Showing posts with label Qubole. Show all posts
Showing posts with label Qubole. Show all posts

Sunday, April 15, 2018

Enabling hive Authorization in Qubole

Once the Hive authorization is enabled in qubole we need to manage the users and permission by Hive authentication,  following are the some of the commands which will be used for the same.

1. Listing the Current Roles

Set role admin;
show roles

2. Create the roles

CREATE ROLE <role_name>;
Creates a new role. Only the admin role has privilege for this.

Set role admin;
Create role sysadmin;

3. Grant Role to users

GRANT ROLE <role_name> TO USER <user_name>
Set role admin;
Grant Role sysadmin to user rahul ;

4. Revoke a role from user

REVOKE ROLE <role_name> FROM USER <user_name>;

Set role admin;
REVOKE Role sysadmin from user rahul;

5. List  Roles attached to a user


Set role admin;
show role grant user `rahul`;

6. List Users under a role


Set role admin;

7. Assign Role access to tables

Sample Permission
SELECT privilege: It provides read access to an object (table).
INSERT privilege: It provides ability for adding data to an object (table).
UPDATE privilege: It provides ability for running UPDATE queries on an object (table).
DELETE privilege: It provides ability for deleting data in an object (table).
ALL privilege: It provides all privileges. In other words, this privilege gets translated into all the above privileges.

GRANT <Permission> ON <table_name> TO ROLE <role_name>;

Grant all on default.testtable to role sysadmin

8. View Role/user Permissions on tables

Check all users who have been granted with a specific role

SHOW GRANT USER <user_name> ON <table_name|All>;
SHOW GRANT ROLE <user_name> ON <table_name|All>;

SHOW GRANT user analytics on all

Saturday, March 31, 2018

Parsing Value from a Json Field in Qubole.

Description of how to extract a value from a JSON field in Hive using the get_json_object function. When the data in one of the fields in the Hive environment is in JSON format, and we need to extract a value out of the JSON, we can use the get_json_object function. For example, if we have a column named jdata containing the following JSON:

get_json_object(column_name, '$.keyvalue')

The column name is : jdata and json the Column is as followes.

    "Foo": "ABC",
    "Bar": "20090101100000",
    "Quux": {
        "QuuxId": 1234,
        "QuuxName": "Sam"

if we have to extract ABC : get_json_object(jdata, '$.Foo') 

Sunday, December 3, 2017

Qubole load CSV with spark

This is a code snippet using Spark on Qubole to load a CSV file into a DataFrame, register it as a temporary table, and create a permanent table from the data in the temporary table.

The first line of code reads the CSV file from an S3 location into a DataFrame. The options set for the format, delimiter, header, and inferSchema specify how the CSV file should be read and parsed.

val df ="com.databricks.spark.csv")
                    .option("delimiter", "|")
                    .option("header", "true")
                    .option("inferSchema", "true")

The second line of code registers the DataFrame as a temporary table, which can be used for querying.


The third line of code creates a permanent table in a specified database by executing an SQL query on the temporary table. The query selects all the columns and rows from the temporary table and creates a new table with the same data in the specified database.

create table database.table as
select * from temp-table

Sunday, August 13, 2017

Qubole : Load Multiple tables to Qubole Hive table from a Data Store

API call to Load Multiple tables from a Qubole Data Store to Hive table. 

[rahul@local qubole]$ cat /databasescript 

#Qubole API Key
#Database Name
#Host Name
#User Name

echo $DB_PASS

## request table import from tap;
function tableImport() {

request_body=$(cat <<EOF
   "hive_table":"<HIVE TABLE NAME>.$1",
   "tags":[" Data"]

echo $request_body
   curl -X POST \
-H "Content-Type:application/json" \
-d "$request_body"

##register database with tap
request_body=$(cat <<EOF
  "gateway_ip": "***********",
  "gateway_port": "***********",
  "gateway_username": "***********",
  "gateway_private_key": "***********"}


echo $KEY
ID=$(curl -s -X POST \
-H "Content-Type:application/json" \
-d "$request_body" | jq .id)

#get the tables and call import
curl -s -H "X-AUTH-TOKEN: $AUTH" \
     -H "Content-Type:application/json" \$ID/tables | jq -r .[] | while read x; do  tableImport $x $ID; done

# can't delete the tap at the end unless we continuously poll for no active jobs;

while [ "$STATUS" = "null" ]
STATUS=$(curl  -s -X DELETE \
 -H "Content-Type:application/json" \$ID | jq .status)
echo -n "."
sleep 5