Pages

Showing posts with label BigData. Show all posts
Showing posts with label BigData. Show all posts

Saturday, March 31, 2018

Parsing Value from a Json Field in Qubole.

Description of how to extract a value from a JSON field in Hive using the get_json_object function. When the data in one of the fields in the Hive environment is in JSON format, and we need to extract a value out of the JSON, we can use the get_json_object function. For example, if we have a column named jdata containing the following JSON:


get_json_object(column_name, '$.keyvalue')

The column name is : jdata and json the Column is as followes.

{
    "Foo": "ABC",
    "Bar": "20090101100000",
    "Quux": {
        "QuuxId": 1234,
        "QuuxName": "Sam"
    }
}

if we have to extract ABC : get_json_object(jdata, '$.Foo') 

Sunday, August 13, 2017

Qubole : Load Multiple tables to Qubole Hive table from a Data Store

API call to Load Multiple tables from a Qubole Data Store to Hive table. 


[rahul@local qubole]$ cat /databasescript 
#!/bin/bash

#Qubole API Key
AUTH="***********"
#Database Name
DB_NAME="***********"
#Host Name
DB_HOST="***********"
#User Name
DB_USER="***********
#Password 
DB_PASS='***********'

echo $DB_PASS


## request table import from tap;
function tableImport() {

request_body=$(cat <<EOF
{
   "command_type":"DbImportCommand",
   "mode":"1",
   "hive_serde":"orc",
   "hive_table":"<HIVE TABLE NAME>.$1",
   "dbtap_id":"$2",
   "db_table":"$1",
   "db_parallelism":"1",
   "use_customer_cluster":"1",
   "customer_cluster_label":"Qubole_Data_Import",
   "tags":[" Data"]
}
EOF
)

echo $request_body
   curl -X POST \
-H "X-AUTH-TOKEN: $AUTH" \
-H "Content-Type:application/json" \
-d "$request_body" https://api.qubole.com/api/v1.2/commands/
}

##register database with tap
request_body=$(cat <<EOF
{
  "db_name":"$DB_NAME",
  "db_host":"$DB_HOST",
  "db_user":"$DB_USER",
  "db_passwd":"$DB_PASS",
  "db_type":"sqlserver",
  "db_location":"on-premise",
  "gateway_ip": "***********",
  "gateway_port": "***********",
  "gateway_username": "***********",
  "gateway_private_key": "***********"}

EOF
)

echo $KEY
ID=$(curl -s -X POST \
-H "X-AUTH-TOKEN: $AUTH" \
-H "Content-Type:application/json" \
-d "$request_body" https://api.qubole.com/api/v1.2/db_taps/ | jq .id)

#get the tables and call import
curl -s -H "X-AUTH-TOKEN: $AUTH" \
     -H "Content-Type:application/json" \
     https://api.qubole.com/api/v1.2/db_taps/$ID/tables | jq -r .[] | while read x; do  tableImport $x $ID; done

# can't delete the tap at the end unless we continuously poll for no active jobs;
STATUS="null"

while [ "$STATUS" = "null" ]
do
STATUS=$(curl  -s -X DELETE \
 -H "X-AUTH-TOKEN: $AUTH" \
 -H "Content-Type:application/json" \
 https://api.qubole.com/api/v1.2/db_taps/$ID | jq .status)
echo -n "."
sleep 5
done