Sunday, December 3, 2017

Qubole load CSV with spark

This is a code snippet using Spark on Qubole to load a CSV file into a DataFrame, register it as a temporary table, and create a permanent table from the data in the temporary table.

The first line of code reads the CSV file from an S3 location into a DataFrame. The options set for the format, delimiter, header, and inferSchema specify how the CSV file should be read and parsed.

val df ="com.databricks.spark.csv")
                    .option("delimiter", "|")
                    .option("header", "true")
                    .option("inferSchema", "true")

The second line of code registers the DataFrame as a temporary table, which can be used for querying.


The third line of code creates a permanent table in a specified database by executing an SQL query on the temporary table. The query selects all the columns and rows from the temporary table and creates a new table with the same data in the specified database.

create table database.table as
select * from temp-table

Tuesday, November 28, 2017

Increases swap in azure linux machine

In Azure to create a swap file in the directory that's defined by the ResourceDisk.MountPoint parameter, you can update the /etc/waagent.conf file by setting the following three parameters:


Note The xx placeholder represents the desired number of megabytes (MB) for the swap file.
Restart the WALinuxAgent service by running one of the following commands, depending on the system in question:

Ubuntu: service walinuxagent restart
Red Hat/Centos: service waagent restart

Run one of the following commands to show the new swap apace that's being used after the restart:

dmesg | grep swap
swapon -s
cat /proc/swaps
file /mnt/resource/swapfile
free| grep -i swap

If the swap file isn't created, you can restart the virtual machine by using one of the following commands:

shutdown -r now
init 6

Wednesday, November 22, 2017

Docker Clustering with Swarm in Centos7

Docker Clustering with Swarm in Centos7 is a process of creating a cluster of Docker hosts using the Docker Swarm feature in the CentOS 7 operating system. The Swarm feature is a native clustering and orchestration tool within Docker that enables users to create and manage a cluster of Docker hosts. This process involves setting up a Docker Swarm manager and one or more Docker Swarm nodes, configuring the network and storage for the cluster, and deploying and scaling Docker services across the cluster. The benefits of clustering Docker hosts with Swarm in CentOS 7 include increased scalability, high availability, and load balancing of Docker services, as well as simplified management and deployment of containerized applications.

Installing Docker

mkdir /install-files ; cd /install-files

Package for docker-engine-selinux
yum install -y policycoreutils-python
rpm -i docker-engine-selinux-1.13.1-1.el7.centos.noarch.rpm
Package for docker-engine
yum install -y libtool-ltdl libseccomp
rpm -i docker-engine-1.13.1-1.el7.centos.x86_64.rpm
Remove rpm packages
rm docker-engine-* -f
Enable systemd service
systemctl enable docker
Start docker

systemctl start docker

Firewalld Enabling Firewall Rules

firewall-cmd --get-active-zones
firewall-cmd --list-all
firewall-cmd --zone=public --add-port=2377/tcp --permanent
firewall-cmd --permanent --add-source=
firewall-cmd --permanent --add-port=2377/tcp
firewall-cmd --permanent --add-port=7946/tcp
firewall-cmd --permanent --add-port=7946/udp
firewall-cmd --permanent --add-port=4789/udp
firewall-cmd --reload
Enable and Restart systemd service
systemctl enable docker;
systemctl restart docker
Docker Cluster Env

docker swarm init --advertise-addr=

Swarm initialized: current node (b4b79zi3t1mq1572r0iubxdhc) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join \
    --token SWMTKN-1-1wcz7xfyvhewvj3dd4wcbhufw4lub3b1vgpuoybh90myzookbf-4ksxoxrilifb2tmvuligp9krs \

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

To join as a Swarm manager

docker swarm join-token manager

  docker swarm join \
    --token SWMTKN-1-10cqx6yryq5kyfe128m2xhyxzplsc90lzksqggmscv1nfipsbb-bfdbvfhuw9sg8mx2i1a4rkvlv \

Sunday, November 5, 2017

Creating CSR with multiple Domains With Openssl

Creating a CSR (Certificate Signing Request) with multiple domains using OpenSSL involves generating a private key and a CSR file, which includes the details of the domain(s) to be included in the certificate. The process involves the following steps:

Generate a private key using the openssl command with the following syntax:

openssl genrsa -out domain.key 2048

This generates a private key file named "domain.key" with 2048 bits of encryption.

Create a configuration file (e.g. domain.conf) that contains the details of the domains to be included in the certificate. This file should contain the following details:

default_bits       = 2048
default_keyfile    = domain.key
distinguished_name = req_distinguished_name
req_extensions     = req_ext

countryName             = Country Name (2 letter code)
stateOrProvinceName     = State or Province Name (full name)
localityName            = Locality Name (eg, city)
organizationName        = Organization Name (eg, company)
commonName              = Common Name (e.g. server FQDN or YOUR name)
emailAddress            = Email Address

subjectAltName          = @alt_names

DNS.1                  =
DNS.2                  =
DNS.3                  =

In the example above, "", "", and "" are included as the alternate domain names.

Generate a CSR file using the openssl command with the following syntax:

openssl req -new -sha256 -key domain.key -out domain.csr -config domain.conf

This generates a CSR file named "domain.csr" that contains the details of the private key and the alternate domain names specified in the configuration file.

Submit the CSR file to a Certificate Authority (CA) to obtain a signed SSL certificate that can be installed on the server.

Overall, this process allows for the creation of a CSR file with multiple domain names that can be used to obtain a signed SSL certificate to secure those domains.

Tuesday, October 24, 2017

docker: 'stack' is not a docker command.

The error message "docker: 'stack' is not a docker command" suggests that the version of Docker being used does not support the "stack" command. The solution to this problem is to upgrade Docker to version 1.13 or higher. In the given example, the solution is to upgrade Docker to version 1.13 by downloading the required RPM packages from the Docker project repository and installing them using the "rpm -i" command. After the installation, the "systemctl enable docker" and "systemctl start docker" commands are used to enable and start the Docker service.While deploying the docker services using stack deploy command. We got following error.

docker stack deploy -c docker-compose.yml appslab
docker: 'stack' is not a docker command.
See 'docker --help'.

Upgrade docker to 1.13

In Centos 7 we used the following to get the docker upgraded. Now the docket-latest package in centos7 is upgraded to 1.13

#package for docker-engine-selinux
yum install -y policycoreutils-python
rpm -i docker-engine-selinux-1.13.1-1.el7.centos.noarch.rpm

#package for docker-engine
yum install -y libtool-ltdl libseccomp
rpm -i docker-engine-1.13.1-1.el7.centos.x86_64.rpm

#remove rpm packages
rm docker-engine-* -f

#enable systemd service
systemctl enable docker

#start docker

systemctl start docker

Friday, September 22, 2017

Fedora 26 + Virtualbox 5.1 + kenel 4.12

Upgrading your virtual machine (VM) environment can sometimes lead to unexpected issues. A common problem users might encounter after upgrading VirtualBox is the VM failing to start. In this post, we'll walk through a specific error and provide a step-by-step guide to resolve it, ensuring your virtual environment gets back up and running smoothly.

Understanding the Error

Upon attempting to start a VM after an upgrade, you might encounter an error in your logs similar to this:

/tmp/vbox.0/r0drv/linux/memuserkernel-r0drv-linux.o: warning: objtool: .fixup: unexpected end of section if [ "-pg" = "-pg" ]; then if [ /tmp/vbox.0/r0drv/linux/memuserkernel-r0drv-linux.o != "scripts/mod/empty.o" ]; then ./scripts/recordmcount "/tmp/vbox.0/r0drv/linux/memuserkernel-r0drv-linux.o"; fi; fi; make[1]: *** [Makefile:1519: _module_/tmp/vbox.0] Error 2 make: *** [Makefile:304: vboxdrv] Error 2

This error typically indicates a problem with the VirtualBox kernel modules not compiling or loading correctly due to incompatibilities or issues within the system.

Step-by-Step Solution

Fear not, as this issue can often be resolved by applying a patch to the VirtualBox source. Here's how you can fix it:

1. Change to the VirtualBox Source Directory:

Navigate to the directory where VirtualBox sources are stored:

cd /usr/share/virtualbox/src

2. Obtain the Necessary Patch:

Download the patch designed to fix the issue:

sudo wget

3. Apply the Patch:

Apply the downloaded patch to the VirtualBox source:

sudo patch -Np0 < 20170629003423

4. Reconfigure VirtualBox:

After applying the patch, you need to reconfigure VirtualBox to make sure it recognizes the changes:


Post-Solution Tips:

Have Fun! Now that you've resolved the issue, your VMs should start as expected. Dive back into your virtual environment and continue your work or play.

Stay Updated: Keep your system and VirtualBox updated to avoid similar issues in the future. Developers regularly release patches and updates to address known bugs and compatibility issues.

Seek Community Help: If you encounter further issues or the problem persists, don't hesitate to seek help from the VirtualBox community forums or check out other user experiences for additional insights.


Encountering errors after a system or software upgrade can be frustrating, but with the right approach and resources, most issues can be resolved. By understanding the error, carefully following the provided steps, and engaging with the community, you can overcome challenges and enjoy a seamless virtualization experience with VirtualBox. Keep exploring, learning, and sharing your knowledge with others!

Friday, September 8, 2017

Minio Running as Service

Minio is a distributed object storage server, similar to Amazon S3, that allows you to store and access large amounts of data. Since the service is running on different hosts, it is important to have a shared storage mechanism so that the data is synchronized across all nodes. To achieve this, a bind mount is used to mount a directory on the host machine to the Minio server container, allowing it to read and write data to the directory. Additionally, two Docker secrets are created for access and secret keys to authenticate and authorize access to the Minio server. Finally, the service is created with the docker service create command, specifying the name of the service, the port to publish, the constraint to run the service only on a manager node, the bind mount for data synchronization, and the two Docker secrets for authentication. The minio/minio image is used to run the Minio server, and the /data directory is specified as the location to store data.

echo "AKIAIOSFODNN7EXAMPLE" | docker secret create access_key -
echo "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" | docker secret create secret_key -

docker service create --name="minio-service" --publish 9000:9000   --constraint 'node.role == manager' --mount type=bind,src=/mnt/minio/,dst=/data --secret="access_key" --secret="secret_key" minio/minio server /data

Wednesday, September 6, 2017

Minio: S3 Compatible Stoage in Docker

Minio is a distributed object storage server that is designed to be scalable and highly available. It is built for cloud-native applications and DevOps. Minio provides Amazon S3 compatible API for cloud-native applications to store and retrieve data. It is open-source and can be deployed on-premise, on the cloud or on Kubernetes.

The command docker pull minio/minio pulls the Minio image from Docker Hub. The command docker run -p 9000:9000 minio/minio server /data runs a Minio container with port forwarding from the host to the container for the Minio web interface. The /data parameter specifies the path to the data directory that will be used to store the data on the container's file system.

**We need to have the docker env up and running.

docker pull minio/minio
docker run -p 9000:9000 minio/minio server /data

After running this command, you can access the Minio web interface by navigating to http://localhost:9000 in your web browser.

Thursday, August 17, 2017

Inceass the Root Disk Size for Centos in Aws

Issue: Root Partition not scaled after EBS is resized.

Growpart called by cloud-init only works for kernels >3.8. Only newer kernels support changing the partition size of a mounted partition. When using an older kernel the resizing of the root partition happens in the initrd stage before the root partition is mounted and the subsequent cloud-init growpart run is a no-op.

xvda    202:0    0  30G  0 disk
└─xvda1 202:1    0   8G  0 part /
Perform the following command as root:

# yum install cloud-utils-growpart

# growpart /dev/xvda 1

# reboot
After the reboot:

# lsblk
xvda    202:0    0  30G  0 disk
└─xvda1 202:1    0  30G  0 part /

Sunday, August 13, 2017

Qubole : Load Multiple tables to Qubole Hive table from a Data Store

API call to Load Multiple tables from a Qubole Data Store to Hive table. 

[rahul@local qubole]$ cat /databasescript 

#Qubole API Key
#Database Name
#Host Name
#User Name

echo $DB_PASS

## request table import from tap;
function tableImport() {

request_body=$(cat <<EOF
   "hive_table":"<HIVE TABLE NAME>.$1",
   "tags":[" Data"]

echo $request_body
   curl -X POST \
-H "Content-Type:application/json" \
-d "$request_body"

##register database with tap
request_body=$(cat <<EOF
  "gateway_ip": "***********",
  "gateway_port": "***********",
  "gateway_username": "***********",
  "gateway_private_key": "***********"}


echo $KEY
ID=$(curl -s -X POST \
-H "Content-Type:application/json" \
-d "$request_body" | jq .id)

#get the tables and call import
curl -s -H "X-AUTH-TOKEN: $AUTH" \
     -H "Content-Type:application/json" \$ID/tables | jq -r .[] | while read x; do  tableImport $x $ID; done

# can't delete the tap at the end unless we continuously poll for no active jobs;

while [ "$STATUS" = "null" ]
STATUS=$(curl  -s -X DELETE \
 -H "Content-Type:application/json" \$ID | jq .status)
echo -n "."
sleep 5

Thursday, July 6, 2017

GrayLog Configuration Error : Please verify that the server is healthy and working correctly.

First, we need to Make sure the Elastic Seach is running fine.

Following was the configuration graylog

Then Make sure the Entry in the graylog for following attributes is correct.

rest_listen_uri =
web_listen_uri =
rest_transport_uri =
web_endpoint_uri =

** In Aws/Azure Make sure we give the Server's Public IP or the Load balancers IP.

Friday, June 30, 2017

Reduce TIME_WAIT socket connections

We will reduce the Time_wait by tweaking the Sysctl to time out at a certain time and reuse that socket.

List the no of time_waits and Established Connections

>>netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n

cat /proc/sys/net/ipv4/tcp_fin_timeout
cat /proc/sys/net/ipv4/tcp_tw_recycle
cat /proc/sys/net/ipv4/tcp_tw_reuse

If you have default settings, you’ll probably see values of 60, 0 and 0. Let’s change those values to 60, 1, 1.

Now, edit the /etc/sysctl.conf with your favorite editor and add these lines to the end of it (or edit the values you have in yours if they exist already):

# Decrease TIME_WAIT seconds
net.ipv4.tcp_fin_timeout = 30

# Recycle and Reuse TIME_WAIT sockets faster
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1

Sysctl -p

netstat -nat | awk '{print $6}' | sort | uniq -c | sort -n

Tuesday, June 6, 2017

ELK : Json Data not Logged Correctly in Elastic Search

Data written to S3 form logstash is in Format
2016-12-08T21:55:36.381Z %{host} %{message}
2016-12-08T21:55:36.385Z %{host} %{message}
2016-12-08T21:55:36.385Z %{host} %{message}
2016-12-08T21:55:36.390Z %{host} %{message}
2016-12-08T21:55:36.391Z %{host} %{message}
2016-12-08T21:55:36.421Z %{host} %{message}
2016-12-08T21:55:36.421Z %{host} %{message}
2016-12-08T21:55:36.421Z %{host} %{message}
What happens here is that the default plain codec is being used for the S3 output from Logsearch. In the configuration for Custom Logstash outputs, you should use the JSON Lines Codec. There are more codecs you can use which are listed here.
You can add the codec by adding the json_lines codec to your Custom Logstash Outputs Configuration in the Logstash tile settings. Your configuration should look like the following:
output {
    s3 {
access_key_id => "****************"
secret_access_key => "*********************"
region => "region name"
bucket => "bucket-name"
time_file => 15
codec => "json_lines"
After adding the json_lines codec, your S3 bucket Logstash entries should look more like this:
{"@timestamp":"2016-12-12T15:58:37.000Z","port":34854,"@type":"CounterEvent","@message":"{\"cf_origin\":\"firehose\",\"delta\":65,\"deployment\":\"cf\",\"event_type\":\"CounterEvent\",\"index\":\"9439da9a-fb72-4064-839f-934d4e8a6a5c\",\"ip\":\"\",\"job\":\"router\",\"level\":\"info\",\"msg\":\"\",\"name\":\"udp.sentMessageCount\",\"origin\":\"MetronAgent\",\"time\":\"2016-12-12T15:58:37Z\",\"total\":5257491}","syslog_pri":"6","syslog_pid":"6229","@raw":"<6>2016-12-12T15:58:37Z f7643aae-c011-4715-a88b-2333aaf770ab doppler[6229]: {\"cf_origin\":\"firehose\",\"delta\":65,\"deployment\":\"cf\",\"event_type\":\"CounterEvent\",\"index\":\"9439da9a-fb72-4064-839f-934d4e8a6a5c\",\"ip\":\"\",\"job\":\"router\",\"level\":\"info\",\"msg\":\"\",\"name\":\"udp.sentMessageCount\",\"origin\":\"MetronAgent\",\"time\":\"2016-12-12T15:58:37Z\",\"total\":5257491}","tags":["syslog_standard","firehose","CounterEvent"],"syslog_severity_code":6,"syslog_facility_code":0,"syslog_facility":"kernel","syslog_severity":"informational","@source":{"host":"f7643aae-c011-4715-a88b-2333aaf770ab","deployment":"cf","job":"router","ip":"","program":"doppler","index":9439,"vm":"router/9439"},"@level":"INFO","CounterEvent":{"delta":65,"name":"udp.sentMessageCount","origin":"MetronAgent","total":5257491}}
Additional Information

Saturday, April 29, 2017

Checking Oracle Database Connection from Java

import java.sql.Connection;
import java.sql.Date;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class OracleSample {

    public static final String DBURL = "jdbc:oracl:thin:@***.***.***.***:1521:oracledatabase";
    public static final String DBUSER = "username";
    public static final String DBPASS = "Oracle8521";

    public static void main(String[] args) throws SQLException {
        // Load Oracle JDBC Driver
        DriverManager.registerDriver(new oracle.jdbc.OracleDriver());
        // Connect to Oracle Database
        Connection con = DriverManager.getConnection(DBURL, DBUSER, DBPASS);

        Statement statement = con.createStatement();

        // Execute a SELECT query on Oracle Dummy DUAL Table. Useful for retrieving system values
        // Enables us to retrieve values as if querying from a table
        ResultSet rs = statement.executeQuery("SELECT SYSDATE FROM DUAL");
        if ( {
            Date currentDate = rs.getDate(1); // get first column returned
            System.out.println("Current Date from Oracle is : "+currentDate);

>># javac -cp "./ojdbc7.jar:."
>># java -cp "./ojdbc7.jar:." OracleSample
Current Date from Oracle is : 2017-02-09

Tuesday, January 17, 2017

Kibana Authentication with Nginx on Centos

Kibana doesn’t support authentication or restricting access to dashboards by default.We can restrict access to Kibana 4 using nginx as a proxy in front of Kibana.

Install nginx server:
To install Nginx using yum we need to include the Nginx repository, install the Nginx repository using,
rpm -Uvh
Install Nginx and httpd-tools by issuing the following command,
yum -y install nginx httpd-tools
Create a password file for basic authentication of http users, this is to enable the password protected access to kibana portal. Replace “admin” with your own user name
htpasswd -c /etc/nginx/conf.d/kibana.htpasswd adin
Configure Nginx:
Create a confiiguration file with the name kibana.conf in /etc/nginx/conf.d directory
vi /etc/nginx/conf.d/kibana.conf
Place the following content to the kibana.conf file, assuming that both kibana and Nginx are installed on same server

server {
listen *:8080;
server_name 192.168.01;
access_log /var/log/nginx/kibana-access.log;
error_log /var/log/nginx/kibana-error.log;
location / {
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/conf.d/kibana.htpasswd;
proxy_pass http://192.168.01:5601;
#proxy_connect_timeout 150;
#proxy_send_timeout 100;
#proxy_read_timeout 100;
Restart nginx server:
sudo service nginx restart
Go to the URL : http://192,168.01:8080, we should get an authentication screen as below on successful setup,
If nothing is showing up check the logs and see whether you have encountered an error as below,
2015/08/11 22:31:13 [crit] 80274#0: *3 connect() to failed (13: Permission denied) while connecting to upstream, client:, server:, request: "GET / HTTP/1.1", upstream: "", host: ""
Error Resolution:
This is happening because we have selinux enabled on our machine.
Disable the selinux by running the command
sudo setsebool -P httpd_can_network_connect 1
Restart nginx:
sudo service nginx restart