Pages

Sunday, December 3, 2017

Qubole load CSV with spark

This is a code snippet using Spark on Qubole to load a CSV file into a DataFrame, register it as a temporary table, and create a permanent table from the data in the temporary table.

The first line of code reads the CSV file from an S3 location into a DataFrame. The options set for the format, delimiter, header, and inferSchema specify how the CSV file should be read and parsed.

val df = sqlContext.read.format("com.databricks.spark.csv")
                    .option("delimiter", "|")
                    .option("header", "true")
                    .option("inferSchema", "true")
                    .load("s3://*****.CSV")

The second line of code registers the DataFrame as a temporary table, which can be used for querying.

df.registerTempTable("temp-table")

The third line of code creates a permanent table in a specified database by executing an SQL query on the temporary table. The query selects all the columns and rows from the temporary table and creates a new table with the same data in the specified database.

sqlContext.sql("""
create table database.table as
select * from temp-table
""")

Tuesday, November 28, 2017

Increases swap in azure linux machine

In Azure to create a swap file in the directory that's defined by the ResourceDisk.MountPoint parameter, you can update the /etc/waagent.conf file by setting the following three parameters:

ResourceDisk.Format=y
ResourceDisk.EnableSwap=y
ResourceDisk.SwapSizeMB=xx


Note The xx placeholder represents the desired number of megabytes (MB) for the swap file.
Restart the WALinuxAgent service by running one of the following commands, depending on the system in question:

Ubuntu: service walinuxagent restart
Red Hat/Centos: service waagent restart


Run one of the following commands to show the new swap apace that's being used after the restart:

dmesg | grep swap
swapon -s
cat /proc/swaps
file /mnt/resource/swapfile
free| grep -i swap


If the swap file isn't created, you can restart the virtual machine by using one of the following commands:

shutdown -r now
init 6

Wednesday, November 22, 2017

Docker Clustering with Swarm in Centos7

Docker Clustering with Swarm in Centos7 is a process of creating a cluster of Docker hosts using the Docker Swarm feature in the CentOS 7 operating system. The Swarm feature is a native clustering and orchestration tool within Docker that enables users to create and manage a cluster of Docker hosts. This process involves setting up a Docker Swarm manager and one or more Docker Swarm nodes, configuring the network and storage for the cluster, and deploying and scaling Docker services across the cluster. The benefits of clustering Docker hosts with Swarm in CentOS 7 include increased scalability, high availability, and load balancing of Docker services, as well as simplified management and deployment of containerized applications.

Installing Docker

mkdir /install-files ; cd /install-files
wget https://yum.dockerproject.org/repo/main/centos/7/Packages/docker-engine-1.13.1-1.el7.centos.x86_64.rpm
wget https://yum.dockerproject.org/repo/main/centos/7/Packages/docker-engine-selinux-1.13.1-1.el7.centos.noarch.rpm


Package for docker-engine-selinux
yum install -y policycoreutils-python
rpm -i docker-engine-selinux-1.13.1-1.el7.centos.noarch.rpm
Package for docker-engine
yum install -y libtool-ltdl libseccomp
rpm -i docker-engine-1.13.1-1.el7.centos.x86_64.rpm
Remove rpm packages
rm docker-engine-* -f
Enable systemd service
systemctl enable docker
Start docker

systemctl start docker

Firewalld Enabling Firewall Rules

firewall-cmd --get-active-zones
firewall-cmd --list-all
firewall-cmd --zone=public --add-port=2377/tcp --permanent
firewall-cmd --permanent --add-source=192.168.56.0/24
firewall-cmd --permanent --add-port=2377/tcp
firewall-cmd --permanent --add-port=7946/tcp
firewall-cmd --permanent --add-port=7946/udp
firewall-cmd --permanent --add-port=4789/udp
firewall-cmd --reload
Enable and Restart systemd service
systemctl enable docker;
systemctl restart docker
Docker Cluster Env

docker swarm init --advertise-addr=192.168.56.105

Swarm initialized: current node (b4b79zi3t1mq1572r0iubxdhc) is now a manager.

To add a worker to this swarm, run the following command:

    docker swarm join \
    --token SWMTKN-1-1wcz7xfyvhewvj3dd4wcbhufw4lub3b1vgpuoybh90myzookbf-4ksxoxrilifb2tmvuligp9krs \
    192.168.56.101:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

To join as a Swarm manager

docker swarm join-token manager

  docker swarm join \
    --token SWMTKN-1-10cqx6yryq5kyfe128m2xhyxzplsc90lzksqggmscv1nfipsbb-bfdbvfhuw9sg8mx2i1a4rkvlv \
    192.168.56.101:2377