Pages

Sunday, July 22, 2018

Deploying Kafka into ubuntu

Apache Kafka is a distributed message broker designed to handle large volumes of real-time data efficiently. Unlike traditional brokers like ActiveMQ and RabbitMQ, Kafka runs as a cluster of one or more servers which makes it highly scalable and due to this distributed nature it has inbuilt fault tolerance while delivering higher throughput when compared to its counterparts

Implementation of Single Node Kafka

Installing Java

sudo apt-get update
sudo apt-get install default-jre

Installing Zookeeper

sudo apt-get install zookeeperd

Create a service User for Kafka

sudo adduser --system --no-create-home --disabled-password --disabled-login kafka

Download Kafka

cd ~
curl http://kafka.apache.org/KEYS | gpg --import
wget https://dist.apache.org/repos/dist/release/kafka/1.0.1/kafka_2.12-1.0.1.tgz.asc
gpg --verify kafka_2.12-1.0.1.tgz.asc kafka_2.12-1.0.1.tgz

Create a directory for extracting Kafka

sudo mkdir /opt/kafka
sudo tar -xvzf kafka_2.12-1.0.1.tgz --directory /opt/kafka --strip-components 1

Delete Kafka tarball and .asc file

rm -rf kafka_2.12-1.0.1.tgz kafka_2.12-1.0.1.tgz.asc

Configuring Kafka Server

Setup Kafka to start automatically on bootup

Copy the following init script to /etc/init.d/kafka:
======***
DAEMON_PATH=/opt/kafka/bin
DAEMON_NAME=kafka
# Check that networking is up.
#[ ${NETWORKING} = "no" ] && exit 0

PATH=$PATH:$DAEMON_PATH

# See how we were called.
case "$1" in
 start)
       # Start daemon.
       echo "Starting $DAEMON_NAME";
       nohup $DAEMON_PATH/kafka-server-start.sh -daemon /opt/kafka/config/server.properties
       ;;
 stop)
       # Stop daemons.
       echo "Shutting down $DAEMON_NAME";
       pid=`ps ax | grep -i 'kafka.Kafka' | grep -v grep | awk '{print $1}'`
       if [ -n "$pid" ]
         then
         kill -9 $pid
       else
         echo "Kafka was not Running"
       fi
       ;;
 restart)
       $0 stop
       sleep 2
       $0 start
       ;;
 status)
       pid=`ps ax | grep -i 'kafka.Kafka' | grep -v grep | awk '{print $1}'`
       if [ -n "$pid" ]
         then
         echo "Kafka is Running as PID: $pid"
       else
         echo "Kafka is not Running"
       fi
       ;;
 *)
       echo "Usage: $0 {start|stop|restart|status}"
       exit 1
esac

exit 0
======***

Make the Kafka service

sudo chmod 755 /etc/init.d/kafka
sudo update-rc.d kafka defaults

Start Stop the Kafka Services

sudo service kafka start
sudo service kafka status
sudo service kafka stop

Testing Kafka topics

sudo service kafka start
sudo service kafka status

Topic creation

/opt/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

Publish Msg to test topic

/opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

This will prompt for Msgs,  we can enter a test Msg

Consume Msg from the topic

/opt/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning

Making Kafka Scalable

Requirement
Clustering the Zookeeper in all the Servers
Clustering the Kafka in All the servers

Install Zookeeper on all the servers and configure the servers in

/etc/zookeeper/conf/zoo.cfg
to mention all the nodes of the zookeeper

server.0=10.0.0.1:2888:3888
server.1=10.0.0.2:2888:3888
server.2=10.0.0.3:2888:3888

Once Kafka is installed in all the servers

/opt/kafka/config/server.properties
We will change the following settings.

broker.id should be unique for each node in the cluster.

for node-2 broker.id=1
for node-3 broker.id=2
change zookeeper. connect value to have such that it lists all zookeeper hosts with port

zookeeper.connect=10.0.0.1:2181,10.0.0.2:2181,10.0.0.3:2181