Monday, 11 September 2017

Installation of Kafka on CentOS 7


Apache kafka is open source stream processing platform developed by Apache/LinkedIN and is written in Scala/Java. The project aims to provide a unified, high throughput, low-latency platform for handling real-time data feeds. One of the strongest point of Kafka is massively scalable pub/sub message queue architecture as a distributed transaction log and is suitable for handling streaming data.

It is possible to deploy kafka on a single server or build a distributed kafka cluster for greater performance.

### Update system
$ sudo yum update -y && sudo reboot

### Install OpenJDK runtime
$ sudo yum install java-1.8.0-openjdk.x86_64

Check java version
$ java -version

### Add JAVA_HOME and JRE_HOME in /etc/profile
export JAVA_HOME = /usr/lib/jvm/jre-1.8.0-openjdk
export JRE_HOME = /usr/lib/jvm/jre

Apply the modified profile
$ sudo source /etc/profile

### Download the latest version of Apache Kafka
$ cd ~
$ wget -c https://archive.apache.org/dist/kafka/0.11.0.0/kafka_2.12-0.11.0.0.tgz

Unzip the archive and move to the preferred location such as /opt
$ tar -xvf kafka_2.12-0.11.0.0.tgz
$ sudo mv kafka_2.12-0.11.0.0 /opt

### Start and test Apache Kafka
Go to kafka directory
$ cd /opt/kafka_2.12-0.11.0.0

#### Start Zookeeper server
$ bin/zookeeper-server-start.sh -daemon config/zookeeper.properties

#### Modify configuration of kafka server
$ vim bin/kafka-server-start.sh

Adjust the memory usage according to your specific system parameters.

By default,
export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"

Replace it with:
export KAFKA_HEAP_OPTS="-Xmx512M -Xms256M"

### Start kafka server
$ bin/kafka-server-start.sh config/server.properties

If everything went successfully, you will see several messages about the Kafka server's status, and the last one will read:

INFO [Kafka Server 0], started (kafka.server.KafkaServer)

Congratulations!! you have started kafka server. Press CTRL + C to stop the server.

Now, run kafka in daemon mode like this
$ bin/kafka-server-start.sh -daemon config/server.properties

### Create a topic "test" on Kafka server
$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test

If you wish to view the topics, you can view like this:
$ bin/kafka-topics.sh --list --zookeeper localhost:2181

In this case, the output will be:
test

### Produce messages using topic "test"

$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Now, on command(console) prompt, you can input any number of messages as you wish, such as:
Welcome Joshi
Enjoy Kafka journey!

Uset CTRL + C to stop the messages.

If you receive an error similar to "WARN Error while fetching metadata with correlation id" while inputting a message, you'll need to update the server.properties file with the following info:

port = 9092
advertised.host.name = localhost

### Consume messages
$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning

Hola! Whatever you have typed earlier will now be visible on console. Effectively, you have consumed the messages.

### Role of Zookeeper

ZooKeeper coordinates and synchronizes configuration information of distributed nodes. Kafka cluster depends on ZooKeeper to perform operations such as electing leaders and detecting failed nodes.

### Testing zookeeper

Type 'ruok' as telnet console input and the response will be 'imok'

$ telnet localhost 2181
Connected to localhost
Escape character is '^]'.
ruok
imok

### Counting Number of messages stored in a kafka topic
$ bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic test --time -1

This sum up all the counts for each partition.

No comments:

Post a Comment