Apache kafka is open source stream processing platform developed by Apache/LinkedIN and is written in Scala/Java. The project aims to provide a unified, high throughput, low-latency platform for handling real-time data feeds. One of the strongest point of Kafka is massively scalable pub/sub message queue architecture as a distributed transaction log and is suitable for handling streaming data.
It is possible to deploy kafka on a single server or build a distributed kafka cluster for greater performance.
### Update system
$ sudo yum update -y && sudo reboot
### Install OpenJDK runtime
$ sudo yum install java-1.8.0-openjdk.x86_64
Check java version
$ java -version
### Add JAVA_HOME and JRE_HOME in /etc/profile
export JAVA_HOME = /usr/lib/jvm/jre-1.8.0-openjdk
export JRE_HOME = /usr/lib/jvm/jre
Apply the modified profile
$ sudo source /etc/profile
### Download the latest version of Apache Kafka
$ cd ~
$ wget -c https://archive.apache.org/dist/kafka/0.11.0.0/kafka_2.12-0.11.0.0.tgz
Unzip the archive and move to the preferred location such as /opt
$ tar -xvf kafka_2.12-0.11.0.0.tgz
$ sudo mv kafka_2.12-0.11.0.0 /opt
### Start and test Apache Kafka
Go to kafka directory
$ cd /opt/kafka_2.12-0.11.0.0
#### Start Zookeeper server
$ bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
#### Modify configuration of kafka server
$ vim bin/kafka-server-start.sh
Adjust the memory usage according to your specific system parameters.
By default,
export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"
Replace it with:
export KAFKA_HEAP_OPTS="-Xmx512M -Xms256M"
### Start kafka server
$ bin/kafka-server-start.sh config/server.properties
If everything went successfully, you will see several messages about the Kafka server's status, and the last one will read:
INFO [Kafka Server 0], started (kafka.server.KafkaServer)
Congratulations!! you have started kafka server. Press CTRL + C to stop the server.
Now, run kafka in daemon mode like this
$ bin/kafka-server-start.sh -daemon config/server.properties
### Create a topic "test" on Kafka server
$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
If you wish to view the topics, you can view like this:
$ bin/kafka-topics.sh --list --zookeeper localhost:2181
In this case, the output will be:
test
### Produce messages using topic "test"
$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Now, on command(console) prompt, you can input any number of messages as you wish, such as:
Welcome Joshi
Enjoy Kafka journey!
Uset CTRL + C to stop the messages.
If you receive an error similar to "WARN Error while fetching metadata with correlation id" while inputting a message, you'll need to update the server.properties file with the following info:
port = 9092
advertised.host.name = localhost
### Consume messages
$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
Hola! Whatever you have typed earlier will now be visible on console. Effectively, you have consumed the messages.
### Role of Zookeeper
ZooKeeper coordinates and synchronizes configuration information of distributed nodes. Kafka cluster depends on ZooKeeper to perform operations such as electing leaders and detecting failed nodes.
### Testing zookeeper
Type 'ruok' as telnet console input and the response will be 'imok'
$ telnet localhost 2181
Connected to localhost
Escape character is '^]'.
ruok
imok
### Counting Number of messages stored in a kafka topic
$ bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic test --time -1
This sum up all the counts for each partition.
No comments:
Post a Comment