Basic setup of a Multi Node Apache Kafka/Zookeeper Cluster


Install three nodes with CentOS 7 with at least 20GB Disk, 2 GB RAM and two CPU Cores.

Install JDK

yum install -y java-1.8.0-openjdkl java-1.8.0-openjdk-devel net-tools

Set JAVA_HOME in ~/.bashrc

# Set Java-Home
export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-"
export PATH=$JAVA_HOME/bin:$PATH

Disable SELinux, Firewall and IPv6

systemctl disable firewalld
systemctl stop firewalld

echo "net.ipv6.conf.all.disable_ipv6 = 1" >> /etc/sysctl.conf

[root@kafka3 ~]# cat /etc/selinux/config | grep "^SELINUX="

Reboot Server

Installing Kafka

Download Kafka and unpack it under /opt

tar zxvf kafka_2.11-

Starting Zookeeper

On each node create a zookeeper directory and a file ‚myid‘ with a unique number:

mkdir /zookeeper
echo '1' > /zookeeper/myid

On all three Server go to Kafka home folder /opt/kafka_2.11- and setup zookeeper like this

vi config/

# the directory where the snapshot is stored.
# the port at which the clients will connect
# disable the per-ip limit on the number of connections since this is a non-production config

# The number of milliseconds of each tick
# The number of ticks that the initial synchronization phase can take
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
# zoo servers
#add here more servers if you want

Start Zookeeper on all three servers

./bin/ -daemon config/

Change the Kafka on all three servers (set a unique broker id on each server)

vi config/

# The id of the broker. This must be set to a unique integer for each broker.

#     listeners = PLAINTEXT://

# A comma seperated list of directories under which to store log files

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. ",,".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.,,

Start Kafka on all three nodes:

./bin/ -daemon config/

Verify kafka and zookeper are running:

4150 Jps
2365 QuorumPeerMain
1743 Kafka

Verify also all brokers are registered to zookeeper:

# ./bin/ kafka1:2181 ls /brokers/ids
Connecting to kafka1:2181


WatchedEvent state:SyncConnected type:None path:null
[1, 2, 3]

Create a example Topic with three partitions and replicationfactor 3

# ./bin/ --create --zookeeper kafka1:2181 --topic example-topic --partitions 3 --replication-factor 3
Created topic "example-topic".

# ./bin/ --list --zookeeper kafka1:2181 --topic example-topic

# ./bin/ --describe --zookeeper kafka1:2181 --topic example-topic
Topic:example-topic	PartitionCount:3	ReplicationFactor:3	Configs:
	Topic: example-topic	Partition: 0	Leader: 2	Replicas: 2,3,1	Isr: 2,3,1
	Topic: example-topic	Partition: 1	Leader: 3	Replicas: 3,1,2	Isr: 3,2,1
	Topic: example-topic	Partition: 2	Leader: 1	Replicas: 1,2,3	Isr: 1,2,3

Test the Topic

Start a Producer on one node:

# ./bin/ --broker-list kafka1:9093,kafka2:9093,kafka3:9093 --topic example-topic

Start also a Consumer on a different node:

# ./bin/ --zookeeper kafka1:2181 --topic example-topic --from-beginning

Write some text in the producer console. You should then see the Text on the Consumer Console.

Stop a node and write again some messages in the producer console to verify the high availability is working.