# Zookeeper Cluster Installation on Linux

In 
Published 2022-12-03

This tutorial explains to you how to install Apache Zookeeper Cluster on Linux. This tutorial explains step-by-step how to install Zookeeper Cluster on a Linux machine.

Zookeeper is a software used for Kafka management. ZooKeeper was originally developed by Yahoo to address the bugs that can arise with distributed, big data applications by storing the status of processes running on clusters. Like Kafka, ZooKeeper is an open source technology under the Apache License.

Zookeeper acts as a centralized service and is used to maintain naming and configuration data and to provide flexible and robust synchronization within distributed systems. Zookeeper keeps track of status of the Kafka cluster nodes, and it also keeps track of Kafka topics, partitions etc.

Kafka server is not working if Zookeeper is not started. For this reason in a cluster (but on a single node installation as well) you must install/start Zookeeper before you install/start Kafka server.

In this tutorial I will install and start the Zookeeper cluster. Zookeeper cluster in my example will stay on the same machine as Kafka server, but in the production environment it is common to have separate machines for Zookeeper cluster. After the Zookeeper cluster is installed/configured/started you can install Kafka cluster.

Here are the steps for installing Zookeeper cluster on Linux (CentOS 8):

  1. Install Java 8+ on your Linux machine. In my example I use CentOS 8.
sudo yum install java-11-openjdk-devel
  1. Disable RAM Swap - can set to 0 on certain Linux versions/distributions
sudo sysctl vm.swappiness=1
echo 'vm.swappiness=1' | sudo tee --append /etc/sysctl.conf
  1. Add hosts entries (in production you will use probably DNS)
echo "192.168.85.153 kafka1
192.168.85.153 zookeeper1
192.168.85.154 kafka2
192.168.85.154 zookeeper2
192.168.85.155 kafka3
192.168.85.155 zookeeper3" | sudo tee --append /etc/hosts

These values are for my nodes.

  1. Install Kafka and Zookeeper (are packaged together)

You can take a look at the tutorial named install Kafka cluster.

  1. Configure Zookeeper

-- Create Zookeeper data directory :

# mkdir -p /kafka/kafka_2.13-2.8.0/data/zookeeper
# chown -R kafka:kafka /kafka/kafka_2.13-2.8.0/data

-- Create ZOOKEEPER_HOME environment variable :

export ZOOKEEPER_HOME=/kafka/kafka_2.13-2.8.0/

-- Modify zookeeper.properties file :

vi $ZOOKEEPER_HOME/config/zookeeper.properties

and add:

#milleseconds every tick has
tickTime=2000

# Number of ticks (the time) the ZooKeeper servers in quorum have to connect to a leader
initLimit=5

#syncLimit limits how far out of date a server can be from a leader
syncLimit=2

# Declare Zookeeper servers
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888

also you have to modify dataDir :

dataDir=/kafka/kafka_2.13-2.8.0/data/zookeeper

-- Create the ID of each Zookeeper Server

Run on Zookeeper node #1:

echo '1' > /kafka/kafka_2.13-2.8.0/data/zookeeper/myid

Run on Zookeeper node #2:

echo '2' > /kafka/kafka_2.13-2.8.0/data/zookeeper/myid

Run on Zookeeper node #3:

echo '3' > /kafka/kafka_2.13-2.8.0/data/zookeeper/myid
  1. If needed, you might open the Zookeeper ports (as root/ administrator user):
firewall-cmd --permanent --add-port 2888/tcp
firewall-cmd --permanent --add-port 3888/tcp
firewall-cmd --permanent --add-port 2181/tcp

Now you can start Zookeeper servers and see that the Zookeeper servers communicate one with each other.

(A better practice is to configure Zookeeper servers to start/stop as a Linux service)

  1. Start Zookeeper manually server on each machine:
cd /kafka/kafka_2.13-2.8.0
zookeeper-server-start.sh ./config/zookeeper.properties
  1. Verify that Zookeeper cluster is working properly

Before verifying Zookeeper cluster, you can enable also the Four Letter Words commands.

Check if the node accept connections (is working in non-error mode):
echo ruok | nc 127.0.0.1 2181     
imok[kafka@kafka1 bin]$

If there is a problem, the command has no response.

For more information about the Zookeeper server you can run the following command (you can run this command on any nodes to receive information about the node #1):

echo "srvr" | nc zookeeper1 2181
Zookeeper version: 3.5.9-83df9301aa5c2a5d284a9940177808c01bc35cef, built on 01/06/2021 20:03 GMT
Latency min/avg/max: 0/0/0
Received: 7
Sent: 6
Connections: 1
Outstanding: 0
Zxid: 0x1f
Mode: follower
Node count: 26

If initLimit is too little, Zookeeper cluster might not elect a leader and the cluster will fail !

At this point you have Zookeeper cluster running, and you can install/configure and after that run the Kafka Cluster.

Once Zookeeper Cluster is installed you can install Kafka Cluster as well.