Skip to main content

kafka: cluster installation

Cluster Env Setup

1. remote ssh access using generated ssh key
   a. ssh-keygen -t rsa
   b. go to ~/.ssh/ and check for id_rsa and id_rsa.pub 
   c. ssh-copy-id -i ~/.ssh/id_rsa.pub <targetserver>
   d. more ~/.ssh/authorized_keys
2. download the kafka and zookeeper to each nodes
3. for each purpose, use clush to push the kafka to all nodes.
    clush -g nodeGroup -c <path of kafka>

Install zookeeper in Cluster

4. setup zookeeper:
   a. rename zoo_sample to zoo.cfg
   b. add the following to zoo.cfg for cluster setup.
        server.1 = <host1>:2888:3888
        server.2 = <host2>:2888:3888
        server.3 = <host3>:2888:3888 
    2888: is used between leader and follower.
    3888: is used for leader election.

5. sync the zoo.cfg to all clusters node
6. assign the different ID to different node.
    1. create unique id under /tmp/zookeeper
7. start zookeeper on each node
8.   test if the all 3 zookeeper are synced.
    1. login zookeeper conosle: ./zookeeper-shell.sh <nodename>:2181 
    2. create /hello-test test
    3. login all of other zookeeper and all of them should see the created path: /hello-test

Start the kafka in cluster.

1. update the kafka.properties file to use zookeeper cluster for all node

2. update the broker id for each of node.
Here, we just set the broker id as 1, 2, 3 for each node.
3. start the kafka on each node:  clush -g kafka {path}/kafka-server-start.sh ../config/server.properties --daemon
4. verify the kafka cluster works by creating topic.

Comments

Popular posts from this blog

How to fix "ValueError when trying to compile python module with VC Express"

When I tried to compile the python, I always get compile issue as following: ------------ ... File "C:\Python26\lib\ distutils\msvc9compiler.py ", line 358, in initialize vc_env = query_vcvarsall(VERSION, plat_spec) File "C:\Python26\lib\ distutils\msvc9compiler.py ", line 274, in query_vcvarsall raise ValueError(str(list(result.keys()))) ValueError: [u'path'] --------------------- Python community discussed a lot but no solution: http://bugs.python.org/issue7511 The root cause is because the latest visual studio change the *.bat file a lot especially on 64bit env. The python 2.7 didn't update the path accordingly. Based on the assumption above, the following solution worked for me. To install Visual Studio 2008 Express Edition with all required components: 1. Install Microsoft Visual Studio 2008 Express Edition. The main Visual Studio 2008 Express installer is available from (the C++ installer name is vcsetup.exe): https://ww

How to convert the ResultSet to Stream

Java 8 provided the Stream family and easy operation of it. The way of pipeline usage made the code clear and smart. However, ResultSet is still go with very legacy way to process. Per actual ResultSet usage, it is really helpful if converted as Stream. Here is the simple usage of above: StreamUtils.uncheckedConsumer is required to convert the the SQLException to runtimeException to make the Lamda clear.

Interview for System Design 1: Designing a URL Shortening service like TinyURL.

Problem:  This service will provide short aliases redirecting to long URLs. Step 1: Requirement Analysis Understand the the basic core features: 1. create short url from long url. 2. get the long url from  the short url.  Nice to have feature: 3. will url get expired in certain time? 4. could user define their customized short url? here is some questions need to clarify:  1. How long we need keep the url?  (it will have impact on storage, it is very import to understand to how long will the data be if such data will be stored in local storage). 2. Do we allow N : 1 or only 1: 1 mapping? (have impact about algorithm and data storage.  Step 2:   Estimation Of  Resource Usage common resources: data storage || web services: QPS Let's the estimation right now:  Assume DAU is about 500M,  Create: and one user will create new one item every 5 days. so the total creation per Second will be a. yearly new record: 500M/5 * 365 ~ 50G, new records a. monthly storage: 500M/5 * 100  * 30 = 100M *