Skip to main content

kafka: Topic & Partition

Step 1: understand Topic & Partition

1. one topic can be distributed to one or multiple brokers
2. each broker hold different partitions for topic
3. one broker hold multiple partitions for one or more topics
4. One partition is impl by one physical folder with multiple files (segment).


Step 2: check the physical file folder on /tmp/kafka-logs  (the default folder for kafka logs files)



1. the folder is named as {topic}-{partition no}
2. the same {filename}.index, .log and .timeindex construct one segment. 
3. 00000000000000000000 here = the offset of this segment
4. how to locate the message:
    1. decide the partition. 
    2. decide the segment by offset number of message.
    3. decide the offset inside of segment. 
5. for kafka log, it is clear that the SLM mechanism is used which provides the max write/read turnover. 

Step 3: Customize the partition no based on message





Comments

Popular posts from this blog

How to fix "ValueError when trying to compile python module with VC Express"

When I tried to compile the python, I always get compile issue as following: ------------ ... File "C:\Python26\lib\ distutils\msvc9compiler.py ", line 358, in initialize vc_env = query_vcvarsall(VERSION, plat_spec) File "C:\Python26\lib\ distutils\msvc9compiler.py ", line 274, in query_vcvarsall raise ValueError(str(list(result.keys()))) ValueError: [u'path'] --------------------- Python community discussed a lot but no solution: http://bugs.python.org/issue7511 The root cause is because the latest visual studio change the *.bat file a lot especially on 64bit env. The python 2.7 didn't update the path accordingly. Based on the assumption above, the following solution worked for me. To install Visual Studio 2008 Express Edition with all required components: 1. Install Microsoft Visual Studio 2008 Express Edition. The main Visual Studio 2008 Express installer is available from (the C++ installer name is vcsetup.exe): https://ww

How to convert the ResultSet to Stream

Java 8 provided the Stream family and easy operation of it. The way of pipeline usage made the code clear and smart. However, ResultSet is still go with very legacy way to process. Per actual ResultSet usage, it is really helpful if converted as Stream. Here is the simple usage of above: StreamUtils.uncheckedConsumer is required to convert the the SQLException to runtimeException to make the Lamda clear.

Interview for System Design 1: Designing a URL Shortening service like TinyURL.

Problem:  This service will provide short aliases redirecting to long URLs. Step 1: Requirement Analysis Understand the the basic core features: 1. create short url from long url. 2. get the long url from  the short url.  Nice to have feature: 3. will url get expired in certain time? 4. could user define their customized short url? here is some questions need to clarify:  1. How long we need keep the url?  (it will have impact on storage, it is very import to understand to how long will the data be if such data will be stored in local storage). 2. Do we allow N : 1 or only 1: 1 mapping? (have impact about algorithm and data storage.  Step 2:   Estimation Of  Resource Usage common resources: data storage || web services: QPS Let's the estimation right now:  Assume DAU is about 500M,  Create: and one user will create new one item every 5 days. so the total creation per Second will be a. yearly new record: 500M/5 * 365 ~ 50G, new records a. monthly storage: 500M/5 * 100  * 30 = 100M *