Skip to main content

Interview for System Design 1: Designing a URL Shortening service like TinyURL.

Problem: This service will provide short aliases redirecting to long URLs.

Step 1: Requirement Analysis

Understand the the basic core features:

1. create short url from long url.

2. get the long url from  the short url. 


Nice to have feature:

3. will url get expired in certain time?

4. could user define their customized short url?


here is some questions need to clarify: 

1. How long we need keep the url?  (it will have impact on storage, it is very import to understand to how long will the data be if such data will be stored in local storage).

2. Do we allow N : 1 or only 1: 1 mapping? (have impact about algorithm and data storage. 


Step 2:   Estimation Of Resource Usage

common resources: data storage

|| web services: QPS

Let's the estimation right now: 

Assume DAU is about 500M, 

Create: and one user will create new one item every 5 days. so the total creation per Second will be

a. yearly new record: 500M/5 * 365 ~ 50G, new records

a. monthly storage: 500M/5 * 100  * 30 = 100M * 3K = 300G --> yearly storage = 3600G = 3.6T ~ 5T

b. QPS

    add: 500M / 5 / 86400 ~ 1000 / seconds.

Update: None

Delete: None

Query: one user will have 100 friends, and will click about 100 / 5 / 2 = 10 per day

     QPS: 500M * 10 / 86400 ~ 5000M / 0.1 M = 50K second.

     PEAK QPS: 50K * 3 = 150K


So in summary:

Year Storage = 5T

Write QPS = 1000, BandWidth = 100K/S

Read QPS = 50K, BandWidth = 5M/S,  

Memory Usage to Cache (only cache 1%) = 1000 * 86400 * 1000 * 1% * 365 = 300G 

Both of them are not big number, can be handle even by single server. 


Step 3: Services

API:

 1. CreateShortUrl(String longUrl); 

 2. GetLongUrl(String shortUrl);  + redirect 303


Core DataModel:

    {

        String shortUrl;

        String longUrl;

        DateTime: createdTime;

    }

Logic to generate the shortUrl:

1. Base62 encode: 62 ^ 7 = 3.6T vs 50G.    

2. How to generate ShortURL:    Random String (offline) VS Auto Inc ID

 Random String:   

     Define new table  to store all available Keys generated by  Random Key Generator b offline.

Auto Inc ID:   the number will be mod by 62 iteratively. 

Step 4:  Database

1. Any relationship between data?  NO
2. Always looked up by PK ? YES

Then we can use Non SQL since it will benefit with better performance and easy to scale.
But if we decided to use Auto Inc Id, we need pick up the sql database.

Now, we get the basic diagram of whole design






Step 5:  Scaling and Shredding


Cache.   Since it is just simple Key Value lookup, Redis will be candidate to pick up.

Database Shredding:
     How to find which Database the data belong to.
        a. use first digital as machine ID.
        b. if based on SQL, based on the KEY RANGE.




    
System Partition:








Comments

Popular posts from this blog

How to fix "ValueError when trying to compile python module with VC Express"

When I tried to compile the python, I always get compile issue as following: ------------ ... File "C:\Python26\lib\ distutils\msvc9compiler.py ", line 358, in initialize vc_env = query_vcvarsall(VERSION, plat_spec) File "C:\Python26\lib\ distutils\msvc9compiler.py ", line 274, in query_vcvarsall raise ValueError(str(list(result.keys()))) ValueError: [u'path'] --------------------- Python community discussed a lot but no solution: http://bugs.python.org/issue7511 The root cause is because the latest visual studio change the *.bat file a lot especially on 64bit env. The python 2.7 didn't update the path accordingly. Based on the assumption above, the following solution worked for me. To install Visual Studio 2008 Express Edition with all required components: 1. Install Microsoft Visual Studio 2008 Express Edition. The main Visual Studio 2008 Express installer is available from (the C++ installer name is vcsetup.exe): https://ww

How to convert the ResultSet to Stream

Java 8 provided the Stream family and easy operation of it. The way of pipeline usage made the code clear and smart. However, ResultSet is still go with very legacy way to process. Per actual ResultSet usage, it is really helpful if converted as Stream. Here is the simple usage of above: StreamUtils.uncheckedConsumer is required to convert the the SQLException to runtimeException to make the Lamda clear.