Efficient Database Storage & Retrieval of TinyURL

Let’s discuss the mapping of a long URL into a short URL in our database:

1. Using Base62 Encoding

Assume we generate the Tiny URL using base62 encoding then we need to perform the steps given below:

  • The tiny URL should be unique so firstly check the existence of this tiny URL in the database (doing get(tiny) on DB). If it’s already present there for some other long URL then generate a new short URL.
  • If the short URL isn’t present in DB then put the long URL and TinyURL in DB (put(TinyURL, long URL)).

This technique works with one server very well but if there will be multiple servers then this technique will create a race condition.

  • When multiple servers will work together, there will be a possibility that they all can generate the same unique id or the same tiny URL for different long URLs
  • Even after checking the database, they will be allowed to insert the same tiny URLs simultaneously in the database and this may end up corrupting the data. 

We can use putIfAbsent(TinyURL, long URL) or INSERT-IF-NOT-EXIST condition while inserting the tiny URL but this requires support from DB which is available in RDBMS but not in NoSQL.  

2. Using MD5 Approach

  • Encode the long URL using the MD5 approach and take only the first 7 chars to generate TinyURL.
  • The first 7 characters could be the same for different long URLs so check the DB (as we have discussed in Technique 1) to verify that TinyURL is not used already

This approach saves some space in the database but how?

  • If two users want to generate a tiny URL for the same long URL then the first technique will generate two random numbers and it requires two rows in the database
  • In the second technique, both the longer URL will have the same MD5 so it will have the same first 43 bits
  • This means we will get some deduping and we will end up with saving some space since we only need to store one row instead of two rows in the database.

Note: We can use putIfAbsent but NoSQL doesn’t support this feature. So let’s move to the third technique to solve this problem.

3. Using Counter Approach

Using a counter is a good decision for a scalable solution because counters always get incremented so we can get a new value for every new request. 

Single server approach:  

  • A single host or server (say database) will be responsible for maintaining the counter.
  • When the worker host receives a request it talks to the counter host, which returns a unique number and increments the counter. When the next request comes the counter host again returns the unique number and this goes on.
  • Every worker host gets a unique number which is used to generate TinyURL.

Now Let’s Discuss a problem:

If the counter host goes down for some time then it will create a problem, also if the number of requests will be high then the counter host might not be able to handle the load. So challenges are a Single point of failure and a single point of bottleneck. 

what if there are multiple servers? 

You can’t maintain a single counter and returns the output to all the servers. To solve this problem we can use multiple internal counters for multiple servers which use different counter ranges.

For example: Server 1 ranges from 1 to 1M, server 2 ranges from 1M to 10M, and so on.

But still there are some problems with this architecture such as:

  • If one of the counters goes down then for another server it will be difficult to get the range of the failure counter and maintain it again.
  • If one counter reaches its maximum limit then resetting the counter will be difficult because there is no single host available for coordination among all these multiple servers.
  • The architecture will be messed up since we don’t know which server is the master or which one is a slave and which one is responsible for coordination and synchronization. 

Now let’s discuss the solution

To solve this problem we can use a distributed service Zookeeper to manage all these tedious tasks and to solve the various challenges of a distributed system like a race condition, deadlock, or particle failure of data.

Zookeeper is basically a distributed coordination service that manages a large set of hosts. It keeps track of all the things such as the naming of the servers, active servers, dead servers, and configuration information of all the hosts. It provides coordination and maintains the synchronization between the multiple servers. 

Let’s discuss how to maintain a counter for distributed hosts using Zookeeper. 

  • From 3.5 trillion combinations take 1st billion combinations.
  • In Zookeeper maintain the range and divide the 1st billion into 1000 ranges of 1 million each i.e. range 1->(1 – 1,000,000), range 2->(1,000,001 – 2,000,000)…. range 1000->(999,000,001 – 1,000,000,000)
  • When servers will be added these servers will ask for the unused range from Zookeepers.
    • Suppose the W1 server is assigned range 1, now W1 will generate the tiny URL incrementing the counter and using the encoding technique.
    • Every time it will be a unique number so there is no possibility of collision and also there is no need to keep checking the DB to ensure that the URL already exists or not.
    • We can directly insert the mapping of a long URL and a short URL into the DB.
  • In the worst case, if one of the servers goes down then we will only lose a million combinations in Zookeeper (which will be unused and we can’t reuse it as well) but since we have 3.5 trillion combinations we should not worry about losing this combination.
  • If one of the servers will reach its maximum range or limit then again it can take a new fresh range from Zookeeper.
  • The addition of a new server is also easy. Zookeeper will assign an unused counter range to this new server.
  • We will take the 2nd billion when the 1st billion is exhausted to continue the process.

System Design | URL Shortner (bit.ly, TinyURL, etc)

The need for efficient and concise URL management has become a big matter in this technical age. URL shortening services, such as bit.ly, and TinyURL, play a massive role in transforming lengthy web addresses into shorter, shareable links. As the demand for such services grows, it has become vital to undertand the System Design of URL Shortner and mastering the art of designing a scalable and reliable URL-shortening system, to gain a crucial skill for software engineers.

This article gets into the System Design of URL Shortner (URL Shortening Service), which will help in architecting a robust system that can seamlessly generate and redirect short URLs while ensuring scalability, durability, and high availability.

Important Topics for System Design of URL Shortner (URL Shortening Service)

  • How to Design a URL Shortener Service Like TinyURL?
  • Requirements for System Design of URL Shortner
  • Capacity estimation for System Design of URL Shortner
  • Low Level Design for System Design of URL Shortner
    • URL Shortening Logic (Encoding)
    • Techniques to Generate and Store TinyURL 
  • High-level Design for System Design of URL Shortner
  • Database
  • Conclusion

Similar Reads

How Would You Design a URL Shortener Service Like TinyURL?

URL shortening services like bit.ly or TinyURL are very popular to generate shorter aliases for long URLs. You need to design this kind of web service where if a user gives a long URL then the service returns a short URL and if the user gives a short URL then it returns the original long URL....

1. Requirements for System Design of URL Shortner Service

1.1 Functional requirements of URL Shortening service...

2. Capacity estimation for System Design of URL Shortner

Let’s assume our service has 30M new URL shortenings per month. Let’s assume we store every URL shortening request (and associated shortened link) for 5 years. For this period the service will generate about 1.8 B records....

3. Low Level Design for System Design of URL Shortner

...

3.1 URL Encoding Techniques to create Shortened URL

To convert a long URL into a unique short URL we can use some hashing techniques like Base62 or MD5. We will discuss both approaches....

3.2 Efficient Database Storage & Retrieval of TinyURL

...

4. High-level Design of a URL-Shortening Service

...

5. Database

Let’s discuss the mapping of a long URL into a short URL in our database:...

Conclusion

...

Contact Us