Algorithms to Design a Rate Limiter API

High Level Design (HLD) to Design a Rate Limiter API

Examples of Rate Limiting APIs used worldwide

Several algorithms are used for rate limiting, including

The Token bucket,
Leaky bucket,
Sliding window logs, and
Sliding window counters.

Let’s discuss each algorithm in detail:

Token Bucket

The token bucket algorithm is a simple algorithm that uses a fixed-size token bucket to limit the rate of incoming requests. The token bucket is filled with tokens at a fixed rate, and each request requires a token to be processed. If the bucket is empty, the request is rejected.

The token bucket algorithm can be implemented using the following steps:

Initialize the token bucket with a fixed number of tokens.
For each request, remove a token from the bucket.
If there are no tokens left in the bucket, reject the request.
Add tokens to the bucket at a fixed rate.

Thus, by allocating a bucket with a predetermined number of tokens for each user, we are successfully limiting the number of requests per user per time unit. When the counter of tokens comes down to 0 for a certain user, we know that he or she has reached the maximum amount of requests in a particular timeframe. The bucket will be auto-refilled whenever the new timeframe starts.

Token bucket example with initial bucket token count of 3 for each user in one minute

Leaky Bucket

It is based on the idea that if the average rate at which water is poured exceeds the rate at which the bucket leaks, the bucket will overflow.

The leaky bucket algorithm is similar to the token bucket algorithm, but instead of using a fixed-size token bucket, it uses a leaky bucket that empties at a fixed rate. Each incoming request adds to the bucket’s depth, and if the bucket overflows, the request is rejected.

One way to implement this is using a queue, which corresponds to the bucket that will contain the incoming requests. Whenever a new request is made, it is added to the queue’s end. If the queue is full at any time, then the additional requests are discarded.

The leaky bucket algorithm can be separated into the following concepts:

Initialize the leaky bucket with a fixed depth and a rate at which it leaks.
For each request, add to the bucket’s depth.
If the bucket’s depth exceeds its capacity, reject the request.
Leak the bucket at a fixed rate.

Leaky bucket example with token count per user per minute is 3, which is the queue size.

Sliding Window Logs

Another approach to rate limiting is to use sliding window logs. This data structure involves a “window” of fixed size that slides along a timeline of events, storing information about the events that fall within the window at any given time.

The window can be thought of as a buffer of limited size that holds the most recent events or changes that have occurred. As new events or changes occur, they are added to the buffer, and old events that fall outside of the window are removed. This ensures that the buffer stays within its fixed size, and only contains the most recent events.

This rate limitation keeps track of each client’s request in a time-stamped log. These logs are normally stored in a time-sorted hash set or table.

The sliding window logs algorithm can be implemented using the following steps:

A time-sorted queue or hash table of timestamps within the time range of the most recent window is maintained for each client making the requests.
When a certain length of the queue is reached or after a certain number of minutes, whenever a new request comes, a check is done for any timestamps older than the current window time.
The queue is updated with new timestamp of incoming request and if number of elements in queue does not exceed the authorised count, it is proceeded otherwise an exception is triggered.

Sliding window logs in a timeframe of 1 minute

Sliding Window Counters

The sliding window counter algorithm is an optimization over sliding window logs. As we can see in the previous approach, memory usage is high. For example, to manage numerous users or huge window timeframes, all the request timestamps must be kept for a window time, which eventually uses a huge amount of memory. Also, removing numerous timestamps older than a particular timeframe means high complexity of time as well.

To reduce surges of traffic, this algorithm accounts for a weighted value of the previous window’s request based on timeframe. If we have a one-minute rate limit, we can record the counter for each second and calculate the sum of all counters in the previous minute whenever we get a new request to determine the throttling limit.

The sliding window counters can be separated into the following concepts:

Remove all counters which are more than 1 minute old.
If a request comes which falls in the current bucket, the counter is increased.
If a request comes when the current bucket has reached it’s throat limit, the request is blocked.

sliding window counters with a timeframe of 20 seconds

How to Design a Rate Limiter API | Learn System Design

A Rate Limiter API is a tool that developers can use to define rules that specify how many requests can be made in a given time period and what actions should be taken when these limits are exceeded.

Rate limiting is an essential technique used in software systems to control the rate of incoming requests. It helps to prevent the overloading of servers by limiting the number of requests that can be made in a given time frame.

It helps to prevent a high volume of requests from overwhelming a server or API. Here is a basic design for a rate limiter API In this article, we will discuss the design of a rate limiter API, including its requirements, high-level design, and algorithms used for rate limiting.