Algorithms to Design a Rate Limiter API

Several algorithms are used for rate limiting, including 

  • The Token bucket, 
  • Leaky bucket, 
  • Sliding window logs, and 
  • Sliding window counters. 

Let’s discuss each algorithm in detail:

Token Bucket

The token bucket algorithm is a simple algorithm that uses a fixed-size token bucket to limit the rate of incoming requests. The token bucket is filled with tokens at a fixed rate, and each request requires a token to be processed. If the bucket is empty, the request is rejected.

The token bucket algorithm can be implemented using the following steps:

  • Initialize the token bucket with a fixed number of tokens.
  • For each request, remove a token from the bucket.
  • If there are no tokens left in the bucket, reject the request.
  • Add tokens to the bucket at a fixed rate.

Thus, by allocating a bucket with a predetermined number of tokens for each user, we are successfully limiting the number of requests per user per time unit. When the counter of tokens comes down to 0 for a certain user, we know that he or she has reached the maximum amount of requests in a particular timeframe. The bucket will be auto-refilled whenever the new timeframe starts.

Token  bucket example with initial bucket token count of 3 for each user in one minute

Leaky Bucket

It is based on the idea that if the average rate at which water is poured exceeds the rate at which the bucket leaks, the bucket will overflow.

The leaky bucket algorithm is similar to the token bucket algorithm, but instead of using a fixed-size token bucket, it uses a leaky bucket that empties at a fixed rate. Each incoming request adds to the bucket’s depth, and if the bucket overflows, the request is rejected.

One way to implement this is using a queue, which corresponds to the bucket that will contain the incoming requests. Whenever a new request is made, it is added to the queue’s end. If the queue is full at any time, then the additional requests are discarded.

The leaky bucket algorithm can be separated into the following concepts:

  • Initialize the leaky bucket with a fixed depth and a rate at which it leaks.
  • For each request, add to the bucket’s depth.
  • If the bucket’s depth exceeds its capacity, reject the request.
  • Leak the bucket at a fixed rate.

Leaky bucket example with token count per user per minute is 3, which is the queue size.

Sliding Window Logs

Another approach to rate limiting is to use sliding window logs. This data structure involves a “window” of fixed size that slides along a timeline of events, storing information about the events that fall within the window at any given time.

The window can be thought of as a buffer of limited size that holds the most recent events or changes that have occurred. As new events or changes occur, they are added to the buffer, and old events that fall outside of the window are removed. This ensures that the buffer stays within its fixed size, and only contains the most recent events.

This rate limitation keeps track of each client’s request in a time-stamped log. These logs are normally stored in a time-sorted hash set or table. 

The sliding window logs algorithm can be implemented using the following steps:

  • A time-sorted queue or hash table of timestamps within the time range of the most recent window is maintained for each client making the requests.
  • When a certain length of the queue is reached or after a certain number of minutes, whenever a new request comes, a check is done for any timestamps older than the current window time.
  • The queue is updated with new timestamp of incoming request and if number of elements in queue does not exceed the authorised count, it is proceeded otherwise an exception is triggered.

Sliding window logs in a timeframe of 1 minute

Sliding Window Counters

The sliding window counter algorithm is an optimization over sliding window logs. As we can see in the previous approach, memory usage is high. For example, to manage numerous users or huge window timeframes, all the request timestamps must be kept for a window time, which eventually uses a huge amount of memory. Also, removing numerous timestamps older than a particular timeframe means high complexity of time as well.

To reduce surges of traffic, this algorithm accounts for a weighted value of the previous window’s request based on timeframe. If we have a one-minute rate limit, we can record the counter for each second and calculate the sum of all counters in the previous minute whenever we get a new request to determine the throttling limit.

The sliding window counters can be separated into the following concepts:

  • Remove all counters which are more than 1 minute old.
  • If a request comes which falls in the current bucket, the counter is increased.
  • If a request comes when the current bucket has reached it’s throat limit, the request is blocked.

sliding window counters with a timeframe of 20 seconds

How to Design a Rate Limiter API | Learn System Design

A Rate Limiter API is a tool that developers can use to define rules that specify how many requests can be made in a given time period and what actions should be taken when these limits are exceeded. 

Rate limiting is an essential technique used in software systems to control the rate of incoming requests. It helps to prevent the overloading of servers by limiting the number of requests that can be made in a given time frame. 

It helps to prevent a high volume of requests from overwhelming a server or API. Here is a basic design for a rate limiter API In this article, we will discuss the design of a rate limiter API, including its requirements, high-level design, and algorithms used for rate limiting.

Similar Reads

Why is rate limiting used?

Avoid resource starvation due to a Denial of Service (DoS) attack. Ensure that servers are not overburdened. Using rate restriction per user ensures fair and reasonable use without harming other users. Control the flow of information, for example, prevent a single worker from accumulating a backlog of unprocessed items while other workers are idle....

Requirements to Design a Rate Limiter API

The requirements of a rate limiter API can be classified into two categories: functional and non-functional....

High Level Design (HLD) to Design a Rate Limiter API

Where to place the Rate Limiter – Client Side or Server Side?...

Algorithms to Design a Rate Limiter API

Several algorithms are used for rate limiting, including...

Examples of Rate Limiting APIs used worldwide

Google Cloud Endpoints: It is a platform for building APIs that includes a built-in rate limiter to help prevent excessive API usage. AWS API Gateway: Amazon Web Services (AWS) API Gateway includes a feature called Usage Plans that allows for rate limiting and throttling of API requests. Akamai API Gateway: Akamai API Gateway is a cloud-based platform that includes a rate limiter feature for controlling API requests. Cloudflare Rate Limiting: Cloudflare’s Rate Limiting feature helps prevent DDoS attacks and other types of abusive traffic by limiting the number of requests that can be made to an API. Redis: It is an in-memory data structure store that can be used as a database, cache, and message broker. It includes several features that make it useful for implementing a rate limiter, such as its ability to store data in memory for fast access and its support for atomic operations....

Contact Us