High Level Design (HLD) to Design a Rate Limiter API

Requirements to Design a Rate Limiter API

Where to place the Rate Limiter – Client Side or Server Side?

A rate limiter should generally be implemented on the server side rather than on the client side. This is because of the following points:

Positional Advantage: The server is in a better position to enforce rate limits across all clients, whereas client-side rate limiting would require every client to implement their own rate limiter, which would be difficult to coordinate and enforce consistently.

Security: Implementing rate limiting on the server side also provides better security, as it allows the server to prevent malicious clients from overwhelming the system with a large number of requests. If rate limiting were implemented on the client side, it would be easier for attackers to bypass the rate limit by just modifying or disabling the client-side code.

Flexible: Server-side rate limiting allows more flexibility in adjusting the rate limits and managing resources. The server can dynamically adjust the rate limits based on traffic patterns and resource availability, and can also prioritize certain types of requests or clients over others. Thus, lends to better utilization of available resources, and also keeps performance good.

HLD of Rate Limiter API – rate limiter placed at server side

The overall basic structure of a rate limiter seems relatively simpler. We just need a counter associated with each user to track how many requests are being same submitted in a particular timeframe. The request is rejected if the counter value hits the limit.

Memory Structure/Approximation

Thus, now let’s think of the data structure which might help us. Since we need fast retrieval of the counter values associated with each user, we can use a hash-table. Considering we have a key-value pair. The key would contain hash value of each User Id, and the corresponding value would be the pair or structure of counter and the startTime, e.g.,
UserId -> {counter, startTime}

Now, each UserId let’s say takes 8 bytes(long long) and the counter takes 2 bytes(int), which for now can count to 50k(limit). Now for the time if we store only the minute and seconds, it will also take 2 bytes. So in total, we would need 12 bytes to store each user’s data.

Now considering the overhead of 10 bytes for each record in our hash-table, we would be needing to track at least 5 million users at any time(traffic), so the total memory in need would be:
(12+10)bytes*5 million = 110 MB

Key Components in the Rate Limiter

Define the rate limiting policy: The first step is to determine the policy for rate limiting. This policy should include the maximum number of requests allowed per unit of time, the time window for measuring requests, and the actions to be taken when a limit is exceeded (e.g., return an error code or delay the request).
Store request counts: The rate limiter API should keep track of the number of requests made by each client. One way to do this is to use a database, such as Redis or Cassandra, to store the request counts.
Identify the client: The API must identify each client that makes a request. This can be done using a unique identifier such as an IP address or an API key.
Handle incoming requests: When a client makes a request, the API should first check if the client has exceeded their request limit within the specified time window. If the limit has been reached, the API can take the action specified in the rate-limiting policy (e.g., return an error code). If the limit has not been reached, the API should update the request count for the client and allow the request to proceed.
Set headers: When a request is allowed, the API should set appropriate headers in the response to indicate the remaining number of requests that the client can make within the time window, as well as the time at which the limit will be reset.
Expose an endpoint: Finally, the rate limiter API should expose an endpoint for clients to check their current rate limit status. This endpoint can return the number of requests remaining within the time window, as well as the time at which the limit will be reset.

Where should we keep the counters?

Due to the slowness of Database operations, it is not a smart option for us. This problem can be handled by an in-memory cache such as Redis. It is quick and supports the already implemented time-based expiration technique.

We can rely on two commands being used with in-memory storage,

INCR: This is used for increasing the stored counter by 1.

EXPIRE: This is used for setting the timeout on the stored counter. This counter is automatically deleted from the storage when the timeout expires.

In this design, client requests pass through a rate limiter middleware, which checks against the configured rate limits. The rate limiter module stores and retrieves rate limit data from a backend storage system. If a client exceeds a rate limit, the rate limiter module returns an appropriate response to the client.

How to Design a Rate Limiter API | Learn System Design

A Rate Limiter API is a tool that developers can use to define rules that specify how many requests can be made in a given time period and what actions should be taken when these limits are exceeded.

Rate limiting is an essential technique used in software systems to control the rate of incoming requests. It helps to prevent the overloading of servers by limiting the number of requests that can be made in a given time frame.

It helps to prevent a high volume of requests from overwhelming a server or API. Here is a basic design for a rate limiter API In this article, we will discuss the design of a rate limiter API, including its requirements, high-level design, and algorithms used for rate limiting.