Skip to content
Design a Rate Limiter — System Design Guide

Design a Rate Limiter — System Design Guide

DodaTech Updated Jun 15, 2026 5 min read

A rate limiter controls how many requests a client can make in a given time window. It prevents abuse, protects backend services, and ensures fair resource allocation across all users.

What You’ll Learn

You’ll master token bucket and sliding window algorithms, distributed rate limiting with Redis, per-user vs global limits, and response headers. You’ll implement a rate limiter in Python.

Why This Problem Matters

Every major API uses rate limiting — Twitter (300 req/15min), GitHub (5000 req/hour), Stripe (100 req/sec). Without it, a single abusive client can degrade service for everyone. At DodaTech, rate limiting protects DodaZIP cloud sync and Durga Antivirus Pro update servers.

Rate Limiter Architecture

    flowchart TB
  Client[Client] -->|Request| API[API Gateway]
  API --> RL[Rate Limiter Middleware]
  RL --> Counter[(Redis Counter)]
  RL --> Config[(Limits Config)]
  Config -->|Per user limits| RL
  Config -->|Global limits| RL
  RL -->|Within limit| Backend[Backend Service]
  RL -->|Exceeded| Response[429 Too Many Requests]
  

Token Bucket Algorithm

The token bucket is the most common rate-limiting algorithm:

import time
import threading

class TokenBucket:
    def __init__(self, rate: float, capacity: int):
        self.rate = rate           # Tokens added per second
        self.capacity = capacity   # Max tokens in bucket
        self.tokens = capacity     # Current tokens
        self.last_refill = time.time()
        self.lock = threading.Lock()

    def allow_request(self) -> bool:
        with self.lock:
            self._refill()
            if self.tokens >= 1:
                self.tokens -= 1
                return True
            return False

    def _refill(self):
        now = time.time()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
        self.last_refill = now


# Usage: 10 requests per second, burst up to 20
limiter = TokenBucket(rate=10, capacity=20)

for i in range(30):
    if limiter.allow_request():
        print(f"Request {i}: Allowed")
    else:
        print(f"Request {i}: Rate limited")
    time.sleep(0.05)

Sliding Window Log

Instead of a fixed window (which allows bursts at boundaries), track timestamps in a sliding window:

import time
from collections import deque

class SlidingWindowLimiter:
    def __init__(self, window_size: float, max_requests: int):
        self.window_size = window_size  # e.g., 60 seconds
        self.max_requests = max_requests
        self.requests = deque()

    def allow_request(self) -> bool:
        now = time.time()
        # Remove expired timestamps
        while self.requests and self.requests[0] < now - self.window_size:
            self.requests.popleft()

        if len(self.requests) < self.max_requests:
            self.requests.append(now)
            return True
        return False

Distributed Rate Limiting with Redis

For multi-server deployments, use Redis as a centralized counter:

import redis.asyncio as redis

class RedisSlidingWindow:
    def __init__(self, redis_client, window_size: int = 60, max_requests: int = 100):
        self.redis = redis_client
        self.window_size = window_size
        self.max_requests = max_requests

    async def allow_request(self, client_id: str) -> bool:
        key = f"ratelimit:{client_id}"
        now = int(time.time() * 1000)  # Milliseconds
        window_start = now - self.window_size * 1000

        pipe = self.redis.pipeline()
        pipe.zremrangebyscore(key, 0, window_start)  # Remove old entries
        pipe.zcard(key)                                # Count current entries
        pipe.zadd(key, {str(now): now})               # Add current request
        pipe.expire(key, self.window_size * 2)          # Cleanup TTL
        results = await pipe.execute()

        count = results[1]  # zcard result
        return count <= self.max_requests

Per-User vs Global Limits

TypeScopeExample
Per-userEach user has their own limit100 req/min per API key
GlobalAll clients share one limit10000 req/min total
Per-IPBased on client IP10 req/sec per IP
Endpoint-specificDifferent limits per routePOST /login: 5 req/min

Response Headers

Inform clients about their rate limit status:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1623456789

HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1623456789
Retry-After: 45

Common Mistakes

1. Fixed Window Burst Problem

A fixed 100 req/min window lets clients send all 100 requests in the first second, then 0 for 59 seconds. Use sliding window or token bucket for smooth limits.

2. Not Using Atomic Operations in Redis

Client A and B both read count=99, both increment to 100, both pass. Use Redis atomic operations (INCR, ZADD with pipelining) or Lua scripts.

3. Clock Drift in Distributed Systems

Servers with unsynchronized clocks produce inconsistent rate limits. Use Redis (single source of truth) or NTP-synchronized servers.

4. No Rate Limit on Login Endpoints

Brute-force attacks target /login. Always apply stricter limits (5 req/min per IP) on authentication endpoints.

5. Ignoring Redis Failures

If Redis goes down, all requests pass through unlimited. Implement a fallback: local in-memory limiter with reduced capacity.

6. Rate Limiting Pre-Request Processing

Rate limiting should happen at the API gateway before authentication. Otherwise, you waste auth resources on rate-limited requests.

7. Not Communicating Limits to Developers

Publish rate limits in API docs. Provide X-RateLimit-Remaining headers. Developers build retry logic around these headers.

Practice Questions

1. What problem does the token bucket algorithm solve?

It allows smooth traffic with burst tolerance. Clients can burst up to the bucket capacity, then settle to the steady-state rate.

2. Why use a sliding window instead of a fixed window?

A fixed window allows all requests at the window edge. A sliding window distributes the limit evenly across the time window.

3. How do you implement rate limiting across multiple servers?

Use a centralized Redis counter. All servers read/write the same Redis key. Use atomic operations to prevent race conditions.

4. What does the Retry-After header tell the client?

How many seconds to wait before retrying. The client should respect this and not retry immediately.

5. Challenge: Implement hierarchical rate limiting.

A user has 1000 req/hour total, 100 req/min, and 10 req/sec. A request must pass all three limits. Use a multi-bucket approach.

Mini Project: Rate Limited API

Build a Flask/FastAPI application with:

  1. Redis-based sliding window rate limiter middleware
  2. 10 req/sec per user (identified by API key)
  3. 100 req/min per user
  4. Proper X-RateLimit-* response headers
  5. 429 JSON response with Retry-After
  6. Graceful degradation (local fallback if Redis is down)

What’s Next

Congratulations on completing this Rate Limiter design! Here’s where to go from here:

  • Practice daily — Implement one algorithm per day
  • Build a project — Create a production-grade rate limiter
  • Explore related topics — Distributed counters, Lua scripting in Redis
  • Join the community — Share your implementations and get feedback

Remember: every expert was once a beginner. Keep limiting!

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro