Design a Rate Limiter — System Design Guide
A rate limiter controls how many requests a client can make in a given time window. It prevents abuse, protects backend services, and ensures fair resource allocation across all users.
What You’ll Learn
You’ll master token bucket and sliding window algorithms, distributed rate limiting with Redis, per-user vs global limits, and response headers. You’ll implement a rate limiter in Python.
Why This Problem Matters
Every major API uses rate limiting — Twitter (300 req/15min), GitHub (5000 req/hour), Stripe (100 req/sec). Without it, a single abusive client can degrade service for everyone. At DodaTech, rate limiting protects DodaZIP cloud sync and Durga Antivirus Pro update servers.
Rate Limiter Architecture
flowchart TB
Client[Client] -->|Request| API[API Gateway]
API --> RL[Rate Limiter Middleware]
RL --> Counter[(Redis Counter)]
RL --> Config[(Limits Config)]
Config -->|Per user limits| RL
Config -->|Global limits| RL
RL -->|Within limit| Backend[Backend Service]
RL -->|Exceeded| Response[429 Too Many Requests]
Token Bucket Algorithm
The token bucket is the most common rate-limiting algorithm:
import time
import threading
class TokenBucket:
def __init__(self, rate: float, capacity: int):
self.rate = rate # Tokens added per second
self.capacity = capacity # Max tokens in bucket
self.tokens = capacity # Current tokens
self.last_refill = time.time()
self.lock = threading.Lock()
def allow_request(self) -> bool:
with self.lock:
self._refill()
if self.tokens >= 1:
self.tokens -= 1
return True
return False
def _refill(self):
now = time.time()
elapsed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
self.last_refill = now
# Usage: 10 requests per second, burst up to 20
limiter = TokenBucket(rate=10, capacity=20)
for i in range(30):
if limiter.allow_request():
print(f"Request {i}: Allowed")
else:
print(f"Request {i}: Rate limited")
time.sleep(0.05)Sliding Window Log
Instead of a fixed window (which allows bursts at boundaries), track timestamps in a sliding window:
import time
from collections import deque
class SlidingWindowLimiter:
def __init__(self, window_size: float, max_requests: int):
self.window_size = window_size # e.g., 60 seconds
self.max_requests = max_requests
self.requests = deque()
def allow_request(self) -> bool:
now = time.time()
# Remove expired timestamps
while self.requests and self.requests[0] < now - self.window_size:
self.requests.popleft()
if len(self.requests) < self.max_requests:
self.requests.append(now)
return True
return FalseDistributed Rate Limiting with Redis
For multi-server deployments, use Redis as a centralized counter:
import redis.asyncio as redis
class RedisSlidingWindow:
def __init__(self, redis_client, window_size: int = 60, max_requests: int = 100):
self.redis = redis_client
self.window_size = window_size
self.max_requests = max_requests
async def allow_request(self, client_id: str) -> bool:
key = f"ratelimit:{client_id}"
now = int(time.time() * 1000) # Milliseconds
window_start = now - self.window_size * 1000
pipe = self.redis.pipeline()
pipe.zremrangebyscore(key, 0, window_start) # Remove old entries
pipe.zcard(key) # Count current entries
pipe.zadd(key, {str(now): now}) # Add current request
pipe.expire(key, self.window_size * 2) # Cleanup TTL
results = await pipe.execute()
count = results[1] # zcard result
return count <= self.max_requestsPer-User vs Global Limits
| Type | Scope | Example |
|---|---|---|
| Per-user | Each user has their own limit | 100 req/min per API key |
| Global | All clients share one limit | 10000 req/min total |
| Per-IP | Based on client IP | 10 req/sec per IP |
| Endpoint-specific | Different limits per route | POST /login: 5 req/min |
Response Headers
Inform clients about their rate limit status:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1623456789
HTTP/1.1 429 Too Many Requests
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1623456789
Retry-After: 45Common Mistakes
1. Fixed Window Burst Problem
A fixed 100 req/min window lets clients send all 100 requests in the first second, then 0 for 59 seconds. Use sliding window or token bucket for smooth limits.
2. Not Using Atomic Operations in Redis
Client A and B both read count=99, both increment to 100, both pass. Use Redis atomic operations (INCR, ZADD with pipelining) or Lua scripts.
3. Clock Drift in Distributed Systems
Servers with unsynchronized clocks produce inconsistent rate limits. Use Redis (single source of truth) or NTP-synchronized servers.
4. No Rate Limit on Login Endpoints
Brute-force attacks target /login. Always apply stricter limits (5 req/min per IP) on authentication endpoints.
5. Ignoring Redis Failures
If Redis goes down, all requests pass through unlimited. Implement a fallback: local in-memory limiter with reduced capacity.
6. Rate Limiting Pre-Request Processing
Rate limiting should happen at the API gateway before authentication. Otherwise, you waste auth resources on rate-limited requests.
7. Not Communicating Limits to Developers
Publish rate limits in API docs. Provide X-RateLimit-Remaining headers. Developers build retry logic around these headers.
Practice Questions
1. What problem does the token bucket algorithm solve?
It allows smooth traffic with burst tolerance. Clients can burst up to the bucket capacity, then settle to the steady-state rate.
2. Why use a sliding window instead of a fixed window?
A fixed window allows all requests at the window edge. A sliding window distributes the limit evenly across the time window.
3. How do you implement rate limiting across multiple servers?
Use a centralized Redis counter. All servers read/write the same Redis key. Use atomic operations to prevent race conditions.
4. What does the Retry-After header tell the client?
How many seconds to wait before retrying. The client should respect this and not retry immediately.
5. Challenge: Implement hierarchical rate limiting.
A user has 1000 req/hour total, 100 req/min, and 10 req/sec. A request must pass all three limits. Use a multi-bucket approach.
Mini Project: Rate Limited API
Build a Flask/FastAPI application with:
- Redis-based sliding window rate limiter middleware
- 10 req/sec per user (identified by API key)
- 100 req/min per user
- Proper
X-RateLimit-*response headers - 429 JSON response with
Retry-After - Graceful degradation (local fallback if Redis is down)
What’s Next
Congratulations on completing this Rate Limiter design! Here’s where to go from here:
- Practice daily — Implement one algorithm per day
- Build a project — Create a production-grade rate limiter
- Explore related topics — Distributed counters, Lua scripting in Redis
- Join the community — Share your implementations and get feedback
Remember: every expert was once a beginner. Keep limiting!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro