Rate Limiting and Throttling for APIs — Complete Guide
Rate Limiting is a technique that controls the number of requests a client can make to an API within a specific time window, preventing abuse, ensuring fair usage, and protecting backend resources from overload.
What You'll Learn
You will learn four Rate Limiting algorithms token bucket, leaky bucket, fixed window, and Sliding Window along with implementation examples, header conventions, and distributed Rate Limiting strategies.
Why Rate Limiting Matters
Without Rate Limiting, a single misbehaving client can consume all API resources, causing degraded performance for all other users. Rate Limiting protects against DDoS attacks, brute force login attempts, and runaway scripts. It also enables tiered pricing models where premium customers get higher limits.
Real-World Use
DodaTech implements Rate Limiting across all products. Doda Browser sync API allows 100 requests per minute for free users and 1000 for premium, DodaZIP update service uses Sliding Window limits for fair bandwidth distribution, and Durga Antivirus Pro threat reporting applies strict limits on submission endpoints.
Rate Limiting Learning Path
flowchart LR
A[Api Design Basics] --> B[Why Rate Limit?]
B --> C[Token Bucket]
B --> D[Leaky Bucket]
B --> E[Fixed Window]
B --> F[Sliding Window]
C --> G[Implementation]
D --> G
E --> G
F --> G
B:::current
classDef current fill:#f90,color:#fff,stroke:#333,stroke-width:2px
Prerequisites
Understand RESTful Api Design Best Practices and API Security Best Practices. Familiarity with Authentication Patterns JWT OAuth2 API Keys is helpful. Basic knowledge of Redis or in-memory data stores is recommended.
Algorithm 1: Token Bucket
The token bucket algorithm maintains a bucket of tokens that refills at a fixed rate.
How It Works
- A bucket holds a maximum number of tokens (burst limit)
- Tokens are added at a steady rate (refill rate)
- Each request consumes one token
- If the bucket is empty, the request is rejected
import time
import threading
class TokenBucket:
def __init__(self, capacity, refill_rate):
self.capacity = capacity # Max tokens (burst)
self.tokens = capacity # Current tokens
self.refill_rate = refill_rate # Tokens per second
self.last_refill = time.time()
self.lock = threading.Lock()
def allow_request(self):
with self.lock:
now = time.time()
elapsed = now - self.last_refill
self.tokens = min(self.capacity,
self.tokens + elapsed * self.refill_rate)
self.last_refill = now
if self.tokens >= 1:
self.tokens -= 1
return True
return False
# Usage: 10 requests per second, burst up to 20
bucket = TokenBucket(capacity=20, refill_rate=10)
for i in range(25):
if bucket.allow_request():
print(f"Request {i+1}: Allowed")
else:
print(f"Request {i+1}: Rate limited")
Expected output:
Request 1: Allowed
Request 2: Allowed
...
Request 20: Allowed
Request 21: Rate limited
Request 22: Rate limited
...
Algorithm 2: Leaky Bucket
The leaky bucket algorithm processes requests at a constant rate, smoothing out traffic bursts.
How It Works
- Requests enter a Queue (bucket)
- Requests are processed at a fixed rate (leak rate)
- If the Queue is full, new requests are rejected
from collections import deque
import time
class LeakyBucket:
def __init__(self, capacity, leak_rate):
self.capacity = capacity
self.queue = deque()
self.leak_rate = leak_rate # Requests processed per second
self.last_leak = time.time()
def allow_request(self):
now = time.time()
elapsed = now - self.last_leak
leaks = int(elapsed * self.leak_rate)
for _ in range(min(leaks, len(self.queue))):
self.queue.popleft()
self.last_leak = now
if len(self.queue) < self.capacity:
self.queue.append(now)
return True
return False
Algorithm 3: Fixed Window
The fixed window algorithm counts requests in discrete time windows.
import time
class FixedWindow:
def __init__(self, limit, window_seconds):
self.limit = limit
self.window_seconds = window_seconds
self.window_start = time.time()
self.count = 0
def allow_request(self):
now = time.time()
if now - self.window_start >= self.window_seconds:
self.window_start = now
self.count = 0
if self.count < self.limit:
self.count += 1
return True
return False
# Usage: 100 requests per minute
limiter = FixedWindow(limit=100, window_seconds=60)
Problem: At window boundaries, clients can double the request rate. If the limit is 100 per minute, a client can send 100 requests at 00:59 and 100 more at 01:00.
Algorithm 4: Sliding Window
The Sliding Window algorithm solves the fixed window boundary problem by tracking timestamps within a rolling window.
from collections import deque
import time
class SlidingWindow:
def __init__(self, limit, window_seconds):
self.limit = limit
self.window_seconds = window_seconds
self.timestamps = deque()
def allow_request(self):
now = time.time()
# Remove timestamps outside the window
while self.timestamps and self.timestamps[0] < now - self.window_seconds:
self.timestamps.popleft()
if len(self.timestamps) < self.limit:
self.timestamps.append(now)
return True
return False
Express.js Rate Limiting with express-rate-limit
const rateLimit = require("express-rate-limit");
const limiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per minute
message: {
error: "Too many requests",
retryAfter: "60 seconds"
},
headers: true, // Send rate limit headers
keyGenerator: (req) => {
return req.ip; // Rate limit by IP
}
});
// Apply to all routes
app.use("/api", limiter);
// Stricter limits for auth endpoints
const authLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 5, // 5 attempts
message: { error: "Too many login attempts" }
});
app.use("/api/auth/login", authLimiter);
Distributed Rate Limiting with Redis
For applications running on multiple servers, use Redis for centralized Rate Limiting.
const redis = require("redis");
const client = redis.createClient();
async function slidingWindowRedis(userId, limit, windowSeconds) {
const now = Date.now();
const key = `rate_limit:${userId}`;
const windowStart = now - windowSeconds * 1000;
// Remove old timestamps
await client.zRemRangeByScore(key, 0, windowStart);
// Count requests in window
const count = await client.zCard(key);
if (count < limit) {
await client.zAdd(key, { score: now, value: `${now}` });
await client.expire(key, windowSeconds);
return true; // Allowed
}
return false; // Rate limited
}
Response Headers
Standard rate limit headers inform clients of their usage:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1719104400
When rate limited (HTTP 429):
HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
{
"error": "Rate limit exceeded",
"retryAfter": 60,
"limit": 100,
"window": "1 minute"
}
Common Errors
Rate Limiting by IP only — IP-based limiting catches multiple users behind the same NAT. Combine IP and user-based limiting. Use authenticated user IDs as primary keys.
Not including Retry-After header — Failing to tell clients when to retry. Always include the
Retry-Afterheader in 429 responses so clients can implement exponential backoff.Window boundary spikes — Using fixed window algorithm allows double bursts at boundaries. Use Sliding Window or token bucket for smooth rate enforcement.
No distributed coordination — Running rate limiters on each server independently. A client hitting multiple servers can exceed the limit. Use Redis or similar centralized store.
Rate Limiting all endpoints equally — Auth endpoints need stricter limits than read endpoints. Apply different limits based on endpoint sensitivity.
Not logging rate limit events — Rate Limiting without monitoring hides abuse patterns. Log every rate limit trigger with client ID and endpoint.
Too aggressive throttling — Setting limits too low and blocking legitimate users. Monitor real usage patterns before setting limits. Start generous and tighten over time.
Practice Questions
- How does the token bucket algorithm differ from the leaky bucket algorithm?
- What problem does the Sliding Window algorithm solve that fixed window does not?
- Why is Redis necessary for distributed Rate Limiting?
- What headers should a rate-limited API response include?
- How should rate limits differ between auth and data endpoints?
Challenge
Implement a multi-tier Rate Limiting system for a SaaS API. Free tier: 10 requests per minute, 100 per hour. Pro tier: 100 requests per minute, 10000 per hour. Enterprise tier: 1000 requests per minute, unlimited hourly. Use Redis for distributed State, token bucket algorithm for smooth rate enforcement, and return proper rate limit headers. Include a mechanism for clients to check their current usage without consuming a request.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro