Distributed Caching — Redis Cluster, Memcached, and Multi-Tier Cache Strategies
Distributed caching spreads cached data across multiple nodes to provide low-latency data access at scale, reducing database load and improving application response times. This guide covers Redis Cluster architecture, Memcached optimization, caching patterns, eviction policies, stampede prevention, and multi-tier designs used by companies serving billions of requests daily.
Why Distributed Caching Matters
A single Redis instance handles ~100K ops/sec. Twitter needs millions of timeline reads per second. A distributed cache scales horizontally — add more nodes, get more throughput. Every 100ms of latency Amazon observed cost 1% in revenue. Caching turns 20ms database queries into <1ms cache lookups. At DodaTech, Redis-based caching accelerates file metadata lookups in DodaZIP and signature database queries in Durga Antivirus Pro.
Cache Cluster Architecture
graph TD
App[Application] --> Router[Cache Router]
Router -->|Hash slot 0-5460| R1[Redis Node 1<br/>Master]
Router -->|Hash slot 5461-10922| R2[Redis Node 2<br/>Master]
Router -->|Hash slot 10923-16383| R3[Redis Node 3<br/>Master]
R1 --- R1R[Replica]
R2 --- R2R[Replica]
R3 --- R3R[Replica]
App --> L1[L1: Local Cache<br/>In-Process Memory]
L1 -->|Miss| Router
style R1 fill:#3498db,color:#fff
style R2 fill:#e67e22,color:#fff
style R3 fill:#27ae60,color:#fff
style L1 fill:#9b59b6,color:#fff
Redis Cluster Architecture
Redis Cluster uses hash slots instead of consistent hashing. The keyspace is divided into 16,384 slots. Each node owns a subset. When a node is added or removed, slots are migrated between nodes — only affected keys move.
# Redis Cluster client
from redis.cluster import RedisCluster
rc = RedisCluster(
startup_nodes=[
{"host": "127.0.0.1", "port": 7000},
{"host": "127.0.0.1", "port": 7001},
{"host": "127.0.0.1", "port": 7002},
],
decode_responses=True
)
# Keys are automatically routed to the correct node
rc.set("user:42:profile", '{"name": "Alice", "tier": "premium"}')
rc.set("user:99:profile", '{"name": "Bob", "tier": "free"}')
# Cluster-aware get
profile = rc.get("user:42:profile")
print(f"User profile: {profile}")Memcached vs Redis
| Feature | Redis | Memcached |
|---|---|---|
| Data types | Strings, lists, sets, hashes, streams | Strings only |
| Persistence | RDB snapshots, AOF logs | None |
| Replication | Master-replica, Cluster | Multi-node with consistent hashing |
| Memory efficiency | Overhead per key (~200 bytes) | Lower overhead |
| Eviction | 8+ policies (LRU, LFU, TTL, etc.) | LRU only |
| Use case | General-purpose cache + data structures | Simple key-value, large objects |
| Max key size | 512 MB | 1 MB |
Use Redis when you need data structures, persistence, or pub/sub. Use Memcached for simple key-value caching of large blobs where simplicity and memory efficiency matter most.
Caching Patterns
Cache-Aside (Lazy Loading)
The application checks cache first, loads from DB on miss, and populates the cache for next time. This is the most common pattern.
def get_user(user_id: int) -> dict:
# Check cache first
cached = redis.get(f"user:{user_id}")
if cached:
return json.loads(cached)
# Cache miss — load from database
user = db.query("SELECT * FROM users WHERE id = ?", user_id)
# Populate cache with TTL
redis.setex(f"user:{user_id}", 300, json.dumps(user))
return userRead-Through
The cache library itself handles cache misses transparently. The application only talks to the cache.
from cachetools import cached, TTLCache
cache = TTLCache(maxsize=1000, ttl=300)
@cached(cache)
def get_expensive_data(key: str) -> dict:
# The application never calls DB directly
return db.query("SELECT * FROM expensive_view WHERE key = ?", key)Write-Through
Every write goes through the cache to the database. Data is always consistent but writes are slower.
def write_through(key: str, value: dict):
db.save(key, value) # Write to database
cache.set(key, value) # Update cache synchronouslyWrite-Behind (Write-Back)
Writes go to cache first and are asynchronously batched to the database. Fast writes but risk of data loss if cache fails before persistence.
import asyncio
write_buffer = []
async def write_behind(key: str, value: dict):
cache.set(key, value)
write_buffer.append((key, value))
async def flush_buffer():
while True:
if write_buffer:
batch = write_buffer[:100]
del write_buffer[:100]
db.batch_save(batch) # Batch insert to database
await asyncio.sleep(1)Cache Eviction Policies
| Policy | Strategy | Best For |
|---|---|---|
| LRU | Evicts least recently used | General purpose |
| LFU | Evicts least frequently used | Skewed access patterns (few hot keys) |
| TTL | Evicts entries after time expires | Time-bounded data |
| FIFO | Evicts oldest first | Simple, predictable workloads |
| Random | Random eviction | Testing |
Redis defaults to noeviction (returns errors on writes when full). For production, set maxmemory-policy allkeys-lru.
# redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lruCache Stampede Prevention
A cache stampede occurs when a popular key expires and thousands of concurrent requests hit the database simultaneously. Three prevention strategies:
Mutex locking — only one thread reloads the cache; others wait for it.
import threading
lock = threading.Lock()
def get_expensive(key: str) -> dict:
cached = redis.get(key)
if cached:
return json.loads(cached)
with lock: # Only one thread enters
# Double-check after acquiring lock
cached = redis.get(key)
if cached:
return json.loads(cached)
value = expensive_computation()
redis.setex(key, 300, json.dumps(value))
return valueProbabilistic early expiration — refresh the cache before it actually expires. If TTL is 300s, refresh at 250s with some probability.
Hot-standby cache — maintain a secondary cache that’s always slightly behind but never empty.
Multi-Tier Caching
Large systems use multiple cache layers:
- L1 (Local memory): Per-application-server cache (in-process map). Microsecond latency, limited by RAM per server.
- L2 (Distributed): Redis Cluster or Memcached. Sub-millisecond latency, shared across all servers.
- L3 (CDN): Edge servers for static content or API responses. Geographic distribution.
class MultiTierCache:
def __init__(self):
self.local = {} # L1: in-memory dict
self.redis = Redis() # L2: distributed cache
def get(self, key: str) -> dict:
# L1 check
if key in self.local:
return self.local[key]
# L2 check
cached = self.redis.get(key)
if cached:
self.local[key] = json.loads(cached) # Populate L1
return self.local[key]
return None
def set(self, key: str, value: dict, ttl: int = 300):
self.local[key] = value
self.redis.setex(key, ttl, json.dumps(value))Common Errors
No TTL on cache entries: Stale data lives forever. Eventually the cache diverges from the source of truth. Always set TTL.
Cache stampede without protection: A popular key expires and the database is hammered. Use mutex locks or probabilistic early expiration.
Caching the entire dataset: If your dataset fits in cache but 90% of requests hit 10% of keys, you’re wasting memory on cold data. Let LRU/LFU evict it.
Ignoring serialization overhead: JSON serialization/deserialization costs CPU. For high-throughput paths, use binary formats (MessagePack, Protocol Buffers).
Single point of failure: A single Redis instance fails = entire cache goes down. Use Redis Cluster or Sentinel for high availability.
No cache warmup: After a deployment, the cache is cold. First requests experience high latency. Pre-load hot keys on application startup.
Unbounded cache growth: Without eviction limits, the cache grows until it exhausts memory and crashes. Always configure
maxmemory.
Mini Project
Build a multi-tier cache for a user profile service:
import redis, json, time
from functools import lru_cache
redis_client = redis.Redis(decode_responses=True)
class ProfileCache:
def __init__(self):
self.local_hits = 0
self.redis_hits = 0
self.misses = 0
@lru_cache(maxsize=100) # L1: in-memory
def get_profile_l1(self, user_id: int) -> dict:
return self.get_profile_l2(user_id)
def get_profile_l2(self, user_id: int) -> dict:
cached = redis_client.get(f"profile:{user_id}")
if cached:
self.redis_hits += 1
return json.loads(cached)
return self.get_profile_db(user_id)
def get_profile_db(self, user_id: int) -> dict:
self.misses += 1
# Simulate database query
profile = {"id": user_id, "name": f"User_{user_id}", "loaded_at": time.time()}
redis_client.setex(f"profile:{user_id}", 300, json.dumps(profile))
return profile
cache = ProfileCache()
for _ in range(5):
p = cache.get_profile_l1(42)
print(f"L1 hits: {4}, Redis hits: {0}, DB misses: {1}")Cross-References
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro