Learn System: Distributed Caching — Redis Cluster, Memcached, and Multi-Tier Cache Strategies

Q: **1. How does Redis Cluster distribute keys across nodes?**

Redis uses 16,384 hash slots. Each key is hashed with CRC16 modulo 16384 to determine its slot. Each node owns a range of slots. The client knows the slot-to-node mapping and routes directly.

Q: **2. What is the difference between cache-aside and read-through patterns?**

In cache-aside, the application manually checks cache and populates on miss. In read-through, the cache library transparently handles misses and population.

Q: **3. How do you prevent a cache stampede?**

Use mutex locking (one thread reloads, others wait), probabilistic early expiration (refresh before TTL expires), or a hot-standby cache.

Q: **4. Challenge: Design a write-behind cache with durability guarantees.**

Use Redis AOF persistence with appendfsync always. The write-behind queue writes to a local log file first. If the process crashes, replay the log on restart. Batch-flush to database every 100ms.

System Design & Architecture

Distributed Caching — Redis Cluster, Memcached, and Multi-Tier Cache Strategies

DodaTech Updated Jun 20, 2026 7 min read

Distributed caching spreads cached data across multiple nodes to provide low-latency data access at scale, reducing database load and improving application response times. This guide covers Redis Cluster architecture, Memcached optimization, caching patterns, eviction policies, stampede prevention, and multi-tier designs used by companies serving billions of requests daily.

Why Distributed Caching Matters

A single Redis instance handles ~100K ops/sec. Twitter needs millions of timeline reads per second. A distributed cache scales horizontally — add more nodes, get more throughput. Every 100ms of latency Amazon observed cost 1% in revenue. Caching turns 20ms database queries into <1ms cache lookups. At DodaTech, Redis-based caching accelerates file metadata lookups in DodaZIP and signature database queries in Durga Antivirus Pro.

Cache Cluster Architecture

    graph TD
    App[Application] --> Router[Cache Router]
    Router -->|Hash slot 0-5460| R1[Redis Node 1<br/>Master]
    Router -->|Hash slot 5461-10922| R2[Redis Node 2<br/>Master]
    Router -->|Hash slot 10923-16383| R3[Redis Node 3<br/>Master]
    R1 --- R1R[Replica]
    R2 --- R2R[Replica]
    R3 --- R3R[Replica]
    App --> L1[L1: Local Cache<br/>In-Process Memory]
    L1 -->|Miss| Router
    style R1 fill:#3498db,color:#fff
    style R2 fill:#e67e22,color:#fff
    style R3 fill:#27ae60,color:#fff
    style L1 fill:#9b59b6,color:#fff

Redis Cluster Architecture

Redis Cluster uses hash slots instead of consistent hashing. The keyspace is divided into 16,384 slots. Each node owns a subset. When a node is added or removed, slots are migrated between nodes — only affected keys move.

# Redis Cluster client
from redis.cluster import RedisCluster

rc = RedisCluster(
    startup_nodes=[
        {"host": "127.0.0.1", "port": 7000},
        {"host": "127.0.0.1", "port": 7001},
        {"host": "127.0.0.1", "port": 7002},
    ],
    decode_responses=True
)

# Keys are automatically routed to the correct node
rc.set("user:42:profile", '{"name": "Alice", "tier": "premium"}')
rc.set("user:99:profile", '{"name": "Bob", "tier": "free"}')

# Cluster-aware get
profile = rc.get("user:42:profile")
print(f"User profile: {profile}")

Memcached vs Redis

Feature	Redis	Memcached
Data types	Strings, lists, sets, hashes, streams	Strings only
Persistence	RDB snapshots, AOF logs	None
Replication	Master-replica, Cluster	Multi-node with consistent hashing
Memory efficiency	Overhead per key (~200 bytes)	Lower overhead
Eviction	8+ policies (LRU, LFU, TTL, etc.)	LRU only
Use case	General-purpose cache + data structures	Simple key-value, large objects
Max key size	512 MB	1 MB

Use Redis when you need data structures, persistence, or pub/sub. Use Memcached for simple key-value caching of large blobs where simplicity and memory efficiency matter most.

Caching Patterns

Cache-Aside (Lazy Loading)

The application checks cache first, loads from DB on miss, and populates the cache for next time. This is the most common pattern.

def get_user(user_id: int) -> dict:
    # Check cache first
    cached = redis.get(f"user:{user_id}")
    if cached:
        return json.loads(cached)

    # Cache miss — load from database
    user = db.query("SELECT * FROM users WHERE id = ?", user_id)

    # Populate cache with TTL
    redis.setex(f"user:{user_id}", 300, json.dumps(user))
    return user

Read-Through

The cache library itself handles cache misses transparently. The application only talks to the cache.

from cachetools import cached, TTLCache

cache = TTLCache(maxsize=1000, ttl=300)

@cached(cache)
def get_expensive_data(key: str) -> dict:
    # The application never calls DB directly
    return db.query("SELECT * FROM expensive_view WHERE key = ?", key)

Write-Through

Every write goes through the cache to the database. Data is always consistent but writes are slower.

def write_through(key: str, value: dict):
    db.save(key, value)     # Write to database
    cache.set(key, value)   # Update cache synchronously

Write-Behind (Write-Back)

Writes go to cache first and are asynchronously batched to the database. Fast writes but risk of data loss if cache fails before persistence.

import asyncio

write_buffer = []

async def write_behind(key: str, value: dict):
    cache.set(key, value)
    write_buffer.append((key, value))

async def flush_buffer():
    while True:
        if write_buffer:
            batch = write_buffer[:100]
            del write_buffer[:100]
            db.batch_save(batch)    # Batch insert to database
        await asyncio.sleep(1)

Cache Eviction Policies

Policy	Strategy	Best For
LRU	Evicts least recently used	General purpose
LFU	Evicts least frequently used	Skewed access patterns (few hot keys)
TTL	Evicts entries after time expires	Time-bounded data
FIFO	Evicts oldest first	Simple, predictable workloads
Random	Random eviction	Testing

Redis defaults to noeviction (returns errors on writes when full). For production, set maxmemory-policy allkeys-lru.

# redis.conf
maxmemory 4gb
maxmemory-policy allkeys-lru

Cache Stampede Prevention

A cache stampede occurs when a popular key expires and thousands of concurrent requests hit the database simultaneously. Three prevention strategies:

Mutex locking — only one thread reloads the cache; others wait for it.

import threading

lock = threading.Lock()

def get_expensive(key: str) -> dict:
    cached = redis.get(key)
    if cached:
        return json.loads(cached)

    with lock:  # Only one thread enters
        # Double-check after acquiring lock
        cached = redis.get(key)
        if cached:
            return json.loads(cached)
        value = expensive_computation()
        redis.setex(key, 300, json.dumps(value))
        return value

Probabilistic early expiration — refresh the cache before it actually expires. If TTL is 300s, refresh at 250s with some probability.

Hot-standby cache — maintain a secondary cache that’s always slightly behind but never empty.

Multi-Tier Caching

Large systems use multiple cache layers:

L1 (Local memory): Per-application-server cache (in-process map). Microsecond latency, limited by RAM per server.
L2 (Distributed): Redis Cluster or Memcached. Sub-millisecond latency, shared across all servers.
L3 (CDN): Edge servers for static content or API responses. Geographic distribution.

class MultiTierCache:
    def __init__(self):
        self.local = {}      # L1: in-memory dict
        self.redis = Redis() # L2: distributed cache

    def get(self, key: str) -> dict:
        # L1 check
        if key in self.local:
            return self.local[key]
        # L2 check
        cached = self.redis.get(key)
        if cached:
            self.local[key] = json.loads(cached)  # Populate L1
            return self.local[key]
        return None

    def set(self, key: str, value: dict, ttl: int = 300):
        self.local[key] = value
        self.redis.setex(key, ttl, json.dumps(value))

Common Errors

No TTL on cache entries: Stale data lives forever. Eventually the cache diverges from the source of truth. Always set TTL.
Cache stampede without protection: A popular key expires and the database is hammered. Use mutex locks or probabilistic early expiration.
Caching the entire dataset: If your dataset fits in cache but 90% of requests hit 10% of keys, you’re wasting memory on cold data. Let LRU/LFU evict it.
Ignoring serialization overhead: JSON serialization/deserialization costs CPU. For high-throughput paths, use binary formats (MessagePack, Protocol Buffers).
Single point of failure: A single Redis instance fails = entire cache goes down. Use Redis Cluster or Sentinel for high availability.
No cache warmup: After a deployment, the cache is cold. First requests experience high latency. Pre-load hot keys on application startup.
Unbounded cache growth: Without eviction limits, the cache grows until it exhausts memory and crashes. Always configure maxmemory.

Practice Questions

1. How does Redis Cluster distribute keys across nodes?

Redis uses 16,384 hash slots. Each key is hashed with CRC16 modulo 16384 to determine its slot. Each node owns a range of slots. The client knows the slot-to-node mapping and routes directly.

2. What is the difference between cache-aside and read-through patterns?

In cache-aside, the application manually checks cache and populates on miss. In read-through, the cache library transparently handles misses and population.

3. How do you prevent a cache stampede?

Use mutex locking (one thread reloads, others wait), probabilistic early expiration (refresh before TTL expires), or a hot-standby cache.

4. Challenge: Design a write-behind cache with durability guarantees.

Use Redis AOF persistence with appendfsync always. The write-behind queue writes to a local log file first. If the process crashes, replay the log on restart. Batch-flush to database every 100ms.

Mini Project

Build a multi-tier cache for a user profile service:

import redis, json, time
from functools import lru_cache

redis_client = redis.Redis(decode_responses=True)

class ProfileCache:
    def __init__(self):
        self.local_hits = 0
        self.redis_hits = 0
        self.misses = 0

    @lru_cache(maxsize=100)  # L1: in-memory
    def get_profile_l1(self, user_id: int) -> dict:
        return self.get_profile_l2(user_id)

    def get_profile_l2(self, user_id: int) -> dict:
        cached = redis_client.get(f"profile:{user_id}")
        if cached:
            self.redis_hits += 1
            return json.loads(cached)
        return self.get_profile_db(user_id)

    def get_profile_db(self, user_id: int) -> dict:
        self.misses += 1
        # Simulate database query
        profile = {"id": user_id, "name": f"User_{user_id}", "loaded_at": time.time()}
        redis_client.setex(f"profile:{user_id}", 300, json.dumps(profile))
        return profile

cache = ProfileCache()
for _ in range(5):
    p = cache.get_profile_l1(42)
print(f"L1 hits: {4}, Redis hits: {0}, DB misses: {1}")

Cross-References

Previous Message Queues Deep Dive — RabbitMQ vs Kafka vs SQS with Routing and Delivery Guarantees Next Microservices Communication — REST, gRPC, Events, and Service Mesh Patterns

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse System Design & Architecture