Learn System Design Interview: Complete Preparation Guide

Q: How long should I prepare for system design interviews?

Senior engineers typically need 4-8 weeks of targeted practice. Focus on understanding fundamentals (CDN, caching, sharding, CAP theorem) before practicing specific designs.

Q: Do I need to know exact numbers for estimation?

No. Order-of-magnitude estimates are fine. 10M vs 100M vs 1B is more important than 47M vs 53M. Interviewers want to see you can think in orders of magnitude.

Q: What if I don't know a specific technology?

Mention alternatives you do know. Say “I’d use a key-value store like Redis or Memcached” — showing you understand the category is more important than knowing one specific tool.

Q: Should I draw diagrams on a whiteboard or paper?

Yes. Visualizing components and data flow helps both you and the interviewer. Draw boxes for services, arrows for data flow, and databases with cylinders.

Q: How do I handle a problem I've never seen before?

Use the same framework. Start with requirements, estimation, data model. The framework works for any system. Even imperfect designs that follow the framework score better than perfect designs with no process.

Interview Prep

System Design Interview: Complete Preparation Guide

DodaTech Updated Jun 20, 2026 9 min read

System design interviews evaluate your ability to architect large-scale distributed systems. Unlike coding interviews, there’s no single correct answer — interviewers assess your process, tradeoffs, and communication as you design a system from scratch.

Learning Path

    flowchart LR
  A["DSA Patterns<br/>Foundation"] --> B["System Design<br/>Framework"]
  B --> C["Practice Problems<br/>URL Shortener, Chat, Twitter"]
  C --> D["Mock Interviews<br/>System Design Focus"]
  style B fill:#f90,color:#fff,stroke-width:2px

What you’ll learn: A step-by-step framework for system design interviews, how to handle requirements, estimation, data modeling, deep dives, and tradeoff analysis. Why it matters: System design interviews are the primary gatekeeper for senior engineering roles (L5+ at FAANG). Real-world use: DodaTech’s infrastructure team designs systems serving 10M+ users across Doda Browser and Durga Antivirus Pro — using the exact patterns covered here.

System Design Framework

A reliable 6-step framework that works for any system design problem:

    flowchart TD
  A["1. Requirements<br/>Functional + Non-functional"] --> B["2. Estimation<br/>Traffic, Storage, Bandwidth"]
  B --> C["3. Data Model<br/>Schema + Storage Choice"]
  C --> D["4. High-Level Design<br/>Components + Data Flow"]
  D --> E["5. Deep Dive<br/>Bottlenecks + Scaling"]
  E --> F["6. Tradeoffs<br/>What you chose and why"]
  style A fill:#1a73e8,color:#fff
  style B fill:#34a853,color:#fff
  style C fill:#fbbc04,color:#333
  style D fill:#ea4335,color:#fff
  style E fill:#ab47bc,color:#fff
  style F fill:#1a73e8,color:#fff

Step 1: Requirements Gathering (5 min)

Always start by clarifying what you’re building. Don’t assume anything.

Functional requirements (what the system does):

Can users create short URLs?
Do we need analytics (clicks, geolocation)?
Should URLs have expiration?

Non-functional requirements (quality attributes):

Read vs write ratio (typical: 100:1 reads to writes)
Latency requirements (P99 < 200ms)
Availability target (99.9% or 99.99%?)
Consistency vs availability tradeoff

def gather_requirements():
    """Template for collecting system design requirements."""
    requirements = {
        "functional": [
            "Create shortened URLs",
            "Redirect to original URL",
            "Custom alias support",
            "Analytics (click count, referrer)",
            "URL expiration",
            "REST API for programmatic access",
        ],
        "non_functional": {
            "traffic": "100M URLs created/month",
            "reads": "10K redirects/second peak",
            "latency_p99_ms": 200,
            "availability": "99.99%",
            "consistency": "Eventual for analytics, strong for redirects"
        }
    }
    return requirements

reqs = gather_requirements()
print("Functional:", len(reqs["functional"]), "items")
print(f"Traffic: {reqs['non_functional']['traffic']}")
print(f"Redirects: {reqs['non_functional']['reads']}/s")

Expected output:

Functional: 6 items
Traffic: 100M URLs created/month
Redirects: 10000 reads/s peak

Step 2: Estimation (5 min)

Back-of-the-envelope calculations show you can think at scale.

URL Shortener Estimations:

100M new URLs/month → ~40 URLs/second writes
10K redirects/second reads
Storage: 100M URLs × 500 bytes ≈ 50 GB/month → 600 GB/year
Bandwidth: 10K/s × 500 bytes response ≈ 5 MB/s → 15 TB/month

Write throughput: 100,000,000 / (30 × 24 × 3600) ≈ 38 writes/second
Read throughput: 10,000 reads/second (peak)
Read-to-write ratio: 10,000 / 38 ≈ 263:1
Storage per year: 100M × 12 × 500 bytes ≈ 600 GB
Cache required (80% cache hit): 10K × 0.2 × 200 bytes ≈ 4 MB/s cache bandwidth

Step 3: Data Model (10 min)

Design the schema and choose storage technology.

-- URL Shortener Schema
CREATE TABLE urls (
    short_key VARCHAR(10) PRIMARY KEY,
    original_url TEXT NOT NULL,
    user_id BIGINT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP,
    click_count BIGINT DEFAULT 0
);

CREATE TABLE click_events (
    id BIGSERIAL PRIMARY KEY,
    short_key VARCHAR(10) REFERENCES urls(short_key),
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    ip_address INET,
    user_agent TEXT,
    referrer TEXT
);

CREATE INDEX idx_click_events_short_key ON click_events(short_key);
CREATE INDEX idx_urls_user_id ON urls(user_id);

Storage choice:

Primary DB: PostgreSQL or CockroachDB for strong consistency on redirects
Cache: Redis for hot URLs (LRU eviction, 80% cache hit target)
Analytics: Cassandra or ClickHouse for time-series click data
CDN: CloudFront/Cloudflare for regional edge caching

Step 4: High-Level Design (15 min)

Draw the system architecture and data flow.

    flowchart TD
    Client["User/Browser"] --> LB["Load Balancer"]
    LB --> WS["Web Servers"]
    WS --> Cache["Redis Cache"]
    WS --> DB[("Primary DB<br/>PostgreSQL")]
    WS --> Queue["Message Queue<br/>Kafka"]
    Queue --> Analytics["Analytics Workers"]
    Analytics --> ADB[("Analytics DB<br/>ClickHouse")]
    DB --> Replica[("Read Replicas")]
    LB --> CDN["CDN Edge Cache"]
    
    style Client fill:#4285f4,color:#fff
    style Cache fill:#ff6d01,color:#fff
    style DB fill:#34a853,color:#fff
    style Queue fill:#ea4335,color:#fff

Step 5: Deep Dive (15 min)

Focus on the most interesting or bottleneck component.

Key deep dive topics for URL shortener:

Key generation: Base62 encoding of unique IDs. Use Twitter Snowflake or a distributed ID generator:

import time
import threading

class SnowflakeID:
    """Distributed unique ID generator (Twitter Snowflake-style)."""
    def __init__(self, worker_id, datacenter_id):
        self.worker_id = worker_id
        self.datacenter_id = datacenter_id
        self.sequence = 0
        self.last_timestamp = -1
        self.lock = threading.Lock()
    
    def _gen_timestamp(self):
        return int(time.time() * 1000)
    
    def get_id(self):
        with self.lock:
            timestamp = self._gen_timestamp()
            if timestamp == self.last_timestamp:
                self.sequence = (self.sequence + 1) & 4095
            else:
                self.sequence = 0
            self.last_timestamp = timestamp
            
            id = ((timestamp - 1609459200000) << 22)  # custom epoch
            id |= (self.datacenter_id << 17)
            id |= (self.worker_id << 12)
            id |= self.sequence
            return id

gen = SnowflakeID(worker_id=1, datacenter_id=1)
print(f"ID: {gen.get_id()}")
print(f"ID: {gen.get_id()}")

Expected output:

ID: 1398172634113024
ID: 1398172634113025

Base62 encoding (short key from numeric ID):

BASE62 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"

def encode_base62(num):
    """Convert a number to base62 string."""
    if num == 0:
        return BASE62[0]
    result = []
    while num > 0:
        result.append(BASE62[num % 62])
        num //= 62
    return ''.join(reversed(result))

def decode_base62(s):
    """Convert base62 string back to number."""
    result = 0
    for char in s:
        result = result * 62 + BASE62.index(char)
    return result

# Test
ids = [0, 1, 61, 62, 123456789]
for id in ids:
    encoded = encode_base62(id)
    decoded = decode_base62(encoded)
    print(f"{id:>10} → {encoded:>7} → {decoded}")

Expected output:

         0 →       0 → 0
         1 →       1 → 1
        61 →       z → 61
        62 →      10 → 62
 123456789 → 8M0kX → 123456789

Step 6: Tradeoffs (5 min)

Every design decision involves tradeoffs. Explicitly discuss them:

Decision	Alternative	Why You Chose This
PostgreSQL over NoSQL	Cassandra	Strong consistency needed for redirects
Snowflake ID over UUID	Sequential IDs lead to contention	Snowflake is ordered, fits in 64-bit
Redis cache over Memcached	Simpler	Persistence, data structures, built-in LRU
Monolith over microservices	40 writes/sec doesn’t need microservices	Simpler ops, lower latency, one codebase

Practice Problems

Master these 5 problems to cover 90% of system design interviews:

URL Shortener (e.g., bit.ly) — Key generation, redirection, analytics, caching
Chat System (e.g., WhatsApp) — WebSockets, message ordering, presence, offline storage
Social Media Feed (e.g., Twitter) — Fan-out on write vs read, timeline generation, ranking
Video Streaming (e.g., YouTube) — Upload pipeline, transcoding, CDN, recommendation
Rate Limiter — Token bucket, sliding window, distributed rate limiting

Scalability Concepts You Must Know

Concept	Why It Matters
Horizontal vs Vertical Scaling	Add more servers vs make servers bigger. Know when each is appropriate.
Database Indexing	B-tree for range queries, Hash for point lookups, GIN for full-text.
Caching Strategies	Cache aside, read through, write through, write behind. LRU vs LFU eviction.
Consistency Models	Strong vs eventual vs causal consistency. CAP theorem tradeoffs.
Partitioning (Sharding)	Hash-based, range-based, directory-based. Rebalancing challenges.
Replication	Leader-follower, leader-leader, quorum-based reads/writes.
Load Balancing	Round robin, least connections, consistent hashing for cache affinity.
Message Queues	Kafka for high throughput, RabbitMQ for reliable delivery, SQS for simplicity.

Common System Design Mistakes

Jumping to solution without clarifying requirements — Always confirm functional and non-functional requirements first. Building the wrong system is worse than building a system poorly.
Ignoring storage estimation — You can’t design a system without knowing how much data it needs to store. Calculate storage requirements before choosing a database.
Single-server mindset — Everything runs on one server. Interviewers want to see distributed thinking — sharding, replication, load balancers, CDNs.
Not discussing tradeoffs — Every decision has tradeoffs. Saying “we’ll use NoSQL” without explaining why SQL doesn’t fit shows shallow thinking.
Over-engineering — Adding Kubernetes, microservices, and 15 caches for a system serving 100 users. Start simple and scale up as requirements grow.
Ignoring failure modes — What happens when the database goes down? When cache is cold? When traffic spikes 10x? Discuss resilience patterns.
No monitoring or observability — Every production system needs metrics, logging, and alerting. Mention tools like Prometheus, Grafana, and structured logging.

Practice Questions

1. What’s the first step in any system design interview? Clarify requirements — both functional (what the system does) and non-functional (scale, latency, availability, consistency).

2. How do you estimate storage needs for a URL shortener? Calculate: URLs created per month × average storage per URL (500 bytes for key, URL, metadata) × retention period. Add analytics storage separately.

3. When would you choose eventual consistency over strong consistency? For read-heavy systems where freshness isn’t critical (analytics, timelines, recommendations). Never for financial transactions or systems where correctness depends on order.

4. What’s the difference between vertical and horizontal scaling? Vertical = bigger machine (limited, single point of failure). Horizontal = more machines (elastic, fault-tolerant, but more complex).

5. Challenge: Design a real-time leaderboard Design a system that shows the top 100 scores in a multiplayer game with 10M daily active users. Requirements: update scores in real-time (sub-second), handle 100K writes/second, serve 1M reads/second. Discuss your data model, storage choice, caching strategy, and tradeoffs.

Mini Project: System Design Template

from dataclasses import dataclass, field
from typing import List, Dict

@dataclass
class SystemDesign:
    """Template for practicing system design interviews."""
    problem: str
    requirements: Dict = field(default_factory=dict)
    estimation: Dict = field(default_factory=dict)
    data_model: str = ""
    high_level_design: List[str] = field(default_factory=list)
    deep_dive_topics: List[str] = field(default_factory=list)
    tradeoffs: List[tuple] = field(default_factory=list)
    
    def present(self):
        print(f"\n{'='*50}")
        print(f"SYSTEM DESIGN: {self.problem}")
        print(f"{'='*50}")
        print(f"\n📋 Requirements:")
        for k, v in self.requirements.items():
            print(f"   {k}: {v}")
        print(f"\n📊 Estimation:")
        for k, v in self.estimation.items():
            print(f"   {k}: {v}")
        print(f"\n🗄️  Data Model:")
        print(f"   {self.data_model}")
        print(f"\n🏗️  High-Level Components:")
        for c in self.high_level_design:
            print(f"   - {c}")
        print(f"\n🔍 Deep Dive:")
        for t in self.deep_dive_topics:
            print(f"   - {t}")
        print(f"\n⚖️  Tradeoffs:")
        for decision, alt, reason in self.tradeoffs:
            print(f"   {decision} over {alt}: {reason}")

design = SystemDesign(problem="URL Shortener")
design.requirements = {
    "functional": ["Create short URLs", "Redirect", "Analytics"],
    "non-functional": {"reads": "10K/s", "writes": "40/s", "latency": "<200ms"}
}
design.estimation = {"storage": "600 GB/year", "bandwidth": "15 TB/month"}
design.data_model = "PostgreSQL urls + click_events tables"
design.high_level_design = ["Load Balancer", "Web Servers", "Redis Cache", "PostgreSQL", "Kafka", "ClickHouse"]
design.deep_dive_topics = ["Key generation (Snowflake + Base62)", "Caching strategy", "Database sharding"]
design.tradeoffs = [
    ("PostgreSQL", "Cassandra", "Strong consistency needed"),
    ("Snowflake", "UUID", "Ordered, fits in 64-bit"),
    ("Monolith", "Microservices", "40 writes/sec, simpler ops"),
]
design.present()

FAQ

How long should I prepare for system design interviews?

Senior engineers typically need 4-8 weeks of targeted practice. Focus on understanding fundamentals (CDN, caching, sharding, CAP theorem) before practicing specific designs.

Do I need to know exact numbers for estimation?

No. Order-of-magnitude estimates are fine. 10M vs 100M vs 1B is more important than 47M vs 53M. Interviewers want to see you can think in orders of magnitude.

What if I don’t know a specific technology?

Mention alternatives you do know. Say “I’d use a key-value store like Redis or Memcached” — showing you understand the category is more important than knowing one specific tool.

Should I draw diagrams on a whiteboard or paper?

Yes. Visualizing components and data flow helps both you and the interviewer. Draw boxes for services, arrows for data flow, and databases with cylinders.

How do I handle a problem I’ve never seen before?

Use the same framework. Start with requirements, estimation, data model. The framework works for any system. Even imperfect designs that follow the framework score better than perfect designs with no process.

System Design Interview: Complete Preparation Guide

Learning Path

System Design Framework

Step 1: Requirements Gathering (5 min)

Step 2: Estimation (5 min)

Step 3: Data Model (10 min)

Step 4: High-Level Design (15 min)

Step 5: Deep Dive (15 min)

Step 6: Tradeoffs (5 min)

Practice Problems

Scalability Concepts You Must Know

Common System Design Mistakes

Practice Questions

Mini Project: System Design Template

FAQ

Related Tutorials