Learn System: Design YouTube/Netflix: Video Streaming Architecture

Design YouTube/Netflix: Video Streaming Architecture

DodaTech Updated Jun 20, 2026 6 min read

Designing a video streaming platform like YouTube or Netflix means building a system that ingests raw video, transcodes it into dozens of formats, stores petabytes of data, and delivers it with sub-second startup to billions of devices worldwide.

What You’ll Learn

You’ll master video transcoding pipelines, adaptive bitrate streaming (HLS/DASH), CDN edge delivery, recommendation system architecture, watch history tracking, and video search at scale.

Why This Problem Matters

YouTube processes 500+ hours of video uploaded every minute. Netflix accounts for 15% of global internet traffic. Video streaming is the most bandwidth-intensive application on the internet, and its architecture is a masterclass in data pipelines, encoding tradeoffs, and global content delivery. At DodaTech, streaming patterns inform video playback optimization in Doda Browser.

Video Upload Pipeline

    flowchart LR
    Upload[Creator Upload] --> LB[Load Balancer]
    LB --> Ingest[Ingestion Service]
    Ingest --> Validate[Validation: Format, Size, Virus Scan]
    Validate --> Queue[Job Queue - Kafka]
    Queue --> T1[Transcode 4K/H.265]
    Queue --> T2[Transcode 1080p/H.264]
    Queue --> T3[Transcode 720p/VP9]
    Queue --> T4[Generate Thumbnails]
    Queue --> T5[Generate Captions]
    T1 --> Storage[(Object Store - S3)]
    T2 --> Storage
    T3 --> Storage
    T4 --> Storage
    Storage --> CDN[CDN Edge]
    User[Viewer] --> CDN
    Storage --> Meta[(Metadata DB)]
    User --> Search[Search Service]
    Search --> Index[(Elasticsearch)]

Transcoding Pipeline

Raw video from creators is too large and in incompatible formats for streaming. The transcoding pipeline converts it:

Step	Input	Output	Worker
Demux	MP4 container	Raw video + audio streams	FFmpeg
Encode video	Raw frames	H.264 1080p, H.265 4K, VP9 720p	GPU/CPU farm
Encode audio	Raw PCM	AAC stereo, Dolby 5.1, Opus	FFmpeg
Package	Encoded streams	HLS/DASH segments + manifests	Packager
Thumbnail	Keyframes	Multiple sizes (320×180 to 1920×1080)	Image worker
Captions	Audio track	SRT/VTT files	ASR/ML worker

# Simplified transcoding job
import subprocess, json

def transcode_video(input_path: str, output_dir: str, resolutions: list):
    jobs = []
    for resolution in resolutions:
        height = resolution["height"]
        bitrate = resolution["bitrate"]
        output = f"{output_dir}/{height}p.mp4"
        cmd = [
            "ffmpeg", "-i", input_path,
            "-vf", f"scale=-2:{height}",
            "-c:v", "libx264",
            "-b:v", bitrate,
            "-c:a", "aac",
            "-y", output
        ]
        subprocess.run(cmd, check=True)
        jobs.append({"resolution": f"{height}p", "output": output, "bitrate": bitrate})
    return jobs

# Simulated run
resolutions = [
    {"height": 360, "bitrate": "500k"},
    {"height": 720, "bitrate": "2500k"},
    {"height": 1080, "bitrate": "5000k"},
]
print("Transcoding jobs generated:")
for job in resolutions:
    print(f"  {job['height']}p at {job['bitrate']}")

Output:

Transcoding jobs generated:
  360p at 500k
  720p at 2500k
  1080p at 5000k

Adaptive Bitrate Streaming (HLS/DASH)

Adaptive streaming lets the player switch quality mid-playback based on network conditions. The server prepares multiple quality variants and a manifest file.

HLS (HTTP Live Streaming)

Apple’s HLS splits video into 5-10 second .ts segments. The manifest (.m3u8) lists all variants:

#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=500000,RESOLUTION=640x360
360p.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=2500000,RESOLUTION=1280x720
720p.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080
1080p.m3u8

DASH (Dynamic Adaptive Streaming over HTTP)

DASH uses MPEG segments with an MPD manifest. It’s codec-agnostic and supports more advanced features like ad insertion and multi-angle.

# Generate HLS manifest
def generate_hls_manifest(variants: list, duration_seconds: int) -> str:
    lines = ["#EXTM3U"]
    for v in variants:
        segment_count = duration_seconds // 10
        lines.append(f"\n#EXT-X-STREAM-INF:BANDWIDTH={v['bandwidth']},RESOLUTION={v['resolution']}")
        lines.append(f"{v['name']}.m3u8")

        # Per-variant playlist
        lines.append(f"\n#EXTM3U")
        lines.append(f"#EXT-X-TARGETDURATION:10")
        lines.append(f"#EXT-X-VERSION:3")
        lines.append(f"#EXT-X-MEDIA-SEQUENCE:0")
        lines.append(f"#EXT-X-PLAYLIST-TYPE:VOD")
        for i in range(segment_count):
            lines.append(f"#EXTINF:10.0,")
            lines.append(f"{v['name']}_segment_{i:04d}.ts")
        lines.append("#EXT-X-ENDLIST")
    return "\n".join(lines)

manifest = generate_hls_manifest([
    {"name": "360p", "bandwidth": 500000, "resolution": "640x360"},
    {"name": "720p", "bandwidth": 2500000, "resolution": "1280x720"},
], 60)
print(manifest[:300] + "...")

CDN for Video Delivery

Video is the most cache-friendly content — it’s static, large, and accessed frequently (for popular videos):

CDN Strategy	Benefit	Implementation
Edge caching	Serve segments from nearest PoP	Open Connect (Netflix), CloudFront
Origin shield	Reduce load on origin	Parent cache tier before origin
Pre-positioning	Push popular content to edges	ML predicts trending videos
P2P delivery	Peers share segments	WebRTC data channels

Netflix’s Open Connect CDN places dedicated servers inside ISP networks, serving 95%+ of traffic from cache.

Recommendation System

    flowchart TB
    User[User Watches Video] --> Event[Watch Event]
    Event --> Stream[Event Stream - Kafka]
    Stream --> Batch[Batch Processor - Spark]
    Stream --> Online[Online Predictor - ML]
    Batch --> CF[Collaborative Filtering]
    Batch --> CB[Content-Based Filtering]
    Batch --> Trending[Trending Detector]
    CF --> Embeddings[User Embeddings]
    CB --> Embeddings
    Online --> Rank[Ranking Model]
    Trending --> Rank
    Embeddings --> Rank
    Rank --> Results[Recommended Videos]

Three recommendation approaches work together:

Collaborative filtering: Users who watched X also watched Y
Content-based: Similar video categories, tags, descriptions
Trending: Global and regional popular videos

Search Architecture

Video search uses Elasticsearch with specialized field mappings:

{
  "mappings": {
    "properties": {
      "title": { "type": "text", "boost": 3 },
      "description": { "type": "text", "boost": 1.5 },
      "tags": { "type": "keyword" },
      "category": { "type": "keyword" },
      "view_count": { "type": "long" },
      "upload_date": { "type": "date" }
    }
  }
}

Search relevance combines text matching with popularity signals (views, watch time, recency).

Common Errors

Fixed segment duration: Using 10s segments for all content. Live streams need shorter segments (2-4s) for lower latency. VOD can use longer segments (6-10s) for better compression.
No multi-codec strategy: Encoding only H.264 misses compression gains from H.265/AV1 (40-50% smaller files). Serve the best codec the client supports.
Hot video overload: When a video goes viral, origin servers get millions of simultaneous segment requests. Pre-position popular content on CDN edges and implement origin shielding.
Transcoding at upload time only: A video in 4:3 aspect ratio gets wrong crops. Always validate aspect ratio and pad (letterbox) instead of stretching.
Storing master files with no backup: Raw studio masters are irreplaceable. Replicate across at least two geographic regions with versioning enabled.

Practice Questions

1. How does adaptive bitrate streaming decide which quality to serve?

The player monitors download speed of the last few segments. If segments download faster than real-time, it requests higher quality. If slower, it drops to lower quality. This is called rate-based adaptation.

2. What happens when a CDN edge doesn’t have a video segment?

It fetches from the origin (cache miss), then serves and caches it. The first viewer experiences higher latency. Pre-positioning popular content eliminates this cold-start penalty.

3. How would you design the upload pipeline for live streaming?

Live streaming skips the storage-and-transcode step. Video is transcoded in real-time with a fixed latency budget (e.g., 5 seconds). Segments are pushed directly to CDN. Use chunked encoding and low-latency HLS/DASH extensions.

4. Challenge: Design a video platform with user-uploaded 8K/120fps content.

8K at 120fps needs ~4x the bandwidth of 4K. Most devices can’t play it. Store the original, but only transcode to 8K for high-end users. Use AV1 codec (30% better compression than H.265). Serve 4K as the default max resolution.

Mini Project

Build a video transcoding orchestrator:

import random, time

class VideoTranscoder:
    def __init__(self):
        self.jobs = []
        self.workers = 4

    def submit_job(self, video_id: str, resolutions: list):
        job = {
            "video_id": video_id,
            "resolutions": resolutions,
            "status": "queued",
            "created_at": time.time(),
        }
        self.jobs.append(job)
        return job

    def process(self):
        for job in self.jobs:
            if job["status"] == "queued":
                job["status"] = "processing"
                for res in job["resolutions"]:
                    duration = random.uniform(5, 30)
                    time.sleep(0.1)  # Simulate work
                    print(f"Transcoded {job['video_id']} to {res['height']}p ({duration:.0f}s)")
                job["status"] = "completed"
                job["completed_at"] = time.time()
        return [j for j in self.jobs if j["status"] == "completed"]

tx = VideoTranscoder()
tx.submit_job("vid-001", [{"height": 360}, {"height": 720}, {"height": 1080}])
tx.submit_job("vid-002", [{"height": 720}, {"height": 2160}])
completed = tx.process()
print(f"Completed {len(completed)} video jobs")

Cross-References

Previous Design a Chat System — WhatsApp/Messenger Architecture Next Design Uber: Ride-Sharing System Architecture

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse System Design Problems