Learn Design Instagram — Photo Sharing System Architecture

Design Instagram — Photo Sharing System Architecture

DodaTech Updated Jun 15, 2026 5 min read

Designing Instagram means building a system for photo upload, processing, storage, and delivery — plus social features like feeds, stories, likes, and comments at billions-user scale.

What You’ll Learn

You’ll master photo upload pipelines, CDN delivery, feed generation (fan-out push vs pull), caching strategies, and scalable like/comment systems. You’ll build a complete Instagram architecture.

Why This Problem Matters

Instagram has 2 billion monthly active users uploading 100 million photos daily. The architecture teaches media processing pipelines, read vs write optimization, and caching at scale. At DodaTech, CDN and caching patterns from Instagram inform Doda Browser’s media delivery system.

Feed Architecture

    flowchart TB
  User[User Upload] -->|Photo| LB[Load Balancer]
  LB --> API[Photo Service]
  API --> Queue[Message Queue]
  Queue --> WP[Worker: Thumbnail Gen]
  Queue --> WP2[Worker: Filter Apply]
  Queue --> WP3[Worker: Metadata Extract]
  WP --> CDN[(CDN - CloudFront)]
  WP2 --> CDN
  WP3 --> MetaDB[(Metadata DB)]
  API --> FeedGen[Feed Generator]
  FeedGen --> FeedCache[(Feed Cache - Redis)]
  User2[User Scroll Feed] -->|GET /feed| FeedLB[Load Balancer]
  FeedLB --> FeedAPI[Feed Service]
  FeedAPI --> FeedCache

Photo Upload Pipeline

User uploads photo (multipart POST)
Server validates (file type, size < 20MB)
Photo stored temporarily, job enqueued
Worker generates multiple resolutions: thumbnail (150×150), small (320×320), medium (640×640), full (1080×1080)
Photos uploaded to CDN (S3 + CloudFront)
Metadata stored in DB (URLs, dimensions, filters, timestamp)

Feed Generation: Push vs Pull

Two strategies for generating the user’s feed:

Fan-out on Write (Push)

When User A posts a photo, push the post to all followers’ feed caches immediately.

Pro: Feed read is O(1) — just fetch from cache. Con: Write cost scales with follower count. A celebrity with 50M followers triggers 50M cache writes.

Fan-out on Read (Pull)

Store the post once. When a user requests their feed, pull posts from all followed users and merge.

Pro: Write is O(1). No follower explosion. Con: Read is O(N) where N = number of followed users. Merging is expensive.

Hybrid Approach (What Instagram Uses)

Celebrities (100K+ followers): Pull model. Their posts are indexed and merged on read.
Regular users: Push model. Posts are pre-populated to follower feeds.
For high-follower users: Fan-out only to active followers (last 30 days).

Caching the Feed

async def get_feed(user_id: str, page: int, page_size: int = 50):
    # Try feed cache first
    cache_key = f"feed:{user_id}:{page}"
    cached = await redis.get(cache_key)
    if cached:
        return json.loads(cached)

    # On cache miss, generate feed from followed users' recent posts
    followed = await db.fetch(
        "SELECT followee_id FROM follows WHERE follower_id = $1", user_id
    )
    posts = await db.fetch(
        """SELECT p.* FROM posts p
           WHERE p.user_id = ANY($1::bigint[])
           ORDER BY p.created_at DESC
           LIMIT $2 OFFSET $3""",
        [f.followee_id for f in followed],
        page_size, page * page_size
    )

    # Cache for 60 seconds
    feed_data = [serialize_post(p) for p in posts]
    await redis.setex(cache_key, 60, json.dumps(feed_data))
    return feed_data

Stories

Stories differ from feed posts:

TTL: Expire after 24 hours (stored separately)
Resolution: Vertical 9:16 format
View tracking: Track who viewed each story
Ranking: Show stories from closest connections first

CREATE TABLE stories (
    id BIGINT PRIMARY KEY,
    user_id BIGINT NOT NULL,
    media_urls TEXT[] NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at TIMESTAMP GENERATED ALWAYS AS (created_at + INTERVAL '24 hours') STORED
);

CREATE TABLE story_views (
    story_id BIGINT,
    viewer_id BIGINT,
    viewed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (story_id, viewer_id)
);

Likes and Comments

Likes use counter cache to avoid COUNT queries:

CREATE TABLE likes (
    post_id BIGINT,
    user_id BIGINT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (post_id, user_id)
);

-- Denormalized count on posts table
ALTER TABLE posts ADD COLUMN like_count INT DEFAULT 0;

-- Increment on like
UPDATE posts SET like_count = like_count + 1 WHERE id = $1;

Comments use time-series partition by post_id for fast retrieval.

Common Mistakes

1. Pre-Generating Feeds for All Followers

Celebrities with 50M followers cause 50M writes per post. Use hybrid push/pull — celebrities use pull, regular users use push.

2. Storing Photos as Database BLOBs

Never store images in the DB. Use object storage (S3) + CDN. DB only holds metadata and URLs.

3. No Image Processing Pipeline

User uploads need resizing, format conversion (WebP), and thumbnail generation. Use async workers, not the request thread.

4. Counting Likes with SELECT COUNT

Scanning millions of likes per page view is slow. Maintain a denormalized counter. Handle race conditions with optimistic locking.

5. Not Using CDN for Media Delivery

Serving images directly from app servers wastes bandwidth and increases latency. Always use CDN with edge caching.

6. Feed Cache Without Pagination

Caching the entire feed (500 posts) per user is wasteful. Cache by page with TTL. Invalidate oldest pages on new posts.

7. Ignoring Deleted/Private Content

When unfollowing, deleting, or making an account private, remove cached feed entries. Use a tombstone or version counter.

Practice Questions

1. What problem does fan-out on write solve?

Feed reads are instant (cache hit). New posts appear immediately without querying multiple followees.

2. When does fan-out on write become a problem?

For users with millions of followers. Each post triggers millions of cache writes, which may timeout or overload the cache.

3. Why use CDN for photo delivery?

CDNs cache photos at edge servers close to users, reducing latency and offloading origin servers.

4. Why maintain a denormalized like_count?

To avoid COUNT(*) scans on the likes table every time a post is viewed.

5. Challenge: Implement feed ranking.

Instead of chronological feed, rank posts by engagement (likes + comments × recency). Use a scoring function: score = log(likes + comments * 2) + gravity / age_hours.

Mini Project: Mini Instagram API

Build a REST API with:

POST /upload — accepts image, returns CDN URL
POST /feed — generates feed for a user (fan-out on write for < 1000 followers)
POST /like and POST /comment endpoints
GET /feed/{user_id} — returns cached feed with pagination
Background worker for image resizing (simulate with a sleep)

What’s Next

Design Twitter

Design Netflix

Design Uber

Congratulations on completing this Instagram design! Here’s where to go from here:

Practice daily — Design one subsystem per day
Build a project — Implement a feed generator
Explore related topics — CDN optimization, media processing
Join the community — Share your system designs and get feedback

Remember: every expert was once a beginner. Keep designing!

Previous Design Uber/Lyft — Ride-Hailing System Architecture Next Design Twitter — Social Media System Architecture

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse System Design Problems