Design Twitter/Instagram: Social Media Feed Architecture
Designing a social media feed like Twitter or Instagram means building a system that surfaces the most relevant content from thousands of followed accounts in a personalized, real-time, and scalable way — all while handling millions of posts per minute.
What You’ll Learn
You’ll master feed generation strategies (fan-out on write vs read), push/pull hybrid architectures, ML-based relevance ranking, caching at scale, real-time updates via WebSocket, CDN media storage, and notification delivery.
Why This Problem Matters
Twitter serves 500 million tweets per day. Instagram handles 100 million photo uploads daily with 2 billion monthly active users. Feed architecture is the most performance-critical component — a 500ms delay in feed loading reduces engagement by 10%. At DodaTech, feed patterns power the activity feeds in DodaZIP and notification streams in Doda Browser.
System Design Learning Path
flowchart LR
A[Pastebin] --> B[Video Streaming]
B --> C[Ride Sharing]
C --> D[Social Media Feed]
D --> E{You Are Here}
E --> F[E-Commerce Platform]
style E fill:#f90,color:#fff
Feed Generation: Fan-Out on Write vs Read
This is the central design decision in any social media feed.
Fan-Out on Write (Push)
When a user posts, the system pushes the post to all followers’ feed caches at write time.
Pros: Feed reads are O(1) — just fetch pre-computed timeline. Minimal read latency.
Cons: Write amplification. A celebrity with 100M followers generates 100M cache writes per post.
Fan-Out on Read (Pull)
When a user opens the app, the system fetches recent posts from all followed accounts at read time.
Pros: No write amplification. New followers immediately see past content.
Cons: Read latency scales with number of followed accounts. A user following 1000 accounts triggers 1000 queries.
Hybrid Approach
Twitter uses a hybrid: push for regular users (< 10K followers), pull for celebrities (> 10K followers).
import random, time
from collections import defaultdict
class FeedGenerator:
def __init__(self):
self.followers = defaultdict(set) # user -> set of followers
self.following = defaultdict(set) # user -> set of followed
self.posts = {}
self.feed_cache = defaultdict(list)
self.CELEBRITY_THRESHOLD = 10000
def follow(self, follower: str, followee: str):
self.followers[followee].add(follower)
self.following[follower].add(followee)
# Back-fill for existing posts
for post_id, post in list(self.posts.items())[-20:]:
if post["author"] == followee:
self.feed_cache[follower].insert(0, post_id)
def post(self, author: str, content: str) -> str:
post_id = f"post-{len(self.posts) + 1}"
self.posts[post_id] = {"author": author, "content": content, "ts": time.time()}
follower_count = len(self.followers[author])
if follower_count < self.CELEBRITY_THRESHOLD:
# Fan-out on write: push to all followers
for follower in self.followers[author]:
self.feed_cache[follower].insert(0, post_id)
return f"Pushed to {follower_count} followers"
# Celebrity: just store, pull on read
return f"Stored (fan-out on read for {follower_count} followers)"
def get_feed(self, user: str, limit: int = 10) -> list:
# Hybrid: check cache first (push), fill missing (pull)
feed = self.feed_cache[user][:limit]
# Pull from followed celebrities
for followee in self.following[user]:
if len(self.followers[followee]) >= self.CELEBRITY_THRESHOLD:
for pid, post in list(self.posts.items())[-5:]:
if post["author"] == followee and pid not in feed:
feed.append(pid)
return feed[:limit]
fg = FeedGenerator()
fg.follow("alice", "bob")
fg.follow("alice", "celebrity1")
fg.follow("bob", "celebrity2")
print(fg.post("bob", "Hello from Bob!"))
print(fg.post("celebrity1", "Big announcement!"))
feed = fg.get_feed("alice")
print(f"Alice's feed: {[fg.posts[p]['content'][:20] for p in feed]}")Output:
Pushed to 1 followers
Stored (fan-out on read for 0 followers)
Alice's feed: ['Hello from Bob!', 'Big announcement!']Caching Strategy
| Layer | Cache | Data | TTL |
|---|---|---|---|
| L1 | Client-side (device) | Pre-fetched next feed | 5 min |
| L2 | CDN (edge) | Rendered feed HTML | 1 min |
| L3 | Redis (feed cache) | Post IDs + scores | 15 min |
| L4 | Application | Post objects | 30 min |
import json, time
class MultiLevelCache:
def __init__(self):
self.l1 = {} # In-memory (per server)
self.l3 = {} # Simulated Redis
def get(self, key: str):
cached = self.l1.get(key)
if cached and cached["expires"] > time.time():
return cached["value"]
cached = self.l3.get(key)
if cached and cached["expires"] > time.time():
self.l1[key] = cached
return cached["value"]
return None
def set(self, key: str, value: list, ttl_l1: int = 300, ttl_l3: int = 900):
now = time.time()
self.l1[key] = {"value": value, "expires": now + ttl_l1}
self.l3[key] = {"value": value, "expires": now + ttl_l3}
cache = MultiLevelCache()
cache.set("feed:alice", [1, 2, 3])
print(f"Feed: {cache.get('feed:alice')}")Ranking (ML-Based Relevance)
Feeds rank by relevance, not just chronology. Key ranking signals:
| Signal | Weight | Source |
|---|---|---|
| Recency | 0.3 | Timestamp |
| Engagement score | 0.25 | Likes + comments + shares |
| Affinity score | 0.20 | Past interaction with author |
| Media type | 0.10 | Video > image > text |
| Ad score | 0.15 | Bid price + relevance |
def rank_feed(posts: list, user_profile: dict) -> list:
def score(post: dict) -> float:
recency = 1 / (time.time() - post["ts"] + 1) * 1000
engagement = (post["likes"] * 0.5 + post["comments"] * 1.5) / 100
affinity = user_profile.get(f"affinity:{post['author']}", 0.5)
media_bonus = 0.15 if post.get("has_video") else 0.05
return recency * 0.3 + engagement * 0.25 + affinity * 0.2 + media_bonus * 0.1
scored = [(score(p), p) for p in posts]
scored.sort(key=lambda x: x[0], reverse=True)
return [p for _, p in scored[:20]]
posts = [
{"id": 1, "author": "friend1", "ts": time.time() - 60, "likes": 50, "comments": 5, "has_video": True},
{"id": 2, "author": "friend2", "ts": time.time() - 3600, "likes": 200, "comments": 20, "has_video": False},
]
user_profile = {"affinity:friend1": 0.9, "affinity:friend2": 0.2}
ranked = rank_feed(posts, user_profile)
print(f"Ranked: {[p['id'] for p in ranked]}")Real-Time Updates via WebSocket
New posts appear in the feed without refreshing via WebSocket:
import asyncio, json
class FeedWebSocket:
def __init__(self):
self.connections = {} # user_id -> [ws_connections]
async def broadcast(self, author: str, post: dict, followers: list):
message = json.dumps({"type": "new_post", "author": author, "post": post})
for follower in followers:
if follower in self.connections:
for ws in self.connections[follower]:
await ws.send(message)
async def connect(self, user_id: str, ws):
if user_id not in self.connections:
self.connections[user_id] = []
self.connections[user_id].append(ws)
async def disconnect(self, user_id: str, ws):
if user_id in self.connections:
self.connections[user_id].remove(ws)
print("WebSocket feed service ready for 10K concurrent connections")Media Storage (CDN for Images/Video)
Social media is media-heavy. Instagram stores every image in 4 versions:
| Version | Size | Use |
|---|---|---|
| Thumbnail | 150×150 | Grid view |
| Small | 320×320 | Feed preview |
| Medium | 640×640 | Detail view |
| Original | Up to 4K | Full resolution |
Upload pipeline: Client → Load balancer → Upload service → Queue → Workers (resize, filter, compress) → CDN.
Notification System
flowchart TB
Event[User Action: Like, Comment, Follow] --> Stream[Event Stream - Kafka]
Stream --> Classify[Event Classifier]
Classify -->|High priority| Instant[Instant Push]
Classify -->|Low priority| Digest[Digest Queue]
Instant --> Push[Push Notification Service]
Instant --> InApp[In-App Notification]
Digest --> Batch[Batch Sender]
Batch --> Email[Email/SMS]
Batch --> Push
Common Errors
Fan-out on write for everyone: Pushing every post to every follower works for 10K users but breaks at 100M. A celebrity with 100M followers triggers 100M cache writes per post. Always use the hybrid approach.
No feed cache warmup: After a deployment, all feed caches are cold — every user experiences 1-2 second load times. Pre-generate hot feeds for active users.
Chronological only: Strictly chronological feeds show the most noise (spam, low-quality posts). Rank by relevance, not time. Give users the option to switch.
No pagination cursors: Using page numbers (page=1, page=2) breaks when new content arrives — page 2 now has different content. Use cursor-based pagination with
beforeandafterpost IDs.Ignoring read-after-write consistency: A user posts, then immediately opens their feed — their post should appear. Use read-your-writes consistency for the author’s own feed.
Mini Project
Build a feed simulator with hybrid fan-out:
import random, time
from collections import defaultdict
class SocialFeedSimulator:
def __init__(self):
self.posts = {}
self.feed_cache = defaultdict(list)
self.followers = defaultdict(set)
self.following = defaultdict(set)
self.CELEBRITY_THRESHOLD = 50
def add_user(self, user_id: str):
pass
def follow(self, follower: str, followee: str):
self.followers[followee].add(follower)
self.following[follower].add(followee)
def create_post(self, author: str, content: str):
post_id = f"p{len(self.posts)}"
self.posts[post_id] = {"author": author, "content": content, "ts": time.time()}
if len(self.followers[author]) < self.CELEBRITY_THRESHOLD:
for f in self.followers[author]:
self.feed_cache[f].append(post_id)
def get_feed(self, user: str, limit: int = 5) -> list:
feed = list(self.feed_cache[user][-limit:])
for followee in self.following[user]:
if len(self.followers[followee]) >= self.CELEBRITY_THRESHOLD:
for pid, p in self.posts.items():
if p["author"] == followee and pid not in feed:
feed.append(pid)
return feed[-limit:]
sim = SocialFeedSimulator()
for u in ["alice", "bob", "charlie", "celebrity"]:
sim.add_user(u)
sim.follow("alice", "bob")
sim.follow("alice", "celebrity")
sim.create_post("bob", "Morning everyone!")
sim.create_post("celebrity", "Big news incoming!")
sim.create_post("charlie", "Hello world")
feed = sim.get_feed("alice")
print(f"Alice sees: {[sim.posts[p]['content'][:20] for p in feed]}")Cross-References
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro