Design Twitter — Social Media System Architecture
Designing Twitter challenges you to build a system that handles tweets, timelines, hashtags, trending topics, and search — all while serving 500 million tweets per day and supporting real-time delivery.
What You’ll Learn
You’ll master tweet storage, timeline generation (fan-out on write vs read), hashtag indexing, trending topic calculation, and distributed ID generation with Snowflake.
Why This Problem Matters
Twitter’s architecture pioneered many patterns now standard in social media: pre-computed timelines, fan-out strategies, and unique ID generation. These patterns apply to any feed-based application. At DodaTech, tweet-analogous patterns power notification feeds in DodaZIP.
Tweet Architecture
flowchart TB
User[User Posts Tweet] --> LB[Load Balancer]
LB --> TweetSvc[Tweet Service]
TweetSvc --> ID[Snowflake ID Gen]
TweetSvc --> TS[(Tweet Store)]
TweetSvc --> Queue[Fan-out Queue]
Queue --> Fanout[Fan-out Workers]
Fanout --> TL[(Timeline Cache - Redis)]
Fanout --> HT[Hashtag Index]
Fanout --> Trending[Trending Calculator]
Fanout --> Search[Search Index - Elasticsearch]
Tweet ID Generation (Snowflake)
Twitter’s Snowflake generates 64-bit unique IDs:
| 1 bit (reserved) | 41 bits (timestamp) | 10 bits (worker ID) | 12 bits (sequence) |import time
class Snowflake:
def __init__(self, worker_id: int, epoch: int = 1288834974657):
self.worker_id = worker_id
self.epoch = epoch
self.sequence = 0
self.last_timestamp = -1
def next_id(self) -> int:
timestamp = int(time.time() * 1000) - self.epoch
if timestamp == self.last_timestamp:
self.sequence = (self.sequence + 1) & 4095 # 12 bits
else:
self.sequence = 0
self.last_timestamp = timestamp
return (timestamp << 22) | (self.worker_id << 12) | self.sequenceSnowflake IDs are time-sortable (chronologically ordered) and don’t require a central coordinator.
Timeline Generation
Two approaches mirror Instagram’s feed problem:
Fan-out on Write (Push) — When User A tweets, push the tweet to all followers’ timelines:
- Home timeline read is O(1) — just fetch from Redis
- Write cost scales with follower count
- Best for users with < 100K followers
Fan-out on Read (Pull) — When User opens timeline, fetch from all followed users:
- Write is O(1) — store once
- Read must merge tweets from all followed users
- Best for users who follow < 1000 accounts
Twitter’s hybrid: Celebrities use pull. Regular users use push. Timeline merges both.
Tweet Storage
CREATE TABLE tweets (
id BIGINT PRIMARY KEY, -- Snowflake ID
user_id BIGINT NOT NULL,
content VARCHAR(280) NOT NULL,
created_at TIMESTAMP NOT NULL,
retweet_count INT DEFAULT 0,
like_count INT DEFAULT 0,
INDEX idx_user_created (user_id, created_at DESC)
);
CREATE TABLE follows (
follower_id BIGINT NOT NULL,
followee_id BIGINT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (follower_id, followee_id)
);
CREATE TABLE hashtags (
id SERIAL PRIMARY KEY,
tag VARCHAR(140) NOT NULL UNIQUE
);
CREATE TABLE tweet_hashtags (
tweet_id BIGINT NOT NULL,
hashtag_id INT NOT NULL,
PRIMARY KEY (tweet_id, hashtag_id)
);Hashtag Search
Parse hashtags from tweet content on write. Index in Elasticsearch for prefix search:
def process_tweet(tweet_id: int, content: str):
tags = set(re.findall(r'#(\w+)', content.lower()))
for tag in tags:
hashtag_id = await db.fetchval(
"INSERT INTO hashtags (tag) VALUES ($1) ON CONFLICT (tag) DO UPDATE SET tag=EXCLUDED.tag RETURNING id",
tag
)
await db.execute(
"INSERT INTO tweet_hashtags (tweet_id, hashtag_id) VALUES ($1, $2)",
tweet_id, hashtag_id
)
# Index for full-text search
await elasticsearch.index(index="tweets", id=tweet_id, document={
"content": content,
"hashtags": tags,
"user_id": user_id,
"created_at": timestamp
})Trending Topics
Calculate trending topics in near real-time:
- Count unique mentions of each hashtag in the last N minutes
- Apply TF-IDF-like scoring to boost unusual spikes
- Deduplicate (remove identical trends with different casings)
- Rank and return top 10 per geographic region
async def get_trending(region: str) -> list:
cutoff = int(time.time()) - 3600 # Last hour
trends = await redis.zrevrangebyscore(
f"trending:{region}",
min=cutoff, max="+inf",
withscores=True
)
return [{"hashtag": t, "score": s} for t, s in trends[:10]]Common Mistakes
1. Fan-Out on Write for Celebrities
Posting to 50M followers causes 50M cache writes. Use pull model for high-follower accounts.
2. Sequential Integer IDs
Sequential IDs reveal tweet volume and are guessable. Snowflake IDs are random-looking and time-sortable.
3. Not Caching Timelines
Without Redis timeline caches, every timeline request hits the DB, which cannot handle Twitter’s read volume.
4. Case-Sensitive Hashtags
“#OpenCode” and “#opencode” should map to the same topic. Normalize to lowercase on storage.
5. Real-Time Trending Without Deduplication
Bot networks can artificially inflate a hashtag. Require diversity of unique users, not just volume.
6. No Soft Delete for Tweets
Deleted tweets should be soft-deleted (marked as is_deleted = True). Hard deletion breaks replies and conversations.
7. Ignoring Retweet Recursion
Track retweet depth to prevent infinite loops. A retweets B, B retweets A = endless cycle. Allow max depth of 2.
Practice Questions
1. Why does Twitter use Snowflake for ID generation?
Snowflake generates time-sortable, unique, 64-bit IDs without a central coordinator, scaling across thousands of servers.
2. How does fan-out on write scale for celebrities?
It doesn’t. For users with > 100K followers, switch to fan-out on read. Merge celebrity tweets into the timeline on access.
3. How are trending topics calculated?
Count hashtag mentions in a time window, apply frequency scoring, deduplicate, and rank.
4. Why index tweets in Elasticsearch?
Elasticsearch provides full-text search with prefix matching, fuzzy search, and relevance ranking — impossible with SQL LIKE queries at scale.
5. Challenge: Implement a retweet feature.
Add a retweet_of_id column to tweets. On retweet, create a new tweet referencing the original. Show retweet counts. When the original is deleted, hide all retweets.
Mini Project: Tweet API
Build a Twitter-like API:
POST /tweet— create tweet with Snowflake IDGET /timeline/{user_id}— returns merged timelinePOST /follow— follow another userGET /trending— return top 10 hashtags (last hour)GET /search?q=hashtag— search tweets by hashtag- Use Redis for timeline caching (fan-out on write for < 1000 followers)
What’s Next
Congratulations on completing this Twitter design! Here’s where to go from here:
- Practice daily — Design one Twitter subsystem per day
- Build a project — Implement a timeline generator with fan-out
- Explore related topics — Distributed IDs, search indexing, real-time analytics
- Join the community — Share your designs and get feedback
Remember: every expert was once a beginner. Keep designing!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro