Skip to content
Design Twitter — Social Media System Architecture

Design Twitter — Social Media System Architecture

DodaTech Updated Jun 15, 2026 5 min read

Designing Twitter challenges you to build a system that handles tweets, timelines, hashtags, trending topics, and search — all while serving 500 million tweets per day and supporting real-time delivery.

What You’ll Learn

You’ll master tweet storage, timeline generation (fan-out on write vs read), hashtag indexing, trending topic calculation, and distributed ID generation with Snowflake.

Why This Problem Matters

Twitter’s architecture pioneered many patterns now standard in social media: pre-computed timelines, fan-out strategies, and unique ID generation. These patterns apply to any feed-based application. At DodaTech, tweet-analogous patterns power notification feeds in DodaZIP.

Tweet Architecture

    flowchart TB
  User[User Posts Tweet] --> LB[Load Balancer]
  LB --> TweetSvc[Tweet Service]
  TweetSvc --> ID[Snowflake ID Gen]
  TweetSvc --> TS[(Tweet Store)]
  TweetSvc --> Queue[Fan-out Queue]
  Queue --> Fanout[Fan-out Workers]
  Fanout --> TL[(Timeline Cache - Redis)]
  Fanout --> HT[Hashtag Index]
  Fanout --> Trending[Trending Calculator]
  Fanout --> Search[Search Index - Elasticsearch]
  

Tweet ID Generation (Snowflake)

Twitter’s Snowflake generates 64-bit unique IDs:

| 1 bit (reserved) | 41 bits (timestamp) | 10 bits (worker ID) | 12 bits (sequence) |
import time

class Snowflake:
    def __init__(self, worker_id: int, epoch: int = 1288834974657):
        self.worker_id = worker_id
        self.epoch = epoch
        self.sequence = 0
        self.last_timestamp = -1

    def next_id(self) -> int:
        timestamp = int(time.time() * 1000) - self.epoch
        if timestamp == self.last_timestamp:
            self.sequence = (self.sequence + 1) & 4095  # 12 bits
        else:
            self.sequence = 0
        self.last_timestamp = timestamp
        return (timestamp << 22) | (self.worker_id << 12) | self.sequence

Snowflake IDs are time-sortable (chronologically ordered) and don’t require a central coordinator.

Timeline Generation

Two approaches mirror Instagram’s feed problem:

Fan-out on Write (Push) — When User A tweets, push the tweet to all followers’ timelines:

  • Home timeline read is O(1) — just fetch from Redis
  • Write cost scales with follower count
  • Best for users with < 100K followers

Fan-out on Read (Pull) — When User opens timeline, fetch from all followed users:

  • Write is O(1) — store once
  • Read must merge tweets from all followed users
  • Best for users who follow < 1000 accounts

Twitter’s hybrid: Celebrities use pull. Regular users use push. Timeline merges both.

Tweet Storage

CREATE TABLE tweets (
    id BIGINT PRIMARY KEY,          -- Snowflake ID
    user_id BIGINT NOT NULL,
    content VARCHAR(280) NOT NULL,
    created_at TIMESTAMP NOT NULL,
    retweet_count INT DEFAULT 0,
    like_count INT DEFAULT 0,
    INDEX idx_user_created (user_id, created_at DESC)
);

CREATE TABLE follows (
    follower_id BIGINT NOT NULL,
    followee_id BIGINT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (follower_id, followee_id)
);

CREATE TABLE hashtags (
    id SERIAL PRIMARY KEY,
    tag VARCHAR(140) NOT NULL UNIQUE
);

CREATE TABLE tweet_hashtags (
    tweet_id BIGINT NOT NULL,
    hashtag_id INT NOT NULL,
    PRIMARY KEY (tweet_id, hashtag_id)
);

Hashtag Search

Parse hashtags from tweet content on write. Index in Elasticsearch for prefix search:

def process_tweet(tweet_id: int, content: str):
    tags = set(re.findall(r'#(\w+)', content.lower()))
    for tag in tags:
        hashtag_id = await db.fetchval(
            "INSERT INTO hashtags (tag) VALUES ($1) ON CONFLICT (tag) DO UPDATE SET tag=EXCLUDED.tag RETURNING id",
            tag
        )
        await db.execute(
            "INSERT INTO tweet_hashtags (tweet_id, hashtag_id) VALUES ($1, $2)",
            tweet_id, hashtag_id
        )
    # Index for full-text search
    await elasticsearch.index(index="tweets", id=tweet_id, document={
        "content": content,
        "hashtags": tags,
        "user_id": user_id,
        "created_at": timestamp
    })

Trending Topics

Calculate trending topics in near real-time:

  1. Count unique mentions of each hashtag in the last N minutes
  2. Apply TF-IDF-like scoring to boost unusual spikes
  3. Deduplicate (remove identical trends with different casings)
  4. Rank and return top 10 per geographic region
async def get_trending(region: str) -> list:
    cutoff = int(time.time()) - 3600  # Last hour
    trends = await redis.zrevrangebyscore(
        f"trending:{region}",
        min=cutoff, max="+inf",
        withscores=True
    )
    return [{"hashtag": t, "score": s} for t, s in trends[:10]]

Common Mistakes

1. Fan-Out on Write for Celebrities

Posting to 50M followers causes 50M cache writes. Use pull model for high-follower accounts.

2. Sequential Integer IDs

Sequential IDs reveal tweet volume and are guessable. Snowflake IDs are random-looking and time-sortable.

3. Not Caching Timelines

Without Redis timeline caches, every timeline request hits the DB, which cannot handle Twitter’s read volume.

4. Case-Sensitive Hashtags

“#OpenCode” and “#opencode” should map to the same topic. Normalize to lowercase on storage.

5. Real-Time Trending Without Deduplication

Bot networks can artificially inflate a hashtag. Require diversity of unique users, not just volume.

6. No Soft Delete for Tweets

Deleted tweets should be soft-deleted (marked as is_deleted = True). Hard deletion breaks replies and conversations.

7. Ignoring Retweet Recursion

Track retweet depth to prevent infinite loops. A retweets B, B retweets A = endless cycle. Allow max depth of 2.

Practice Questions

1. Why does Twitter use Snowflake for ID generation?

Snowflake generates time-sortable, unique, 64-bit IDs without a central coordinator, scaling across thousands of servers.

2. How does fan-out on write scale for celebrities?

It doesn’t. For users with > 100K followers, switch to fan-out on read. Merge celebrity tweets into the timeline on access.

3. How are trending topics calculated?

Count hashtag mentions in a time window, apply frequency scoring, deduplicate, and rank.

4. Why index tweets in Elasticsearch?

Elasticsearch provides full-text search with prefix matching, fuzzy search, and relevance ranking — impossible with SQL LIKE queries at scale.

5. Challenge: Implement a retweet feature.

Add a retweet_of_id column to tweets. On retweet, create a new tweet referencing the original. Show retweet counts. When the original is deleted, hide all retweets.

Mini Project: Tweet API

Build a Twitter-like API:

  1. POST /tweet — create tweet with Snowflake ID
  2. GET /timeline/{user_id} — returns merged timeline
  3. POST /follow — follow another user
  4. GET /trending — return top 10 hashtags (last hour)
  5. GET /search?q=hashtag — search tweets by hashtag
  6. Use Redis for timeline caching (fan-out on write for < 1000 followers)

What’s Next

Congratulations on completing this Twitter design! Here’s where to go from here:

  • Practice daily — Design one Twitter subsystem per day
  • Build a project — Implement a timeline generator with fan-out
  • Explore related topics — Distributed IDs, search indexing, real-time analytics
  • Join the community — Share your designs and get feedback

Remember: every expert was once a beginner. Keep designing!

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro