Learn System: Design Pastebin: Architecture for Text Sharing

Design Pastebin: Architecture for Text Sharing

DodaTech Updated Jun 20, 2026 6 min read

Designing Pastebin means building a service where users paste text, get a shareable URL, and optionally set expiration — while handling millions of pastes and reads per day with low latency.

What You’ll Learn

You’ll master write-heavy vs read-heavy tradeoffs, blob storage architecture, URL generation strategies, expiration policies, and rate limiting. You’ll build a complete Pastebin architecture.

Why This Problem Matters

Pastebin and similar services (GitHub Gist, hastebin) handle billions of pastes. The core challenge is balancing durability (don’t lose pastes) with cost (storage is expensive) and performance (pastes must load fast). At DodaTech, similar blob storage patterns handle encrypted data in DodaZIP and signature caches in Durga Antivirus Pro.

System Design Learning Path

    flowchart LR
  A[System Design Overview] --> B[URL Shortener]
  B --> C[Pastebin]
  C --> D{You Are Here}
  D --> E[Video Streaming]
  D --> F[Ride Sharing]
  style D fill:#f90,color:#fff

Requirements

Functional:

Create a paste with text content, optional expiration, optional password
Read a paste by its unique URL
List recent pastes for a user (optional auth)
Report/flag inappropriate content (optional)

Non-functional:

High durability (no lost pastes)
Fast reads (< 100ms for cached pastes)
Support pastes up to 10MB
Scale: 1M pastes/day, 10M reads/day

System Architecture

    flowchart TB
  User[User] -->|Create/Read| LB[Load Balancer]
  LB --> Web[Web Servers]
  Web --> Cache[(Redis Cache)]
  Web --> MetaDB[(Metadata DB - SQL)]
  Web --> Blob[(Blob Store - S3)]
  Web --> Analytics[(Analytics Pipeline)]
  Web --> RateLimiter[Rate Limiter]
  Cache --> MetaDB
  RateLimiter --> Redis2[(Redis - Counter)]
  Web -->|Generate URL| URLGen[URL Generator]

Read-Heavy vs Write-Heavy

Aspect	Read-Heavy (Posts)	Write-Heavy (Analytics)
Ratio	10:1 reads to writes	1:10 reads to writes
Cache Strategy	Aggressive caching (CDN + Redis)	Minimal caching, batch writes
DB Pattern	Read replicas, denormalized	Append-only, columnar store
Optimization	Fast reads, tolerate slow writes	Fast ingestion, tolerate slow reads

Pastebin is read-heavy — pastes are shared and viewed many times, but written once.

URL Generation

import hashlib, base64, time

def generate_url(content: str, salt: str = "") -> str:
    # Hash content + timestamp + salt for uniqueness
    unique = f"{content}{time.time_ns()}{salt}"
    hash_bytes = hashlib.sha256(unique.encode()).digest()
    # Base62 encode first 8 bytes for 11-character URL
    encoded = base64.b64encode(hash_bytes[:8]).decode().rstrip("=")
    # Replace + and / with URL-safe chars
    return encoded.replace("+", "A").replace("/", "B")[:8]

def validate_url(short: str) -> bool:
    return len(short) == 8 and all(c.isalnum() for c in short)

# Examples
for content in ["Hello World", "Code snippet", "Long document"]:
    url = generate_url(content)
    print(f"Content: '{content}' → URL: {url}")

Output:

Content: 'Hello World' → URL: wT7XpB2k
Content: 'Code snippet' → URL: mR9sK4nL
Content: 'Long document' → URL: fG5hJ8qW

Collision Handling

Even with SHA-256, 8-character Base62 URLs have 62^8 ≈ 218 trillion combinations — collisions are astronomically unlikely. But to be safe:

Check if generated URL exists in DB
If collision detected, append a counter and re-hash
Use a distributed ID (Snowflake) as the seed instead

Storage Architecture

Data Type	Storage	Reason
Paste content	Blob store (S3, GCS)	Cheaper than DB for large text
Metadata	SQL (PostgreSQL)	Relational queries, indexes
Cache	Redis	Fast reads for hot pastes
Analytics	ClickHouse or timescale	High write throughput for views

Metadata Schema

CREATE TABLE pastes (
    id BIGSERIAL PRIMARY KEY,
    short_url VARCHAR(8) UNIQUE NOT NULL,
    user_id BIGINT REFERENCES users(id),
    title VARCHAR(255),
    content_url TEXT NOT NULL,  -- S3 key
    content_length INT NOT NULL,
    content_type VARCHAR(50) DEFAULT 'text/plain',
    password_hash VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP,
    view_count INT DEFAULT 0,
    is_flagged BOOLEAN DEFAULT FALSE
);

CREATE INDEX idx_short_url ON pastes(short_url);
CREATE INDEX idx_user_id ON pastes(user_id);
CREATE INDEX idx_expires_at ON pastes(expires_at);

Expiration Policies

Policy	Implementation	Storage Saving
TTL-based	Set `expires_at` column, periodic cleanup	High (cold data removed)
LRU eviction	Evict least recently viewed pastes	Medium
Size-based	Pastes > 1MB expire in 24h	Low
Never expire	Pro accounts, permanent pastes	None

from datetime import datetime, timedelta

class ExpirationManager:
    def __init__(self, db_connection):
        self.db = db_connection

    def cleanup_expired(self):
        query = """
            DELETE FROM pastes
            WHERE expires_at IS NOT NULL
            AND expires_at < NOW()
            RETURNING short_url, content_url
        """
        expired = self.db.execute(query)
        for paste in expired:
            self._delete_from_blob(paste['content_url'])
            self._evict_from_cache(paste['short_url'])
        return len(expired)

    def _delete_from_blob(self, url: str):
        print(f"Deleting blob: {url}")

    def _evict_from_cache(self, short_url: str):
        print(f"Evicting cache: paste:{short_url}")

manager = ExpirationManager(None)
count = manager.cleanup_expired()
print(f"Cleaned up {count} expired pastes")

Rate Limiting

Pastebin needs rate limiting to prevent abuse (spam, DDoS):

import time
from collections import defaultdict

class SlidingWindowRateLimiter:
    def __init__(self, max_requests: int = 10, window_seconds: int = 60):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.requests = defaultdict(list)

    def allow(self, client_id: str) -> bool:
        now = time.time()
        window_start = now - self.window_seconds
        client_requests = self.requests[client_id]
        # Remove old requests
        self.requests[client_id] = [t for t in client_requests if t > window_start]
        if len(self.requests[client_id]) >= self.max_requests:
            return False
        self.requests[client_id].append(now)
        return True

limiter = SlidingWindowRateLimiter(max_requests=5, window_seconds=60)
client = "user-42"
for i in range(7):
    allowed = limiter.allow(client)
    print(f"Request {i+1}: {'✓ allowed' if allowed else '✗ rate limited'}")

CDN for Content Delivery

Pastebin benefits from CDN for cached paste content:

CDN Use	Cache Hit	Benefit
Paste HTML pages	High (pastes shared widely)	Reduced origin load
Static assets (CSS, JS)	Very high	Fast page load
API responses	Medium (depends on popularity)	Lower latency

Hot pastes (millions of views) should be cached at the CDN edge. Expired pastes invalidate cache via CDN purge API.

Common Errors

Storing content in SQL directly: Large text blobs in relational databases kill query performance. Always store content in a blob store, with metadata in SQL.
No content deduplication: If the same text is pasted 1000 times, you store it 1000 times. Hash content and store once with a reference counter.
URL collision at scale: With 1M pastes/day, after 5 years you have ~1.8B URLs — 8-char Base62 may have collisions. Monitor and expand to 10 chars proactively.
Forgetting expiration cleanup: Without a background job deleting expired pastes, storage grows unbounded. Run cleanup every 15 minutes.
No abuse prevention: Without rate limiting, a single script can create millions of pastes (spam, malware hosting). Always rate limit per IP and per user.

Practice Questions

1. Why use a blob store instead of a database for paste content?

Blob stores (S3, GCS) cost ~$0.023/GB/month vs database storage at $0.10-0.50/GB/month. They also offer built-in CDN integration, versioning, and lifecycle policies for automatic expiration.

2. How would you handle a paste that goes viral (millions of views)?

The CDN caches the response after the first request. If the paste is behind auth, you need edge-side includes (ESI) or a custom caching layer. Consider generating a static HTML page at paste creation time.

3. What’s the best expiration strategy?

TTL-based expiration with tiered limits: free users get 30 days, pro users get 1 year, enterprise gets permanent. Run daily cleanup jobs. This balances storage costs with user needs.

4. Challenge: Design a pastebin that supports real-time collaboration like Google Docs.

Use Operational Transform (OT) or CRDTs. Each character edit is an operation broadcast via WebSocket. The server maintains the authoritative document state. This changes the entire architecture from write-once-read-many to a collaborative editing system.

Mini Project

Build a simple Pastebin API:

import hashlib, json, time
from datetime import datetime, timedelta

class SimplePastebin:
    def __init__(self):
        self.store = {}
        self.metadata = {}

    def create(self, content: str, expires_in_hours: int = 24) -> str:
        content_hash = hashlib.md5(content.encode()).hexdigest()[:8]
        timestamp = str(time.time_ns())
        url = hashlib.sha256(f"{content_hash}{timestamp}".encode()).hexdigest()[:8]
        self.store[url] = content
        self.metadata[url] = {
            "created_at": datetime.now().isoformat(),
            "expires_at": (datetime.now() + timedelta(hours=expires_in_hours)).isoformat(),
            "size": len(content),
        }
        return url

    def get(self, url: str) -> dict:
        if url not in self.store:
            return {"error": "Not found"}
        meta = self.metadata[url]
        if datetime.fromisoformat(meta["expires_at"]) < datetime.now():
            del self.store[url]
            del self.metadata[url]
            return {"error": "Expired"}
        return {"content": self.store[url], "metadata": meta}

api = SimplePastebin()
url = api.create("def hello():\n    print('Hello, Pastebin!')", expires_in_hours=1)
print(f"Paste URL: {url}")
paste = api.get(url)
print(f"Content: {paste['content'][:50]}...")
print(f"Expires: {paste['metadata']['expires_at']}")

Cross-References

Previous Design a Rate Limiter — System Design Guide Next Design a Web Crawler — System Design Guide

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse System Design Problems