Skip to content
Design Pastebin: Architecture for Text Sharing

Design Pastebin: Architecture for Text Sharing

DodaTech Updated Jun 20, 2026 6 min read

Designing Pastebin means building a service where users paste text, get a shareable URL, and optionally set expiration — while handling millions of pastes and reads per day with low latency.

What You’ll Learn

You’ll master write-heavy vs read-heavy tradeoffs, blob storage architecture, URL generation strategies, expiration policies, and rate limiting. You’ll build a complete Pastebin architecture.

Why This Problem Matters

Pastebin and similar services (GitHub Gist, hastebin) handle billions of pastes. The core challenge is balancing durability (don’t lose pastes) with cost (storage is expensive) and performance (pastes must load fast). At DodaTech, similar blob storage patterns handle encrypted data in DodaZIP and signature caches in Durga Antivirus Pro.

System Design Learning Path

    flowchart LR
  A[System Design Overview] --> B[URL Shortener]
  B --> C[Pastebin]
  C --> D{You Are Here}
  D --> E[Video Streaming]
  D --> F[Ride Sharing]
  style D fill:#f90,color:#fff
  

Requirements

Functional:

  • Create a paste with text content, optional expiration, optional password
  • Read a paste by its unique URL
  • List recent pastes for a user (optional auth)
  • Report/flag inappropriate content (optional)

Non-functional:

  • High durability (no lost pastes)
  • Fast reads (< 100ms for cached pastes)
  • Support pastes up to 10MB
  • Scale: 1M pastes/day, 10M reads/day

System Architecture

    flowchart TB
  User[User] -->|Create/Read| LB[Load Balancer]
  LB --> Web[Web Servers]
  Web --> Cache[(Redis Cache)]
  Web --> MetaDB[(Metadata DB - SQL)]
  Web --> Blob[(Blob Store - S3)]
  Web --> Analytics[(Analytics Pipeline)]
  Web --> RateLimiter[Rate Limiter]
  Cache --> MetaDB
  RateLimiter --> Redis2[(Redis - Counter)]
  Web -->|Generate URL| URLGen[URL Generator]
  

Read-Heavy vs Write-Heavy

AspectRead-Heavy (Posts)Write-Heavy (Analytics)
Ratio10:1 reads to writes1:10 reads to writes
Cache StrategyAggressive caching (CDN + Redis)Minimal caching, batch writes
DB PatternRead replicas, denormalizedAppend-only, columnar store
OptimizationFast reads, tolerate slow writesFast ingestion, tolerate slow reads

Pastebin is read-heavy — pastes are shared and viewed many times, but written once.

URL Generation

import hashlib, base64, time

def generate_url(content: str, salt: str = "") -> str:
    # Hash content + timestamp + salt for uniqueness
    unique = f"{content}{time.time_ns()}{salt}"
    hash_bytes = hashlib.sha256(unique.encode()).digest()
    # Base62 encode first 8 bytes for 11-character URL
    encoded = base64.b64encode(hash_bytes[:8]).decode().rstrip("=")
    # Replace + and / with URL-safe chars
    return encoded.replace("+", "A").replace("/", "B")[:8]

def validate_url(short: str) -> bool:
    return len(short) == 8 and all(c.isalnum() for c in short)

# Examples
for content in ["Hello World", "Code snippet", "Long document"]:
    url = generate_url(content)
    print(f"Content: '{content}' → URL: {url}")

Output:

Content: 'Hello World' → URL: wT7XpB2k
Content: 'Code snippet' → URL: mR9sK4nL
Content: 'Long document' → URL: fG5hJ8qW

Collision Handling

Even with SHA-256, 8-character Base62 URLs have 62^8 ≈ 218 trillion combinations — collisions are astronomically unlikely. But to be safe:

  1. Check if generated URL exists in DB
  2. If collision detected, append a counter and re-hash
  3. Use a distributed ID (Snowflake) as the seed instead

Storage Architecture

Data TypeStorageReason
Paste contentBlob store (S3, GCS)Cheaper than DB for large text
MetadataSQL (PostgreSQL)Relational queries, indexes
CacheRedisFast reads for hot pastes
AnalyticsClickHouse or timescaleHigh write throughput for views

Metadata Schema

CREATE TABLE pastes (
    id BIGSERIAL PRIMARY KEY,
    short_url VARCHAR(8) UNIQUE NOT NULL,
    user_id BIGINT REFERENCES users(id),
    title VARCHAR(255),
    content_url TEXT NOT NULL,  -- S3 key
    content_length INT NOT NULL,
    content_type VARCHAR(50) DEFAULT 'text/plain',
    password_hash VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW(),
    expires_at TIMESTAMP,
    view_count INT DEFAULT 0,
    is_flagged BOOLEAN DEFAULT FALSE
);

CREATE INDEX idx_short_url ON pastes(short_url);
CREATE INDEX idx_user_id ON pastes(user_id);
CREATE INDEX idx_expires_at ON pastes(expires_at);

Expiration Policies

PolicyImplementationStorage Saving
TTL-basedSet expires_at column, periodic cleanupHigh (cold data removed)
LRU evictionEvict least recently viewed pastesMedium
Size-basedPastes > 1MB expire in 24hLow
Never expirePro accounts, permanent pastesNone
from datetime import datetime, timedelta

class ExpirationManager:
    def __init__(self, db_connection):
        self.db = db_connection

    def cleanup_expired(self):
        query = """
            DELETE FROM pastes
            WHERE expires_at IS NOT NULL
            AND expires_at < NOW()
            RETURNING short_url, content_url
        """
        expired = self.db.execute(query)
        for paste in expired:
            self._delete_from_blob(paste['content_url'])
            self._evict_from_cache(paste['short_url'])
        return len(expired)

    def _delete_from_blob(self, url: str):
        print(f"Deleting blob: {url}")

    def _evict_from_cache(self, short_url: str):
        print(f"Evicting cache: paste:{short_url}")

manager = ExpirationManager(None)
count = manager.cleanup_expired()
print(f"Cleaned up {count} expired pastes")

Rate Limiting

Pastebin needs rate limiting to prevent abuse (spam, DDoS):

import time
from collections import defaultdict

class SlidingWindowRateLimiter:
    def __init__(self, max_requests: int = 10, window_seconds: int = 60):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.requests = defaultdict(list)

    def allow(self, client_id: str) -> bool:
        now = time.time()
        window_start = now - self.window_seconds
        client_requests = self.requests[client_id]
        # Remove old requests
        self.requests[client_id] = [t for t in client_requests if t > window_start]
        if len(self.requests[client_id]) >= self.max_requests:
            return False
        self.requests[client_id].append(now)
        return True

limiter = SlidingWindowRateLimiter(max_requests=5, window_seconds=60)
client = "user-42"
for i in range(7):
    allowed = limiter.allow(client)
    print(f"Request {i+1}: {'✓ allowed' if allowed else '✗ rate limited'}")

CDN for Content Delivery

Pastebin benefits from CDN for cached paste content:

CDN UseCache HitBenefit
Paste HTML pagesHigh (pastes shared widely)Reduced origin load
Static assets (CSS, JS)Very highFast page load
API responsesMedium (depends on popularity)Lower latency

Hot pastes (millions of views) should be cached at the CDN edge. Expired pastes invalidate cache via CDN purge API.

Common Errors

  1. Storing content in SQL directly: Large text blobs in relational databases kill query performance. Always store content in a blob store, with metadata in SQL.

  2. No content deduplication: If the same text is pasted 1000 times, you store it 1000 times. Hash content and store once with a reference counter.

  3. URL collision at scale: With 1M pastes/day, after 5 years you have ~1.8B URLs — 8-char Base62 may have collisions. Monitor and expand to 10 chars proactively.

  4. Forgetting expiration cleanup: Without a background job deleting expired pastes, storage grows unbounded. Run cleanup every 15 minutes.

  5. No abuse prevention: Without rate limiting, a single script can create millions of pastes (spam, malware hosting). Always rate limit per IP and per user.

Practice Questions

1. Why use a blob store instead of a database for paste content?
Blob stores (S3, GCS) cost ~$0.023/GB/month vs database storage at $0.10-0.50/GB/month. They also offer built-in CDN integration, versioning, and lifecycle policies for automatic expiration.
2. How would you handle a paste that goes viral (millions of views)?
The CDN caches the response after the first request. If the paste is behind auth, you need edge-side includes (ESI) or a custom caching layer. Consider generating a static HTML page at paste creation time.
3. What’s the best expiration strategy?
TTL-based expiration with tiered limits: free users get 30 days, pro users get 1 year, enterprise gets permanent. Run daily cleanup jobs. This balances storage costs with user needs.
4. Challenge: Design a pastebin that supports real-time collaboration like Google Docs.
Use Operational Transform (OT) or CRDTs. Each character edit is an operation broadcast via WebSocket. The server maintains the authoritative document state. This changes the entire architecture from write-once-read-many to a collaborative editing system.

Mini Project

Build a simple Pastebin API:

import hashlib, json, time
from datetime import datetime, timedelta

class SimplePastebin:
    def __init__(self):
        self.store = {}
        self.metadata = {}

    def create(self, content: str, expires_in_hours: int = 24) -> str:
        content_hash = hashlib.md5(content.encode()).hexdigest()[:8]
        timestamp = str(time.time_ns())
        url = hashlib.sha256(f"{content_hash}{timestamp}".encode()).hexdigest()[:8]
        self.store[url] = content
        self.metadata[url] = {
            "created_at": datetime.now().isoformat(),
            "expires_at": (datetime.now() + timedelta(hours=expires_in_hours)).isoformat(),
            "size": len(content),
        }
        return url

    def get(self, url: str) -> dict:
        if url not in self.store:
            return {"error": "Not found"}
        meta = self.metadata[url]
        if datetime.fromisoformat(meta["expires_at"]) < datetime.now():
            del self.store[url]
            del self.metadata[url]
            return {"error": "Expired"}
        return {"content": self.store[url], "metadata": meta}

api = SimplePastebin()
url = api.create("def hello():\n    print('Hello, Pastebin!')", expires_in_hours=1)
print(f"Paste URL: {url}")
paste = api.get(url)
print(f"Content: {paste['content'][:50]}...")
print(f"Expires: {paste['metadata']['expires_at']}")

Cross-References

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro