Design a URL Shortener — System Design Interview Guide
Designing a URL shortener is a classic system design interview problem. You’re asked to create a service that takes a long URL and returns a short, unique alias — like TinyURL or bit.ly.
What You’ll Learn
You’ll master requirements gathering, Base62 encoding, hash collision handling, 301 vs 302 redirect strategies, and analytics tracking. You’ll build a complete architecture for a production URL shortener.
Why This Problem Matters
URL shorteners are deceptively simple — the core feature is trivial, but scaling to millions of URLs requires careful design of hashing, storage, redirect performance, and analytics. At DodaTech, similar URL routing techniques optimize link sharing in Doda Browser.
System Design Learning Path
flowchart LR
A[System Design Overview] --> B[URL Shortener]
B --> C{You Are Here}
C --> D[Chat System]
C --> E[Rate Limiter]
style C fill:#f90,color:#fff
Requirements
Functional:
- Generate a unique short URL for any given long URL
- Redirect short URL to the original long URL
- Optional: custom alias, expiration, analytics
Non-functional:
- Highly available (redirects must never fail)
- Low latency (redirects in < 100ms)
- Scale: 100M+ URLs, 10K writes/sec, 100K reads/sec
System Architecture
flowchart TB
User[User] -->|Short URL click| LB[Load Balancer]
LB --> Web[Web Servers]
Web --> Cache[(Redis Cache)]
Web --> DB[(SQL Database)]
Web --> Analytics[(Analytics DB)]
Cache --> DB
Web -->|Generate hash| Hash[Hash Service]
Hash --> DB
URL Encoding (Base62)
A short URL like https://short.ly/abc123 uses Base62 encoding (lowercase + uppercase + digits):
Base62: [a-z][A-Z][0-9] = 26 + 26 + 10 = 62 charactersWith 7 characters: 62^7 = ~3.5 trillion unique URLs.
import string
BASE62 = string.ascii_lowercase + string.ascii_uppercase + string.digits
def encode(num: int) -> str:
if num == 0:
return BASE62[0]
result = []
while num > 0:
result.append(BASE62[num % 62])
num //= 62
return ''.join(reversed(result))
def decode(short: str) -> int:
num = 0
for c in short:
num = num * 62 + BASE62.index(c)
return numHash Collision Strategies
Approach 1: Hash + Base62
import hashlib
def generate_short(url: str) -> str:
hash_bytes = hashlib.md5(url.encode()).digest()
num = int.from_bytes(hash_bytes[:6], 'big') # 6 bytes
return encode(num % 62**7) # 7 charsProblem: Hash collisions — two URLs could produce the same short code.
Approach 2: Unique ID + Base62 (Recommended)
Use a distributed unique ID generator (like Snowflake or database sequence). Encode that ID to Base62. No collisions guaranteed.
-- Auto-increment ID (for single DB)
INSERT INTO urls (long_url) VALUES ('https://example.com');
-- Use LAST_INSERT_ID() and encode itApproach 3: Bloom Filter Check
Use a Bloom filter (well, Bloom filter isn’t in glossary with that key, but the concept is clear) to check if a URL was already shortened. If yes, return existing short URL.
301 vs 302 Redirect
| Status | Meaning | Browser Caches? | Use Case |
|---|---|---|---|
| 301 | Permanently Moved | Yes (forever) | Production URLs that won’t change |
| 302 | Temporarily Moved | No | Analytics tracking, A/B testing |
Use 302 if you need analytics (each click hits your server). Use 301 for maximum performance (browser skips your server entirely after first request).
Analytics Tracking
Track per short URL:
- Number of clicks
- Referrer (HTTP Referer header)
- Browser, OS, device type
- Geographic location (from IP)
- Timestamp of each click
Store in a separate analytics database (columnar like MySQL or time-series DB). Update cache asynchronously.
Common Mistakes
1. Using MD5/SHA Directly as Short Code
Full hashes are too long. Always truncate and handle collisions.
2. Not Handling Duplicate URLs
The same long URL should return the same short URL. Check existing entries before creating a new one.
3. Using 301 Without Analytics
If you need click data, use 302. A 301 redirect means browsers skip your server.
4. Single Database Write Bottleneck
Auto-increment IDs don’t scale across multiple write masters. Use distributed ID generators (Snowflake, UUID).
5. No Expiration Cleanup
Old unused URLs accumulate. Implement TTL-based cleanup using a background job.
6. Forgetting Rate Limiting
Without Rate Limiting, bots can exhaust your ID space. Limit to 10 shortens/second per user.
7. Not Encoding Characters for URLs
Base62 works because it avoids ambiguous characters (O/0, l/1). Always use a URL-safe alphabet.
Practice Questions
1. Why use Base62 instead of Base64?
Base62 omits + and / which have special meaning in URLs. Base62 is URL-safe without encoding.
2. What’s the advantage of 302 over 301 for redirects?
302 redirects always hit your server, allowing click analytics. 301 redirects are cached by browsers.
3. How do you handle hash collisions?
Use a unique ID generator + Base62 encoding (collision-free). Alternatively, check against existing entries and rehash with a salt.
4. Challenge: Add custom aliases.
Allow users to specify custom short codes. Check uniqueness before saving. Reserve common words (admin, api, login).
Mini Project: URL Shortener API
Build a minimal URL shortener with Express or Flask:
POST /shorten— accepts{ "url": "..." }, returns{ "short": "abc123" }GET /{short}— redirects to the original URL- Store mappings in Redis with TTL
- Track click count per URL
- Add rate limiting per IP
What’s Next
Congratulations on completing this system design problem! Here’s where to go from here:
- Practice daily — Design one system per day
- Build a project — Implement a working URL shortener
- Explore related topics — Distributed ID generation, CDN caching
- Join the community — Share your system designs and get feedback
Remember: every expert was once a beginner. Keep designing!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro