CDN Deep Dive — Edge Servers, Cache Invalidation, and Geo-Routing Explained
A Content Delivery Network (CDN) is a globally distributed network of proxy servers that cache and deliver content from locations closest to end users, reducing latency by 50-80% and offloading origin traffic. In this deep dive, you’ll move beyond basic CDN setup to understand edge architecture, cache invalidation strategies, geo-routing mechanics, and how to choose between CloudFront, Cloudflare, and Akamai.
Why CDN Architecture Matters
Netflix accounts for 15% of global internet traffic — almost entirely delivered through its Open Connect CDN. A poorly configured CDN can mean 2-second load times for a Tokyo user hitting a Virginia origin. At DodaTech, CDN delivery patterns optimize asset loading in Doda Browser and ensure fast antivirus definition updates for Durga Antivirus Pro. Understanding CDN internals lets you design systems that feel local no matter where users are.
Edge Server Architecture
A CDN edge server is a reverse proxy running a high-performance HTTP cache (typically NGINX, Varnish, or a custom implementation). When a user requests content, the edge server checks its local cache. On a hit, it serves instantly. On a miss, it fetches from the origin or a parent tier.
graph TD
User[User Browser] -->|DNS lookup| CDN_DNS[CDN DNS]
CDN_DNS -->|Returns nearest edge IP| Edge[Edge Server]
Edge -->|Cache hit| User
Edge -->|Cache miss| Parent[Parent Tier]
Parent -->|Miss| Origin[Origin Server]
Parent --> Edge
Origin --> Parent
Edge -->|Cache miss| Origin
style Edge fill:#8e44ad,color:#fff
style Parent fill:#3498db,color:#fff
style Origin fill:#c0392b,color:#fff
Edge servers use a tiered caching hierarchy: L1 edges near users → L2 regional parents → origin. This prevents a stampede of L1 misses from hitting the origin simultaneously.
Origin Pull vs Push
Origin pull is the default mode. The CDN requests content from your origin on the first cache miss. Subsequent requests for the same resource are served from the edge. This is simple to set up but means the first user to request a resource experiences origin latency.
Origin push proactively uploads content to edge servers before any user requests it. This eliminates the cold-start penalty and gives you control over what gets cached and when.
# Origin pull configuration (AWS CloudFront + S3)
import boto3
client = boto3.client('cloudfront')
response = client.create_distribution(
DistributionConfig={
'Enabled': True,
'Origins': {
'Quantity': 1,
'Items': [{
'Id': 's3-origin',
'DomainName': 'my-bucket.s3.amazonaws.com',
'S3OriginConfig': {'OriginAccessIdentity': ''},
}]
},
'DefaultCacheBehavior': {
'TargetOriginId': 's3-origin',
'ViewerProtocolPolicy': 'redirect-to-https',
'DefaultTTL': 86400,
'MaxTTL': 604800,
},
'PriceClass': 'PriceClass_100',
}
)
print(f"CDN domain: {response['Distribution']['DomainName']}")Cache Invalidation Strategies
Invalidating cached content is one of the hardest CDN problems. Four main approaches:
Versioned filenames — the gold standard. styles.a1b2c3.css never changes content. When you update, deploy styles.d4e5f6.css. The old URL naturally expires from cache. No invalidation needed.
TTL-based expiration — set Cache-Control: max-age=3600. Content automatically expires after 1 hour. The next request fetches fresh content. Simple but you can’t force immediate updates.
API-based purge — call the CDN provider’s API to invalidate specific paths. CloudFront supports CreateInvalidation. Works but has rate limits and propagation delays (minutes to hours).
Origin response with Cache-Control: no-cache — the origin tells the CDN to revalidate on every request using If-Modified-Since or ETag. Bandwidth-saving without staleness.
# API-based cache purge for CloudFront
def purge_cache(distribution_id: str, paths: list):
client.create_invalidation(
DistributionId=distribution_id,
InvalidationBatch={
'Paths': {'Quantity': len(paths), 'Items': paths},
'CallerReference': str(time.time()),
}
)
print(f"Purge initiated for {len(paths)} paths")CDN for Dynamic Content
CDNs traditionally serve static assets, but modern providers also accelerate dynamic content. Cloudflare Workers and AWS Lambda@Edge let you run code at edge locations. Dynamic content acceleration optimizes the TCP connection from edge to origin — using keep-alive, TCP optimization, and route optimization over the CDN’s private backbone.
# NGINX cache configuration for API responses
location /api/ {
proxy_pass http://origin-server;
proxy_cache my_cache;
proxy_cache_valid 200 60s; # Cache 200 responses for 60s
proxy_cache_key "$scheme$request_method$host$request_uri";
add_header X-Cache-Status $upstream_cache_status;
}Geo-Routing and Anycast
CDNs use two routing techniques:
Anycast — the same IP address is advertised from multiple edge locations worldwide. BGP routes each user to the nearest location automatically. Simple, fast, and fault-tolerant.
DNS-based routing — the CDN’s authoritative DNS server returns different IP addresses based on the requester’s geographic location (GeoDNS). More flexible but subject to DNS caching and latency.
CDN Provider Comparison
| Feature | CloudFront | Cloudflare | Akamai |
|---|---|---|---|
| Edge locations | 450+ | 310+ | 4,100+ |
| Pricing | Pay-as-you-go | Free tier available | Enterprise contract |
| Dynamic content | Lambda@Edge | Workers | EdgeWorkers |
| DDoS protection | AWS Shield | Built-in (up to Tbps) | Proactive throttling |
| Custom SSL | Free (via ACM) | Free (Universal SSL) | Custom certs |
| Origin types | Any (S3, ALB, HTTP) | Any (HTTP, Argo) | Any |
DDoS Protection at the Edge
CDNs absorb DDoS attacks by distributing traffic across thousands of edge servers. The origin is never directly exposed. Cloudflare has mitigated 2+ Tbps attacks. Key mechanisms:
- Rate limiting per IP at the edge before traffic reaches origin
- WAF rules to filter malicious patterns (SQL injection, XSS)
- Challenge pages (CAPTCHA, JavaScript challenge) for suspicious requests
- Connection limiting to prevent resource exhaustion
# Cloudflare WAF rule via API
import requests
rule_payload = {
"description": "Block SQL injection",
"expression": '(http.request.uri contains "union select")',
"action": "block",
"priority": 1
}
headers = {"Authorization": f"Bearer {CF_API_TOKEN}", "Content-Type": "application/json"}
resp = requests.post(
f"https://api.cloudflare.com/client/v4/zones/{ZONE_ID}/rulesets",
json=rule_payload,
headers=headers
)
print(f"WAF rule deployed: {resp.status_code}")Common Errors
Caching dynamic user-specific content: Personalized dashboards, shopping carts, and account settings must not be cached at the CDN. Use
Cache-Control: privateor setCookiesin the cache key to vary per user.No origin redundancy: If your CDN points to a single origin server and it goes down, all edge requests fail. Use multiple origins with failover or a load balancer behind the CDN.
TTL too short for static assets: Setting
max-age=60on versioned CSS/JS means users re-download on every page load. Set immutable assets tomax-age=31536000(1 year).Ignoring cache hit ratio: A 30% cache hit ratio means 70% of requests hit the origin. Monitor this metric in CDN analytics and tune your caching rules.
Cold start stampede: When a popular video is uploaded and not pre-cached, the first wave of users all trigger cache misses simultaneously. Use origin push for anticipated traffic.
Not using origin shield: A parent cache tier consolidates L1 edge misses before they reach origin. Without it, 1000 edges all missing simultaneously send 1000 requests to origin.
Misconfigured cross-origin requests: CDN-hosted fonts or scripts blocked by CORS. Set
Access-Control-Allow-Origin: *on CDN responses for public assets.
Mini Project
Build a CDN performance comparison tool:
import urllib.request
import time
import statistics
def measure_latency(url: str, samples: int = 5) -> dict:
times = []
for _ in range(samples):
start = time.time()
urllib.request.urlopen(url, timeout=5)
times.append(time.time() - start)
return {
"min": min(times),
"max": max(times),
"avg": statistics.mean(times),
"median": statistics.median(times)
}
# Compare direct origin vs CDN
direct = measure_latency("https://origin.example.com/asset.jpg")
cdn = measure_latency("https://cdn.example.com/asset.jpg")
print(f"Direct origin: avg={direct['avg']:.3f}s, median={direct['median']:.3f}s")
print(f"CDN: avg={cdn['avg']:.3f}s, median={cdn['median']:.3f}s")
print(f"Speedup: {direct['avg'] / cdn['avg']:.1f}x")Expected output (varies by location):
Direct origin: avg=0.342s, median=0.338s
CDN: avg=0.045s, median=0.042s
Speedup: 7.6xCross-References
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro