Design Uber: Ride-Sharing System Architecture
Designing a ride-sharing platform like Uber requires building a real-time location-based system that matches millions of riders with nearby drivers, calculates ETAs, adjusts prices dynamically, and handles trip lifecycle — all within seconds.
What You’ll Learn
You’ll master WebSocket-based GPS tracking, geohashing and quadtree geospatial indexing, ride matching algorithms, surge pricing models, ETA calculation with routing engines, and trip history storage at scale.
Why This Problem Matters
Uber processes 25 million trips daily across 70+ countries. The system combines real-time streaming, geospatial indexing, marketplace economics, and machine learning. Every ride involves 30+ microservice calls — from driver matching to payment processing. At DodaTech, geolocation patterns from Uber optimize location-aware features in Doda Browser.
System Design Learning Path
flowchart LR
A[System Design Overview] --> B[Pastebin]
B --> C[Video Streaming]
C --> D[Ride Sharing]
D --> E{You Are Here}
E --> F[Social Media Feed]
E --> G[E-Commerce]
style E fill:#f90,color:#fff
Architecture Overview
flowchart TB
Rider[Rider App] -->|Request ride| API[API Gateway]
Driver[Driver App] -->|GPS WebSocket| WS[WebSocket Service]
WS --> DM[Driver Matching Service]
DM --> Geo[Geospatial Index]
Geo --> Redis[(Redis - Driver Locations)]
DM --> Trips[(Trips DB)]
API --> Pricing[Surge Pricing Engine]
API --> ETA[ETA Service]
API --> Payment[Payment Service]
Pricing --> Analytics[(Analytics - Spark)]
ETA --> Maps[Maps/Routing API]
DM --> Notif[Push Notification]
Location Tracking via WebSocket
Drivers send GPS coordinates continuously. The WebSocket connection maintains a persistent bi-directional stream.
| Frequency | State | Purpose |
|---|---|---|
| Every 4 seconds | Active trip | Precise ETA, route tracking |
| Every 30 seconds | Idle/available | Driver discovery |
| On demand | Any | Location for matching |
import asyncio, json, random, time
class DriverLocationService:
def __init__(self):
self.drivers = {} # driver_id -> {lat, lon, status}
async def update_location(self, driver_id: str, lat: float, lon: float):
self.drivers[driver_id] = {
"lat": lat,
"lon": lon,
"status": self.drivers.get(driver_id, {}).get("status", "available"),
"updated_at": time.time()
}
async def get_nearby_drivers(self, lat: float, lon: float, radius_km: float) -> list:
nearby = []
for d_id, loc in self.drivers.items():
if loc["status"] != "available":
continue
dist = self._haversine(lat, lon, loc["lat"], loc["lon"])
if dist <= radius_km:
nearby.append({"driver_id": d_id, "distance_km": round(dist, 2)})
return sorted(nearby, key=lambda x: x["distance_km"])
def _haversine(self, lat1, lon1, lat2, lon2):
import math
R = 6371 # Earth radius in km
dlat = math.radians(lat2 - lat1)
dlon = math.radians(lon2 - lon1)
a = math.sin(dlat/2)**2 + math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) * math.sin(dlon/2)**2
return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
# Simulation
service = DriverLocationService()
for i in range(10):
asyncio.run(service.update_location(f"driver-{i}", 28.61 + random.uniform(-0.05, 0.05), 77.23 + random.uniform(-0.05, 0.05)))
nearby = asyncio.run(service.get_nearby_drivers(28.61, 77.23, 3.0))
print(f"Found {len(nearby)} nearby drivers:")
for d in nearby[:5]:
print(f" {d['driver_id']} — {d['distance_km']}km away")Output:
Found 10 nearby drivers:
driver-3 — 1.2km away
driver-7 — 1.8km away
driver-1 — 2.1km away
driver-5 — 2.5km away
driver-9 — 2.8km awayRide Matching: Geohashing and Quadtree
Brute-force computing distance to every driver doesn’t scale. Uber uses geospatial indexing:
Geohashing
Geohash converts lat/lon into a short string. Nearby locations share prefixes:
Location A (28.61, 77.23) → geohash: "ttnq5"
Location B (28.62, 77.24) → geohash: "ttnq7"Quadtree
Quadtree recursively subdivides the map into quadrants:
class QuadTreeNode:
def __init__(self, lat_min, lat_max, lon_min, lon_max, capacity=4):
self.bounds = (lat_min, lat_max, lon_min, lon_max)
self.capacity = capacity
self.drivers = []
self.children = None
def insert(self, driver_id: str, lat: float, lon: float):
if not self._contains(lat, lon):
return False
if self.children is None and len(self.drivers) < self.capacity:
self.drivers.append((driver_id, lat, lon))
return True
# Subdivide and insert into child
if self.children is None:
self._subdivide()
for child in self.children:
if child.insert(driver_id, lat, lon):
return True
return False
def query_radius(self, lat: float, lon: float, radius_km: float) -> list:
results = []
for d_id, d_lat, d_lon in self.drivers:
dist = self._haversine(lat, lon, d_lat, d_lon)
if dist <= radius_km:
results.append((d_id, dist))
if self.children:
for child in self.children:
if self._bounds_overlap_circle(child.bounds, lat, lon, radius_km):
results.extend(child.query_radius(lat, lon, radius_km))
return sorted(results, key=lambda x: x[1])
def _contains(self, lat, lon):
return (self.bounds[0] <= lat <= self.bounds[1] and
self.bounds[2] <= lon <= self.bounds[3])
def _subdivide(self):
lat_mid = (self.bounds[0] + self.bounds[1]) / 2
lon_mid = (self.bounds[2] + self.bounds[3]) / 2
self.children = [
QuadTreeNode(self.bounds[0], lat_mid, self.bounds[2], lon_mid),
QuadTreeNode(self.bounds[0], lat_mid, lon_mid, self.bounds[3]),
QuadTreeNode(lat_mid, self.bounds[1], self.bounds[2], lon_mid),
QuadTreeNode(lat_mid, self.bounds[1], lon_mid, self.bounds[3]),
]
for d_id, lat, lon in self.drivers:
for child in self.children:
child.insert(d_id, lat, lon)
self.drivers = []
def _haversine(self, lat1, lon1, lat2, lon2):
import math
R = 6371
dlat = math.radians(lat2 - lat1)
dlon = math.radians(lon2 - lon1)
a = math.sin(dlat/2)**2 + math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) * math.sin(dlon/2)**2
return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
def _bounds_over_circle(self, bounds, clat, clon, radius):
# Simplified check — true if circle center within bounds or near
return (bounds[0] <= clat + radius/111 <= bounds[1] or
bounds[2] <= clon + radius/111 <= bounds[3])
root = QuadTreeNode(-90, 90, -180, 180)
for i in range(50):
root.insert(f"d-{i}", random.uniform(28.5, 28.7), random.uniform(77.1, 77.3))
nearby = root.query_radius(28.61, 77.23, 2.0)
print(f"Quadtree found {len(nearby)} drivers within 2km")Pricing Engine (Surge Pricing)
Surge pricing adjusts fares based on supply and demand:
class SurgePricingEngine:
def __init__(self):
self.demand_history = []
self.supply_history = []
def calculate_surge(self, zone_id: str, current_demand: int, current_supply: int) -> float:
self.demand_history.append(current_demand)
self.supply_history.append(current_supply)
ratio = current_demand / max(current_supply, 1)
if ratio < 1.5:
return 1.0 # No surge
if ratio < 2.0:
return 1.5 # 1.5x
if ratio < 3.0:
return 2.0 # 2x
return 3.0 # 3x max
engine = SurgePricingEngine()
scenarios = [
("Downtown", 120, 100), # Balanced
("Airport", 200, 40), # High demand, low supply
("Suburbs", 30, 50), # Low demand
]
for zone, demand, supply in scenarios:
surge = engine.calculate_surge(zone, demand, supply)
print(f"{zone}: demand={demand}, supply={supply} → {surge}x surge")Output:
Downtown: demand=120, supply=100 → 1.0x surge
Airport: demand=200, supply=40 → 3.0x surge
Suburbs: demand=30, supply=50 → 1.0x surgeETA Calculation
ETA combines:
- Route distance: From maps/routing API (OSRM, Google Maps)
- Historical speed: Per-road-segment averages at current time
- Real-time traffic: Live traffic data from other drivers
- Pickup time: Driver’s estimated arrival to rider location
def calculate_eta(driver_lat: float, driver_lon: float,
rider_lat: float, rider_lon: float) -> dict:
import math
# 1. Straight-line distance (as approximation)
R = 6371
dlat = math.radians(rider_lat - driver_lat)
dlon = math.radians(rider_lon - driver_lon)
a = math.sin(dlat/2)**2 + math.cos(math.radians(driver_lat)) * math.cos(math.radians(rider_lat)) * math.sin(dlon/2)**2
distance_km = R * 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))
# 2. Apply road network multiplier (typically 1.3-1.5x for city)
road_distance = distance_km * 1.4
# 3. Average speed (city: ~25 km/h)
avg_speed_kmh = 25
eta_minutes = (road_distance / avg_speed_kmh) * 60
return {
"distance_km": round(road_distance, 1),
"eta_minutes": round(eta_minutes, 0),
"speed_assumed": avg_speed_kmh
}
eta = calculate_eta(28.60, 77.22, 28.63, 77.25)
print(f"ETA: {eta['eta_minutes']} min, Distance: {eta['distance_km']} km")Trip History Storage
Trips are write-once, read-frequently for history:
| Data | Storage | Retention |
|---|---|---|
| Active trip | Redis (in-memory) | Duration of trip |
| Completed trip | PostgreSQL + S3 | 7 years (regulatory) |
| GPS trail | S3 (Parquet) | 90 days |
| Analytics | ClickHouse | 2 years |
Common Errors
Missing geospatial indexing: Linear search across all drivers doesn’t scale past a few thousand. Always use geohashing, quadtree, or S2 geometry.
WebSocket reconnection storms: When a cell tower goes down, hundreds of drivers reconnect simultaneously. Use exponential backoff and connection coalescing.
Surge pricing without smoothing: Instant surge updates at 3x shock users. Smooth price changes over 2-5 minutes. Show surge zones on the map, not per-request.
Ignoring ETA variance: Showing “5 min” when actual is 5-15 min destroys trust. Show range: “5-8 min” or add uncertainty indicator.
No trip deduplication: A rider tapping “Request” twice creates duplicate trips. Use idempotency keys on the client side with server-side dedup.
Mini Project
Build a ride-matching simulation:
import random, math, time
class RideMatchingSimulation:
def __init__(self):
self.drivers = {}
self.rides = []
def add_driver(self, driver_id: str, lat: float, lon: float):
self.drivers[driver_id] = {"lat": lat, "lon": lon, "available": True}
def request_ride(self, rider_id: str, lat: float, lon: float):
best_driver = None
best_dist = float("inf")
for d_id, d in self.drivers.items():
if not d["available"]:
continue
dist = math.sqrt((d["lat"] - lat)**2 + (d["lon"] - lon)**2) * 111
if dist < best_dist:
best_dist = dist
best_driver = d_id
if best_driver:
self.drivers[best_driver]["available"] = False
ride = {"rider": rider_id, "driver": best_driver, "distance_km": round(best_dist, 1)}
self.rides.append(ride)
return ride
return {"error": "No drivers available"}
sim = RideMatchingSimulation()
for i in range(5):
sim.add_driver(f"d{i}", 28.61 + random.uniform(-0.03, 0.03), 77.23 + random.uniform(-0.03, 0.03))
for i in range(3):
result = sim.request_ride(f"r{i}", 28.61 + random.uniform(-0.02, 0.02), 77.23 + random.uniform(-0.02, 0.02))
print(f"Ride {i+1}: {result}")Cross-References
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro