Learn Design Uber: Ride-Sharing System Architecture

Design Uber: Ride-Sharing System Architecture

DodaTech Updated Jun 20, 2026 8 min read

Designing a ride-sharing platform like Uber requires building a real-time location-based system that matches millions of riders with nearby drivers, calculates ETAs, adjusts prices dynamically, and handles trip lifecycle — all within seconds.

What You’ll Learn

You’ll master WebSocket-based GPS tracking, geohashing and quadtree geospatial indexing, ride matching algorithms, surge pricing models, ETA calculation with routing engines, and trip history storage at scale.

Why This Problem Matters

Uber processes 25 million trips daily across 70+ countries. The system combines real-time streaming, geospatial indexing, marketplace economics, and machine learning. Every ride involves 30+ microservice calls — from driver matching to payment processing. At DodaTech, geolocation patterns from Uber optimize location-aware features in Doda Browser.

System Design Learning Path

    flowchart LR
  A[System Design Overview] --> B[Pastebin]
  B --> C[Video Streaming]
  C --> D[Ride Sharing]
  D --> E{You Are Here}
  E --> F[Social Media Feed]
  E --> G[E-Commerce]
  style E fill:#f90,color:#fff

Architecture Overview

    flowchart TB
    Rider[Rider App] -->|Request ride| API[API Gateway]
    Driver[Driver App] -->|GPS WebSocket| WS[WebSocket Service]
    WS --> DM[Driver Matching Service]
    DM --> Geo[Geospatial Index]
    Geo --> Redis[(Redis - Driver Locations)]
    DM --> Trips[(Trips DB)]
    API --> Pricing[Surge Pricing Engine]
    API --> ETA[ETA Service]
    API --> Payment[Payment Service]
    Pricing --> Analytics[(Analytics - Spark)]
    ETA --> Maps[Maps/Routing API]
    DM --> Notif[Push Notification]

Location Tracking via WebSocket

Drivers send GPS coordinates continuously. The WebSocket connection maintains a persistent bi-directional stream.

Frequency	State	Purpose
Every 4 seconds	Active trip	Precise ETA, route tracking
Every 30 seconds	Idle/available	Driver discovery
On demand	Any	Location for matching

import asyncio, json, random, time

class DriverLocationService:
    def __init__(self):
        self.drivers = {}  # driver_id -> {lat, lon, status}

    async def update_location(self, driver_id: str, lat: float, lon: float):
        self.drivers[driver_id] = {
            "lat": lat,
            "lon": lon,
            "status": self.drivers.get(driver_id, {}).get("status", "available"),
            "updated_at": time.time()
        }

    async def get_nearby_drivers(self, lat: float, lon: float, radius_km: float) -> list:
        nearby = []
        for d_id, loc in self.drivers.items():
            if loc["status"] != "available":
                continue
            dist = self._haversine(lat, lon, loc["lat"], loc["lon"])
            if dist <= radius_km:
                nearby.append({"driver_id": d_id, "distance_km": round(dist, 2)})
        return sorted(nearby, key=lambda x: x["distance_km"])

    def _haversine(self, lat1, lon1, lat2, lon2):
        import math
        R = 6371  # Earth radius in km
        dlat = math.radians(lat2 - lat1)
        dlon = math.radians(lon2 - lon1)
        a = math.sin(dlat/2)**2 + math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) * math.sin(dlon/2)**2
        return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))

# Simulation
service = DriverLocationService()
for i in range(10):
    asyncio.run(service.update_location(f"driver-{i}", 28.61 + random.uniform(-0.05, 0.05), 77.23 + random.uniform(-0.05, 0.05)))

nearby = asyncio.run(service.get_nearby_drivers(28.61, 77.23, 3.0))
print(f"Found {len(nearby)} nearby drivers:")
for d in nearby[:5]:
    print(f"  {d['driver_id']} — {d['distance_km']}km away")

Output:

Found 10 nearby drivers:
  driver-3 — 1.2km away
  driver-7 — 1.8km away
  driver-1 — 2.1km away
  driver-5 — 2.5km away
  driver-9 — 2.8km away

Ride Matching: Geohashing and Quadtree

Brute-force computing distance to every driver doesn’t scale. Uber uses geospatial indexing:

Geohashing

Geohash converts lat/lon into a short string. Nearby locations share prefixes:

Location A (28.61, 77.23) → geohash: "ttnq5"
Location B (28.62, 77.24) → geohash: "ttnq7"

Quadtree

Quadtree recursively subdivides the map into quadrants:

class QuadTreeNode:
    def __init__(self, lat_min, lat_max, lon_min, lon_max, capacity=4):
        self.bounds = (lat_min, lat_max, lon_min, lon_max)
        self.capacity = capacity
        self.drivers = []
        self.children = None

    def insert(self, driver_id: str, lat: float, lon: float):
        if not self._contains(lat, lon):
            return False
        if self.children is None and len(self.drivers) < self.capacity:
            self.drivers.append((driver_id, lat, lon))
            return True
        # Subdivide and insert into child
        if self.children is None:
            self._subdivide()
        for child in self.children:
            if child.insert(driver_id, lat, lon):
                return True
        return False

    def query_radius(self, lat: float, lon: float, radius_km: float) -> list:
        results = []
        for d_id, d_lat, d_lon in self.drivers:
            dist = self._haversine(lat, lon, d_lat, d_lon)
            if dist <= radius_km:
                results.append((d_id, dist))
        if self.children:
            for child in self.children:
                if self._bounds_overlap_circle(child.bounds, lat, lon, radius_km):
                    results.extend(child.query_radius(lat, lon, radius_km))
        return sorted(results, key=lambda x: x[1])

    def _contains(self, lat, lon):
        return (self.bounds[0] <= lat <= self.bounds[1] and
                self.bounds[2] <= lon <= self.bounds[3])

    def _subdivide(self):
        lat_mid = (self.bounds[0] + self.bounds[1]) / 2
        lon_mid = (self.bounds[2] + self.bounds[3]) / 2
        self.children = [
            QuadTreeNode(self.bounds[0], lat_mid, self.bounds[2], lon_mid),
            QuadTreeNode(self.bounds[0], lat_mid, lon_mid, self.bounds[3]),
            QuadTreeNode(lat_mid, self.bounds[1], self.bounds[2], lon_mid),
            QuadTreeNode(lat_mid, self.bounds[1], lon_mid, self.bounds[3]),
        ]
        for d_id, lat, lon in self.drivers:
            for child in self.children:
                child.insert(d_id, lat, lon)
        self.drivers = []

    def _haversine(self, lat1, lon1, lat2, lon2):
        import math
        R = 6371
        dlat = math.radians(lat2 - lat1)
        dlon = math.radians(lon2 - lon1)
        a = math.sin(dlat/2)**2 + math.cos(math.radians(lat1)) * math.cos(math.radians(lat2)) * math.sin(dlon/2)**2
        return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))

    def _bounds_over_circle(self, bounds, clat, clon, radius):
        # Simplified check — true if circle center within bounds or near
        return (bounds[0] <= clat + radius/111 <= bounds[1] or
                bounds[2] <= clon + radius/111 <= bounds[3])

root = QuadTreeNode(-90, 90, -180, 180)
for i in range(50):
    root.insert(f"d-{i}", random.uniform(28.5, 28.7), random.uniform(77.1, 77.3))
nearby = root.query_radius(28.61, 77.23, 2.0)
print(f"Quadtree found {len(nearby)} drivers within 2km")

Pricing Engine (Surge Pricing)

Surge pricing adjusts fares based on supply and demand:

class SurgePricingEngine:
    def __init__(self):
        self.demand_history = []
        self.supply_history = []

    def calculate_surge(self, zone_id: str, current_demand: int, current_supply: int) -> float:
        self.demand_history.append(current_demand)
        self.supply_history.append(current_supply)
        ratio = current_demand / max(current_supply, 1)
        if ratio < 1.5:
            return 1.0  # No surge
        if ratio < 2.0:
            return 1.5  # 1.5x
        if ratio < 3.0:
            return 2.0  # 2x
        return 3.0  # 3x max

engine = SurgePricingEngine()
scenarios = [
    ("Downtown", 120, 100),   # Balanced
    ("Airport", 200, 40),     # High demand, low supply
    ("Suburbs", 30, 50),      # Low demand
]
for zone, demand, supply in scenarios:
    surge = engine.calculate_surge(zone, demand, supply)
    print(f"{zone}: demand={demand}, supply={supply} → {surge}x surge")

Output:

Downtown: demand=120, supply=100 → 1.0x surge
Airport: demand=200, supply=40 → 3.0x surge
Suburbs: demand=30, supply=50 → 1.0x surge

ETA Calculation

ETA combines:

Route distance: From maps/routing API (OSRM, Google Maps)
Historical speed: Per-road-segment averages at current time
Real-time traffic: Live traffic data from other drivers
Pickup time: Driver’s estimated arrival to rider location

def calculate_eta(driver_lat: float, driver_lon: float,
                  rider_lat: float, rider_lon: float) -> dict:
    import math
    # 1. Straight-line distance (as approximation)
    R = 6371
    dlat = math.radians(rider_lat - driver_lat)
    dlon = math.radians(rider_lon - driver_lon)
    a = math.sin(dlat/2)**2 + math.cos(math.radians(driver_lat)) * math.cos(math.radians(rider_lat)) * math.sin(dlon/2)**2
    distance_km = R * 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))

    # 2. Apply road network multiplier (typically 1.3-1.5x for city)
    road_distance = distance_km * 1.4

    # 3. Average speed (city: ~25 km/h)
    avg_speed_kmh = 25
    eta_minutes = (road_distance / avg_speed_kmh) * 60

    return {
        "distance_km": round(road_distance, 1),
        "eta_minutes": round(eta_minutes, 0),
        "speed_assumed": avg_speed_kmh
    }

eta = calculate_eta(28.60, 77.22, 28.63, 77.25)
print(f"ETA: {eta['eta_minutes']} min, Distance: {eta['distance_km']} km")

Trip History Storage

Trips are write-once, read-frequently for history:

Data	Storage	Retention
Active trip	Redis (in-memory)	Duration of trip
Completed trip	PostgreSQL + S3	7 years (regulatory)
GPS trail	S3 (Parquet)	90 days
Analytics	ClickHouse	2 years

Common Errors

Missing geospatial indexing: Linear search across all drivers doesn’t scale past a few thousand. Always use geohashing, quadtree, or S2 geometry.
WebSocket reconnection storms: When a cell tower goes down, hundreds of drivers reconnect simultaneously. Use exponential backoff and connection coalescing.
Surge pricing without smoothing: Instant surge updates at 3x shock users. Smooth price changes over 2-5 minutes. Show surge zones on the map, not per-request.
Ignoring ETA variance: Showing “5 min” when actual is 5-15 min destroys trust. Show range: “5-8 min” or add uncertainty indicator.
No trip deduplication: A rider tapping “Request” twice creates duplicate trips. Use idempotency keys on the client side with server-side dedup.

Practice Questions

1. How does Uber match riders to drivers at scale?

Uber uses a geospatial index (geohash or H3 hexagons) to find available drivers near a rider. The matching service considers distance, driver rating, vehicle type, and surge pricing, then sends push notifications to the top 3-5 drivers.

2. What happens when a driver accepts but then cancels?

The trip is marked as cancelled, the driver’s cancellation rate increases (affecting their earnings), and the rider is re-matched with the next best driver. A cancellation fee may apply depending on timing.

3. How do you ensure drivers don’t game the surge pricing system?

Uber uses anomaly detection to identify drivers clustering near surge zones without moving. The pricing model accounts for driver density per geohash, not per coordinate, and has minimum movement thresholds.

4. Challenge: Design a batch ride allocation system for Uber Pool.

Use route matching algorithms (Held-Karp, savings algorithm). Group riders whose routes overlap by >70%. Optimize for total travel time, not shortest path. Recalculate in real-time as new riders join.

Mini Project

Build a ride-matching simulation:

import random, math, time

class RideMatchingSimulation:
    def __init__(self):
        self.drivers = {}
        self.rides = []

    def add_driver(self, driver_id: str, lat: float, lon: float):
        self.drivers[driver_id] = {"lat": lat, "lon": lon, "available": True}

    def request_ride(self, rider_id: str, lat: float, lon: float):
        best_driver = None
        best_dist = float("inf")
        for d_id, d in self.drivers.items():
            if not d["available"]:
                continue
            dist = math.sqrt((d["lat"] - lat)**2 + (d["lon"] - lon)**2) * 111
            if dist < best_dist:
                best_dist = dist
                best_driver = d_id
        if best_driver:
            self.drivers[best_driver]["available"] = False
            ride = {"rider": rider_id, "driver": best_driver, "distance_km": round(best_dist, 1)}
            self.rides.append(ride)
            return ride
        return {"error": "No drivers available"}

sim = RideMatchingSimulation()
for i in range(5):
    sim.add_driver(f"d{i}", 28.61 + random.uniform(-0.03, 0.03), 77.23 + random.uniform(-0.03, 0.03))
for i in range(3):
    result = sim.request_ride(f"r{i}", 28.61 + random.uniform(-0.02, 0.02), 77.23 + random.uniform(-0.02, 0.02))
    print(f"Ride {i+1}: {result}")

Cross-References

Previous Design YouTube/Netflix: Video Streaming Architecture Next Design Twitter/Instagram: Social Media Feed Architecture

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse System Design Problems