Learn System: Load Balancing — Algorithms, Configurations, and Best Practices Explained

Load Balancing — Algorithms, Configurations, and Best Practices Explained

DodaTech Updated Jun 15, 2026 5 min read

A load balancer is a component that distributes incoming network traffic across multiple backend servers to ensure no single server bears too much demand, improving responsiveness and availability.

Why Load Balancing Matters

Without a load balancer, when a server goes down, all users connected to it lose access. When traffic spikes, a single server gets overwhelmed and slows to a crawl. Load balancers solve both problems — they detect failed servers and reroute traffic, and they spread requests evenly so capacity scales linearly as you add servers. Major e-commerce platforms run hundreds of load-balanced servers behind a single entry point.

Plain-Language Explanation

Imagine a bank with only one teller. When one customer takes a long time, everyone else waits. If that teller goes home sick, the bank closes. Now imagine the bank adds more tellers and a receptionist who sends each customer to the shortest line. That receptionist is your load balancer.

The load balancer sits between users and your servers. It receives every request, decides which server should handle it based on a scheduling algorithm, and forwards the request. If a server fails its health check, the load balancer stops sending traffic there until it recovers.


graph LR
    Users --> LB[Load Balancer]
    LB --> S1[Server A
Healthy]
    LB --> S2[Server B
Healthy]
    LB --> S3[Server C
DEGRADED]
    S3 -.->|Health Check Failed| LB
    Users2[Users] --> LB
    style LB fill:#e67e22,color:#fff
    style S1 fill:#27ae60,color:#fff
    style S2 fill:#27ae60,color:#fff
    style S3 fill:#e74c3c,color:#fff

Load Balancing Algorithms

Round Robin

Each server gets requests in a rotating sequence. Simple and works well when servers have equal capacity.

Server A → Request 1, 4, 7
Server B → Request 2, 5, 8
Server C → Request 3, 6, 9

Least Connections

Sends requests to the server with the fewest active connections. Better when requests vary in processing time — a server handling a long-lived websocket won’t get more work.

IP Hash

Hashes the client’s IP address to determine which server handles it. The same client always reaches the same server — useful for session persistence (sticky sessions) without storing session state in a shared database.

Weighted Variants

Assign weights to servers based on capacity. A server with weight 3 gets three times the traffic of a server with weight 1. Useful during migrations or when hardware is heterogeneous.

NGINX Configuration Example

Here’s a production-ready NGINX load balancer configuration:

# /etc/nginx/nginx.conf
upstream backend_servers {
    least_conn;
    server api-01.example.com weight=3 max_fails=3 fail_timeout=30s;
    server api-02.example.com weight=2 max_fails=3 fail_timeout=30s;
    server api-03.example.com backup;
}

server {
    listen 80;
    listen [::]:80;
    server_name api.example.com;

    location / {
        proxy_pass http://backend_servers;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_next_upstream error timeout invalid_header http_500;
    }

    location /health {
        access_log off;
        return 200 "healthy\n";
        add_header Content-Type text/plain;
    }
}

Expected behavior: Traffic distributes using least-connections. api-03 is a backup (only used if both others fail). If a server fails 3 health checks in 30 seconds, it’s removed from rotation. proxy_next_upstream retries on error codes.

Health Checks

Load balancers need to know which servers are alive and ready to accept traffic. Two common approaches:

Active health checks: The load balancer periodically sends requests (HTTP GET /health, TCP connect, ICMP ping) to each server. If the server doesn’t respond correctly within a timeout, it’s marked unhealthy.

Passive health checks: The load balancer monitors real traffic — if a server returns 5xx errors or times out repeatedly, it’s removed from rotation.

A good pattern is a dedicated /health endpoint that checks not just the web server but also its dependencies:

# health.py
from flask import Flask, jsonify
import redis, psycopg2

app = Flask(__name__)

@app.route('/health')
def health():
    status = {"status": "healthy", "checks": {}}
    try:
        r = redis.Redis()
        r.ping()
        status["checks"]["redis"] = "ok"
    except Exception as e:
        status["checks"]["redis"] = str(e)
        status["status"] = "degraded"
    try:
        conn = psycopg2.connect("dbname=test")
        conn.close()
        status["checks"]["database"] = "ok"
    except Exception as e:
        status["checks"]["database"] = str(e)
        status["status"] = "degraded"
    return jsonify(status), 200 if status["status"] == "healthy" else 503

Common Mistakes

No health checks: The load balancer sends traffic to dead servers, causing random errors. Always configure health checks.
Sticky sessions without understanding tradeoffs: IP hash can cause uneven load if many users share the same NAT gateway (common in offices).
No proxy protocol headers: Without X-Forwarded-For, the backend sees the load balancer’s IP instead of the real client IP, breaking logging and rate limiting.
Single load balancer: The load balancer itself is a single point of failure. Use a pair in active-passive configuration (e.g., keepalived with floating IP).
Not tuning timeouts: Default timeouts may be too short for slow database queries, causing the load balancer to mark healthy servers as failed.

Practice Questions

What is the difference between round robin and least connections? Round robin distributes equally regardless of current load. Least connections sends requests to the server with the fewest active connections, which handles variable-length requests better.
How does a load balancer detect a failed server? Through health checks — either active (periodic probes) or passive (monitoring real traffic errors).
Why would you use weighted round robin? When backend servers have different capacities (CPU, RAM, network). A server with weight 5 can handle 5x the traffic of a weight-1 server.
What is the role of X-Forwarded-For header? It preserves the original client IP address when traffic passes through a proxy or load balancer, so the backend can log, rate-limit, or geo-route correctly.
How does Layer 4 differ from Layer 7 load balancing? Layer 4 (transport) routes based on IP and port — faster, less overhead. Layer 7 (application) can inspect HTTP headers, cookies, and paths — more intelligent routing but slightly slower.

Mini Project

Set up a local NGINX load balancer with three simple Python HTTP servers. Create a test script to verify distribution:

# server.py — run three instances on ports 8001, 8002, 8003
from http.server import HTTPServer, BaseHTTPRequestHandler
import sys

class Handler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.end_headers()
        self.wfile.write(f"Server {sys.argv[1]}".encode())

port = int(sys.argv[1])
server = HTTPServer(('0.0.0.0', port), Handler)
print(f"Server running on {port}")
server.serve_forever()

# test_lb.py
import urllib.request

for i in range(10):
    resp = urllib.request.urlopen("http://localhost/")
    print(f"Request {i+1}: {resp.read().decode()}")

With NGINX configured to round-robin across localhost:8001, localhost:8002, localhost:8003, output should show alternating servers:

Request 1: Server 8001
Request 2: Server 8002
Request 3: Server 8003
Request 4: Server 8001
...

Cross-References

Previous System Design Overview — Complete Guide for Beginners Next Caching Strategies — Write-Through, Write-Around, Write-Back Explained with Examples

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse System Design & Architecture