Microservices Design Patterns — Service Discovery, Circuit Breaker, Saga, CQRS Explained
Microservices design patterns are reusable solutions to common challenges in distributed systems — service discovery, fault tolerance, data consistency, and inter-service communication — that emerge when breaking a monolith into independently deployable services.
Why Microservices Patterns Matter
A monolith is simple to build but hard to scale. As your team and codebase grow, deployment coordination becomes a nightmare, a bug in any module can crash everything, and scaling requires replicating the entire application. Microservices solve these problems but introduce new ones: how do services find each other? How do you handle a failing service? How do you maintain data consistency across services? These patterns — proven at companies like Netflix, Amazon, and Uber — provide battle-tested answers.
graph TD
Client --> GW[API Gateway]
GW --> Auth[Auth Service]
GW --> Users[User Service]
GW --> Orders[Order Service]
GW --> Inventory[Inventory Service]
GW --> Payments[Payment Service]
Auth --> SD[Service Discovery]
Users --> SD
Orders --> SD
Inventory --> SD
Payments --> SD
Orders --> CB[Circuit Breaker]
CB --> Payments
Orders --> Saga[Saga Orchestrator]
Saga --> Payments
Saga --> Inventory
style GW fill:#e67e22,color:#fff
style SD fill:#9b59b6,color:#fff
style CB fill:#e74c3c,color:#fff
style Saga fill:#3498db,color:#fff
API Gateway Pattern
A single entry point for all client requests. The gateway routes to appropriate services, handles authentication, rate limiting, and response aggregation.
# Simple API gateway using FastAPI
from fastapi import FastAPI, Request
import httpx
app = FastAPI()
SERVICES = {
"users": "http://user-service:8001",
"orders": "http://order-service:8002",
"inventory": "http://inventory-service:8003",
}
@app.api_route("/{service}/{path:path}", methods=["GET", "POST", "PUT", "DELETE"])
async def gateway(service: str, path: str, request: Request):
if service not in SERVICES:
return {"error": "Service not found"}, 404
backend_url = f"{SERVICES[service]}/{path}"
async with httpx.AsyncClient() as client:
resp = await client.request(
method=request.method,
url=backend_url,
headers=dict(request.headers),
params=dict(request.query_params),
)
return resp.json(), resp.status_codeService Discovery
Services register themselves with a registry (Consul, etcd, ZooKeeper) so other services can find them without hardcoded addresses.
# service registration with Consul
import consul
c = consul.Consul(host='consul.example.com')
# Register this service
c.agent.service.register(
name='order-service',
service_id='order-service-v1-instance-3',
address='10.0.1.42',
port=8002,
check=consul.Check().http('http://10.0.1.42:8002/health', interval='10s')
)
# Discover other services
services = c.agent.services()
for service_id, info in services.items():
if info['Service'] == 'payment-service':
print(f"Payment service at {info['Address']}:{info['Port']}")Circuit Breaker Pattern
When a downstream service fails, the circuit breaker trips and subsequent calls fail fast instead of waiting for timeouts. After a cooldown, a half-open state probes for recovery.
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Failing — reject fast
HALF_OPEN = "half_open" # Testing recovery
class CircuitBreaker:
def __init__(self, failure_threshold: int = 5, recovery_timeout: float = 30):
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.failure_count = 0
self.state = CircuitState.CLOSED
self.last_failure_time = 0
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = CircuitState.HALF_OPEN
else:
raise Exception("Circuit breaker is OPEN — request rejected")
try:
result = func(*args, **kwargs)
if self.state == CircuitState.HALF_OPEN:
self.state = CircuitState.CLOSED
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
raise e
breaker = CircuitBroken(failure_threshold=3, recovery_timeout=5)
def unreliable_service():
raise Exception("Service unavailable")
for i in range(6):
try:
breaker.call(unreliable_service)
except Exception as e:
print(f"Attempt {i+1}: {e}")Expected output:
Attempt 1: Service unavailable
Attempt 2: Service unavailable
Attempt 3: Service unavailable
Attempt 4: Circuit breaker is OPEN — request rejected
Attempt 5: Circuit breaker is OPEN — request rejected
Attempt 6: Circuit breaker is OPEN — request rejectedSaga Pattern
A saga is a sequence of local transactions where each step publishes an event triggering the next step. If a step fails, the saga runs compensating transactions to undo previous steps.
class OrderSaga:
def create_order(self, order_id: str, amount: float):
steps = [
("reserve_inventory", self.reserve_inventory, self.release_inventory),
("process_payment", self.process_payment, self.refund_payment),
("confirm_order", self.confirm_order, self.cancel_order),
]
completed = []
try:
for name, action, _ in steps:
print(f"Executing: {name}")
action(order_id, amount)
completed.append(name)
print("Order completed successfully")
except Exception as e:
print(f"Failed at {name}: {e}. Rolling back...")
for step_name, _, compensate in reversed(steps[:len(completed)]):
print(f"Compensating: {step_name}")
compensate(order_id, amount)
def reserve_inventory(self, oid, amt): pass
def release_inventory(self, oid, amt): pass
def process_payment(self, oid, amt): raise Exception("Payment declined")
def refund_payment(self, oid, amt): pass
def confirm_order(self, oid, amt): pass
def cancel_order(self, oid, amt): pass
saga = OrderSaga()
saga.create_order("ORD-123", 99.99)Expected output:
Executing: reserve_inventory
Executing: process_payment
Failed at process_payment: Payment declined. Rolling back...
Compensating: process_payment
Compensating: reserve_inventoryCQRS and Event Sourcing
CQRS separates read and write models into different services. Event sourcing stores all state changes as an append-only event log.
# Command side (write)
class OrderCommandService:
def place_order(self, user_id: str, items: list):
event = {"type": "order_placed", "user_id": user_id, "items": items}
self.event_store.append(event) # Append to event log
return event
# Query side (read — optimized for reads)
class OrderQueryService:
def get_order_summary(self, user_id: str):
# Read from a materialized view, not the event store
return self.read_db.query("SELECT * FROM order_summary WHERE user_id = ?", user_id)When Microservices vs Monolith
Start with a monolith. Don’t add microservices complexity until you need it. Consider microservices when:
- Your team grows beyond 10-15 developers
- Deployment coordination takes days
- Different parts of the system need different scaling
- You need to use different tech stacks for different subsystems
Common Mistakes
Distributed monolith: Services that are tightly coupled and can’t be deployed independently. No real benefit over a monolith.
Shared database: Multiple services sharing the same database creates coupling. Each service should own its data.
Over-engineering: Starting with microservices for a 3-developer team. The operational overhead (monitoring, deployment, networking) dwarfs any benefit.
Synchronous chains: Service A calls B, B calls C, C calls D. A failure anywhere cascades. Use async communication and circuit breakers.
Ignoring data consistency: Without proper saga patterns, distributed transactions leave data in inconsistent states.
Practice Questions
What problem does the API gateway solve? Single entry point for authentication, routing, rate limiting, and response aggregation. Clients don’t need to know about individual services.
How does a circuit breaker differ from a retry? A retry assumes the failure is transient. A circuit breaker assumes the failure will persist and fails fast to prevent cascading.
When should you use the saga pattern? When a business transaction spans multiple services and you need ACID-like guarantees without a distributed transaction.
What is CQRS and when is it useful? Separating read and write models. Useful when reads and writes have very different patterns (e.g., writes are normalized, reads are denormalized for fast queries).
Why should you start with a monolith? Microservices introduce operational complexity (networking, monitoring, deployments) that adds cost without benefit until you genuinely need independent scaling and deployment.
Mini Project
Build a circuit breaker wrapper for API calls:
import time, random
class CircuitBreaker:
def __init__(self, fail_max: int = 3, reset_timeout: float = 10):
self.fail_max = fail_max
self.reset_timeout = reset_timeout
self.failures = 0
self.last_fail = 0
self.state = "closed"
def __call__(self, func):
def wrapper(*args, **kwargs):
if self.state == "open":
if time.time() - self.last_fail > self.reset_timeout:
self.state = "half-open"
else:
raise Exception("Circuit breaker open")
try:
result = func(*args, **kwargs)
if self.state == "half-open":
self.state = "closed"
self.failures = 0
return result
except:
self.failures += 1
self.last_fail = time.time()
if self.failures >= self.fail_max:
self.state = "open"
raise
return wrapper
cb = CircuitBreaker()
@cb
def fragile_api():
if random.random() < 0.7:
raise Exception("API error")
return "Success"
for i in range(10):
try:
result = fragile_api()
print(f"Call {i+1}: {result}")
except Exception as e:
print(f"Call {i+1}: {e}")
time.sleep(0.5)Cross-References
- System Design Overview
- Message Queues
- Event-Driven Architecture
- API Design
- Load Balancing
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro