Learn System: Microservices Communication — REST, gRPC, Events, and Service Mesh Patterns

Microservices Communication — REST, gRPC, Events, and Service Mesh Patterns

DodaTech Updated Jun 20, 2026 7 min read

Microservices communication patterns define how independent services exchange data — synchronously via REST or gRPC, or asynchronously through events and message queues. Choosing the right pattern directly impacts system resilience, latency, scalability, and coupling. This guide covers the full spectrum from direct HTTP calls to service mesh sidecars.

Why Communication Patterns Matter

In a monolith, function calls are instant and reliable. In microservices, a call between services involves network I/O, serialization, potential failures, and latency. Netflix handles 2+ billion API edge requests daily through a sophisticated mix of synchronous and async communication. A badly chosen pattern creates a “distributed monolith” — services that are as coupled as a monolith but with network overhead. At DodaTech, microservices communication patterns orchestrate the backend for Doda Browser cloud sync and DodaZIP collaboration features.

Communication Architecture

    graph TB
    Client[Client] --> GW[API Gateway]
    GW -->|REST| US[User Service]
    GW -->|gRPC| OS[Order Service]
    GW -->|Async Events| Queue[Message Queue]
    Queue --> IS[Inventory Service]
    Queue --> PS[Payment Service]
    Queue --> NS[Notification Service]
    subgraph Mesh[Service Mesh]
        US ---|mTLS| SP1[Sidecar Proxy]
        OS ---|mTLS| SP2[Sidecar Proxy]
        SP1 <-->|mTLS| SP2
    end
    style GW fill:#e67e22,color:#fff
    style Queue fill:#3498db,color:#fff
    style Mesh fill:#9b59b6,color:#fff

Synchronous Protocols

REST over HTTP

The simplest approach. Services expose HTTP endpoints and call each other using standard HTTP methods. Benefits: simple, well-understood, works with any language. Drawbacks: coupled, blocking, high latency for chains.

# Order service calling payment service via REST
import httpx
import asyncio

async def process_order(order: dict) -> dict:
    async with httpx.AsyncClient() as client:
        # Call payment service synchronously
        payment_resp = await client.post(
            "http://payment-service:8001/charge",
            json={"order_id": order["id"], "amount": order["total"]}
        )
        payment_resp.raise_for_status()

        # Call inventory service
        await client.post(
            "http://inventory-service:8002/reserve",
            json={"order_id": order["id"], "items": order["items"]}
        )

        return {"status": "confirmed", "payment_id": payment_resp.json()["id"]}

result = asyncio.run(process_order({"id": "ORD-123", "total": 99.99, "items": ["widget"]}))
print(f"Order result: {result}")

gRPC

High-performance RPC using Protocol Buffers and HTTP/2. Supports streaming, bi-directional communication, and strict typing. Used for internal service-to-service communication where performance matters.

// order.proto
syntax = "proto3";

service OrderService {
    rpc CreateOrder (CreateOrderRequest) returns (OrderResponse);
    rpc StreamOrderUpdates (OrderFilter) returns (stream OrderEvent);
}

message CreateOrderRequest {
    string user_id = 1;
    repeated Item items = 2;
    double total = 3;
}

message OrderResponse {
    string order_id = 1;
    string status = 2;
}

# gRPC client
import grpc
import order_pb2
import order_pb2_grpc

channel = grpc.insecure_channel('order-service:50051')
stub = order_pb2_grpc.OrderServiceStub(channel)

response = stub.CreateOrder(order_pb2.CreateOrderRequest(
    user_id="USR-42",
    items=[order_pb2.Item(product_id="widget", qty=2, price=9.99)],
    total=19.98
))
print(f"Order created: {response.order_id}, status: {response.status}")

Asynchronous Communication

Events and messages decouple services. The producer publishes an event without knowing who consumes it. This enables independent scaling, failure isolation, and new subscribers without modifying producers.

# Event-driven communication with Kafka
from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers=['kafka:9092'],
    value_serializer=lambda v: json.dumps(v).encode()
)

# Publish event — no knowledge of consumers
producer.send('order_events', {
    "type": "OrderPlaced",
    "order_id": "ORD-123",
    "user_id": "USR-42",
    "total": 99.99
})
producer.flush()
print("OrderPlaced event published — inventory, payment, and notification will react independently")

API Gateway Patterns

An API gateway is a single entry point for all clients. It handles routing, authentication, rate limiting, and response aggregation.

Gateway routing — clients call the gateway, which routes to internal services. The gateway knows the service topology; clients don’t.

Gateway aggregation — the gateway calls multiple services and combines responses. Useful for dashboards that need data from user + order + recommendation services.

# FastAPI gateway with aggregation
from fastapi import FastAPI
import httpx

app = FastAPI()

@app.get("/dashboard/{user_id}")
async def get_dashboard(user_id: str):
    async with httpx.AsyncClient() as client:
        # Fan-out to multiple services
        user_resp = client.get(f"http://user-service/users/{user_id}")
        order_resp = client.get(f"http://order-service/orders?user_id={user_id}")
        rec_resp = client.get(f"http://recommendation-service/recs?user_id={user_id}")

        user, orders, recs = await asyncio.gather(user_resp, order_resp, rec_resp)

        return {
            "user": user.json(),
            "recent_orders": orders.json(),
            "recommendations": recs.json(),
        }

Circuit Breaker with Retry and Exponential Backoff

When a downstream service fails, the circuit breaker trips and subsequent calls fail fast. After a cooldown period, the breaker allows a probe request. If it succeeds, the circuit closes.

import time
import random

class CircuitBreaker:
    def __init__(self, threshold: int = 5, recovery: float = 30.0):
        self.threshold = threshold
        self.recovery = recovery
        self.failures = 0
        self.last_fail = 0.0
        self.state = "closed"

    async def call(self, func, *args, **kwargs):
        if self.state == "open":
            if time.time() - self.last_fail > self.recovery:
                self.state = "half-open"
            else:
                raise Exception("Circuit breaker open — request rejected")

        try:
            result = await func(*args, **kwargs)
            if self.state == "half-open":
                self.state = "closed"
                self.failures = 0
            return result
        except Exception as e:
            self.failures += 1
            self.last_fail = time.time()
            if self.failures >= self.threshold:
                self.state = "open"
            raise e

# Retry with exponential backoff
async def call_with_retry(func, max_retries: int = 3):
    for attempt in range(max_retries):
        try:
            return await func()
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Attempt {attempt + 1} failed, retrying in {wait:.1f}s...")
            await asyncio.sleep(wait)

cb = CircuitBreaker(threshold=3, recovery=10)

async def unreliable_service():
    raise Exception("Downstream timeout")

# Test circuit breaker
for i in range(6):
    try:
        await cb.call(unreliable_service)
    except Exception as e:
        print(f"Attempt {i + 1}: {e}")

Expected output:

Attempt 1: Downstream timeout
Attempt 2: Downstream timeout
Attempt 3: Downstream timeout
Attempt 4: Circuit breaker open — request rejected
Attempt 5: Circuit breaker open — request rejected
Attempt 6: Circuit breaker open — request rejected

Service Mesh (Istio, Linkerd)

A service mesh offloads communication concerns (retries, circuit breaking, mTLS, observability) to a sidecar proxy. The application code only contains business logic.

Istio uses Envoy proxies injected alongside each service pod. It provides:

mTLS — automatic encryption between all services
Traffic splitting — canary deployments, A/B testing
Observability — metrics, traces, logs per service call
Circuit breaking — configurable via CRDs

# Istio VirtualService for traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: order-service
spec:
  hosts:
  - order-service
  http:
  - route:
    - destination:
        host: order-service
        subset: v1
      weight: 90
    - destination:
        host: order-service
        subset: v2
      weight: 10

Common Errors

Synchronous call chains: Service A calls B calls C calls D. A failure anywhere in the chain cascades. If D is slow, all upstream services hold connections. Use async communication for long chains.
No timeout on HTTP calls: Default HTTP clients often wait 60-120 seconds. A slow downstream service can exhaust connection pools. Always set aggressive timeouts (e.g., 5 seconds).
Shared database across services: Multiple services reading/writing the same database creates tight coupling. Each service should own its data and expose an API.
Ignoring idempotency: Retries (from circuit breakers or timeouts) send duplicate requests. Every mutating endpoint should be idempotent or use idempotency keys.
No bulkhead pattern: A failure in one service shouldn’t exhaust resources needed by others. Use separate connection pools or thread pools per downstream service.
Over-reliance on service mesh: A service mesh adds latency (sidecar overhead) and operational complexity. Not every service needs it. Use it selectively for high-value resilience patterns.
No fallback in API gateway: If a downstream service fails, the gateway should return a degraded response (e.g., cached data, partial results) instead of failing completely.

Practice Questions

1. When should you use gRPC over REST?

When you need high performance, streaming, or strict typing. gRPC uses HTTP/2 with multiplexed streams and Protocol Buffers for efficient serialization. Best for internal service-to-service communication.

2. What is the difference between a circuit breaker and a retry?

A retry assumes the failure is transient and tries again. A circuit breaker assumes the failure will persist and fails fast to prevent cascading. They work well together — retry a few times, then open the circuit.

3. How does a service mesh improve inter-service communication?

It offloads retry, circuit breaking, mTLS, and observability to sidecar proxies. Application code contains only business logic. The mesh handles all cross-cutting communication concerns.

4. Challenge: Design a saga with compensating transactions.

Create an orchestrated saga for order processing: reserve inventory → process payment → confirm order. Each step publishes an event. If payment fails, emit a PaymentFailed event. The inventory service listens and releases the reservation.

Mini Project

Build a resilient API gateway with circuit breaker and retry:

import asyncio, time, random

class RetryCircuitBreaker:
    def __init__(self, retries=3, fail_threshold=3, recovery=10):
        self.retries = retries
        self.fail_threshold = fail_threshold
        self.recovery = recovery
        self.failures = 0
        self.state = "closed"

    async def call(self, func):
        if self.state == "open":
            if time.time() - self.last_fail > self.recovery:
                self.state = "half-open"
            else:
                raise Exception("Service unavailable (circuit open)")

        for attempt in range(self.retries):
            try:
                result = await func()
                if self.state == "half-open":
                    self.state = "closed"
                    self.failures = 0
                return result
            except Exception as e:
                if attempt < self.retries - 1:
                    wait = 2 ** attempt + random.uniform(0, 0.5)
                    await asyncio.sleep(wait)
                else:
                    self.failures += 1
                    self.last_fail = time.time()
                    if self.failures >= self.fail_threshold:
                        self.state = "open"
                    raise e

gateway = RetryCircuitBreaker()
async def test():
    for i in range(8):
        try:
            result = await gateway.call(lambda: asyncio.sleep(0.1) or "Success")
            print(f"Call {i+1}: {result}")
        except Exception as e:
            print(f"Call {i+1}: {e}")

asyncio.run(test())

Cross-References

Previous Distributed Caching — Redis Cluster, Memcached, and Multi-Tier Cache Strategies Next Event-Driven Architecture — Event Sourcing, CQRS, and Pub/Sub with Kafka Examples

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse System Design & Architecture