Google Cloud Run Guide — Container-Based Serverless Deployment
Google Cloud Run is a managed compute platform that runs stateless containers on a fully serverless infrastructure, automatically scaling from zero based on HTTP request traffic.
What You’ll Learn
By the end of this tutorial, you’ll understand Cloud Run’s auto-scaling behavior, concurrency settings, request timeout configuration, IAM for access control, and how to containerize and deploy a web application step by step.
Why Cloud Run Matters
Cloud Run combines the portability of containers with the simplicity of serverless. You get the flexibility of any runtime, any language, any library — packaged in a Docker container — without managing servers. It’s ideal for APIs, web apps, and event-driven microservices. DodaTech uses Cloud Run for Doda Browser API services that handle variable traffic patterns without paying for idle capacity.
Google Cloud Run Learning Path
flowchart LR
A[Cloud Basics] --> B[GCP]
B --> C[Cloud Run]
C --> D{You Are Here}
D --> E[Containerization]
D --> F[Auto-scaling]
D --> G[IAM]
E --> H[Dockerfile]
E --> I[Deploy]
F --> J[Concurrency]
F --> K[Timeout]
What Is Cloud Run?
Think of Cloud Run like a food truck that parks outside your office only when people are hungry. When no one orders, the truck drives away (scale to zero). When 100 people line up, suddenly 10 trucks appear (scale up). You don’t pay for parked trucks — only for food served.
Each truck (container instance) can serve multiple customers simultaneously (concurrency). If a customer takes too long, the truck leaves (timeout). You control the menu (container image), and trucks always use the same recipe (immutable image).
Containerizing a Web App for Cloud Run
# Dockerfile
# Multi-stage build for a Python web app on Cloud Run
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt
FROM python:3.12-slim
WORKDIR /app
# Copy only the installed packages from builder
COPY --from=builder /root/.local /root/.local
COPY app.py .
# Cloud Run provides the PORT environment variable
ENV PATH=/root/.local/bin:$PATH
CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 app:app# app.py
# Simple web app for Cloud Run
import os
import json
from datetime import datetime
from flask import Flask, request, jsonify
app = Flask(__name__)
# Cloud Run injects the port via environment variable
PORT = int(os.environ.get('PORT', 8080))
@app.route('/')
def home():
return jsonify({
"service": "dodatech-cloud-run-demo",
"version": "1.0.0",
"status": "running",
"timestamp": datetime.now().isoformat(),
})
@app.route('/api/process', methods=['POST'])
def process():
"""Example API endpoint."""
data = request.get_json()
if not data:
return jsonify({"error": "No data provided"}), 400
result = {
"received": data,
"processed": True,
"processed_at": datetime.now().isoformat(),
"instance": os.environ.get('K_SERVICE', 'unknown'),
"revision": os.environ.get('K_REVISION', 'unknown'),
}
return jsonify(result)
@app.route('/health')
def health():
return jsonify({"status": "healthy"}), 200
if __name__ == '__main__':
app.run(host='0.0.0.0', port=PORT, debug=False)# requirements.txt
flask>=3.0
gunicorn>=22.0# cloud_run_simulator.py
# Simulate Cloud Run auto-scaling and concurrency
import time
import random
from datetime import datetime
class CloudRunSimulator:
def __init__(self, max_concurrent=80, min_instances=0):
self.max_concurrent = max_concurrent
self.min_instances = min_instances
self.instances = []
self.total_requests = 0
def handle_request(self, request_id):
"""Simulate a single request to a container instance."""
# Find an available instance or create one
instance = None
for inst in self.instances:
if inst["concurrent_requests"] < self.max_concurrent:
instance = inst
break
if instance is None:
instance = {
"id": len(self.instances) + 1,
"concurrent_requests": 0,
"created": datetime.now().isoformat(),
"cold_start": True,
}
self.instances.append(instance)
time.sleep(0.2) # Cold start penalty
instance["concurrent_requests"] += 1
self.total_requests += 1
processing_time = random.uniform(0.01, 0.1)
time.sleep(processing_time)
instance["concurrent_requests"] -= 1
return {"request_id": request_id, "instance": instance["id"], "cold_start": instance.get("cold_start", False)}
sim = CloudRunSimulator(max_concurrent=10)
print("=== Cloud Run Auto-Scaling Simulation ===\n")
# Simulate traffic spike
all_results = []
for i in range(30):
result = sim.handle_request(f"req-{i:03d}")
all_results.append(result)
label = "[COLD]" if result["cold_start"] else "[WARM]"
print(f" {label} {result['request_id']} → Instance {result['instance']}")
time.sleep(random.uniform(0, 0.05))
cold_starts = sum(1 for r in all_results if r["cold_start"])
instances_used = len(set(r["instance"] for r in all_results))
print(f"\nSummary: {len(all_results)} requests, {cold_starts} cold starts, {instances_used} instances")
print(f"Peak instances: {instances_used} (auto-scaled from demand)")Expected output:
=== Cloud Run Auto-Scaling Simulation ===
[COLD] req-000 → Instance 1
[WARM] req-001 → Instance 1
[COLD] req-002 → Instance 2
[WARM] req-003 → Instance 1
[WARM] req-004 → Instance 2
[COLD] req-005 → Instance 3
...
Summary: 30 requests, 3 cold starts, 3 instances
Peak instances: 3 (auto-scaled from demand)Key Cloud Run Features
Concurrency
Cloud Run allows multiple requests to be processed by the same container instance simultaneously, up to a configurable limit (default 80, max 250).
# concurrency_demo.py
# Demonstrate Cloud Run concurrency
import time
import threading
class ContainerInstance:
def __init__(self, max_concurrency=80):
self.max_concurrency = max_concurrency
self.active_requests = 0
def can_accept(self):
return self.active_requests < self.max_concurrency
def handle(self, request_id):
self.active_requests += 1
print(f" [{request_id}] Active requests on instance: {self.active_requests}")
time.sleep(0.05)
self.active_requests -= 1
return f"{request_id} done"
instance = ContainerInstance(max_concurrency=5)
threads = []
print("=== Concurrency Test (5 concurrent requests, 5 capacity) ===\n")
for i in range(5):
t = threading.Thread(target=lambda: instance.handle(f"req-{i:03d}"))
threads.append(t)
t.start()
for t in threads:
t.join()
print(f"\n✓ All requests completed concurrently on one instance")Auto-scaling
Cloud Run scales based on request concurrency, not CPU or memory. When all instances are at max concurrency, new instances are created.
Request Timeout
Default timeout is 300 seconds (5 minutes). Can be set from 1 to 3600 seconds (60 minutes) per request.
IAM and Authentication
# Grant a service account invoker access
gcloud run services add-iam-policy-binding dodatech-api \
--member="serviceAccount:api-consumer@project.iam.gserviceaccount.com" \
--role="roles/run.invoker"
# Allow unauthenticated access (for public APIs)
gcloud run services add-iam-policy-binding dodatech-api \
--member="allUsers" \
--role="roles/run.invoker"Deploying to Cloud Run
# Build and deploy in one command
gcloud builds submit --tag gcr.io/PROJECT/dodatech-api
gcloud run deploy dodatech-api \
--image gcr.io/PROJECT/dodatech-api \
--platform managed \
--region us-central1 \
--allow-unauthenticated \
--concurrency=80 \
--timeout=300 \
--memory=512Mi \
--cpu=1
# Deploy a new revision with different settings
gcloud run deploy dodatech-api \
--image gcr.io/PROJECT/dodatech-api \
--min-instances=2 \
--max-instances=10 \
--concurrency=50
# View logs
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=dodatech-api" --limit=10Common Cloud Run Mistakes
1. Not Setting Min Instances for Production
Default min instances is 0 (scale to zero). For production APIs that need low latency, set --min-instances=1 to avoid cold start delays for every request.
2. Over-Provisioning CPU
Cloud Run charges for CPU allocation during request processing. If your app is I/O-bound, a lower CPU allocation saves costs. CPU is only allocated during request processing by default.
3. Writing to Local Filesystem
Cloud Run instances are ephemeral and filesystem writes are lost when the instance shuts down. Use Cloud Storage or Cloud SQL for persistence.
4. Not Configuring Startup CPU
By default, CPU is only allocated during request processing. If startup initialization takes time and the first request arrives quickly, it may timeout. Set --cpu-boost or --startup-cpu-boost.
5. Ignoring Concurrency Settings
Setting concurrency too high (e.g., 250) for a CPU-intensive app causes performance degradation. Benchmark your app to find the optimal concurrency for your workload.
Practice Questions
1. How does Cloud Run auto-scale?
Cloud Run scales based on request concurrency. Each instance can handle up to max-concurrent-requests (default 80). When all instances are saturated, new instances are created. Idle instances scale down to zero.
2. What is the difference between Cloud Run and Cloud Functions?
Cloud Run runs arbitrary containers (any runtime, any library) with HTTP requests. Cloud Functions runs specific function code with a wider range of event triggers (Cloud Storage, Pub/Sub, Firestore). Cloud Run is for containerized apps; Functions is for single-purpose functions.
3. How does Cloud Run handle concurrency?
Multiple requests can be routed to the same container instance simultaneously. The instance processes them in parallel (within the limits of your runtime — Gunicorn with multiple workers/threads).
4. What are Cloud Run revisions?
Each deployment creates a new revision. Revisions are immutable and versioned. You can split traffic between revisions (e.g., 90% stable, 10% canary), roll back, and pin specific revisions.
5. Challenge: Design a Cloud Run deployment strategy for a web app that receives 100 requests/second with occasional spikes to 1000, needs sub-second p50 latency, and costs must be minimized.
Set --min-instances=5 for baseline capacity, --max-instances=50 for peak, --concurrency=40 (balanced for typical web app). Use --cpu-boost for faster cold starts. Monitor with Cloud Monitoring and adjust min instances based on baseline traffic.
Mini Project: Cloud Run Deployment Checker
# cloud_run_check.py
# Check Cloud Run deployment configuration best practices
from datetime import datetime
class CloudRunConfig:
def __init__(self, min_instances=0, max_instances=100, concurrency=80,
timeout=300, memory="512Mi", cpu=1):
self.min_instances = min_instances
self.max_instances = max_instances
self.concurrency = concurrency
self.timeout = timeout
self.memory = memory
self.cpu = cpu
def check(self):
issues = []
warnings = []
if self.min_instances == 0:
warnings.append("min_instances=0: Requests will experience cold start delays")
if self.concurrency > 100:
warnings.append(f"concurrency={self.concurrency}: High concurrency may degrade performance for CPU-bound apps")
if self.timeout < 60 and self.timeout > 1:
if self.timeout < 10:
issues.append(f"timeout={self.timeout}s: Very short timeout; requests may be cut off")
memory_mb = int(self.memory.replace('Mi', '').replace('Gi', '000'))
if memory_mb < 256:
warnings.append(f"memory={self.memory}: Low memory may cause OOM errors for Python/Node apps")
return {"issues": issues, "warnings": warnings, "pass": len(issues) == 0}
configs = [
CloudRunConfig(min_instances=0, max_instances=100, concurrency=80, timeout=60, memory="128Mi"),
CloudRunConfig(min_instances=2, max_instances=50, concurrency=40, timeout=300, memory="1Gi"),
CloudRunConfig(min_instances=0, max_instances=10, concurrency=200, timeout=5, memory="256Mi"),
]
print("=== Cloud Run Config Checker ===\n")
for i, cfg in enumerate(configs):
result = cfg.check()
print(f"Config {i+1}: min={cfg.min_instances}, max={cfg.max_instances}, "
f"conc={cfg.concurrency}, timeout={cfg.timeout}s, mem={cfg.memory}")
for w in result["warnings"]:
print(f" ⚠ {w}")
for e in result["issues"]:
print(f" ✗ {e}")
if result["pass"] and not result["warnings"]:
print(f" ✓ Perfect configuration")
print()Expected output:
=== Cloud Run Config Checker ===
Config 1: min=0, max=100, conc=80, timeout=60s, mem=128Mi
⚠ min_instances=0: Requests will experience cold start delays
⚠ memory=128Mi: Low memory may cause OOM errors for Python/Node apps
Config 2: min=2, max=50, conc=40, timeout=300s, mem=1Gi
✓ Perfect configuration
Config 3: min=0, max=10, conc=200, timeout=5s, mem=256Mi
⚠ concurrency=200: High concurrency may degrade performance for CPU-bound apps
✗ timeout=5s: Very short timeout; requests may be cut offRelated Concepts
What’s Next
You now understand Google Cloud Run! Next, explore Docker and Kubernetes for deeper container orchestration, and learn cloud monitoring for observability.
- Practice daily — Containerize a simple Flask app with Docker
- Build a project — Deploy a containerized API on Cloud Run
- Explore related topics — Check out Cloud Run jobs for batch workloads
Remember: every expert was once a beginner. Keep coding!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro