Build a Complete Docker Monitoring Stack with Prometheus, Grafana, and cAdvisor (Step by Step)
Build a complete Docker monitoring stack using Prometheus, Grafana, cAdvisor, and Node Exporter with docker-compose, custom dashboards, and alerting rules.
What You’ll Build
You’ll deploy a production-grade monitoring stack for Docker containers that collects CPU, memory, disk, and network metrics from every running container and the host machine, visualizes them in Grafana dashboards, and sends alerts when things go wrong. This is the same monitoring architecture used internally by DodaTech to track DodaZIP conversion servers and Durga Antivirus Pro update services.
Why Monitoring Stacks Matter
You cannot fix what you cannot see. Containerized applications are ephemeral — they start, crash, and restart without warning. A monitoring stack gives you visibility into resource usage, capacity planning, and incident detection. Prometheus + Grafana is the industry standard for cloud-native monitoring, used by companies from startups to Netflix.
Prerequisites
- Docker and Docker Compose installed
- Basic understanding of YAML
- Ports 9090, 3000, 8080, 9100 available
Step 1: Project Structure
mkdir docker-monitoring
cd docker-monitoring
mkdir prometheus grafanaStep 2: Docker Compose Configuration
# docker-compose.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
ports:
- '9090:9090'
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: grafana
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=admin
- GF_INSTALL_PLUGINS=grafana-piechart-panel
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
ports:
- '3000:3000'
depends_on:
- prometheus
restart: unless-stopped
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
ports:
- '8080:8080'
privileged: true
restart: unless-stopped
node_exporter:
image: prom/node-exporter:latest
container_name: node_exporter
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--rootfs=/rootfs'
ports:
- '9100:9100'
restart: unless-stopped
volumes:
prometheus_data:
grafana_data:What each service does:
- Prometheus — scrapes and stores metrics every 15 seconds, with 30-day retention
- Grafana — visualizes metrics with dashboards, auto-provisions data sources
- cAdvisor — Google’s container advisor, exposes per-container resource metrics
- Node Exporter — exposes host-level metrics (CPU, memory, disk, network)
Step 3: Prometheus Configuration
# prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
# Add Alertmanager later: "alertmanager:9093"
rule_files:
- "alerts.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
- job_name: 'node_exporter'
static_configs:
- targets: ['node_exporter:9100']# prometheus/alerts.yml
groups:
- name: docker_alerts
rules:
- alert: HighCpuUsage
expr: sum(rate(container_cpu_usage_seconds_total{name!=""}[5m])) by (name) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.name }} CPU usage above 80%"
- alert: HighMemoryUsage
expr: (container_memory_usage_bytes{name!=""} / container_spec_memory_limit_bytes{name!=""}) > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.name }} memory usage above 85%"
- alert: ContainerDown
expr: time() - container_last_seen{name!=""} > 60
for: 1m
labels:
severity: critical
annotations:
summary: "Container {{ $labels.name }} has not been seen for 60 seconds"
- alert: NodeHighDiskUsage
expr: (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) > 0.9
for: 5m
labels:
severity: critical
annotations:
summary: "Host disk usage above 90%"Expected output: Prometheus scrapes metrics from all four targets every 15 seconds. The alerting rules evaluate every 15 seconds. When a condition persists for the for duration, the alert fires.
Step 4: Grafana Auto-Provisioning
# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true# grafana/provisioning/dashboards/dashboard.yml
apiVersion: 1
providers:
- name: 'Docker Monitoring'
orgId: 1
folder: ''
type: file
disableDeletion: false
editable: true
options:
path: /etc/grafana/provisioning/dashboardsStep 5: Launch the Stack
docker-compose up -dVerify each service:
- Prometheus:
http://localhost:9090/targets— all four targets should be UP - cAdvisor:
http://localhost:8080— per-container metrics page - Node Exporter:
http://localhost:9100/metrics— raw host metrics - Grafana:
http://localhost:3000— login with admin/admin
Step 6: Create a Dashboard in Grafana
- Click + → Import Dashboard
- Enter Dashboard ID 893 (Docker Monitoring) or 179 (Node Exporter Full)
- Select the Prometheus data source
- Click Import
To build a custom panel manually:
- Create → Dashboard → Add new panel
- Query:
rate(container_cpu_usage_seconds_total{name!=""}[1m]) - Legend:
{{ name }} - Unit:
Percent (0-1) - Panel type: Time series
Expected output: A dashboard showing CPU usage lines for each container, updating every 15 seconds.
Architecture
graph TD
subgraph "Host Machine"
A[cAdvisor] -->|Per-container metrics| B[Prometheus]
C[Node Exporter] -->|Host metrics| B
end
B -->|Scrape every 15s| B
B -->|Store metrics| D[(Prometheus TSDB)]
E[Grafana] -->|Query data source| B
E -->|Dashboards| F[User Browser]
B -->|Evaluate rules| G[Alert Manager]
G -->|Send alerts| H[Email / Slack / PagerDuty]
style B fill:#e6562b,color:white
style E fill:#f4722b,color:white
style G fill:#e6562b,color:white
Common Errors
1. cAdvisor fails with “cgroups: cannot find cgroup mount destination”
Run with --privileged flag and mount /sys/fs/cgroup. The compose file uses privileged: true. On some systems, add --pid=host to the cadvisor service.
2. Prometheus targets show “connection refused”
Services must resolve each other by container name. Ensure all services are in the same Docker network. Docker Compose creates a default network. If services can’t reach each other, add networks: explicitly.
3. Grafana “data source not found” error
The provisioning file references http://prometheus:9090. If Prometheus has a different container name, update the URL. Check Grafana logs: docker logs grafana.
4. Metrics show gaps in time series
Prometheus scrapes every 15s. If a gap appears, the target was unreachable for a scrape interval. Check network stability. For long gaps, increase scrape_timeout or check target load.
5. Alert rules never fire
Alerts need to persist for the for duration plus evaluation interval. A spike under 5 minutes won’t trigger HighCpuUsage. Use for: 0s for immediate alerts. Also verify the expression returns results in the Prometheus expression browser.
Practice Questions
1. What is the difference between cAdvisor and Node Exporter? cAdvisor exposes per-container metrics (CPU, memory, network per container). Node Exporter exposes host-level metrics (total CPU, memory, disk, network interfaces). You need both for complete visibility.
2. Why does Prometheus use a pull model instead of push? Pull (scraping) lets Prometheus control the collection interval, detect dead targets immediately, and avoid overwhelming the server during traffic spikes. Push-based systems like Graphite can be flooded.
3. How long does Prometheus retain data by default?
The default is 15 days. Our config sets 30 days with --storage.tsdb.retention.time=30d. For longer retention, add object storage (Thanos, Cortex) or increase disk.
4. Challenge: Add Alertmanager
Add an alertmanager service to the compose file. Configure it to send Slack notifications when HighCpuUsage fires. Use Prometheus’s alerting.alertmanagers config. Test by running a CPU-stress container.
5. Challenge: Custom container labels
Add --container_labels to cAdvisor to include Docker labels in metrics. Label your containers with service=api, tier=backend in docker-compose. Create a Grafana panel grouped by container_label_service.
FAQ
Next Steps
- Add Alertmanager for Slack/PagerDuty notifications
- Explore Grafana Loki for log aggregation
- Containerize your own apps and monitor them with this stack
- Check the Docker fundamentals tutorial for deeper container knowledge
- Try the Kubernetes deployment tutorial for monitoring at scale
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro