Learn Build a Complete Docker Monitoring Stack with Prometheus, Grafana, and cAdvisor (Step by Step)

Build a Complete Docker Monitoring Stack with Prometheus, Grafana, and cAdvisor (Step by Step)

DodaTech Updated Jun 20, 2026 6 min read

Build a complete Docker monitoring stack using Prometheus, Grafana, cAdvisor, and Node Exporter with docker-compose, custom dashboards, and alerting rules.

What You’ll Build

You’ll deploy a production-grade monitoring stack for Docker containers that collects CPU, memory, disk, and network metrics from every running container and the host machine, visualizes them in Grafana dashboards, and sends alerts when things go wrong. This is the same monitoring architecture used internally by DodaTech to track DodaZIP conversion servers and Durga Antivirus Pro update services.

Why Monitoring Stacks Matter

You cannot fix what you cannot see. Containerized applications are ephemeral — they start, crash, and restart without warning. A monitoring stack gives you visibility into resource usage, capacity planning, and incident detection. Prometheus + Grafana is the industry standard for cloud-native monitoring, used by companies from startups to Netflix.

Prerequisites

Docker and Docker Compose installed
Basic understanding of YAML
Ports 9090, 3000, 8080, 9100 available

Step 1: Project Structure

mkdir docker-monitoring
cd docker-monitoring
mkdir prometheus grafana

Step 2: Docker Compose Configuration

# docker-compose.yml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=30d'
    ports:
      - '9090:9090'
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_INSTALL_PLUGINS=grafana-piechart-panel
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    ports:
      - '3000:3000'
    depends_on:
      - prometheus
    restart: unless-stopped

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - '8080:8080'
    privileged: true
    restart: unless-stopped

  node_exporter:
    image: prom/node-exporter:latest
    container_name: node_exporter
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--rootfs=/rootfs'
    ports:
      - '9100:9100'
    restart: unless-stopped

volumes:
  prometheus_data:
  grafana_data:

What each service does:

Prometheus — scrapes and stores metrics every 15 seconds, with 30-day retention
Grafana — visualizes metrics with dashboards, auto-provisions data sources
cAdvisor — Google’s container advisor, exposes per-container resource metrics
Node Exporter — exposes host-level metrics (CPU, memory, disk, network)

Step 3: Prometheus Configuration

# prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # Add Alertmanager later: "alertmanager:9093"

rule_files:
  - "alerts.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

  - job_name: 'node_exporter'
    static_configs:
      - targets: ['node_exporter:9100']

# prometheus/alerts.yml
groups:
  - name: docker_alerts
    rules:
      - alert: HighCpuUsage
        expr: sum(rate(container_cpu_usage_seconds_total{name!=""}[5m])) by (name) > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.name }} CPU usage above 80%"

      - alert: HighMemoryUsage
        expr: (container_memory_usage_bytes{name!=""} / container_spec_memory_limit_bytes{name!=""}) > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.name }} memory usage above 85%"

      - alert: ContainerDown
        expr: time() - container_last_seen{name!=""} > 60
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Container {{ $labels.name }} has not been seen for 60 seconds"

      - alert: NodeHighDiskUsage
        expr: (1 - (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"})) > 0.9
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Host disk usage above 90%"

Expected output: Prometheus scrapes metrics from all four targets every 15 seconds. The alerting rules evaluate every 15 seconds. When a condition persists for the for duration, the alert fires.

Step 4: Grafana Auto-Provisioning

# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true

# grafana/provisioning/dashboards/dashboard.yml
apiVersion: 1

providers:
  - name: 'Docker Monitoring'
    orgId: 1
    folder: ''
    type: file
    disableDeletion: false
    editable: true
    options:
      path: /etc/grafana/provisioning/dashboards

Step 5: Launch the Stack

docker-compose up -d

Verify each service:

Prometheus: http://localhost:9090/targets — all four targets should be UP
cAdvisor: http://localhost:8080 — per-container metrics page
Node Exporter: http://localhost:9100/metrics — raw host metrics
Grafana: http://localhost:3000 — login with admin/admin

Step 6: Create a Dashboard in Grafana

Click + → Import Dashboard
Enter Dashboard ID 893 (Docker Monitoring) or 179 (Node Exporter Full)
Select the Prometheus data source
Click Import

To build a custom panel manually:

Create → Dashboard → Add new panel
Query: rate(container_cpu_usage_seconds_total{name!=""}[1m])
Legend: {{ name }}
Unit: Percent (0-1)
Panel type: Time series

Expected output: A dashboard showing CPU usage lines for each container, updating every 15 seconds.

Architecture


graph TD
    subgraph "Host Machine"
        A[cAdvisor] -->|Per-container metrics| B[Prometheus]
        C[Node Exporter] -->|Host metrics| B
    end
    
    B -->|Scrape every 15s| B
    B -->|Store metrics| D[(Prometheus TSDB)]
    
    E[Grafana] -->|Query data source| B
    E -->|Dashboards| F[User Browser]
    
    B -->|Evaluate rules| G[Alert Manager]
    G -->|Send alerts| H[Email / Slack / PagerDuty]
    
    style B fill:#e6562b,color:white
    style E fill:#f4722b,color:white
    style G fill:#e6562b,color:white

Common Errors

1. cAdvisor fails with “cgroups: cannot find cgroup mount destination” Run with --privileged flag and mount /sys/fs/cgroup. The compose file uses privileged: true. On some systems, add --pid=host to the cadvisor service.

2. Prometheus targets show “connection refused” Services must resolve each other by container name. Ensure all services are in the same Docker network. Docker Compose creates a default network. If services can’t reach each other, add networks: explicitly.

3. Grafana “data source not found” error The provisioning file references http://prometheus:9090. If Prometheus has a different container name, update the URL. Check Grafana logs: docker logs grafana.

4. Metrics show gaps in time series Prometheus scrapes every 15s. If a gap appears, the target was unreachable for a scrape interval. Check network stability. For long gaps, increase scrape_timeout or check target load.

5. Alert rules never fire Alerts need to persist for the for duration plus evaluation interval. A spike under 5 minutes won’t trigger HighCpuUsage. Use for: 0s for immediate alerts. Also verify the expression returns results in the Prometheus expression browser.

Practice Questions

1. What is the difference between cAdvisor and Node Exporter? cAdvisor exposes per-container metrics (CPU, memory, network per container). Node Exporter exposes host-level metrics (total CPU, memory, disk, network interfaces). You need both for complete visibility.

2. Why does Prometheus use a pull model instead of push? Pull (scraping) lets Prometheus control the collection interval, detect dead targets immediately, and avoid overwhelming the server during traffic spikes. Push-based systems like Graphite can be flooded.

3. How long does Prometheus retain data by default? The default is 15 days. Our config sets 30 days with --storage.tsdb.retention.time=30d. For longer retention, add object storage (Thanos, Cortex) or increase disk.

4. Challenge: Add Alertmanager Add an alertmanager service to the compose file. Configure it to send Slack notifications when HighCpuUsage fires. Use Prometheus’s alerting.alertmanagers config. Test by running a CPU-stress container.

5. Challenge: Custom container labels Add --container_labels to cAdvisor to include Docker labels in metrics. Label your containers with service=api, tier=backend in docker-compose. Create a Grafana panel grouped by container_label_service.

FAQ

Can I use this stack in production?

Yes, with additions: persistent Grafana volumes, Alertmanager for notifications, authentication for Grafana, and resource limits on the monitoring containers themselves. This stack handles a few dozen containers well. For hundreds, use Thanos for horizontal scaling.

How much resources does the monitoring stack use?

Prometheus uses ~1-2 GB RAM for 10k time series. cAdvisor uses ~50 MB per host. Node Exporter uses ~20 MB. Grafana uses ~100 MB. Allocate at least 2 GB RAM total for the stack.

Can I monitor multiple Docker hosts?

Yes. Deploy cAdvisor and Node Exporter on each host. Add each host as a separate target in prometheus.yml using DNS or service discovery. Prometheus supports file-based, Consul, and Kubernetes service discovery.

Next Steps

Add Alertmanager for Slack/PagerDuty notifications
Explore Grafana Loki for log aggregation
Containerize your own apps and monitor them with this stack
Check the Docker fundamentals tutorial for deeper container knowledge
Try the Kubernetes deployment tutorial for monitoring at scale

Previous Build a Full-Stack Note-Taking App with React, Express, and MongoDB (Step by Step) Next Build an API Rate Limiter with Redis and Express (Step by Step)

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Build Projects