Learn Linux: Monitoring and Logging — syslog, journalctl, logrotate, Prometheus, Grafana

Q: How do I structure structured logging?

Use JSON format with consistent field names: {"timestamp": "2026-06-20T10:00:00Z", "level": "ERROR", "service": "api", "message": "Connection refused", "request_id": "abc123"}. Your logging library likely has JSON formatting built in.

Linux Administration

Monitoring and Logging — syslog, journalctl, logrotate, Prometheus, Grafana

DodaTech Updated Jun 20, 2026 10 min read

Monitoring and logging are the eyes and ears of every production system. This guide covers traditional syslog, systemd’s journalctl, log management with logrotate, and modern metrics monitoring with Prometheus and Grafana.

What You’ll Learn

You’ll configure rsyslog for centralized log collection, query logs with journalctl, automate log rotation with logrotate, collect system metrics with Prometheus node_exporter, and visualize everything in Grafana dashboards. DodaZIP uses Prometheus+Grafana for real-time compression cluster monitoring, and Durga Antivirus Pro uses centralized syslog for threat event correlation.

Why Monitoring and Logging Matter

You can’t fix what you can’t see. Logs tell you what happened after an outage; metrics tell you what’s happening right now. Together, they provide observability — the ability to understand a system’s internal state from its external outputs. Without monitoring, you’re flying blind.

Learning Path

    flowchart LR
  A[Shell Scripting] --> B[Monitoring & Logging<br/>You are here]
  B --> C[Infrastructure Automation]
  C --> D[Container Orchestration]
  style B fill:#f90,color:#fff

syslog and rsyslog

Syslog is the standard logging protocol on Linux. Rsyslog is the modern high-performance implementation.

Rsyslog Configuration

# Main configuration
cat /etc/rsyslog.conf

# Common log facilities and priorities
# auth, authpriv, cron, daemon, kern, lpr, mail, news, syslog, user, uucp, local0-local7
# debug, info, notice, warning, err, crit, alert, emerg

# Default log destinations
cat /etc/rsyslog.d/50-default.conf
# *.info;mail.none;authpriv.none;cron.none   /var/log/messages
# authpriv.*                                  /var/log/secure
# mail.*                                      /var/log/maillog
# cron.*                                      /var/log/cron
# *.emerg                                     :omusrmsg:*

Centralized Logging Server

# On the log server (receiver)
# /etc/rsyslog.d/server.conf
module(load="imtcp")
module(load="imudp")
input(type="imtcp" port="514")
input(type="imudp" port="514")

# Template for organizing logs by host
$template RemoteLogs,"/var/log/remote/%HOSTNAME%/%PROGRAMNAME%.log"
*.* ?RemoteLogs

# On client machines (senders)
# /etc/rsyslog.d/client.conf
*.* @@logserver.dodatech.com:514   # TCP (reliable)
*.* @logserver.dodatech.com:514    # UDP (faster)

# Restart
sudo systemctl restart rsyslog

journalctl — Systemd Logging

Journald collects logs from all systemd services, the kernel, and syslog:

# View all logs
journalctl

# Follow new logs (like tail -f)
journalctl -f

# Last N lines
journalctl -n 50

# Logs since boot
journalctl -b

# Previous boot logs
journalctl -b -1

# Logs for a specific service
journalctl -u nginx
journalctl -u sshd.service

# Logs for a specific time
journalctl --since "1 hour ago"
journalctl --since "2026-06-20" --until "2026-06-21"

# Logs by priority
journalctl -p err               # Errors and above
journalctl -p warning           # Warnings and above

# JSON output
journalctl -o json-pretty

# Show only kernel messages
journalctl -k

# Export to file
journalctl -u myapp > /tmp/myapp.log

# Show disk usage
journalctl --disk-usage

# Vacuum (clean old logs)
sudo journalctl --vacuum-time=30d     # Keep 30 days
sudo journalctl --vacuum-size=500M    # Keep 500MB max

Expected journalctl --disk-usage:

Archived and active journals take up 256.0M in the file system.

Journald Configuration

# /etc/systemd/journald.conf
[Journal]
Storage=persistent            # Logs survive reboot
Compress=yes                  # Compress old logs
Seal=yes                      # Cryptographic sealing
SplitMode=uid                 # Split by user
SyncIntervalSec=5m
RateLimitIntervalSec=30s
RateLimitBurst=10000
SystemMaxUse=1G               # Max 1GB for system logs
MaxFileSec=1month             # Rotate monthly

logrotate — Log Rotation

Logrotate prevents logs from consuming all disk space by rotating, compressing, and deleting old logs:

# Main configuration
cat /etc/logrotate.conf

# Default settings
# rotate 4           # Keep 4 old logs
# weekly             # Rotate weekly
# create             # Create new log after rotation
# compress           # Compress old logs (gzip)
# include /etc/logrotate.d  # Include individual configs

Example logrotate Configurations

# /etc/logrotate.d/nginx
/var/log/nginx/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 640 nginx adm
    sharedscripts
    postrotate
        [ -f /var/run/nginx.pid ] && kill -USR1 `cat /var/run/nginx.pid`
    endscript
}

# /etc/logrotate.d/myapp
/var/log/myapp/*.log {
    daily
    rotate 30
    compress
    maxsize 100M          # Rotate if log reaches 100MB even if not daily
    minsize 1M            # Only rotate if > 1MB
    missingok
    notifempty
    dateext               # Add date to rotated file name
    create 640 myapp myapp
    postrotate
        systemctl reload myapp
    endscript
}

Testing Logrotate

# Dry run
sudo logrotate -d /etc/logrotate.conf

# Force rotation
sudo logrotate -f /etc/logrotate.conf

# Debug mode
sudo logrotate -d /etc/logrotate.d/nginx

Expected logrotate -d /etc/logrotate.d/nginx output:

reading config file /etc/logrotate.d/nginx
Allocating hash table for state file, size 15360 B

Handling 1 logs

rotating pattern: /var/log/nginx/*.log  daily (14 rotations)
empty log files are not rotated, old logs are removed
considering log /var/log/nginx/access.log
  Now: 2026-06-20 10:00
  Last rotated at 2026-06-19 10:00
  log needs rotating
rotating log /var/log/nginx/access.log, log->rotateCount is 14
...

Prometheus Node Exporter

Node exporter exposes system metrics (CPU, memory, disk, network) in Prometheus format:

# Install node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.0/node_exporter-1.8.0.linux-amd64.tar.gz
tar xzf node_exporter-1.8.0.linux-amd64.tar.gz
sudo mv node_exporter-1.8.0.linux-amd64/node_exporter /usr/local/bin/

# Create systemd service
sudo tee /etc/systemd/system/node_exporter.service <<EOF
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
Type=simple
User=node_exporter
Group=node_exporter
ExecStart=/usr/local/bin/node_exporter \
    --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/) \
    --collector.textfile.directory=/var/lib/node_exporter/textfile
Restart=always

[Install]
WantedBy=multi-user.target
EOF

# Create user and start
sudo useradd -rs /bin/false node_exporter
sudo systemctl enable --now node_exporter

Verify Metrics

# Check metrics endpoint
curl -s http://localhost:9100/metrics | head -20

Expected output:

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.
# TYPE node_cpu_seconds_total counter
node_cpu_seconds_total{cpu="0",mode="idle"} 1234567.89
node_cpu_seconds_total{cpu="0",mode="system"} 23456.78
node_cpu_seconds_total{cpu="0",mode="user"} 123456.78
# HELP node_disk_read_bytes_total The total number of bytes read successfully.
# TYPE node_disk_read_bytes_total counter
node_disk_read_bytes_total{device="sda"} 1234567890123
# HELP node_filesystem_avail_bytes Filesystem space available to non-root users.
# TYPE node_filesystem_avail_bytes gauge
node_filesystem_avail_bytes{device="/dev/sda1",fstype="ext4",mountpoint="/"} 123456789012

Prometheus Configuration

# /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets:
        - 'web-01.dodatech.com:9100'
        - 'web-02.dodatech.com:9100'
        - 'db-01.dodatech.com:9100'
        - 'app-01.dodatech.com:9100'

Grafana Dashboards

Grafana visualizes Prometheus metrics in customizable dashboards:

# Install Grafana
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get install -y grafana

sudo systemctl enable --now grafana-server

Useful PromQL Queries for Dashboards

# CPU usage percentage
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage percentage
100 * (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)

# Disk usage percentage
100 * (node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_avail_bytes{mountpoint="/"})
    / node_filesystem_size_bytes{mountpoint="/"}

# Network receive rate (bytes/sec)
rate(node_network_receive_bytes_total{device="eth0"}[5m])

# Disk I/O rate
rate(node_disk_read_bytes_total{device="sda"}[5m])

# System load average
node_load15

# Uptime in days
time() - node_boot_time_seconds

Import Node Exporter Full Dashboard

Grafana dashboard ID 1860 is the standard Node Exporter Full dashboard. Import it via:

Grafana UI → Create → Import → Enter 1860 → Load

Setting Up Alerts

# prometheus-alert.yml — Alert rules
groups:
  - name: node_alerts
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 5m
        annotations:
          summary: "Instance {{ $labels.instance }} down"
          description: "{{ $labels.instance }} has been down for more than 5 minutes."

      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 10m
        annotations:
          summary: "High CPU on {{ $labels.instance }}"
          description: "CPU usage is above 80% for 10 minutes."

      - alert: DiskSpaceLow
        expr: 100 * (node_filesystem_size_bytes{mountpoint="/"} - node_filesystem_avail_bytes{mountpoint="/"})
          / node_filesystem_size_bytes{mountpoint="/"} > 90
        for: 5m
        annotations:
          summary: "Low disk space on {{ $labels.instance }}"
          description: "Disk usage is above 90%."

Common Monitoring Mistakes

1. Not Setting Retention Policies

Logs and metrics grow without bounds. Set logrotate for logs, retention_time in Prometheus, and dashboard data retention in Grafana. Monitor storage usage.

2. Monitoring Everything

More metrics don’t equal better observability. Focus on the Four Golden Signals: latency, traffic, errors, and saturation. Start with 10 key metrics and expand as needed.

3. No Alert Fatigue Management

Too many alerts cause alert fatigue — critical alerts get ignored. Set appropriate thresholds, use for: duration to avoid flapping, and route alerts by severity.

4. Ignoring Log Compression

Uncompressed logs consume 5-10x more disk space. Enable compression in logrotate and journald. Archive old logs to cold storage.

5. Scraping Too Frequently

Prometheus scraping every 5 seconds on 1000 targets generates significant load. Default 15s is usually sufficient. Adjust based on metric churn.

6. Not Having a Logging Strategy

Logs are useless if you can’t find them in a crisis. Structure logs as JSON with consistent fields: timestamp, level, service, message, request_id. Use centralized logging for multi-server environments.

7. Dashboard Sprawl

Hundreds of dashboards nobody maintains or looks at. Keep dashboards focused — one per service or team. Archive unused dashboards.

Practice Questions

1. What’s the difference between syslog and journald? Syslog is the traditional text-based logging system, using rsyslog for routing. Journald is systemd’s structured binary logging with metadata (PID, priority, boot ID). They can work together — journald forwards to rsyslog.

2. How does logrotate prevent disk full issues? It rotates logs (renames + creates new), compresses old logs, removes logs older than the retention period, and can trigger reload signals to services so they write to the new file.

3. What’s the difference between Prometheus and Graphite? Prometheus pulls metrics via HTTP (pull model), has built-in alerting, and uses dimensional labels. Graphite uses push model with dot-separated metric names. Prometheus is more modern and widely adopted.

4. What are the Four Golden Signals of monitoring? Latency (response time), traffic (requests per second), errors (error rate), saturation (resource utilization). These four signals give a complete picture of system health.

5. Challenge: A server runs out of disk space every 3 months due to logs. Design a comprehensive log management strategy that prevents this and provides 90-day retention. Answer: Configure logrotate for all services with weekly rotation, 14 compressed copies, and maxsize limits. Set journald SystemMaxUse=2G. Add Prometheus node_exporter alert at 80% disk usage. Archive logs older than 30 days to S3/Glacier for remaining retention.

Mini Project: Server Monitoring Stack

Create a script that sets up a complete monitoring stack:

#!/bin/bash
# setup_monitoring.sh — Install and configure monitoring stack
# Usage: sudo ./setup_monitoring.sh

set -euo pipefail

echo "=== Setting Up Monitoring Stack ==="

# 1. Install and configure node_exporter
echo "Installing node_exporter..."
NODE_EXPORTER_VERSION="1.8.0"
cd /tmp
wget -q "https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz"
tar xzf "node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz"
sudo mv "node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64/node_exporter" /usr/local/bin/

sudo useradd -rs /bin/false node_exporter 2>/dev/null || true

# Create systemd service
cat << 'SERVICEEOF' | sudo tee /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
Type=simple
User=node_exporter
ExecStart=/usr/local/bin/node_exporter
Restart=always

[Install]
WantedBy=multi-user.target
SERVICEEOF

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

# 2. Install Prometheus
echo "Installing Prometheus..."
PROMETHEUS_VERSION="2.53.0"
cd /tmp
wget -q "https://github.com/prometheus/prometheus/releases/download/v${PROMETHEUS_VERSION}/prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
tar xzf "prometheus-${PROMETHEUS_VERSION}.linux-amd64.tar.gz"
sudo mv "prometheus-${PROMETHEUS_VERSION}.linux-amd64" /etc/prometheus

# Create prometheus config
cat << 'PROMEOF' | sudo tee /etc/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
PROMEOF

sudo useradd -rs /bin/false prometheus 2>/dev/null || true
sudo chown -R prometheus:prometheus /etc/prometheus

cat << 'SERVICEEOF' | sudo tee /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/etc/prometheus/prometheus --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus/
Restart=always

[Install]
WantedBy=multi-user.target
SERVICEEOF

sudo systemctl daemon-reload
sudo systemctl enable --now prometheus

# 3. Install Grafana
echo "Installing Grafana..."
sudo apt-get install -y -qq software-properties-common
sudo add-apt-repository -y "deb https://packages.grafana.com/oss/deb stable main" 2>/dev/null
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add - 2>/dev/null
sudo apt-get update -qq
sudo apt-get install -y -qq grafana
sudo systemctl enable --now grafana-server

# 4. Configure logrotate
cat << 'LOGEOF' | sudo tee /etc/logrotate.d/custom
/var/log/*.log {
    weekly
    rotate 12
    compress
    delaycompress
    missingok
    notifempty
    create 640 root root
}
LOGEOF

echo ""
echo "=== Monitoring Stack Installed ==="
echo "Node Exporter: http://localhost:9100/metrics"
echo "Prometheus: http://localhost:9090"
echo "Grafana: http://localhost:3000 (admin/admin)"
echo ""
echo "Add the Prometheus data source in Grafana to start graphing."

FAQ

What’s the difference between monitoring and observability?

Monitoring tells you if something is wrong (known unknowns). Observability lets you understand why (unknown unknowns). Monitoring uses predefined dashboards and alerts. Observability provides the tools to explore and question.

How much log retention do I need?

Depends on compliance requirements. Minimum: 30 days for operational debugging, 90 days for security analysis, 1+ year for compliance (PCI-DSS, SOC2, HIPAA). Archive older logs to cold storage.

Should I use a SaaS monitoring solution?

SaaS (Datadog, New Relic, Grafana Cloud) reduces operational overhead but costs scale with data volume. Self-hosted Prometheus+Grafana is cost-effective for predictable workloads.

How do I monitor Docker containers?

Use cAdvisor for container metrics, Prometheus with cAdvisor as a target, and Grafana dashboards. For Kubernetes, use Prometheus Operator with kube-state-metrics and node_exporter.

What’s the best alerting strategy?

Use the “fire” vs “water” approach: critical alerts page someone immediately (fire), warnings go to a dashboard that’s checked daily (water). Avoid alert fatigue by tuning thresholds.

How do I structure structured logging?

Use JSON format with consistent field names:

{"timestamp": "2026-06-20T10:00:00Z", "level": "ERROR", "service": "api", "message": "Connection refused", "request_id": "abc123"}

. Your logging library likely has JSON formatting built in.

What’s Next

Networking Commands Deep Dive

Shell Scripting Guide

Cron Jobs

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.

Previous Shell Scripting Guide — Variables, Conditionals, Loops, Functions, Error Handling Next grep Command in Linux — 10 Practical Examples

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Linux Administration