Learn DevOps: Centralized Logging: ELK Stack, Loki, and Best Practices

Q: How do I handle multi-line logs (stack traces)?

: Configure Filebeat’s multiline setting to combine lines until a pattern matches: multiline.pattern: '^\d{4}-\d{2}-\d{2}' (match new log lines starting with a timestamp).

DevOps & Cloud

Centralized Logging: ELK Stack, Loki, and Best Practices

DodaTech Updated Jun 20, 2026 8 min read

Centralized logging aggregates logs from every server, container, and service into a single searchable platform — enabling debugging, auditing, and alerting across your entire infrastructure.

What You’ll Learn

Log aggregation architecture with Elasticsearch, Logstash, Kibana (ELK)
Setting up Grafana Loki for cloud-native logging
Structured logging with JSON format and log levels
Log rotation, retention policies, and cost optimization strategies

Why Centralized Logging Matters

When you have 50 microservices spread across 20 servers, logs are scattered everywhere. Debugging an issue means SSHing into multiple machines, grepping through files, and correlating timestamps manually. Centralized logging brings all logs into one place with full-text search, filtering, and alerting. DodaTech uses the ELK Stack for Durga Antivirus Pro’s backend services — every API call, database query, and error is indexed and searchable within seconds.

    flowchart LR
    A[Monitoring Basics] --> B[Centralized Logging]
    B --> C[Log Shipping]
    B --> D[Indexing & Storage]
    B --> E[Search & Visualization]
    B --> F[Alerting]
    C --> G[Filebeat / Fluentd]
    D --> H[Elasticsearch / Loki]
    E --> I[Kibana / Grafana]
    style B fill:#005571,color:#fff

Prerequisites: Basic Linux administration and Bash skills. Familiarity with Prometheus and Grafana is helpful.

ELK Stack Architecture

The ELK Stack consists of Elasticsearch (storage and search), Logstash (processing), and Kibana (visualization). Filebeat ships logs from servers.

Docker Compose Setup

# docker-compose.elk.yml
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    environment:
      - discovery.type=single-node
      - ES_JAVA_OPTS=-Xms1g -Xmx1g
      - xpack.security.enabled=false
    ports:
      - "9200:9200"
    volumes:
      - esdata:/usr/share/elasticsearch/data

  logstash:
    image: docker.elastic.co/logstash/logstash:8.11.0
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    ports:
      - "5000:5000"

  kibana:
    image: docker.elastic.co/kibana/kibana:8.11.0
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200

volumes:
  esdata:

Logstash Configuration

# logstash.conf
input {
  beats {
    port => 5000
  }
}

filter {
  grok {
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}" }
  }

  if [level] == "ERROR" {
    mutate { add_tag => ["error"] }
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "logs-%{+YYYY.MM.dd}"
  }
}

Output: Logs flow from Filebeat → Logstash → Elasticsearch. Logstash parses, enriches, and structures logs before indexing. Kibana queries Elasticsearch for visualization.

Shipping Logs with Filebeat

# filebeat.yml
filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /var/log/*.log
      - /var/log/nginx/access.log

output.logstash:
  hosts: ["logstash:5000"]

# Optional: output directly to Elasticsearch
output.elasticsearch:
  hosts: ["http://elasticsearch:9200"]

# Start Filebeat with custom config
filebeat -e -c filebeat.yml

# Output:
# 2024/01/15 10:30:00.123456 beat is running
# 2024/01/15 10:30:01.234567 Harvester started for file: /var/log/nginx/access.log
# 2024/01/15 10:30:01.345678 Events sent: 47

Grafana Loki

Loki is a log aggregation system designed for Kubernetes and cloud-native environments. Unlike Elasticsearch, Loki indexes only metadata labels, not the full log content — making it cheaper and faster.

# docker-compose.loki.yml
services:
  loki:
    image: grafana/loki:2.9.0
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml

  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - /var/log:/var/log
      - ./promtail.yml:/etc/promtail/config.yml
    command: -config.file=/etc/promtail/config.yml

  grafana:
    image: grafana/grafana:latest
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
    ports:
      - "3000:3000"

# promtail.yml
scrape_configs:
  - job_name: nginx
    static_configs:
      - targets: [localhost]
        labels:
          job: nginx
          __path__: /var/log/nginx/*.log

  - job_name: system
    static_configs:
      - targets: [localhost]
        labels:
          job: varlogs
          __path__: /var/log/*.log

Output: Promtail reads log files, attaches labels, and pushes to Loki. Grafana’s Explore tab queries Loki with LogQL — a PromQL-inspired query language for logs.

LogQL Queries

# Find all ERROR logs in the nginx job
{job="nginx"} |= "ERROR"

# Count errors per minute over the last hour
sum by (level) (rate({job="nginx"} |= "ERROR"[5m]))

# Find logs containing a specific request ID
{job="api"} |= "req-abc123"

# Output: streaming log lines with timestamps and labels
# 2024-06-20T10:30:00Z {job="nginx"} 192.168.1.1 - - [20/...] "GET /api" 500

Structured Logging

Write logs as JSON for machine parsing:

{
  "timestamp": "2024-06-20T10:30:00Z",
  "level": "ERROR",
  "service": "user-service",
  "request_id": "req-abc123",
  "message": "Database connection timeout",
  "duration_ms": 5234,
  "user_id": "user-456",
  "stack_trace": "TimeoutError: ..."
}

# structured_logger.py
import logging
import json

class JSONFormatter(logging.Formatter):
    def format(self, record):
        log_entry = {
            "timestamp": self.formatTime(record),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
        }
        if hasattr(record, 'extra_data'):
            log_entry.update(record.extra_data)
        return json.dumps(log_entry)

logger = logging.getLogger("myapp")
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger.addHandler(handler)
logger.setLevel(logging.INFO)

logger.info("User login", extra={"extra_data": {"user_id": "abc123", "ip": "10.0.0.1"}})

# Output:
# {"timestamp": "2024-06-20 10:30:00,000", "level": "INFO", "logger": "myapp", "message": "User login", "user_id": "abc123", "ip": "10.0.0.1"}

Log Levels

Level	Purpose	Example
DEBUG	Development details	Variable values, function entry/exit
INFO	Normal operations	Request received, job completed
WARN	Potential issues	Slow query, retry attempt
ERROR	Operation failures	Database connection failed, API returned 500
FATAL	System crash	Out of memory, unrecoverable error

Cost Optimization for Logs

Log volume is the primary cost driver. A typical production system generates 50-500 GB of logs per day.

# log_cost_calculator.py
def estimate_log_cost(daily_gb, retention_days, provider="elastic"):
    rates = {
        "elastic": {"storage_gb": 0.15, "ingest_gb": 0.10},
        "loki": {"storage_gb": 0.04, "ingest_gb": 0.03},
        "datadog": {"storage_gb": None, "ingest_gb": 0.10},
    }
    rate = rates[provider]
    total_storage = daily_gb * retention_days
    if rate["storage_gb"]:
        storage_cost = total_storage * rate["storage_gb"]
    else:
        storage_cost = "included"
    ingest_cost = daily_gb * 30 * rate["ingest_gb"]
    return {
        "provider": provider,
        "daily_ingest_gb": daily_gb,
        "retention_days": retention_days,
        "monthly_ingest_cost": round(ingest_cost, 2),
        "total_storage_cost": round(storage_cost, 2) if isinstance(storage_cost, float) else storage_cost,
    }

print("=== Log Cost Comparison ===")
for provider in ["elastic", "loki", "datadog"]:
    cost = estimate_log_cost(100, 30, provider)
    print(f"  {provider:<10} Monthly ingest: ${cost['monthly_ingest_cost']:<8}")

Expected output:

=== Log Cost Comparison ===
  elastic    Monthly ingest: $300.0
  loki       Monthly ingest: $90.0
  datadog    Monthly ingest: $300.0

Common Mistakes

Logging sensitive data: Passwords, API keys, and PII in logs create compliance violations. Use log scrubbing or redaction filters before shipping.
Not setting log rotation and retention: Without rotation, logs fill the disk. Without retention policies, storage costs balloon. Set max_size and max_age in Filebeat and curator/index lifecycle policies in Elasticsearch.
Over-logging at DEBUG level in production: DEBUG logs generate enormous volume and make it hard to find real issues. Use INFO as default in production; toggle DEBUG per-service during debugging.
Storing unstructured logs: Plain text logs are hard to query. Use structured (JSON) logging so you can filter by field — level:ERROR is faster than grepping through lines.
Not correlating logs across services: A single user request spans multiple services. Include a request_id or trace_id in every log entry to correlate them.

Practice Questions

What is the difference between Elasticsearch and Loki? Answer: Elasticsearch indexes full log content, enabling complex full-text search but using more storage. Loki indexes only labels, using less storage and cost, but with limited text search.
What is structured logging and why use it? Answer: Structured logging outputs logs as JSON (or other structured format) with named fields. It enables machine parsing, filtering by field, and automated analysis without regex.
How does Filebeat differ from Logstash? Answer: Filebeat is a lightweight shipper that reads log files and forwards them. Logstash is a heavier processing pipeline that parses, transforms, and enriches logs. They work together: Filebeat ships, Logstash transforms.
What is LogQL and how does it relate to PromQL? Answer: LogQL is Loki’s query language, inspired by PromQL. It uses label selectors (same as PromQL) with log stream filtering (\|=, \|~).

Challenge

Set up a complete centralized logging stack: deploy ELK with Docker Compose, configure Filebeat to ship NGINX access logs, enable structured JSON logging in a sample Node.js/Python app, create a Kibana dashboard showing error rates over time, and set a Logstash filter to redact email addresses from logs.

FAQ

Should I use ELK or Loki?

: ELK is better for complex full-text search, compliance (audit logs), and existing Elasticsearch expertise. Loki is better for Kubernetes environments, cost-sensitive deployments, and teams already using Grafana.

How long should I retain logs?

: Typical retention: 7-30 days for debugging, 90 days for trends, 1-7 years for compliance. Use tiered storage — hot (fast), warm (cheaper), cold (archive).

What is the difference between a log and a metric?

: A log is an event record with context (what happened, when, details). A metric is a numeric value over time (count, rate, average). Logs answer “what happened”, metrics answer “what’s the trend”.

How do I handle multi-line logs (stack traces)?

: Configure Filebeat’s multiline setting to combine lines until a pattern matches: multiline.pattern: '^\d{4}-\d{2}-\d{2}' (match new log lines starting with a timestamp).

What is Logstash’s grok filter?

: Grok parses unstructured log lines into structured fields using patterns. Example: %{COMBINEDAPACHELOG} parses NGINX/Apache access logs into IP, timestamp, method, path, status, bytes.

Mini Project: JSON Logger Library

# json_logger.py
import json
import sys
from datetime import datetime

class JsonLogger:
    def __init__(self, service_name, level="INFO"):
        self.service = service_name
        self.level = level
        self.levels = {"DEBUG": 0, "INFO": 1, "WARN": 2, "ERROR": 3}

    def log(self, level, message, **extra):
        if self.levels.get(level, 0) < self.levels.get(self.level, 1):
            return
        entry = {
            "timestamp": datetime.utcnow().isoformat() + "Z",
            "level": level,
            "service": self.service,
            "message": message,
        }
        entry.update(extra)
        print(json.dumps(entry))

logger = JsonLogger("my-api")
logger.log("INFO", "Server started", port=8080, env="production")
logger.log("ERROR", "Connection refused", host="db.internal", retry=3)

# Output:
# {"timestamp": "2024-06-20T10:30:00.000000Z", "level": "INFO", "service": "my-api", "message": "Server started", "port": 8080, "env": "production"}
# {"timestamp": "2024-06-20T10:30:01.000000Z", "level": "ERROR", "service": "my-api", "message": "Connection refused", "host": "db.internal", "retry": 3}

What’s Next

Topic	Description
SRE Guide	Reliability practices for production systems
Monitoring Tools	Metrics and dashboards with Prometheus

Related topics: Prometheus, Grafana, Loki, ELK Stack

What’s Next

Congratulations on completing this Centralized Logging tutorial! Here’s where to go from here:

Practice daily — Convert your application logs to JSON format
Build a project — Set up ELK stack to monitor a sample application
Explore related topics — Check out SRE and incident response practices

Remember: every expert was once a beginner. Keep coding!

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Previous Monitoring Tools: Prometheus, Grafana, Datadog & More Next Site Reliability Engineering (SRE): Complete Guide

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse DevOps & Cloud