Centralized Logging: ELK Stack, Loki, and Best Practices
Centralized logging aggregates logs from every server, container, and service into a single searchable platform — enabling debugging, auditing, and alerting across your entire infrastructure.
What You’ll Learn
- Log aggregation architecture with Elasticsearch, Logstash, Kibana (ELK)
- Setting up Grafana Loki for cloud-native logging
- Structured logging with JSON format and log levels
- Log rotation, retention policies, and cost optimization strategies
Why Centralized Logging Matters
When you have 50 microservices spread across 20 servers, logs are scattered everywhere. Debugging an issue means SSHing into multiple machines, grepping through files, and correlating timestamps manually. Centralized logging brings all logs into one place with full-text search, filtering, and alerting. DodaTech uses the ELK Stack for Durga Antivirus Pro’s backend services — every API call, database query, and error is indexed and searchable within seconds.
flowchart LR
A[Monitoring Basics] --> B[Centralized Logging]
B --> C[Log Shipping]
B --> D[Indexing & Storage]
B --> E[Search & Visualization]
B --> F[Alerting]
C --> G[Filebeat / Fluentd]
D --> H[Elasticsearch / Loki]
E --> I[Kibana / Grafana]
style B fill:#005571,color:#fff
ELK Stack Architecture
The ELK Stack consists of Elasticsearch (storage and search), Logstash (processing), and Kibana (visualization). Filebeat ships logs from servers.
Docker Compose Setup
# docker-compose.elk.yml
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
environment:
- discovery.type=single-node
- ES_JAVA_OPTS=-Xms1g -Xmx1g
- xpack.security.enabled=false
ports:
- "9200:9200"
volumes:
- esdata:/usr/share/elasticsearch/data
logstash:
image: docker.elastic.co/logstash/logstash:8.11.0
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
ports:
- "5000:5000"
kibana:
image: docker.elastic.co/kibana/kibana:8.11.0
ports:
- "5601:5601"
environment:
- ELASTICSEARCH_HOSTS=http://elasticsearch:9200
volumes:
esdata:Logstash Configuration
# logstash.conf
input {
beats {
port => 5000
}
}
filter {
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}" }
}
if [level] == "ERROR" {
mutate { add_tag => ["error"] }
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "logs-%{+YYYY.MM.dd}"
}
}Output: Logs flow from Filebeat → Logstash → Elasticsearch. Logstash parses, enriches, and structures logs before indexing. Kibana queries Elasticsearch for visualization.
Shipping Logs with Filebeat
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/*.log
- /var/log/nginx/access.log
output.logstash:
hosts: ["logstash:5000"]
# Optional: output directly to Elasticsearch
output.elasticsearch:
hosts: ["http://elasticsearch:9200"]# Start Filebeat with custom config
filebeat -e -c filebeat.yml
# Output:
# 2024/01/15 10:30:00.123456 beat is running
# 2024/01/15 10:30:01.234567 Harvester started for file: /var/log/nginx/access.log
# 2024/01/15 10:30:01.345678 Events sent: 47Grafana Loki
Loki is a log aggregation system designed for Kubernetes and cloud-native environments. Unlike Elasticsearch, Loki indexes only metadata labels, not the full log content — making it cheaper and faster.
# docker-compose.loki.yml
services:
loki:
image: grafana/loki:2.9.0
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
promtail:
image: grafana/promtail:2.9.0
volumes:
- /var/log:/var/log
- ./promtail.yml:/etc/promtail/config.yml
command: -config.file=/etc/promtail/config.yml
grafana:
image: grafana/grafana:latest
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
ports:
- "3000:3000"# promtail.yml
scrape_configs:
- job_name: nginx
static_configs:
- targets: [localhost]
labels:
job: nginx
__path__: /var/log/nginx/*.log
- job_name: system
static_configs:
- targets: [localhost]
labels:
job: varlogs
__path__: /var/log/*.logOutput: Promtail reads log files, attaches labels, and pushes to Loki. Grafana’s Explore tab queries Loki with LogQL — a PromQL-inspired query language for logs.
LogQL Queries
# Find all ERROR logs in the nginx job
{job="nginx"} |= "ERROR"
# Count errors per minute over the last hour
sum by (level) (rate({job="nginx"} |= "ERROR"[5m]))
# Find logs containing a specific request ID
{job="api"} |= "req-abc123"
# Output: streaming log lines with timestamps and labels
# 2024-06-20T10:30:00Z {job="nginx"} 192.168.1.1 - - [20/...] "GET /api" 500Structured Logging
Write logs as JSON for machine parsing:
{
"timestamp": "2024-06-20T10:30:00Z",
"level": "ERROR",
"service": "user-service",
"request_id": "req-abc123",
"message": "Database connection timeout",
"duration_ms": 5234,
"user_id": "user-456",
"stack_trace": "TimeoutError: ..."
}# structured_logger.py
import logging
import json
class JSONFormatter(logging.Formatter):
def format(self, record):
log_entry = {
"timestamp": self.formatTime(record),
"level": record.levelname,
"logger": record.name,
"message": record.getMessage(),
}
if hasattr(record, 'extra_data'):
log_entry.update(record.extra_data)
return json.dumps(log_entry)
logger = logging.getLogger("myapp")
handler = logging.StreamHandler()
handler.setFormatter(JSONFormatter())
logger.addHandler(handler)
logger.setLevel(logging.INFO)
logger.info("User login", extra={"extra_data": {"user_id": "abc123", "ip": "10.0.0.1"}})
# Output:
# {"timestamp": "2024-06-20 10:30:00,000", "level": "INFO", "logger": "myapp", "message": "User login", "user_id": "abc123", "ip": "10.0.0.1"}Log Levels
| Level | Purpose | Example |
|---|---|---|
| DEBUG | Development details | Variable values, function entry/exit |
| INFO | Normal operations | Request received, job completed |
| WARN | Potential issues | Slow query, retry attempt |
| ERROR | Operation failures | Database connection failed, API returned 500 |
| FATAL | System crash | Out of memory, unrecoverable error |
Cost Optimization for Logs
Log volume is the primary cost driver. A typical production system generates 50-500 GB of logs per day.
# log_cost_calculator.py
def estimate_log_cost(daily_gb, retention_days, provider="elastic"):
rates = {
"elastic": {"storage_gb": 0.15, "ingest_gb": 0.10},
"loki": {"storage_gb": 0.04, "ingest_gb": 0.03},
"datadog": {"storage_gb": None, "ingest_gb": 0.10},
}
rate = rates[provider]
total_storage = daily_gb * retention_days
if rate["storage_gb"]:
storage_cost = total_storage * rate["storage_gb"]
else:
storage_cost = "included"
ingest_cost = daily_gb * 30 * rate["ingest_gb"]
return {
"provider": provider,
"daily_ingest_gb": daily_gb,
"retention_days": retention_days,
"monthly_ingest_cost": round(ingest_cost, 2),
"total_storage_cost": round(storage_cost, 2) if isinstance(storage_cost, float) else storage_cost,
}
print("=== Log Cost Comparison ===")
for provider in ["elastic", "loki", "datadog"]:
cost = estimate_log_cost(100, 30, provider)
print(f" {provider:<10} Monthly ingest: ${cost['monthly_ingest_cost']:<8}")Expected output:
=== Log Cost Comparison ===
elastic Monthly ingest: $300.0
loki Monthly ingest: $90.0
datadog Monthly ingest: $300.0Common Mistakes
Logging sensitive data: Passwords, API keys, and PII in logs create compliance violations. Use log scrubbing or redaction filters before shipping.
Not setting log rotation and retention: Without rotation, logs fill the disk. Without retention policies, storage costs balloon. Set
max_sizeandmax_agein Filebeat and curator/index lifecycle policies in Elasticsearch.Over-logging at DEBUG level in production: DEBUG logs generate enormous volume and make it hard to find real issues. Use INFO as default in production; toggle DEBUG per-service during debugging.
Storing unstructured logs: Plain text logs are hard to query. Use structured (JSON) logging so you can filter by field —
level:ERRORis faster than grepping through lines.Not correlating logs across services: A single user request spans multiple services. Include a
request_idortrace_idin every log entry to correlate them.
Practice Questions
What is the difference between Elasticsearch and Loki? Answer: Elasticsearch indexes full log content, enabling complex full-text search but using more storage. Loki indexes only labels, using less storage and cost, but with limited text search.
What is structured logging and why use it? Answer: Structured logging outputs logs as JSON (or other structured format) with named fields. It enables machine parsing, filtering by field, and automated analysis without regex.
How does Filebeat differ from Logstash? Answer: Filebeat is a lightweight shipper that reads log files and forwards them. Logstash is a heavier processing pipeline that parses, transforms, and enriches logs. They work together: Filebeat ships, Logstash transforms.
What is LogQL and how does it relate to PromQL? Answer: LogQL is Loki’s query language, inspired by PromQL. It uses label selectors (same as PromQL) with log stream filtering (
\|=,\|~).
Challenge
Set up a complete centralized logging stack: deploy ELK with Docker Compose, configure Filebeat to ship NGINX access logs, enable structured JSON logging in a sample Node.js/Python app, create a Kibana dashboard showing error rates over time, and set a Logstash filter to redact email addresses from logs.
FAQ
Mini Project: JSON Logger Library
# json_logger.py
import json
import sys
from datetime import datetime
class JsonLogger:
def __init__(self, service_name, level="INFO"):
self.service = service_name
self.level = level
self.levels = {"DEBUG": 0, "INFO": 1, "WARN": 2, "ERROR": 3}
def log(self, level, message, **extra):
if self.levels.get(level, 0) < self.levels.get(self.level, 1):
return
entry = {
"timestamp": datetime.utcnow().isoformat() + "Z",
"level": level,
"service": self.service,
"message": message,
}
entry.update(extra)
print(json.dumps(entry))
logger = JsonLogger("my-api")
logger.log("INFO", "Server started", port=8080, env="production")
logger.log("ERROR", "Connection refused", host="db.internal", retry=3)
# Output:
# {"timestamp": "2024-06-20T10:30:00.000000Z", "level": "INFO", "service": "my-api", "message": "Server started", "port": 8080, "env": "production"}
# {"timestamp": "2024-06-20T10:30:01.000000Z", "level": "ERROR", "service": "my-api", "message": "Connection refused", "host": "db.internal", "retry": 3}What’s Next
| Topic | Description |
|---|---|
| Reliability practices for production systems | |
| Metrics and dashboards with Prometheus |
Related topics: Prometheus, Grafana, Loki, ELK Stack
What’s Next
Congratulations on completing this Centralized Logging tutorial! Here’s where to go from here:
- Practice daily — Convert your application logs to JSON format
- Build a project — Set up ELK stack to monitor a sample application
- Explore related topics — Check out SRE and incident response practices
Remember: every expert was once a beginner. Keep coding!
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro