Real-Time Analytics Explained — Streaming Architectures, Lambda vs Kappa & Dashboards
Real-time analytics processes data as it arrives — providing insights within seconds or milliseconds instead of hours.
What You’ll Learn
In this tutorial, you’ll learn streaming architectures (Lambda vs Kappa), real-time dashboards, alerting and anomaly detection, and how to build a real-time analytics pipeline with Kafka and ClickHouse.
Why It Matters
In a world moving at internet speed, waiting for batch reports means missed opportunities. Real-time analytics powers fraud detection (detect within milliseconds), operational monitoring (CPU spikes → auto-scale), and user-facing dashboards.
Real-World Use
When you check your DoorDash order status, the map updates in real time. This requires processing GPS events, updating order state, and pushing updates to your browser — all within seconds. Durga Antivirus Pro uses real-time analytics to detect threats as they emerge across millions of endpoints.
graph TD
subgraph "Lambda Architecture"
A[Data Stream] --> B[Speed Layer
Real-time]
A --> C[Batch Layer
Historical]
B --> D[Merged View]
C --> D
end
subgraph "Kappa Architecture"
E[Data Stream] --> F[Stream Processor]
F --> G[Real-time View]
F --> H[Long-term Storage]
end
style B fill:#4f46e5,color:#fff
style F fill:#4f46e5,color:#fff
Lambda vs Kappa Architecture
Lambda Architecture
Two parallel pipelines: a speed layer for low-latency (seconds) and a batch layer for accurate, comprehensive results (hours). Results are merged in a serving layer.
| Layer | Technology | Latency | Accuracy |
|---|---|---|---|
| Speed | Flink, Spark Streaming | Seconds | Approximate |
| Batch | Hadoop, Spark Batch | Hours | Exact |
| Serving | Druid, Cassandra | “Real-time” query | Merged |
Problem: Maintaining two separate codebases that produce compatible results is complex and error-prone.
Kappa Architecture
A single stream processing pipeline handles everything. Stream processor maintains state for real-time results, and can reprocess historical data by replaying the stream.
| Component | Technology |
|---|---|
| Message broker | Kafka (immutable log) |
| Stream processor | Flink, ksqlDB |
| Serving database | ClickHouse, Pinot, Druid |
| Dashboard | Grafana, Metabase, Superset |
# Simulated Kappa architecture pipeline
import time
import random
import json
from datetime import datetime
class StreamEvent:
def __init__(self, event_type, value):
self.event_type = event_type
self.value = value
self.timestamp = datetime.now()
class KappaPipeline:
def __init__(self):
self.state = {} # Key: metric name, Value: rolling stats
self.stream = []
def ingest(self, event):
self.stream.append(event)
key = event.event_type
if key not in self.state:
self.state[key] = {"count": 0, "sum": 0, "min": float("inf"), "max": float("-inf")}
s = self.state[key]
s["count"] += 1
s["sum"] += event.value
s["min"] = min(s["min"], event.value)
s["max"] = max(s["max"], event.value)
def get_stats(self, event_type):
s = self.state.get(event_type)
if not s:
return None
return {
**s,
"avg": round(s["sum"] / s["count"], 2) if s["count"] else 0,
}
pipeline = KappaPipeline()
# Simulate streaming events
print("Ingesting events...")
for i in range(1, 101):
event = StreamEvent(
random.choice(["page_view", "purchase", "click"]),
random.gauss(50, 20)
)
pipeline.ingest(event)
print("\nReal-time stats:")
for event_type in ["page_view", "purchase", "click"]:
stats = pipeline.get_stats(event_type)
print(f" {event_type}: count={stats['count']}, avg=${stats['avg']}, max=${stats['max']}")Expected output:
Ingesting events...
Real-time stats:
page_view: count=35, avg=$49.23, max=$95.12
purchase: count=28, avg=$52.87, max=$98.44
click: count=37, avg=$48.15, max=$91.76Real-Time Dashboard Architecture
A typical real-time dashboard stack:
Data Sources → Kafka → Stream Processor → Serving DB → Dashboard
(apps) (buffer) (Flink/ksqlDB) (ClickHouse) (Grafana)Example: Kafka + ClickHouse + Grafana
-- ClickHouse table for real-time analytics
CREATE TABLE events (
event_time DateTime,
event_type String,
user_id String,
value Float64
) ENGINE = MergeTree()
ORDER BY (event_time, event_type);
-- Materialized view for per-second aggregations
CREATE MATERIALIZED VIEW events_per_sec
ENGINE = SummingMergeTree()
ORDER BY (event_time, event_type)
AS SELECT
toStartOfSecond(event_time) as event_time,
event_type,
count() as count,
sum(value) as total_value
FROM events
GROUP BY event_time, event_type;
-- Query for dashboard (last 1 hour, refreshed every 5 seconds)
SELECT
toStartOfMinute(event_time) as minute,
event_type,
sum(count) as events
FROM events_per_sec
WHERE event_time > now() - INTERVAL 1 HOUR
GROUP BY minute, event_type
ORDER BY minute;Alerting and Anomaly Detection
Real-time alerts trigger when metrics cross thresholds or deviate from expected patterns.
import statistics
class AnomalyDetector:
def __init__(self, window_size=20, threshold=3):
self.values = []
self.window_size = window_size
self.threshold = threshold
def check(self, value):
self.values.append(value)
if len(self.values) > self.window_size:
self.values.pop(0)
if len(self.values) < 5:
return False # Not enough data
mean = statistics.mean(self.values)
stdev = statistics.stdev(self.values)
z_score = abs(value - mean) / max(stdev, 0.001)
if z_score > self.threshold:
return True, f"Anomaly: {value:.1f} (mean={mean:.1f}, z={z_score:.1f})"
return False, f"Normal: {value:.1f} (mean={mean:.1f}, z={z_score:.1f})"
detector = AnomalyDetector(window_size=30, threshold=3)
# Simulate normal traffic with a spike
print("Monitoring for anomalies:")
for i in range(40):
if 25 <= i <= 27:
value = random.gauss(200, 30) # Spike
else:
value = random.gauss(50, 10) # Normal
is_anomaly, msg = detector.check(value)
if is_anomaly:
print(f" ⚠ {msg}")
else:
print(f" {msg}")Expected output (approximate):
Monitoring for anomalies:
Normal: 52.3 (mean=52.3, z=0.0)
Normal: 48.7 (mean=50.5, z=0.2)
Normal: 55.1 (mean=52.0, z=0.3)
...
⚠ Anomaly: 215.3 (mean=51.8, z=14.5)
...
Normal: 49.2 (mean=55.0, z=0.5)Common Mistakes
- Using Lambda architecture unnecessarily: Most use cases can be handled with Kappa (single pipeline). Lambda adds complexity.
- Not planning for data replay: Real-time pipelines need to reprocess historical data when fixing bugs. Store raw data in Kafka with long retention.
- Ignoring backpressure: If your stream processor can’t keep up with data ingestion, you need backpressure handling or auto-scaling.
- Serving dashboard queries from the stream processor: Stream processors aren’t query engines. Use a serving database (ClickHouse, Pinot) for dashboard queries.
- Sampling data for dashboards: Sampling hides the tail. Real-user monitoring needs full data to catch rare issues affecting individual users.
Practice Questions
What’s the difference between Lambda and Kappa architectures? Lambda uses two pipelines (speed + batch). Kappa uses one stream processing pipeline for everything, simplifying maintenance.
Why use a serving database like ClickHouse for dashboards? Stream processors are optimized for real-time computation, not sub-second query serving. ClickHouse is optimized for fast analytical queries.
What is anomaly detection in real-time analytics? Identifying data points that deviate significantly from historical patterns — used for fraud detection, system monitoring, and quality control.
How do you handle late-arriving data in real-time pipelines? Use event time processing with watermarks (Flink) or upsert semantics in the serving database.
What is backpressure? When a downstream component can’t keep up with the upstream data rate. Solutions include buffering, scaling, or dropping data.
Challenge
Build a real-time pipeline: produce events to Kafka → consume with Flink/Apify → aggregate per minute → write to ClickHouse → query with Grafana. Start with simulated data from a Python script.
Real-World Task
Open Grafana or a monitoring dashboard your team uses. How often does it refresh? Are metrics powered by real-time queries or pre-aggregated data?
Mini Project: Real-Time Website Monitor
Build a real-time dashboard for website traffic. Use Kafka to stream page views, a Python stream processor to aggregate per minute, and display via a WebSocket-based dashboard (or Grafana with a database).
Security angle: Real-time security analytics enable immediate response to threats — detecting brute-force attacks, data exfiltration, or compromised accounts as they happen.
What’s Next
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
What’s Next
Congratulations on completing this Real-Time Analytics tutorial! Here’s where to go from here:
- Practice daily — Consistency is more important than long study sessions
- Build a project — Apply what you learned by building something real
- Explore related topics — Check out other tutorials in the same category
- Join the community — Discuss with other learners and share your progress
Remember: every expert was once a beginner. Keep coding!
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro