Skip to content
Apache Flink — Stream Processing, Event Time, Watermarks & Windowing

Apache Flink — Stream Processing, Event Time, Watermarks & Windowing

DodaTech Updated Jun 15, 2026 5 min read

Apache Flink is a distributed stream processing framework that processes data in real time — designed for event-driven applications, real-time analytics, and continuous data pipelines.

What You’ll Learn

In this tutorial, you’ll learn Flink’s stream processing model, event time vs processing time, watermarks, windowing (tumbling, sliding, session), state management, exactly-once semantics, and how Flink compares to Spark Streaming.

Why It Matters

Real-time data processing is essential for fraud detection, recommendation systems, monitoring, and IoT analytics. Flink’s true streaming architecture (not micro-batching) makes it the best choice for low-latency stream processing.

Real-World Use

Uber uses Flink for real-time fraud detection — analyzing ride transactions within milliseconds. Alibaba processes petabytes of data per day with Flink for real-time search and recommendations. Durga Antivirus Pro uses stream processing to detect threats as events arrive, not hours later.


graph LR
  subgraph "Flink Architecture"
    A[Data Source
Kafka / Kinesis] --> B[Flink Job] B --> C[Operator 1
Map] C --> D[Operator 2
Window] D --> E[Sink
Database / Dashboard] end subgraph "Event Time Processing" F[Events with Timestamps] --> G[Watermark] G --> H[Window Trigger] H --> I[Window Result] end

Stream Processing Model

Unlike Spark Streaming’s micro-batch model (processing data in small batches), Flink processes each event as it arrives — a true event-by-event streaming architecture.

FeatureFlinkSpark Streaming
ModelTrue streamingMicro-batching
LatencyMillisecondsSeconds
Exactly-onceBuilt-inTransactional sinks
Event timeNative supportRequires watermarking
State managementKeyed state, timersCheckpointing via DStream
SQL supportFlink SQLStructured Streaming

Event Time vs Processing Time

ConceptDefinitionUse Case
Event TimeWhen the event actually occurredSensor data, financial transactions
Ingestion TimeWhen Flink received the eventLog collection
Processing TimeThe current system timeMonitoring, alerting

Watermarks

A watermark is Flink’s mechanism to handle late events. It declares: “I’ve processed all events up to time T.”

// Conceptual watermark: events with timestamp < watermark are "complete"
// Watermark = CurrentEventTime - MaxOutOfOrderness

If you set max out-of-orderness to 5 seconds, a watermark of 12:00:10 means all events up to 12:00:05 are expected to have arrived. Events with timestamps before the watermark are considered late.

Windowing

Flink supports several window types for aggregating over time or count.

from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.functions import MapFunction, WindowFunction
from pyflink.common.time import Time
from pyflink.common.typeinfo import Types
from pyflink.datastream.window import TumblingEventTimeWindows

# Flink word count (Python API)
env = StreamExecutionEnvironment.get_execution_environment()

# Create stream (simulated)
data = ["hello world", "hello flink", "world hello", "hello flink flink"]
stream = env.from_collection(data)

# Word count with tumbling window of 5 seconds
result = (
    stream
    .flat_map(lambda text: [(word, 1) for word in text.split()],
              output_type=Types.TUPLE([Types.STRING(), Types.INT()]))
    .key_by(lambda x: x[0])
    .window(TumblingEventTimeWindows.of(Time.seconds(5)))
    .sum(1)
)

result.print()
env.execute("Flink Word Count")

Window Types

WindowDescriptionExample
TumblingFixed-size, non-overlappingEvery 1 hour: “sales per hour”
SlidingFixed-size, overlappingEvery 10 minutes: “30-minute moving average”
SessionGap-based, dynamic size“User session — 5 min inactivity ends session”
GlobalSingle global windowTriggers when a condition is met

State Management

Flink’s state is keyed — each operator stores state per partition key.

State TypeDescriptionExample
ValueStateSingle value per keyLast seen temperature
ListStateList of values per keyRecent transactions
MapStateMap (key-value) per keyUser session attributes
AggregatingStateIncremental aggregationRunning sum

State is checkpointed to durable storage (HDFS, S3) for fault recovery.

Exactly-Once Semantics

Flink guarantees exactly-once through lightweight distributed snapshots (checkpoints). The algorithm (Chandy-Lamport) captures the state of all operators and sources simultaneously without pausing the stream.

1. Source marks all events with a checkpoint barrier
2. Barrier propagates through operators
3. Each operator snapshots its state
4. When all operators complete, the checkpoint is committed
5. On failure: restore from the last successful checkpoint

Common Mistakes

  1. Using processing time when event time is needed: Processing time ignores event arrival delays. For accurate analytics (especially with mobile/offline sources), always use event time.
  2. Setting watermarks too aggressively: Too-strict watermarks cause excessive data dropping. Too-loose watermarks increase latency. Measure your data’s tardiness.
  3. Ignoring state size: Flink state is stored in memory or RocksDB. Unbounded state grows forever — use time-to-live (TTL) or session windows.
  4. Not configuring checkpointing: Without checkpointing, a Flink job loses all state on failure. Enable checkpointing for production.
  5. Choosing the wrong backend: For memory state (fast, limited), use HashMapStateBackend. For large state (RocksDB, slower), use EmbeddedRocksDBStateBackend.

Practice Questions

  1. How does Flink differ from Spark Streaming? Flink processes events one-by-one (true streaming, sub-second latency). Spark Streaming processes micro-batches (second-level latency).

  2. What is a watermark in Flink? A mechanism to track event time progress and handle late-arriving data. Events with timestamps before the watermark are considered late.

  3. What are the different window types in Flink? Tumbling (fixed, non-overlapping), sliding (fixed, overlapping), session (dynamic, inactivity gap).

  4. How does Flink achieve exactly-once processing? Through distributed snapshots (checkpoints) using the Chandy-Lamport algorithm. State is restored from the last checkpoint on failure.

  5. What is keyed state vs operator state? Keyed state is partitioned by key (e.g., per user). Operator state is per parallel instance (e.g., per Kafka partition).

Challenge

Set up a Flink cluster (local mode). Create a streaming job that reads from a Kafka topic, applies a sliding window (10 min, slide 1 min) to calculate a moving average, and writes results to another Kafka topic.

Real-World Task

Using the Flink dashboard (localhost:8081 in local mode), examine the job graph. How many parallel instances does each operator have? What’s the current watermark progress?

Mini Project: Anomaly Detection Pipeline

Build a Flink job that reads sensor events (temperature, pressure) and detects anomalies: if a value exceeds 3 standard deviations from the 1-hour rolling average, emit an alert. Use event_time and sliding windows.

Security angle: Real-time anomaly detection powers cybersecurity tools. Flink’s low-latency stream processing enables Durga Antivirus Pro to detect and respond to threats in milliseconds.

What’s Next

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

What’s Next

Congratulations on completing this Apache Flink tutorial! Here’s where to go from here:

  • Practice daily — Consistency is more important than long study sessions
  • Build a project — Apply what you learned by building something real
  • Explore related topics — Check out other tutorials in the same category
  • Join the community — Discuss with other learners and share your progress

Remember: every expert was once a beginner. Keep coding!

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro