Skip to content
Apache Kafka Deep Dive — Topics, Partitions, Consumer Groups & Exactly-Once

Apache Kafka Deep Dive — Topics, Partitions, Consumer Groups & Exactly-Once

DodaTech Updated Jun 15, 2026 6 min read

Apache Kafka is a distributed event streaming platform capable of handling trillions of events per day — the backbone of modern real-time data pipelines.

What You’ll Learn

In this tutorial, you’ll learn Kafka’s architecture: topics and partitions, consumer groups and offset management, replication and fault tolerance, exactly-once semantics, and how to produce and consume messages with Python and confluent-kafka.

Why It Matters

Kafka is the de facto standard for event streaming. It’s used by 80% of Fortune 500 companies for real-time data pipelines, microservices communication, log aggregation, and stream processing. Understanding Kafka is essential for backend and data engineering roles.

Real-World Use

LinkedIn processes 7 trillion messages per day with Kafka. Uber uses it to track millions of rides in real time. Netflix streams events through Kafka for monitoring and analytics. Durga Antivirus Pro uses Kafka to stream threat intelligence updates to millions of endpoints.


graph LR
  subgraph "Kafka Cluster"
    A[Topic: orders]
    A --> B[Partition 0]
    A --> C[Partition 1]
    A --> D[Partition 2]
    B --> E[Broker 1]
    C --> F[Broker 2]
    D --> G[Broker 3]
  end
  H[Producer] --> A
  E --> I[Consumer Group A]
  F --> I
  G --> J[Consumer Group B]

Topics and Partitions

A topic is a category/feed name to which records are published. Each topic is split into partitions — ordered, immutable sequences of records.

  • Partitions provide parallelism: different consumers can read different partitions concurrently
  • Each message within a partition has a unique offset
  • Messages are ordered within a partition, not across partitions
# Partitioning strategy: key-based
# Same key always goes to the same partition (guarantees ordering per key)

Producing Messages with confluent-kafka

from confluent_kafka import Producer
import json

def delivery_report(err, msg):
    if err is not None:
        print(f"Delivery failed: {err}")
    else:
        print(f"Delivered to {msg.topic()} [{msg.partition()}] @ offset {msg.offset()}")

# Configure producer
conf = {
    'bootstrap.servers': 'localhost:9092',
    'client.id': 'python-producer',
    'acks': 'all',  # Wait for all replicas to acknowledge
}

producer = Producer(conf)

# Produce messages
topic = "orders"
for i in range(5):
    order = {"order_id": i, "item": f"Widget {i}", "quantity": i * 2}
    producer.produce(
        topic,
        key=str(i).encode(),
        value=json.dumps(order).encode(),
        callback=delivery_report
    )

producer.flush()
print("All messages delivered")

Expected output:

Delivered to orders [0] @ offset 0
Delivered to orders [1] @ offset 1
Delivered to orders [2] @ offset 2
Delivered to orders [3] @ offset 3
Delivered to orders [4] @ offset 4
All messages delivered

Consumer Groups and Offset Management

A consumer group shares the work of reading a topic. Each partition is assigned to exactly one consumer in the group.

  • Group coordinator: One broker manages the consumer group
  • Rebalance: When a consumer joins or leaves, partitions are reassigned
  • Offsets: Consumers commit their position so they can resume after a restart
from confluent_kafka import Consumer, KafkaError

consumer_conf = {
    'bootstrap.servers': 'localhost:9092',
    'group.id': 'order-processors',
    'auto.offset.reset': 'earliest',
    'enable.auto.commit': True,
}

consumer = Consumer(consumer_conf)
consumer.subscribe(["orders"])

print("Consumer started. Waiting for messages...")
try:
    for _ in range(5):
        msg = consumer.poll(timeout=5.0)
        if msg is None:
            continue
        if msg.error():
            if msg.error().code() == KafkaError._PARTITION_EOF:
                continue
            print(f"Error: {msg.error()}")
            continue
        order = json.loads(msg.value().decode())
        print(f"Received: {order} [partition={msg.partition()}, offset={msg.offset()}]")
finally:
    consumer.close()

Expected output:

Consumer started. Waiting for messages...
Received: {"order_id": 0, "item": "Widget 0", "quantity": 0} [partition=0, offset=0]
...

Replication and Fault Tolerance

Each partition has replicas across multiple brokers:

RoleDescription
LeaderHandles all reads and writes for a partition
FollowerReplicates data from the leader (in-sync replica = ISR)
ISRIn-Sync Replicas — followers that are up-to-date with the leader

If the leader fails, an ISR follower is elected as the new leader. The min.insync.replicas setting controls how many replicas must acknowledge writes.

Exactly-Once Semantics (EOS)

Kafka supports three delivery semantics:

SemanticsDescriptionUse Case
At-most-onceFire and forget — messages may be lostMonitoring metrics
At-least-onceRetry until acknowledged — duplicates possibleMost use cases
Exactly-onceTransactions + idempotent producerFinancial transactions

EOS uses:

  • Idempotent producer: Automatically deduplicates retries (set enable.idempotence=true)
  • Transactions: Atomic writes across multiple partitions (producer.beginTransaction(), producer.commitTransaction())
  • Consumer transactional reads: isolation.level=read_committed

Common Mistakes

  1. Using the same consumer group for different workloads: Each consumer type (processing, archiving, analytics) should have its own group.id.
  2. Not handling consumer rebalances: When consumers join/leave, partitions rebalance. Your consumer should handle revoked partitions gracefully.
  3. Sending messages without keys: Without keys, messages are round-robined across partitions. Sending related messages with the same key ensures ordering.
  4. Setting acks=0 for important data: acks=0 means fire-and-forget. Data can be lost. Use acks=all for durability.
  5. Ignoring partition count during upgrades: Partition count can’t be decreased. Plan capacity carefully.

Practice Questions

  1. What is a Kafka partition? An ordered, immutable sequence of records within a topic. Partitions enable parallelism — different consumers read different partitions.

  2. How does a consumer group work? Multiple consumers in the same group share the work of reading a topic. Each partition is assigned to exactly one consumer in the group.

  3. What is an ISR (In-Sync Replica)? A replica that is fully caught up with the partition leader. Only ISRs can become leaders during failover.

  4. How does Kafka achieve exactly-once semantics? Through idempotent producers (deduplication) and transactions (atomic writes across partitions).

  5. What happens when a consumer fails? The group coordinator detects the failure (missed heartbeats), triggers a rebalance, and reassigns partitions to remaining consumers.

Challenge

Set up a 3-broker Kafka cluster (using Docker). Create a topic with 6 partitions and replication factor 3. Produce 1000 messages and consume them with a 3-consumer group. Observe the partition distribution. Kill one consumer and watch the rebalance.

Real-World Task

Install Kafka and run kafka-topics.sh --describe --topic orders to see partition and replica details. What’s the leader for each partition? Which brokers have ISR status?

Mini Project: Event Pipeline

Build a Kafka-based event pipeline: a producer reads from a CSV file and sends events to a topic. A consumer processes events and writes results to a database (or file). Add error handling and offset management.

Security angle: Kafka’s encryption (SSL), authentication (SASL), and authorization (ACLs) secure event streams. Understanding these protects sensitive data in transit.

What’s Next

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

What’s Next

Congratulations on completing this Apache Kafka tutorial! Here’s where to go from here:

  • Practice daily — Consistency is more important than long study sessions
  • Build a project — Apply what you learned by building something real
  • Explore related topics — Check out other tutorials in the same category
  • Join the community — Discuss with other learners and share your progress

Remember: every expert was once a beginner. Keep coding!

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro