Apache Kafka in Action you own this product

From basics to production

Anatoly Zelenin, Alexander Kropp
Foreword by Adam Bellemare

April 2025
ISBN 9781633437593
368 pages

Included with a Manning Online subscription

printed in black & white

available in Russian

catalog / Data Science / Big Data / Stream Processing

table of content

Part 1 Getting started

1 Introduction to Apache Kafka

1.1 What is Apache Kafka, and how does it solve our problems?

1.2 Kafka in enterprise ecosystems

1.3 Architectural overview of Kafka

1.4 Running and using Kafka

1.5 Our learning path

Summry

2 First steps with Kafka

2.1 Introducing our use case

2.2 Producing messages

2.3 Consuming messages

2.4 Consuming and producing messages in parallel

2.5 Graphical user interfaces for Kafka

Summry

Part 2 Concepts

3 Exploring Kafka topics and messages

3.1 Topics

3.1.1 Viewing topics

3.1.2 Create, customize, and delete topics

3.2 Messages

3.2.1 Message types

3.2.2 Data formats

3.2.3 Message structure

Summry

4 Kafka as a distributed log

4.1 Logs

4.1.1 What exactly is a log?

4.1.2 Basic properties of a log

4.1.3 Kafka as a log

4.2 Kafka as a distributed system

4.2.1 Partitioning and keys

4.2.2 Consumer groups

4.2.3 Replication

4.3 Components of Kafka

4.3.1 Coordination cluster

4.3.2 Broker

4.3.3 Clients

4.4 Kafka in corporate use

Summry

5 Reliability

5.1 Acknowledgments

5.1.1 ACK strategies in Kafka

5.1.2 ACKs and ISRs

5.1.3 Message delivery guarantees in Kafka

5.2 Transactions

5.2.1 Transactions in databases

5.2.2 Transactions in Kafka

5.2.3 Transactions and consumers

5.3 Replication and the leader-follower principle

Summry

6 Performance

6.1 Configuring topics for performance

6.1.1 Scaling and load balancing

6.1.2 Determining how many partitions are needed

6.1.3 Changing the number of partitions

6.2 Producer performance

6.2.1 Producer configuration

6.2.2 Producer performance test

6.3 Broker configuration and optimization

6.3.1 Optimizing brokers

6.3.2 Determining broker count and sizing

6.4 Consumer performance

6.4.1 Consumer configuration

6.4.2 Consumer performance test

Summry

Part 3 Kafka deep dive

7 Cluster management

7.1 Apache Kafka Raft cluster management

7.2 ZooKeeper Cluster Management

7.3 Migrating from ZooKeeper to KRaft

7.4 Connection to Kafka

Summry

8 Producing and persisting messages

8.1 Producer

8.1.1 Producing messages

8.1.2 Production process for messages

8.1.3 Producer and ACKs

8.2 Broker

8.2.1 Receiving and persisting messages

8.2.2 Brokers and ACKs

8.3 Data and file structures

8.3.1 Metadata, checkpoints, and topics

8.3.2 Partitions directory

8.3.3 Log data and indices

8.3.4 Segments

8.3.5 Deleted topics

8.4 Replication

8.4.1 In-sync replicas

8.4.2 High Watermark

8.4.3 Effects of delays during replication

Summry

9 Consuming messages

9.1 Fetching messages

9.1.1 Fetch requests

9.1.2 Fetch from the closest replica

9.2 Broker handling of consumer fetch requests

9.3 Offsets and Consumer

9.3.1 Offset management

9.3.2 Understanding offsets in Kafka

9.4 Understanding and managing Kafka consumer groups

9.4.1 Consumer group management

9.4.2 Distribution of partitions to consumers

9.4.3 Static memberships

Summry

10 Cleaning up messages

10.1 Why clean up messages?

10.2 Kafka’s cleanup methods

10.3 Log retention

10.3.1 When is a log cleaned up via retention?

10.3.2 Offset retention

10.4 Log compaction

10.4.1 When is a log cleaned up via compaction?

10.4.2 How the log cleaner works

10.4.3 Tombstones

Summry

Part 4 Kafka in enterprise use

11 Integrating external systems with Kafka Connect

11.1 What is Kafka Connect?

11.2 Kafka Connect cluster: Distributed Mode

11.2.1 Configuring a Kafka Connect cluster

11.2.2 Creating a connector

11.2.3 Testing the connector

11.3 Scalability and fault tolerance of Kafka Connect

11.4 Worker configuration

11.5 The Kafka Connect REST API

11.5.1 Status of a Kafka Connect cluster

11.5.2 Creating, modifying, and deleting connectors

11.6 Connector configuration

11.6.1 General connector configuration

11.6.2 Error handling in Kafka Connect

11.7 Single message transformations

11.8 Kafka Connect example: JDBC Source Connector

11.8.1 Preparing the JDBC Source Connector

11.8.2 Configuring the JDBC Source Connector

11.8.3 Testing the JDBC Source Connector

11.9 Kafka Connect example: Change data capture connector

11.9.1 Preparing the Debezium connector for PostgreSQL

11.9.2 Configuring the Debezium connector for PostgreSQL

11.9.3 Testing the Debezium connector for PostgreSQL

Summry

12 Stream processing

12.1 Stream processing overview

12.1.1 Stream-processing libraries

12.1.2 Processing data

12.2 Stream processors

12.2.1 Processor types

12.2.2 Processor topologies

12.3 Stream processing using SQL

12.4 Stream states

12.4.1 Streams and tables

12.4.2 Aggregations

12.4.3 Streaming joins

12.4.4 Use case: Notifications

12.5 Streaming and time

12.5.1 Time is relative

12.5.2 Time windows

12.5.3 Use case: Fraud detection

12.6 Scaling Kafka Streams

Summry

13 Governance

13.1 Schema management

13.1.1 Why do we need schemas?

13.1.2 Compatibility levels

13.1.3 Schema registries

13.1.4 Avro

13.2 Security

13.2.1 Transport encryption

13.2.2 Authentication

13.2.3 Authorization

13.2.4 Encryption at rest

13.2.5 End-to-end encryption

13.2.6 ZooKeeper security

13.2.7 Securing an unsecured Kafka cluster

13.3 Quotas in Kafka: Protecting the cluster from overload

Summry

14 Kafka reference architecture

14.1 Useful components and tools

14.1.1 kcat

14.1.2 Graphical user interfaces

14.1.3 Managing Kafka resources

14.1.4 Cruise Control for Apache Kafka

14.2 Deployment environments

14.2.1 Kafka on a company’s own hardware

14.2.2 Kafka in virtualized environments

14.2.3 Kafka in Kubernetes: Strimzi

14.2.4 Running Kafka in the public cloud

14.3 Hardware requirements

14.3.1 Brokers

14.3.2 Coordination cluster

Summry

15 Kafka monitoring and alerting

15.1 Infrastructure metrics

15.2 Broker metrics

15.2.1 Kafka server metrics

15.2.2 Kafka log metrics

15.2.3 Kafka network metrics

15.2.4 Kafka controller metrics

15.3 Client metrics

15.3.1 General client metrics

15.3.2 Producer metrics

15.3.3 Consumer metrics

15.3.4 Kafka Connect and Kafka Streams metrics

15.4 Alerting

15.4.1 From metrics to alerts

15.4.2 From alerts to problem solving

15.5 Kafka deployment environments and their monitoring challenges

15.5.1 Kafka on a company’s own hardware

15.5.2 Kafka on virtual machines

15.5.3 Kafka in the public cloud

15.5.4 Kafka in Kubernetes

15.5.5 Kafka as a managed services

15.5.6 Security considerations across environments

Summry

16 Disaster management

16.1 What could possibly go wrong?

16.1.1 Network failures

16.1.2 Compute failures

16.1.3 Storage failures

16.1.4 Data center failures

16.2 Backing up Kafka

16.3 Mirroring Kafka clusters with MirrorMaker

16.3.1 Active-passive cluster

16.3.2 Active-active cluster

16.3.3 Hub-and-spoke topology

Summry

17 Comparison with other technologies

17.1 Data on the outside vs. data on the inside

17.2 Classic messaging systems vs. Kafka

17.2.1 Kafka is agnostic

17.2.2 Operational complexity in classic messaging systems

17.2.3 Governance of classic messaging systems

17.3 REST vs. Kafka

17.3.1 Challenges of synchronous communication

17.3.2 Alternative communication strategies

17.4 Relational databases vs. Kafka

17.4.1 Strengths and weaknesses of relational databases

17.4.2 Complementary roles of Kafka and relational databases in modern data architectures

17.5 Kafka is the core of a streaming platform

Summry

18 Kafka’s role in modern enterprise architectures

18.1 Kafka as the core of a data mesh

18.1.1 The challenges of traditional data management

18.1.2 Principles of a data mesh

18.1.3 Data mesh vs. traditional approaches

18.1.4 Kafka’s role and responsibilities in implementing a data mesh

18.2 Liberating data from core systems with Kafka

18.3 Kafka for big data

18.4 Kafka for the Industrial Internet of Things

18.4.1 Use cases for Kafka in the IIoT

18.4.2 Data storage and retention challenges

18.4.3 Data integration and access management

18.4.4 When to use multiple Kafka clusters

18.5 What Kafka is not

18.5.1 Kafka isn’t a relational database

18.5.2 Kafka isn’t a synchronous communication interface

18.5.3 Kafka isn’t a file exchange platform

18.5.4 Kafka for small applications is questionable

18.5.5 Kafka isn’t a substitute for good architecture

Summry

Appendix

Appendix A: Setting up a Kafka test environment

A.1 Operating systems

A.2 Downloading Kafka

A.3 Configuring Kafka

A.4 Preparing the data directories

A.5 Starting Kafka

A.6 Stopping Kafka

Appendix B: Monitoring setup

B.1 Prometheus

B.2 Prometheus Exporter

B.3 Prometheus Alertmanager

B.4 Grafana

Overview

4 Kafka as a distributed log

This chapter introduces Kafka through the lens of logs: ordered, append-only sequences of events that answer the question “what happened?” It explains core log properties—temporal ordering, append-at-end writes, immutability—and how offsets make large logs navigable while enabling consumers to track progress. Kafka elevates the log to a first-class storage and transport abstraction, using topics as logs and offsets to coordinate reading, but it cautions against treating Kafka as a query system or key-value store. Instead, Kafka acts as a central data backbone, where event streams are shared reliably and different systems materialize the forms they need (databases for queries, caches for fast lookups, search engines for discoverability).

To scale and remain resilient, Kafka is presented as a distributed log. Topics are partitioned so processing can be parallelized, with ordering guaranteed per partition and preserved for records sharing the same key. Without keys, producers use round-robin partitioning (now optimized via batching). Consumer groups enable horizontal consumption by assigning each partition to exactly one consumer instance within a group while storing per-group offsets for continuity. Reliability is delivered through replication: each partition has a leader and followers (replicas), with in-sync replicas (ISR) ready to take over on failure. Replication is log-based and efficient, and leaders are distributed across brokers to balance load and maintain throughput.

The chapter also outlines Kafka’s building blocks and their roles at scale. A coordination cluster manages cluster metadata, broker membership, partition leadership, and failover; Kafka now recommends KRaft for this role, replacing the operationally heavier ZooKeeper in most cases. Brokers store and serve data, while clients—producers, consumers, Kafka Streams, and Kafka Connect—write, read, process, and integrate data with external systems. In corporate environments, Kafka becomes a data hub: Connect links databases and other systems, Streams enables real-time processing, schema registries standardize data formats across teams, MirrorMaker 2 supports multi-datacenter mirroring, and robust operations (monitoring, automation, governance) turn Kafka into a dependable streaming platform for near–real-time, data-driven decisions.

A log is a sequential list where we add elements at the end and read them from a specific position (offset). For example, we read from offset 0, then choose to read from offset 4, and so on.

A log is a perfect data structure to exchange data between systems. Typically, we do not work directly with the data in the log, but store it in a data format that is best suited for our particular use case. For example, we can use relational databases to perform complex queries over our data. If we want to access prepared data quickly, we can use an in-memory key-value store like Redis, for example. If we want to provide a search function over the data in the log, we can use a search engine for that.

Scaling vertically means adding more resources to a single instance. Scaling horizontally means adding more instances to a system.

Log A holds all the data for coffee pads and log B holds all data for cola.

Every odd message was produced to partition 0 and every even message was produced to partition 1.

Messages with the same key (here the form) were produced to the same partition

If we have only one consumer that needs to read data from all partitions, it may not be able to keep up and we may not be able to process the data in a timely manner.

Consumer groups allow us to split the processing of multiple partitions between different instances of the same service. Often, not only one consumer group consumes the data from a topic, but several. Consumer groups are isolated from each other and do not influence each other.

Consumer and producer communicate exclusively with the leader (with rare exceptions). Followers are only there to continuously replicate new messages from the leader. If the leader fails, one of the followers takes over.

A typical Kafka environment consists of the Kafka cluster itself and the clients that write and read data to Kafka. Before the KRaft-based coordination cluster, a Zookeeper-ensemble was used as a coordination cluster. Without Zookeeper, brokers can take over the task of the coordination cluster themselves or outsource it to a standalone coordination cluster.

Kafka uses either a KRaft-based or a Zookeeper-based coordination cluster. Both should consist of an odd number (usually 3 or 5) of nodes and are using a protocol to find a consensus.

Kafka alone is usually not enough. The Kafka ecosystem offers numerous components to integrate Kafka into our enterprise landscape and thus build a streaming platform.

Summary

A log is a sequential list where we add elements at the end and read them from a specific position.
Kafka is a distributed log, the data of a topic is distributed to several partitions on several brokers.
Offsets are used to define the position of a message inside a partition.
Kafka is used to exchange data between systems, it does not replace databases, key-value stores, or search engines.
Partitions are used to scale topics horizontally and enable parallelization of processing.
Producers use partitioners to decide into which partition to produce to.
Messages with the same keys end up in the same partition.
Consumer groups are used to scale consumers and allow them to share the workload, one partition is always consumed by one consumer inside a group.
Replication is used to ensure reliability by duplicating partitions across multiple brokers within a Kafka cluster.
There is always one leader replica per partition which is responsible for the coordination of the partition.
Kafka consists of a coordination cluster, brokers, and clients.
The coordination cluster is responsible for orchestrating the Kafka cluster, in other words for managing brokers.
Brokers form the actual Kafka cluster, they are responsible for receiving, storing, and making messages available for retrieval.
Clients are responsible for producing or consuming messages, they connect to brokers.
There are various frameworks and tools to easily integrate Kafka into an existing corporate infrastructure.

FAQ

How does Kafka model data as a log?

Kafka stores records in append-only, ordered logs. New messages are written at the end, and consumers read from a specific position (offset) forward. This simple structure enables high throughput, durability, and straightforward replication.

What are offsets and how do consumers track their progress?

An offset is the position of a record within a partition, assigned by the broker when the record is written. Consumers remember the next offset to read; Kafka can persist these positions in the __consumer_offsets topic so consumers can resume from where they left off after restarts.

Why is immutability important in Kafka logs?

Messages, once written, are not changed in place. This immutability simplifies replication, preserves ordering, and allows consistent replay to reconstruct state at any point in time. Retention policies govern how long immutable data is kept.

Should I use Kafka like a database or key-value store?

No. Kafka addresses records by offset, not by key lookup or ad hoc queries. Use Kafka as a central data hub and materialize data into systems optimized for your access patterns (for example, relational databases for analytics, Redis for fast key lookups, Elasticsearch for search).

What are partitions and how do keys affect message ordering?

Topics are split into partitions for parallelism and scalability. Kafka guarantees ordering only within a single partition. If you need ordering for related messages, produce them with the same key so the partitioner routes them to the same partition.

What happens if I change the number of partitions or mix client libraries?

Partition selection typically uses hash(key) % numPartitions; changing the partition count can reshuffle keys and break ordering guarantees across re-partitioned data. Also, Java and librdkafka use different default partitioners—set librdkafka producers to murmur2_random to align with Java clients and ensure consistent routing.

How do consumer groups provide horizontal scalability?

Consumers that share the same group.id form a consumer group. Within a group, each partition is consumed by at most one consumer instance, allowing parallel processing while preserving per-partition order. Groups are isolated from each other, and offsets are tracked per group.

How does replication work and what do Leader, Follower, and ISR mean?

Each partition has one Leader that handles reads and writes, and Followers that replicate data from the Leader. In-Sync Replicas (ISR) are replicas caught up with the Leader. If the Leader fails, an ISR (or eligible replica) is elected Leader and clients automatically switch over.

What is the coordination cluster (KRaft vs. ZooKeeper) and why use an odd number of nodes?

The coordination cluster manages metadata, broker membership, controller elections, and partition leadership. Modern Kafka uses KRaft (Kafka Raft) instead of ZooKeeper, reducing operational complexity and improving performance. Use an odd number of nodes (commonly 3 or 5) to maintain quorum and tolerate failures.

How is Kafka used in enterprises and which ecosystem tools matter?

Kafka typically serves as a central data hub. Kafka Connect integrates external systems; Kafka Streams (and alternatives like Flink) process streams; a schema registry manages data formats; MirrorMaker 2 supports multi-datacenter mirroring. Production setups also require monitoring, automation, and compliance controls.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$47.99 $35.99

you save $12.00 (25%)

include audio $24.99 $18.74

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $35.99

you save $12.00 (25%)

include audio $24.99 $18.74

eBook

pdf, ePub, online

$47.99 $35.99

you save $12.00 (25%)

include audio $24.99 $18.74

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more