Kafka for Architects you own this product

Event-driven architecture, logs, microservices, real-time event processing

Katya Gorshkova
Foreword by Viktor Gamov

January 2026
ISBN 9781633436411
392 pages

Included with a Manning Online subscription

printed in black & white

available in Russian

catalog / Data Science / Big Data / Stream Processing

resources: Book forum Register your pBook for a free eBook

table of content

Part 1 Exploring Kafka building blocks

1 Getting to know Kafka as an architect

1.1 How an architect sees Kafka

1.1.1 Event-driven architecture

1.1.2 Handling myriads of data

1.2 Field notes: Journey of an event-driven project

1.3 Key players in the Kafka ecosystem

1.3.1 Brokers and clients

1.3.2 Controllers: Managing cluster metadata

1.4 Applying Kafka’s architectural principles

1.4.1 The publish-subscribe pattern

1.4.2 Reliable delivery

1.4.3 The commit log

1.5 Designing and managing data flows

1.5.1 Schema Registry: Handling data contracts

1.5.2 Kafka Connect: Data replication without code

1.5.3 Streaming frameworks: Processing data in real time

1.6 Addressing operations and infrastructure

1.6.1 Kafka tuning and maintenance

1.6.2 On-premises and cloud options

1.6.3 Solutions from other cloud providers

1.7 Applying Kafka in enterprise

1.7.1 Using Kafka for sending messages

1.7.2 Using Kafka for storing data

1.7.3 How Kafka is different

1.8 Online resources

2 Kafka cluster data architecture

2.1 Inside the Kafka cluster

2.2 Core concepts of data processing

2.2.1 Partitioning the topic

2.2.2 Processing data concurrently

2.2.3 Ordering within a topic

2.2.4 AsyncAPI: Capturing the architecture of topics, partitions, and more

2.3 Replicating partitions

2.3.1 Replica leaders and followers

2.3.2 Choosing replication factor and minimal number of in-sync replicas

2.3.3 Extending topic configuration with replication information

2.4 Inside the topic

2.4.1 Messages: Keys, values and headers

2.4.2 First draft for documenting messages in AsyncAPI

2.4.3 Message batches and offsets

2.4.4 Physical representation of a topic

2.4.5 Data retention

2.4.6 Selecting the number of partitions

2.4.7 Configuring topic metadata

2.5 Compacted topics

2.5.1 The idea of compaction

2.5.2 How compaction works

2.5.3 When compaction happens

2.5.4 Making decisions about the compaction policy

2.6 Online resources

3 Kafka clients and message production

3.1 Communicating with Kafka

3.1.1 How producers send messages to brokers

3.1.2 Configuring clients

3.1.3 Connecting to Kafka

3.1.4 Serializing and deserializing data

3.1.5 Setting quotas

3.1.6 Field notes: Setting up the Customer 360 operational data store

3.2 Sending a message

3.2.1 Partitioning strategy

3.2.2 Field notes: Partitioning strategy for the Customer 360 ODS

3.2.3 Acknowledgment strategies

3.2.4 Field notes: Implementing an acknowledgment strategy

3.2.5 Batches and timeouts

3.2.6 Common producer challenges

3.3 Online resources

4 Creating consumer applications

4.1 Organizing consumer applications

4.2 Receiving a message

4.2.1 Reading data in parallel

4.2.2 Setting initial consumer configuration for the Customer 360 project

4.2.3 Group leader and group coordinator

4.2.4 Committing the offsets

4.2.5 Specifying the strategy for committing offsets for an ODS

4.2.6 Creating batches

4.2.7 Timeouts and partition rebalance

4.2.8 Static group membership

4.2.9 Partition assignment strategies

4.2.10 The next-gen consumer rebalance protocol

4.2.11 Subscriptions and assignment

4.2.12 Reading data from compacted topics

4.2.13 Consumer considerations for the ODS project

4.3 Common consumer problems

4.3.1 Consumer scalability challenges

4.3.2 Optimizing batch size configuration

4.3.3 Timeout management strategies

4.3.4 Error-proof deserialization processes

4.3.5 Offset initialization strategies for new consumers

4.3.6 Accurate offset commitment practices

4.3.7 Coordinating transactions across external systems

4.4 Data compression

4.5 Accessing Kafka through the Confluent REST Proxy

4.6 Online resources

Part 2 Solving problems with Kafka

5 Kafka in real-world use cases

5.1 Navigating real-world implementation

5.1.1 Event-driven microservices

5.1.2 Data integration

5.1.3 Collecting logs

5.1.4 Real-time data processing

5.2 Differences from other messaging platforms

5.2.1 Publish-subscribe model

5.2.2 Partitioned data

5.2.3 Lack of broker-side logic

5.2.4 Sequential data access

5.2.5 Message persistence

5.2.6 Limitations in handling large messages

5.2.7 Scalability and high throughput

5.2.8 Fault tolerance

5.2.9 Batch processing

5.2.10 Lack of global ordering

5.3 Kafka alternatives

5.3.1 RabbitMQ

5.3.2 Apache Pulsar

5.3.3 Solutions from cloud providers

5.4 Online resources

6 Defining data contracts

6.1 How Kafka handles event structure

6.2 Designing events

6.2.1 Challenges in event design

6.2.2 Fact and delta events: Representing state changes

6.2.3 Composite, atomic, and aggregate events: Representing event structure

6.2.4 Pulling state on notification

6.2.5 Evolution of types

6.2.6 Mapping events to Kafka messages

6.2.7 Data strategies for the Customer 360 ODS

6.3 Event governance

6.3.1 Data formats

6.3.2 Selecting a data format for the Customer 360 ODS

6.3.3 Data ownership

6.3.4 Organizing data and communicating changes

6.3.5 Designing events for the Customer 360 ODS

6.4 Schema Registry

6.4.1 Schema Registry in Kafka ecosystem

6.4.2 Registering schemas

6.4.3 A concept of subject

6.4.4 Compatibility rules

6.4.5 Alternatives to Schema Registry

6.4.6 Handling data contracts without the centralized server

6.4.7 Commercial extensions for data contracts

6.5 Common problems in handling data contracts

6.5.1 Absence of server-side validation

6.5.2 Handling incompatible changes for non-compacted topics

6.5.3 Migrating state

6.5.4 Automatic registration of schemas

6.6 Online resources

7 Kafka interaction patterns

7.1 Using Kafka in microservices

7.1.1 Smart endpoints and dumb pipes

7.1.2 Request-response pattern

7.1.3 CQRS pattern

7.1.4 Event sourcing with snapshotting

7.1.5 Having “hot” and “cold” data

7.2 Decentralizing analytical data with a data mesh

7.2.1 Domain ownership

7.2.2 Data as a product

7.2.3 Federated governance

7.2.4 Self-serve platform

7.3 Using Kafka Connect

7.3.1 Kafka Connect at a glance

7.3.2 Internal Kafka Connect architecture

7.3.3 Converters

7.3.4 Single message transformations

7.3.5 Source connectors

7.3.6 Sink connectors

7.3.7 Changes in the incoming data structure

7.3.8 Integrating Kafka and databases

7.3.9 Creating a connector for the Customer 360 ODS

7.3.10 Common Kafka Connect problems

7.4 Ensuring delivery guarantees

7.4.1 Producer idempotence

7.4.2 Understanding Kafka transactions

7.4.3 Transactional outbox pattern

7.5 Online resources

8 Designing streaming applications

8.1 Introducing Kafka Streams

8.1.1 ETL, ELT, and stream processing

8.1.2 The Kafka Streams framework

8.1.3 Benefits of using Kafka Streams

8.2 Sketching out the ODS with Kafka Streams

8.3 Processing data

8.3.1 Stateless operations

8.3.2 Stateful operations

8.3.3 The Processor API

8.3.4 Kafka Streams internal architecture

8.3.5 Windowing operations

8.3.6 Joining streams

8.3.7 Implementing CustomerJoinService in the example ODS

8.3.8 Interactive queries

8.4 Alternative solutions

8.4.1 Confluent ksqlDB

8.4.2 Apache Flink

8.4.3 Solutions from cloud providers

8.5 Common streaming application challenges

8.5.1 Memory and disk capacity planning

8.5.2 Incorrect topic partitioning

8.5.3 Out-of-order data

8.5.4 Late-arriving data

8.5.5 State store initialization

8.5.6 Monitoring and debugging challenges

8.6 Online resources

Part 3 Delivering projects with Kafka

9 Managing Kafka within the enterprise

9.1 Managing metadata

9.1.1 Introducing KRaft controllers

9.1.2 Example of cluster configuration

9.1.3 Failover scenarios

9.1.4 Using ZooKeeper

9.2 Choosing a deployment solution

9.2.1 Choosing between on-premises and cloud Kafka deployment

9.2.2 Hybrid approach

9.2.3 Choosing the right deployment for the Customer 360 ODS

9.3 Creating a security solution

9.3.1 Kafka security overview

9.3.2 Encrypting using TLS

9.3.3 Authentication

9.3.4 Authorization

9.3.5 Protecting data at rest

9.3.6 Enabling security in the Customer 360 ODS

9.4 Online resources

10 Organizing a Kafka project

10.1 Defining Kafka project requirements

10.1.1 Identifying event-driven workflows

10.1.2 Turning business workflows into events

10.1.3 Gathering functional requirements for Kafka topics

10.1.4 Identifying nonfunctional requirements

10.2 Maintaining cluster structure

10.2.1 Using CLI and UI tools

10.2.2 Using GitOps for Kafka configurations

10.2.3 Using the Kafka Admin API

10.2.4 Setting up environments

10.2.5 Choosing a solution for the Customer 360 ODS

10.3 Testing Kafka applications

10.3.1 Unit testing

10.3.2 Integration testing

10.3.3 Performance tests

10.4 Online resources

11 Operating Kafka

11.1 Cluster evolution and upgrades

11.1.1 Adding brokers and distributing the load

11.1.2 Removing a broker from the cluster

11.1.3 Upgrading clients

11.1.4 Data mobility

11.2 Monitoring a Kafka cluster

11.2.1 Types of metrics in monitoring

11.2.2 Kafka monitoring objects

11.2.3 Ownership of monitoring responsibilities

11.2.4 Monitoring stacks and tools

11.3 Performance tuning clinic

11.3.1 Balancing throughput and latency

11.3.2 Balancing data safety and uptime

11.4 Disaster recovery and failover

11.4.1 RTO/RPO engineering

11.5 Online resources

12 What’s next for Kafka

12.1 Kafka as an orchestration platform

12.2 Integration with new runtimes

12.2.1 Kafka with WebAssembly

12.2.2 Serverless Kafka

12.2.3 Kafka at the edge

12.3 Diskless Kafka: Decoupling storage from brokers

12.4 Kafka in AI/ML world

12.4.1 Incremental learning

12.4.2 Feature engineering in motion

12.4.3 Kafka and AI agents

Overview

5 Kafka in real-world use cases

This chapter turns Kafka’s fundamentals into practical guidance for real-world decisions. It maps high-impact scenarios—notifications, external data integration, real-time analytics, and log aggregation—to Kafka’s strengths, clarifies the guarantees you gain and the operational costs you incur, and outlines antipatterns and edge cases where another tool may fit better. Along the way, it equips architects with a pragmatic checklist and mental model for evaluating Kafka’s applicability, while surveying viable alternatives such as RabbitMQ, Apache Pulsar, and managed cloud services.

In event-driven microservices, producers publish keyed events to preserve per-entity ordering while consumers maintain local read models; topic compaction, extended retention, and tiered storage support different durability and rebuild strategies, but Kafka remains a log, not a database. Data integration leans on snapshot replication and change data capture, with Kafka Connect and its connector ecosystem enabling low-code pipelines that can be enriched downstream by stream processing—balanced against added operational overhead, security concerns around raw data exposure, and fragility under schema drift. For centralized logging, Kafka decouples producers from backends like Elasticsearch, buffering spikes and improving durability and throughput; this adds cost and often warrants a separate cluster, and may be overkill for small estates. Real-time processing with frameworks such as Kafka Streams delivers low-latency insights and microservice-friendly scaling, but introduces a learning curve, state management complexity, operational sprawl, and uneven tooling.

The chapter also delineates where Kafka may not be ideal: it favors publish-subscribe over point-to-point semantics, partitions limit global ordering, brokers don’t perform content-based routing or schema validation, access is sequential (not content-indexed), and large messages require workarounds like externalized payloads. While Kafka excels at high-throughput, fault-tolerant streaming, batch-centric workflows and strict per-message transactions can be awkward. Alternatives fill different niches: RabbitMQ offers queues, request-reply, and smart routing (with a streaming add-on); Pulsar separates stateless brokers from BookKeeper storage and adds multitenancy and geo-replication; cloud services provide managed elasticity with differing semantics. Choosing among them hinges on throughput and latency needs, routing complexity, consistency requirements, ecosystem maturity, team expertise, and whether real-time or batch processing truly delivers the most value.

Flow diagram illustrating how ProfileService sends notifications about profile changes to Customer360Service through Kafka

Using compacted topics in Kafka: you always retain the latest version of each event, allowing Kafka to act as a source of truth

Setting up Kafka Connect for data replication involves using PostgreSQL and MongoDB as source systems. Source connectors are responsible for pulling data from these systems and inserting it into Kafka topics. In turn, sink connectors pull the data from Kafka and insert it into the target systems. In this setup, both sink connectors consume data from the same topics, with one inserting the data into an MS SQL database and the other into Amazon S3.

A workflow for source connectors

A workflow for sink connectors

Conceptual flow for log data collection: Log data is sent from the application to Kafka for processing, it’s then indexed in Elasticsearch, and it’s finally visualized in Kibana.

Sending log data via Kafka to Elasticsearch

The fraud detection application acts as a producer and a consumer for Kafka topics. It reads messages from the Transactions topic, processes them, and sends the output results to the Fraudulent Transactions topic.

Passing messages with references to content stored externally

Time-based batch load to the data warehouse: the consumer buffers records and, at fixed intervals, bulk-loads a batch to the data warehouse (rather than per-message processing).

Unexpected ordering. Earlier-timestamped messages can arrive later because of network delays.

RabbitMQ architecture

Apache Pulsar architecture

Summary

Microservices that communicate through events can use Kafka as an underlying integration platform, providing decoupled communication between services, improving scalability and fault tolerance. Kafka offers an efficient and scalable solution for integrating microservices in distributed architectures.
Kafka’s ability to process events with high throughput makes it ideal for collecting logs and metrics, as Kafka can handle vast amounts of data at a high rate.
Data replication can be implemented using Kafka Connect, a key component of the Kafka ecosystem. Kafka Connect provides a flexible and scalable way to implement data replication without extensive custom development.
Various frameworks tightly integrated with Kafka allow building applications that process data in real time, empowering businesses to react to data as it is generated, enabling advanced real-time use cases.
RabbitMQ and Apache Pulsar are messaging platforms that compete with Kafka, each serving its own niche. RabbitMQ excels in low-latency, transactional messaging, while Pulsar’s architecture with stateless brokers and separate storage makes it more scalable for certain use cases. The choice between Kafka, RabbitMQ, and Pulsar depends on non-functional requirements such as scalability, real-time processing, and transactional guarantees.
Kafka excels at processing small messages at a high rate with minimal latency, making it a top choice for real-time event-driven systems. Examples include clickstream analytics, fraud scoring on card transactions, IoT telemetry ingestion, and real-time operational alerting.
Kafka may not be the best choice for use cases requiring strict ordering, batch transfers, or random data access (e.g., a single-sequence financial ledger or nightly bulk file/table transfers for ETL).

FAQ

What real-world use cases are a great fit for Kafka?

Kafka shines in event-driven microservices, external data integration (especially CDC), real-time stream processing and analytics, and centralized log/metrics aggregation. It also works well for fan-out notifications and as a durable backbone to decouple producers and many independent consumers. Its high throughput and replayable retention make it ideal when multiple systems need the same data at different times.

When is Kafka not the best choice?

Prefer alternatives when you need synchronous request-response, strict cross-service transactions with immediate outcomes, or simple, low-volume systems where Kafka’s operational overhead is unnecessary. Kafka is a poor fit for broker-enforced point-to-point semantics, content-based routing or validation at the broker, random access by message content, global ordering across an entire topic, very large messages, and workflows designed purely for batch ETL.

How should I choose between Kafka and RabbitMQ?

Kafka favors publish-subscribe with “smart endpoints/dumb pipes,” high throughput, durable retention, and replay. RabbitMQ excels at “dumb endpoints/smart pipes” with exchanges that support complex routing, queues for point-to-point, and request-reply patterns. Pick Kafka for high-throughput event streams and fan-out; pick RabbitMQ when you need queue semantics, sophisticated routing, or simple synchronous patterns. RabbitMQ Streams narrows the gap by adding log-like streaming.

How does Apache Pulsar compare to Kafka?

Pulsar separates stateless brokers from storage (Apache BookKeeper), enabling independent scaling and fast recovery. It offers built-in geo-replication, multitenancy, dead-letter topics, non-persistent messaging, and queue-like subscriptions. Kafka has the larger ecosystem, tooling, and community. Pulsar can expose a Kafka protocol handler so Kafka clients can talk to Pulsar without code changes.

How can microservices use Kafka as a source of truth for state?

Use compacted topics to retain only the latest value per key, allowing services to rebuild state from Kafka and keep only in-memory or lightweight stores. Alternatively, set delete-retention to “effectively forever” and/or use tiered storage to keep full history more cheaply. Plan for tradeoffs: Kafka is not a database (no indexes, joins, SQL, or referential integrity) and rebuilding state from an event log adds latency.

What is Kafka Connect and when should I use it for data integration?

Kafka Connect is a no-code/low-code framework for moving data in and out of Kafka via pluggable connectors. Use snapshot-based connectors (e.g., JDBC) or CDC via Debezium to stream inserts/updates/deletes. Pros: large connector marketplace, scalable, uniform pipelines. Cons: some commercial licenses/costs, schema-change fragility, and raw data exposure risks. Alternatives include custom producers/consumers, DB replication tools (e.g., GoldenGate), and traditional ETL platforms.

How do I build a centralized logging pipeline with Kafka, and what are the tradeoffs?

Common pattern: apps log via frameworks or agents (e.g., Fluentd/Fluent Bit) → Kafka buffers and fans out → Kafka Connect indexes logs into Elasticsearch/OpenSearch → dashboards in Kibana. Benefits: decoupling, durability, high throughput, and backpressure buffering. Tradeoffs: resource-intensive, different nonfunctional needs than business streams—often warrant a separate Kafka cluster. Alternatives include syslog, Logstash, direct-to-Elasticsearch, or SaaS platforms like Datadog/New Relic.

What implementation challenges should I expect with event-driven systems on Kafka?

Managing delivery guarantees under retention constraints, immutable messages with no broker-side validation (requiring strong data contracts), and schema evolution/versioning are key challenges. Distributed, asynchronous flows complicate tracing and debugging, especially timing and ordering issues. You’ll need solid observability, error handling, and governance around schemas and topic lifecycles.

How do partitions and ordering affect system design?

Kafka guarantees ordering only within a partition; use a stable key (e.g., customer ID) to maintain per-entity order. Global ordering requires tradeoffs: a single partition (limits throughput), adding sequence numbers, or consumer-side reordering (e.g., by timestamp) with careful handling of late events. Multiple producers and network delays can cause unexpected interleaving at the broker, so plan for reordering logic where strict order matters.

How should I handle large messages and batch workflows with Kafka?

Prefer storing large payloads in external object storage and send only a reference through Kafka; alternatively split payloads into parts, ensuring they land on the same partition. As a last resort, raise size limits across brokers, producers, and consumers (with resource tradeoffs). Batch processing is possible but awkward: you must aggregate across partitions, define batch boundaries despite late events, and handle partial failures. When batch is primary, consider dedicated batch/ETL tools or warehouse loaders.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$55.99 $33.59

you save $22.40 (40%)

include audio $24.99 $14.99

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$55.99 $33.59

you save $22.40 (40%)

include audio $24.99 $14.99

eBook

pdf, ePub, online

$55.99 $33.59

you save $22.40 (40%)

include audio $24.99 $14.99

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more