Kafka for Architects you own this product

Event-driven architecture, logs, microservices, real-time event processing

Katya Gorshkova

MEAP began November 2024
Last updated September 2025
Publication in December 2025 (estimated)

ISBN 9781633436411
375 pages (estimated)

Included with a Manning Online subscription

printed in black & white

available in Russian

catalog / Data Science / Big Data / Stream Processing

table of content

1 Getting to know Kafka as an Architect

1.1 How an architect sees Kafka

1.1.1 Event-driven architecture

1.1.2 Handling myriads of data

1.2 Field notes: Journey of an event-driven project

1.3 Key players in the Kafka ecosystem

1.3.1 Brokers and clients

1.3.2 Managing cluster metadata

1.4 Architectural principles

1.4.1 The Publish-Subscribe pattern

1.4.2 Reliable delivery

1.4.3 The commit log

1.5 Designing and managing data flows

1.5.1 Schema Registry: handling data contracts

1.5.2 Kafka Connect: data replication without code

1.5.3 Data transformation: streaming frameworks

1.6 Impacting operations and infrastructure

1.6.1 Kafka tuning and maintenance

1.6.2 On-premise and cloud options

1.6.3 Solutions from other cloud providers

1.7 Applying Kafka in enterprise

1.7.1 Using Kafka for sending messages

1.7.2 Using Kafka for storing data

1.7.3 How Kafka is different

1.8 Field notes: Getting started with a Kafka project

1.9 Online Resources

1.10 Summary

2 Kafka Cluster Data Architecture

2.1 Field notes: From sketch to project—topics, partitions, and keys

2.2 Inside the Kafka cluster

2.3 Core concepts of data processing

2.3.1 Partitioning the topic

2.3.2 Processing data concurrently

2.3.3 Ordering within a topic

2.4 Field notes: Capturing the architecture of topics, partitions, and beyond

2.4.1 Introducing AsyncAPI

2.5 Replicating partitions

2.5.1 Replica leaders and followers

2.5.2 Choosing replication factor and minimal number of in-sync replicas

2.5.3 Field notes: Extending topic configuration with replication information

2.5.4 Architecture Notes: Configuring Topics

2.6 Inside the topic

2.6.1 Messages: keys, values and headers

2.6.2 Field notes: First draft for documenting messages in AsyncAPI

2.6.3 Message batches and offsets

2.6.4 Physical representation of a topic

2.6.5 Data retention

2.6.6 Selecting the number of partitions

2.6.7 Field notes: Configuring topic metadata

2.6.8 Architecture Points: Advanced Topic Configuration

2.7 Compacted topics

2.7.1 The idea of compaction

2.7.2 How compaction works

2.7.3 Making decisions about compaction policy

2.7.4 Architecture Points: Compaction

2.8 Online resources

2.9 Summary

3 Kafka Clients and Message Production

3.1 Field notes: Patterns and pitfalls for Kafka clients

3.2 Communicating with Kafka

3.2.1 Configuring clients

3.2.2 Connecting to Kafka

3.2.3 Serializing and deserializing data

3.2.4 Setting quotas

3.2.5 Field notes: Setting up Customer360 project

3.2.6 Architecture Points: Initial Producer Configuration

3.3 Sending a message

3.3.1 Partitioning strategy

3.3.2 Field notes: Partitioning strategy for Customer360

3.3.3 Acknowledgement strategies

3.3.4 Field notes: Acknowledgment strategy for Customer360

3.3.5 Batches and timeouts

3.3.6 Field notes: Configuring producer for Customer360

3.3.7 Common producer issues

3.3.8 Architecture Points: advanced producer configuration

3.4 Online resources

3.5 Summary

4 Creating Consumer Applications

4.1 Field notes: Consumer patterns and trade-offs

4.2 Organizing consumer applications

4.3 Receiving a message

4.3.1 Reading data in parallel

4.3.2 Field notes: Setting initial consumer configuration for Customer 360 project

4.3.3 Group Leader and Group Coordinator

4.3.4 Committing the offsets

4.3.5 Field notes: Specifying the strategy for committing offsets for Customer360

4.3.6 Creating batches

4.3.7 Timeouts and partition rebalance

4.3.8 Static Group Membership

4.3.9 Partition assignment strategies

4.3.10 The Next-Gen Consumer Rebalance Protocol

4.3.11 Subscriptions and assignment

4.3.12 Reading data from compacted topics

4.3.13 Field notes: Consumer considerations for the Customer360 project

4.4 Common consumer issues

4.5 Data compression

4.6 Accessing Kafka through REST Proxy

4.7 Online resources

4.8 Summary

5 Kafka in Real World Use Cases

5.1 Field notes: When to choose Kafka—and when not to

5.2 Navigating real-world implementation

5.2.1 Event-driven microservices

5.2.2 Data integration

5.2.3 Collecting logs

5.2.4 Real-time data processing

5.3 Differences with other messaging platforms

5.3.1 Publish-subscribe model

5.3.2 Partitioned data

5.3.3 Lack of broker-side logic

5.3.4 Sequential data access

5.3.5 Message persistence

5.3.6 Limitations in handling large messages

5.3.7 Scalability and high throughput

5.3.8 Fault tolerance

5.3.9 Batch processing

5.3.10 Lack of global ordering

5.4 Kafka Alternatives

5.4.1 RabbitMQ

5.4.2 Apache Pulsar

5.4.3 Solutions from cloud providers

5.5 Online Resources

5.6 Summary

6 Defining Data Contracts

6.1 Field notes: Turning business facts into schemas

6.2 How Kafka handles event structure

6.3 Designing events

6.3.1 Challenges in event design

6.3.2 Fact and delta events: representing state changes

6.3.3 Composite, atomic, and aggregate events: representing event structure

6.3.4 Pulling state on notification

6.3.5 Evolution of types

6.3.6 Mapping events to Kafka messages

6.3.7 Field notes: Data strategies for Customer 360 project

6.4 Event Governance

6.4.1 Data Formats

6.4.2 Field notes: selecting data format for Customer 360 project

6.4.3 Data ownership

6.4.4 Organizing data and communicating changes

6.4.5 Field notes: Designing events for Customer 360 project

6.5 Schema Registry

6.5.1 Schema Registry in Kafka ecosystem

6.5.2 Registering schemas

6.5.3 A Concept of Subject

6.5.4 Compatibility Rules

6.5.5 Alternatives to Schema Registry

6.5.6 Handling data contracts without the centralized server

6.5.7 Commercial extensions for data contracts

6.6 Common problems in handling data contracts

6.6.1 Absence of server-side validation

6.6.2 Handling incompatible changes for non-compacted topics

6.6.3 Migrating state

6.6.4 Automatic registration of schemas

6.7 Online Resources

6.8 Summary

7 Kafka Interaction Patterns

7.1 Field notes: When Kafka helps—and when it hurts

7.1 Using Kafka in Microservices

7.1.1 Smart endpoints and dumb pipes

7.1.2 Request-response pattern

7.1.3 CQRS pattern

7.1.4 Event sourcing with snapshotting

7.1.5 Having “hot” and “cold” data

7.2 Field notes: Implementing data mesh with Kafka

7.2.1 Motivations for data mesh

7.2.2 Domain ownership

7.2.3 Data as a product

7.2.4 Federated governance

7.2.5 Self-Serve platform

7.3 Using Kafka Connect

7.3.1 Kafka Connect at a glance

7.3.2 Internal Kafka Connect architecture

7.3.3 Converters

7.3.4 Single message transformations

7.3.5 Source connectors

7.3.6 Sink connectors

7.3.7 Changes in incoming data structure

7.3.8 Integrating Kafka and databases

7.3.9 Field notes: Creating connector for Customer 360

7.3.10 Common Kafka Connect problems

7.4 Ensuring delivery guarantee

7.4.1 Producer idempotence

7.4.2 Understanding Kafka transactions

7.4.3 Transactional outbox pattern

7.5 Online resources

7.6 Summary

8 Designing Streaming Applications

8.1 Field notes: Transforming data in motion

8.2 Introducing Kafka Streams

8.2.1 ETL, ELT and Stream Processing

8.2.2 Introduction to the Kafka Streams framework

8.2.3 Benefits of using Kafka Streams

8.3 Field notes: Sketching out Customer360 with Kafka Streams

8.4 Processing data

8.4.1 Stateless Operations

8.4.2 Stateful operations

8.4.3 Processing API

8.4.4 Kafka Streams internal architecture

8.4.5 Windowing operations

8.4.6 Joining streams

8.4.7 Field notes: Implementing CustomerJoinService

8.4.8 Interactive Queries

8.5 Alternative Solutions

8.5.1 Confluent ksqlDB

8.5.2 Apache Flink

8.5.3 Solutions from Cloud Providers

8.6 Common streaming application issues

8.6.1 Memory and disk capacity planning

8.6.2 Incorrect topic partitioning

8.6.3 Out-of-order data

8.6.4 Late arriving data

8.6.5 State Store initialization

8.6.6 Monitoring and debugging challenges

8.7 Online resources

8.8 Summary

9 Managing Kafka within the Enterprise

9.1 Field notes: From prototype to deployment

9.2 Managing metadata

9.2.1 Introducing KRaft controllers

9.2.2 Example of cluster configuration

9.2.3 Failover scenarios

9.2.4 Using Zookeeper

9.3 Choosing a deployment solution

9.3.1 Choosing between on-premises and cloud Kafka deployment

9.3.2 Hybrid Approach

9.3.3 Choosing the Right Deployment for a Customer 360 Project

9.4 Protecting Kafka

9.4.1 Kafka security overview

9.4.2 Encrypting using TLS

9.4.3 Authentication

9.4.4 Authorization

9.4.5 Protecting data at rest

9.4.6 Enabling security in the Customer360 project

9.5 Online resources

9.6 Summary

10 Organizing a Kafka Project

10.1 Defining Kafka Project Requirements

10.1.1 Field notes: Use-Case Intake and Requirements

10.1.2 Identifying event-driven workflows

10.1.3 Turning business workflows into events

10.1.4 Gathering functional requirements for Kafka topics

10.1.5 Identifying non-functional requirements

10.2 Maintaining Cluster Structure

10.2.1 Using tools

10.2.2 Using GitOps for Kafka configurations

10.2.3 Using the Kafka Admin API

10.2.4 Setting up environments

10.2.5 Field notes: Choosing a solution for Customer360 project

10.3 Testing Kafka applications

10.3.1 Unit testing

10.3.2 Integration testing

10.3.3 Performance tests

10.4 Online Resources

10.5 Summary

11 Operating Kafka

11.1 Cluster evolution and upgrades

11.1.1 Adding brokers and distributing the load

11.1.2 Removing a broker from the cluster

11.1.3 Upgrading clients

11.1.4 Data mobility

11.2 Monitoring Kafka cluster

11.2.1 Types of metrics in monitoring

11.2.2 Kafka monitoring objects

11.2.3 Ownership of Monitoring Responsibilities

11.2.4 Monitoring Stacks and Tools

11.3 Performance Tuning Clinic

11.3.1 Balancing throughput and latency

11.3.2 Balancing data safety and up-time

11.4 Disaster Recovery & failover

11.4.1 RTO/RPO Engineering

11.5 Online Resources

11.6 Summary

12 What’s next for Kafka

12.1 Kafka’s origins: A path to event backbone

12.2 Kafka as an orchestration platform

12.3 Integration with new runtimes

12.3.1 Kafka with WebAssembly

12.3.2 Serverless Kafka

12.3.3 Kafka at the edge

12.4 Diskless Kafka: decoupling storage from brokers

12.5 Kafka in AI/ML world

12.5.1 Incremental learning

12.5.2 Feature engineering in motion

12.5.3 Kafka and AI agents

12.6 Summary

Overview

1 Getting to know Kafka as an Architect

This chapter positions Apache Kafka as a foundational platform for event-driven architecture, explaining how it decouples producers from multiple consumers to turn events into low-latency actions. It highlights Kafka’s evolution from high-throughput messaging to a durable, real-time event streaming ecosystem that enables use cases like fraud detection, personalized recommendations, operational alerts, and predictive maintenance.

Beyond technology mechanics, the chapter centers on architectural judgment: when Kafka is a good fit, how it reshapes system design and operations, and what governance and event design practices ensure sustainable adoption. It emphasizes that success depends less on code and more on patterns, trade-offs, and integration strategies appropriate to enterprise environments.

Principles of event-driven architecture
Overview of the Kafka ecosystem
Utilizing Kafka in enterprise environments

Key takeaways

Kafka serves as a durable, scalable backbone for real-time event processing, enabling many consumers to react to a single published event without brittle point-to-point integrations.
Event-driven systems unlock time-sensitive value across industries; latency reductions translate directly to business impact.
Adoption requires architectural rigor: event modeling, topic design, consumer patterns, operational considerations, and governance.
The chapter guides architects (and technical leads) in comparing Kafka to alternatives, assessing fit, and making deliberate design choices rather than focusing on language-specific implementation details.

Outcomes for readers

Confidently evaluate Kafka’s suitability for a given problem and environment.
Design event-driven systems that leverage Kafka’s decoupling, durability, and scalability.
Establish governance and patterns that support long-term maintainability and enterprise integration.

1.1 How an architect sees Kafka

Traditional system integration often relied on synchronous request-response communication (commonly REST) between narrowly focused services. While straightforward and well-tooled, chaining synchronous calls introduces tight coupling, coordination complexity, fragility, and a higher risk of cascading failures. Modern requirements push architects toward more autonomous, flexible interactions where components react to changes independently.

Request Response Design Pattern

1.1.1 Event-driven architecture

EDA centers on producing, detecting, consuming, and reacting to events via an intermediary channel. With Kafka as the backbone, services publish events describing changes, and interested consumers react asynchronously. For example, instead of calling CustomerService for each order, OrderService subscribes to address-change events and maintains a local copy, enabling loose coupling, scalability, and resilience. Multiple consumers can subscribe to the same event stream, and offline components can catch up later, provided delivery is reliable.

EDA style of the communication. In EDA, systems communicate by publishing events that describe changes, allowing others to react asynchronously.

EDA introduces trade-offs: eventual consistency (temporary state divergence), concerns about ordering and duplicates, and the need for idempotency and robust error handling. It can add latency and operational complexity, which architects must weigh against the benefits of decoupling and autonomy.

1.1.2 Handling myriads of data

For high-volume, low-latency scenarios—such as user behavior analytics, log aggregation, fraud detection, and predictive maintenance—event-driven pipelines are often the only practical choice. Kafka excels at sustained throughput and low latency, helping architects implement reliable, scalable EDA and efficiently handle high-rate message streams. The chapter concludes by setting up a scenario where a team evaluates adopting EDA with Kafka.

1.2 Field notes: Journey of an event-driven project

An account manager brings a wave of change requests to a team already stretched by a sluggish data warehouse. Despite modern practices and upgrades, the nightly load now takes 14 hours due to a fivefold data growth and poor source data quality, delaying reports and eroding trust.

Amid this pressure, the business proposes a high-profile Customer 360 initiative—cloud-based and backed by marketing—to unify customer views across touchpoints. While the lead architect anticipates added complexity and cost, the senior data engineer suggests a pivot: adopt an event-driven approach with Kafka.

Eva frames Kafka as a low-latency, durable event log enabling fan-out to multiple consumers. Customer 360 can subscribe to streams and build its own projections, potentially reducing batch delays and decoupling downstream needs from upstream constraints. The account manager remains skeptical but open, asking for a concrete proposal with estimates comparable to the traditional approach. The team agrees to explore the Kafka-based design in the next meeting, setting the stage for a shift toward event-driven architecture.

1.3 Key players in the Kafka ecosystem

This section introduces Kafka’s core roles and runtime architecture so architects can evaluate the costs and benefits of adopting Kafka as a foundational system. Although Kafka has grown into a distributed platform for processing real-time events, its essence remains a reliable, high-throughput message broker focused on transporting and persisting messages.

At the data plane, producers push messages to a cluster of brokers, which acknowledge delivery, persist data to disk (with optional tiered storage for older data), and make messages available to consumers. Kafka uses a pull model: consumers subscribe to topics and fetch records on demand. An application can simultaneously act as both producer and consumer. Brokers work in a fault-tolerant cluster, distributing load and reporting their health via heartbeats.

At the control plane, Kafka uses controllers (KRaft) to manage cluster metadata and coordinate broker operations. Any server may run as a broker, a controller, or both. One controller is active while others are hot standbys; if the active controller fails, a new one is elected. Controllers maintain the metadata log (topics, partitions, broker registrations), replicate this metadata for durability, and monitor broker liveness via heartbeats to detect and handle failures.

Producers: push messages to brokers; receive delivery acknowledgments.
Brokers: persist and serve messages, replicate data, balance load, and ensure durability.
Consumers: pull messages by subscribing to topics; control read pace and position.
Controllers (KRaft): own cluster metadata, coordinate operations, elect leaders, and handle failover.
Storage: messages are always persisted; tiered storage can offload older data to cheaper media.

1.4 Architectural principles

This section outlines Kafka’s core architectural principles for event-driven enterprise systems. Kafka goes beyond traditional messaging by combining decoupled communication, durable storage, and replayable event streams—enabling scalable, fault-tolerant, and auditable data flows across services.

1.4.1 The Publish-Subscribe pattern

Kafka uses publish-subscribe to decouple producers and consumers. Producers publish events in a fire-and-forget style with optional acknowledgments and without direct knowledge of subscribers. Multiple consumers can independently receive and process the same event. In practice, producers emit events driven by explicit business needs, while owning and stabilizing the event schema to maximize reuse.

Example: When a customer address changes, OrderService updates its database and emits an event. Both BillingService and AnalyticalService can subscribe and process that event independently as part of a Customer 360 scenario.

Publish–subscribe example: CustomerService publishes a “customer updated” event to a channel; all subscribers receive it independently.

Data governance and the organization of event pipelines are addressed later (see Chapter 6).

1.4.2 Reliable delivery

Reliability is achieved through a Kafka cluster of cooperating brokers with replicated data for fault tolerance. Producers receive acknowledgments upon durable write; if not received in time, they retry. Messages are persisted on disk with replication, ensuring survival of broker failures.

Consumers track processed messages and can recover after outages by resuming from prior positions. Kafka enables replay: consumers may re-read retained messages to rebuild state or reprocess data intentionally. Client-side configurations balance throughput and durability, with best practices discussed in later chapters.

Acknowledgments: Once the cluster accepts a message, it sends an acknowledgement to the service. If no acknowledgment arrives within the timeout, the service treats the send as failed and retries.

1.4.3 The commit log

Kafka organizes data as an append-only commit log, preserving the exact arrival order of messages for durability and traceability—similar to database write-ahead logs. “Commit” can mean safe storage confirmation for producers or recorded progress for consumers.

Events are immutable; incorrect data cannot be altered or deleted individually. Corrections are issued as new events, and consumers can replay the log to reconstruct current state, benefiting both recovery and intentional reprocessing within defined retention periods.

1.5 Designing and managing data flows

This section reframes Kafka’s role in microservice architectures from simple messaging to data-first integration. Events become explicit data contracts that must be modeled, validated, versioned, and governed. It introduces a trio of supporting components—Schema Registry, Kafka Connect, and stream processing frameworks—that help define message structure, replicate data at scale, and transform streams where appropriate.

Key architectural questions addressed:

What structure and guarantees do messages provide?
How do producers and consumers evolve safely as schemas change?
Can Kafka replicate data between systems reliably and at scale?
Where should transformations live: producer, consumer, or processing layer?

1.5.1 Schema Registry: handling data contracts

Kafka brokers treat messages as opaque bytes for performance and flexibility, so structure isn’t enforced by Kafka itself. Schema Registry externalizes data contracts:

Serves as a central source of truth for message schemas.
Producers register schemas and embed a schema ID in each message.
Consumers fetch the schema by ID to deserialize accurately.
Schemas are immutable; changes create new versions with compatibility checks.

Working with Schema Registry: Schemas are managed by a separate Schema Registry cluster; messages carry only a schema ID, which clients use to fetch (and cache) the writer schema.

This model enables teams to treat message definitions as durable contracts and evolve them without breaking consumers, provided compatibility rules are observed.

1.5.2 Kafka Connect: data replication without code

Instead of writing bespoke producers and consumers to sync data across services, Kafka Connect offers configuration-driven pipelines:

Runs as a separate cluster with pluggable source and sink connectors.
Moves data between Kafka and external systems (databases, warehouses, storage) at scale.
Example: stream address updates from a CustomerService database into Kafka via a JDBC source connector; deliver them to OrderService’s database via a sink connector.

Kafka Connect architecture: connectors integrate Kafka with external systems—moving data in and out.

This reduces custom code and standardizes operational patterns for ingestion and delivery.

1.5.3 Data transformation: streaming frameworks

Different consumers need differently shaped data. While Kafka Connect can do simple, stateless changes, richer logic belongs in a processing layer. Architectural options:

Producer-side branching (more upstream complexity).
Consumer-side filtering (wasteful at scale).
Dedicated processing services using stream frameworks.

Frameworks like Kafka Streams and Apache Flink enable content-based routing, filtering, joins, aggregations, windowing, and stateful processing with strong delivery guarantees.

An example of the streaming application. The RoutingService implements content-based routing. It consumes messages from Addresses and, based on their contents (e.g., address type), publishes them to ShippingAddresses or BillingAddresses.

The net result: Kafka becomes not just a transport for events, but the backbone for well-governed, evolvable data flows—where structure is explicit, movement is automated, and transformations are applied in the right place.

1.6 Impacting operations and infrastructure

Architects must design solutions that meet functional needs while remaining operable, supportable, and maintainable over time. Effective monitoring, preventive maintenance, and cost-aware operational practices are essential to avoid failures and ensure long-term sustainability. A key strategic choice is where to run Kafka—on-premise or in the cloud—balancing cost, scalability, performance, security, and compliance against project objectives.

1.6.1 Kafka tuning and maintenance

Even when day-to-day operations sit with DevOps, architects need to understand how Kafka fits into enterprise infrastructure and its implications for deployment, security, and data protection. Early requirement gathering directly impacts cost and feasibility.

Sizing and SLAs: Determine hardware needs for brokers and clients, define key metrics, and shape service level agreements.
Testing strategy: Plan functional and performance testing and prepare for disaster recovery.
Observability and troubleshooting: Decoupled systems increase debugging complexity; ensure robust monitoring, tracing, and tooling to locate data loss or bottlenecks.
Evolution and scalability: Anticipate growth and change; be ready to extend or restructure clusters as requirements evolve.

1.6.2 On-premise and cloud options

On-premise deployments remain common, offering maximum control at the cost of operational responsibility and infrastructure overhead.

On-premise: Full control over configuration and operations; requires investment in hardware, setup, physical security, monitoring, maintenance, and skilled personnel.
Managed cloud service: Simplifies provisioning and administration with provider-backed SLAs, but imposes constraints:

Fixed Kafka versioning governed by the provider.
Limited availability of some ecosystem components (e.g., commercially licensed tools such as Schema Registry) depending on provider.
Restricted low-level broker tuning; baseline performance is the provider’s responsibility.
Constrained choice of cluster management tools.

1.6.3 Solutions from other cloud providers

Competing messaging systems—such as Apache Pulsar, Amazon Kinesis, Google Pub/Sub, and Azure Event Hubs—offer different trade-offs and deployment models. Some support the Kafka protocol, enhancing interoperability. Choosing between on-premise, managed, or hybrid approaches influences enterprise capabilities and costs; the book further examines these options to guide selection.

1.7 Applying Kafka in enterprise

Kafka is not a one-size-fits-all solution. Architects should evaluate its fit per project. This section highlights two primary enterprise uses—reliable message delivery and long-lived state storage—and outlines what makes Kafka distinct.

1.7.1 Using Kafka for sending messages

In event-driven systems, services publish events that signal internal state changes.
Kafka persists events and replicates them across brokers for durability and availability.
Retention ensures subscribers that fail can reprocess events; once processed, events are typically no longer relevant to consumers.

1.7.2 Using Kafka for storing data

With infinite retention, Kafka can act as an immutable log for event sourcing.
Example: CustomerService emits address-change events forming a change log; consumers rebuild current state by replaying from the start and may materialize it in memory or a local database.
Supports real-time processing, such as joining an orders stream with an address change-log topic for enrichment.
Kafka is not a full database replacement: ad hoc queries (e.g., geospatial lookups) are inefficient because clients must read, deserialize, and compute logic outside the broker.

1.7.3 How Kafka is different

Designed as a distributed commit log: an immutable sequence enabling high throughput over large data volumes.
Disk persistence and inter-broker replication provide durability and fault tolerance.
Historical retention supports event sourcing and replay.
Part of a growing ecosystem with connectors and stream processing frameworks for real-time integration and transformations.

1.8 Field notes: Getting started with a Kafka project

This field-note conversation distills the initial planning needed to kick off a Kafka initiative for a Customer 360 use case while staying lean and budget-conscious.

Objective: Start small with clear use cases and a pragmatic, cost-aware plan.
Use cases: Prioritize the key customer events to capture and process for a holistic customer view.
Data sources: Inventory event producers (e.g., transactions, customer interactions, social mentions) to understand variety and volume.
Event modeling: Define robust, scalable event schemas to ensure compatibility across services and over time.
Performance and scalability: Plan topics, partitions, replication factors, and cluster sizing to meet current and future throughput/availability needs.
Cluster management and deployment: Establish monitoring, scaling, and maintenance practices; decide between self-managed clusters and managed Kafka services.
Security: Enforce encryption, authentication, and authorization to protect customer data end to end.
Integration: Ensure smooth data flow between Kafka and existing databases, applications, and analytics tools.
Cost estimation:
- Infrastructure: Hardware, storage, and network sized to projected volumes; include potential third-party or managed-service costs.
- Development and operations: Building producers/consumers, plus ongoing monitoring, troubleshooting, and upgrades.
- Licensing: Align required features with appropriate Kafka licensing options.

Action items:

Define and rank use cases and data sources.
Draft initial event schemas.
Estimate performance needs and cluster sizing.
Choose management model (self-managed vs managed service) and monitoring approach.
Design security controls.
Map integrations with existing systems.
Compile infrastructure, development, operations, and licensing cost estimates.

1.9 Online Resources

This section curates key references to help architects deepen their Kafka expertise, spanning official documentation, platform-specific materials, events, books, and articles on event-driven architecture and microservices.

Apache Kafka Documentation: Official guides and API references for configuration, development, and operations.
Confluent Platform Documentation: Ecosystem guidance covering platform components such as connectors, ksqlDB, and Schema Registry.
The Data Streaming Event: Community and industry event offering trends, real-world case studies, and best practices in data streaming.
Kafka: The Definitive Guide: Comprehensive book on Kafka’s architecture, core concepts, and operational know-how.
Kafka in Action: Practical patterns and hands-on examples for building Kafka-based applications.
Microservices, Apache Kafka, and Domain-Driven Design: Guidance on aligning event streams with bounded contexts and microservice design.
Building Event-Driven Microservices: Strategies and patterns for designing and implementing event-driven systems.
What is Event-Driven Architecture?: Introductory article explaining event-first thinking and its impact on system design.

Together, these resources provide a balanced path from foundational understanding to practical implementation of Kafka-centric, event-driven architectures.

1.10 Summary

There are two primary communication patterns between services: request-response and event-driven architecture. In the event-driven approach, services communicate by triggering events. The key components of the Kafka ecosystem include brokers, producers, consumers, Schema Registry, Kafka Connect, and streaming applications. Cluster metadata management is handled by either ZooKeeper or KRaft controllers. Kafka is versatile and well-suited for various industries and use cases, including real-time data processing, log aggregation, and microservices communication. Kafka components can be deployed both on-premises and in the cloud. The platform supports two main use cases: message delivery and state storage.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$55.99 $41.99

you save $14.00 (25%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$55.99 $41.99

you save $14.00 (25%)

eBook

pdf, ePub, online

$55.99 $41.99

you save $14.00 (25%)

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more