Overview

1 Getting to know Kafka as an architect

This chapter introduces Kafka as a cornerstone of modern, event-driven architecture, tracing its evolution from a high-throughput message broker to a comprehensive streaming platform for real-time pipelines and analytics. It sets expectations for architects, engineers, and leaders to think beyond getting Kafka to run—focusing instead on why Kafka behaves as it does and how to design systems that harness its strengths. The narrative emphasizes architectural tradeoffs and big-picture choices—event modeling, schema evolution, integration strategies, and the balance among performance, ordering, and fault tolerance—rather than code or client APIs.

From an architectural lens, the chapter contrasts synchronous request-response styles with event-driven design, highlighting Kafka’s ability to decouple producers and consumers, deliver low-latency fan-out, and improve resilience through asynchronous communication. It surfaces the realities of eventual consistency, idempotency, and ordering, and explains core Kafka principles: publish-subscribe, durable storage with replication, acknowledgments and retries, reprocessing via retention and replay, and the immutable commit log. The ecosystem view covers producers, brokers, and consumers; persistent, replicated logs; and KRaft controllers that coordinate cluster metadata and availability. Together, these capabilities make Kafka well-suited for high-volume, low-latency workloads like fraud detection, recommendations, and predictive maintenance.

The chapter then pivots to data flow design and operations. It shows how Schema Registry formalizes data contracts with versioning and compatibility, Kafka Connect moves data in and out of external systems without custom code, and streaming frameworks such as Kafka Streams or Flink power real-time transformations, joins, and routing with strong processing guarantees. Operational guidance spans deployment choices (on-premises, managed cloud, or hybrid), tuning, monitoring, security, and capacity planning. Finally, it outlines when to use Kafka for reliable event delivery versus long-term retention and event sourcing, clarifies how Kafka differs from databases, and encourages architects to weigh benefits, risks, and costs while building scalable, sustainable, and data-centric systems.

Request-response design pattern
The EDA style of communication: systems communicate by publishing events that describe changes, allowing others to react asynchronously.
The key components in the Kafka ecosystem are producers, brokers, and consumers.
Structure of a Kafka cluster: brokers handle client traffic; KRaft controllers manage metadata and coordination
Publish-subscribe example: CustomerService publishes a “customer updated” event to a channel; all subscribers receive it independently.
Acknowledgments: Once the cluster accepts a message, it sends an acknowledgement to the service. If no acknowledgment arrives within the timeout, the service treats the send as failed and retries.
Working with Schema Registry: Schemas are managed by a separate Schema Registry cluster; messages carry only a schema ID, which clients use to fetch (and cache) the writer schema.
The Kafka Connect architecture: connectors integrate Kafka with external systems, moving data in and out.
An example of a streaming application. RoutingService implements content-based routing, consuming messages from Addresses and, based on their contents (e.g., address type), publishing them to ShippingAddresses or BillingAddresses.

Summary

  • There are two primary communication patterns between services: request-response and event-driven architecture.
  • In the event-driven approach, services communicate by triggering events.
  • The key components of the Kafka ecosystem include brokers, producers, consumers, Schema Registry, Kafka Connect, and streaming applications.
  • Cluster metadata management is handled by KRaft controllers.
  • Kafka is versatile and well-suited for various industries and use cases, including real-time data processing, log aggregation, and microservices communication.
  • Kafka components can be deployed both on-premises and in the cloud.
  • The platform supports two main use cases: message delivery and state storage.

FAQ

When should I choose event-driven architecture over request-response?Choose EDA when you need loose coupling, fan-out to many consumers, resilience to cascading failures, and the ability for services to operate independently and asynchronously. Expect trade-offs: you must handle eventual consistency, idempotency, out-of-order delivery, added latency in workflows, and higher operational complexity.
What role does Kafka play in an event-driven architecture?Kafka acts as a durable, low-latency event backbone. Producers publish once to a topic, many consumers subscribe and react independently, and messages are persisted and replicated so consumers can process later or replay history if needed. This decoupling boosts flexibility, scalability, and resilience.
What are the core components of the Kafka ecosystem?Producers send messages to brokers; consumers read them (pull-based). Brokers persist and replicate data for durability. Controllers (KRaft) manage cluster metadata, broker health, and failover. Kafka supports durable local storage and tiered storage to offload older data to cheaper layers.
How does Kafka ensure reliable delivery and fault tolerance?Producers receive acknowledgments and retry on failure; brokers replicate messages across the cluster; consumers track progress and can resume after outages; and retained data enables replay. Together these mechanisms minimize loss and support recovery from producer, broker, or consumer failures.
What is Kafka’s commit log and why does it matter?Kafka appends messages to an immutable, ordered log. This preserves arrival order, supports replay to rebuild state, and underpins event-sourcing patterns. Messages aren’t edited or deleted individually; corrections are new events, and retention defines how long history remains available (which can be indefinite).
Why do I need a Schema Registry with Kafka?Kafka treats messages as opaque bytes and doesn’t enforce structure. A Schema Registry provides a shared contract: it stores versioned schemas, assigns IDs embedded in messages, and enforces compatibility so producers and consumers evolve safely while keeping Kafka fast and flexible.
What is Kafka Connect and when should I use it?Kafka Connect moves data between Kafka and external systems via configurable connectors—no custom code required. Use it to stream from sources (for example, databases) into Kafka and sink from Kafka into targets (for example, data stores). It handles scale and operations; built-in transforms are stateless and simple.
Where should I perform transformations and routing of events?Options include the producer, each consumer, or a dedicated processing layer. Streaming frameworks (Kafka Streams, Apache Flink) excel for shared, reusable logic such as filtering, joins, aggregations, and content-based routing, then publishing tailored topics for downstream services.
What deployment and operational choices should architects consider?Decide between on-premises (maximum control, more responsibility and cost) and managed cloud services (simpler operations, provider SLAs, but version and tuning limits and possible ecosystem gaps). Plan for monitoring, security (encryption, authN/Z), sizing, SLAs, testing, and disaster recovery.
When should Kafka be used for messaging versus as a system of record?Use it as a messaging backbone when events are consumed and then become irrelevant, keeping history for reprocessing and recovery. Use it as a source of truth in event-sourcing by retaining change logs indefinitely and replaying to rebuild state. Kafka is not a general query engine; databases still serve ad hoc and complex query needs.

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Kafka for Architects ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Kafka for Architects ebook for free