Kafka Streams in Action
Real-time apps and microservices with the Kafka Streams API
William P. Bejeck Jr.
Foreword by Neha Narkhede
  • August 2018
  • ISBN 9781617294471
  • 280 pages
  • printed in black & white

A great way to learn about Kafka Streams and how it is a key enabler of event-driven applications.

From the Foreword by Neha Narkhede, Cocreator of Apache Kafka

Kafka Streams in Action teaches you everything you need to know to implement stream processing on data flowing into your Kafka platform, allowing you to focus on getting more from your data without sacrificing time or effort.

About the Technology

Not all stream-based applications require a dedicated processing cluster. The lightweight Kafka Streams library provides exactly the power and simplicity you need for message handling in microservices and real-time event processing. With the Kafka Streams API, you filter and transform data streams with just Kafka and your application.

About the book

Kafka Streams in Action teaches you to implement stream processing within the Kafka platform. In this easy-to-follow book, you’ll explore real-world examples to collect, transform, and aggregate data, work with multiple processors, and handle real-time events. You’ll even dive into streaming SQL with KSQL! Practical to the very end, it finishes with testing and operational aspects, such as monitoring and debugging.

Table of Contents detailed table of contents



about this book

Who should read this book

How this book is organized: a roadmap

About the code

Book forum

Other online resources

about the author

about the cover illustration

Part 1: Getting Started with Kafka Streams

1 Welcome to Kafka Streams

1.1 The big data movement, and how it changed the programming landscape

1.1.1 The genesis of big data

1.1.2 Important concepts from MapReduce

1.1.3 Batch processing is not enough

1.2 Introducing stream processing

1.2.1 When to use stream processing, and when not to use it

1.3 Handling a purchase transaction

1.3.1 Weighing the stream-processing option

1.3.2 Deconstructing the requirements into a graph

1.4 Changing perspective on a purchase transaction

1.4.1 Source node

1.4.2 Credit card masking node

1.4.3 Patterns node

1.4.4 Rewards node

1.4.5 Storage node

1.5 Kafka Streams as a graph of processing nodes

1.6 Applying Kafka Streams to the purchase transaction flow

1.6.1 Defining the source

1.6.2 The first processor: masking credit card numbers

1.6.3 The second processor: purchase patterns

1.6.4 The third processor: customer rewards

1.6.5 The fourth processor—​writing purchase records


2 Kafka quickly

2.1 The data problem

2.2 Using Kafka to handle data

2.2.1 ZMart’s original data platform

2.2.2 A Kafka sales transaction data hub

2.3 Kafka architecture

2.3.1 Kafka is a message broker

2.3.2 Kafka is a log

2.3.3 How logs work in Kafka

2.3.4 Kafka and partitions

2.3.5 Partitions group data by key

2.3.6 Writing a custom partitioner

2.3.7 Specifying a custom partitioner

2.3.8 Determining the correct number of partitions

2.3.9 The distributed log

2.3.10 ZooKeeper: leaders, followers, and replication

2.3.11 Apache ZooKeeper

2.3.12 Electing a controller

2.3.13 Replication

2.3.14 Controller responsibilities

2.3.15 Log management

2.3.16 Deleting logs

2.3.17 Compacting logs

2.4 Sending messages with producers

2.4.1 Producer properties

2.4.2 Specifying partitions and timestamps

2.4.3 Specifying a partition

2.4.4 Timestamps in Kafka

2.5 Reading messages with consumers

2.5.1 Managing offsets

2.5.2 Automatic offset commits

2.5.3 Manual offset commits

2.5.4 Creating the consumer

2.5.5 Consumers and partitions

2.5.6 Rebalancing

2.5.7 Finer-grained consumer assignment

2.5.8 Consumer example

2.6 Installing and running Kafka

2.6.1 Kafka local configuration

2.6.2 Running Kafka

2.6.3 Sending your first message


Part 2: Kafka Streams Development

3 Developing Kafka Streams

3.1 The Streams Processor API

3.2 Hello World for Kafka Streams

3.2.1 Creating the topology for the Yelling App

3.2.2 Kafka Streams configuration

3.2.3 Serde creation

3.3 Working with customer data

3.3.1 Constructing a topology

3.3.2 Creating a custom Serde

3.4 Interactive development

3.5 Next steps

3.5.1 New requirements

3.5.2 Writing records outside of Kafka


4 Streams and state

4.1 Thinking of events

4.1.1 Streams need state

4.2 Applying stateful operations to Kafka Streams

4.2.1 The transformValues processor

4.2.2 Stateful customer rewards

4.2.3 Initializing the value transformer

4.2.4 Mapping the Purchase object to a RewardAccumulator using state

4.2.5 Updating the rewards processor

4.3 Using state stores for lookups and previously seen data

4.3.1 Data locality

4.3.2 Failure recovery and fault tolerance

4.3.3 Using state stores in Kafka Streams

4.3.4 Additional key/value store suppliers

4.3.5 StateStore fault tolerance

4.3.6 Configuring changelog topics

4.4 Joining streams for added insight

4.4.1 Data setup

4.4.2 Generating keys containing customer IDs to perform joins

4.4.3 Constructing the join

4.4.4 Other join options

4.5 Timestamps in Kafka Streams

4.5.1 Provided TimestampExtractor implementations

4.5.2 WallclockTimestampExtractor

4.5.3 Custom TimestampExtractor

4.5.4 Specifying a TimestampExtractor


5 The KTable API

5.1 The relationship between streams and tables

5.1.1 The record stream

5.1.2 Updates to records or the changelog

5.1.3 Event streams vs. update streams

5.2 Record updates and KTable configuration

5.2.1 Setting cache buffering size

5.2.2 Setting the commit interval

5.3 Aggregations and windowing operations

5.3.1 Aggregating share volume by industry

5.3.2 Windowing operations

5.3.3 Joining KStreams and KTables

5.3.4 GlobalKTables

5.3.5 Queryable state


6 The Processor API

6.1 The trade-offs of higher-level abstractions vs. more control

6.2 Working with sources, processors, and sinks to create a topology

6.2.1 Adding a source node

6.2.2 Adding a processor node

6.2.3 Adding a sink node

6.3 Digging deeper into the Processor API with a stock analysis processor

6.3.1 The stock-performance processor application

6.3.2 The process() method

6.3.3 The punctuator execution

6.4 The co-group processor

6.4.1 Building the co-grouping processor

6.5 Integrating the Processor API and the Kafka Streams API


Part 3: Administering Kafka Streams

7 Monitoring and performance

7.1 Basic Kafka monitoring

7.1.1 Measuring consumer and producer performance

7.1.2 Checking for consumer lag

7.1.3 Intercepting the producer and consumer

7.2 Application metrics

7.2.1 Metrics configuration

7.2.2 How to hook into the collected metrics

7.2.3 Using JMX

7.2.4 Viewing metrics

7.3 More Kafka Streams debugging techniques

7.3.1 Viewing a representation of the application

7.3.2 Getting notification on various states of the application

7.3.3 Using the StateLstener

7.3.4 State restore listener

7.3.5 Uncaught exception handler


8 Testing a Kafka Streams application

8.1 Testing a topology

8.1.1 Building the test

8.1.2 Testing a state store in the topology

8.1.3 Testing processors and transformers

8.2 Integration testing

8.2.1 Building an integration test


Part 4: Advanced Concepts with Kafka Streams

9 Advanced applications with Kafka Streams

9.1 Integrating Kafka with other data sources

9.1.1 Using Kafka Connect to integrate data

9.1.2 Setting up Kafka Connect

9.1.3 Transforming data

9.2 Kicking your database to the curb

9.2.1 How interactive queries work

9.2.2 Distributing state stores

9.2.3 Setting up and discovering a distributed state store

9.2.4 Coding interactive queries

9.2.5 Inside the query server

9.3 KSQL

9.3.1 KSQL streams and tables

9.3.2 KSQL architecture

9.3.3 Installing and running KSQL

9.3.4 Creating a KSQL stream

9.3.5 Writing a KSQL query

9.3.6 Creating a KSQL table

9.3.7 Configuring KSQL



Appendix A: Additional configuration information

A.1 Limiting the number of rebalances on startup

A.2 Resilience to broker outages

A.3 Handling deserialization errors

A.4 Scaling up your application

A.5 RocksDB configuration

A.6 Creating repartitioning topics ahead of time

A.7 Configuring internal topics

A.8 Reseting your Kafka Streams application

A.9 Cleaning up local state

Appendix B: Exactly Once Semantics


What's inside

  • Using the KStreams API
  • Filtering, transforming, and splitting data
  • Working with the Processor API
  • Integrating with external systems

About the reader

Assumes some experience with distributed systems. No knowledge of Kafka or streaming applications required.

About the author

Bill Bejeck is a Kafka Streams contributor and Confluent engineer with over 15 years of software development experience.

placing your order...

Don't refresh or navigate away from the page.
print book $44.99 pBook + eBook + liveBook
Additional shipping charges may apply
Kafka Streams in Action (print book) added to cart
continue shopping
go to cart

eBook $35.99 3 formats + liveBook
Kafka Streams in Action (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.
customers also reading

This book 1-hop 2-hops 3-hops

FREE domestic shipping on three or more pBooks