Kafka Streams in Action
Real-time apps and microservices with the Kafka Streaming API
William P. Bejeck Jr.
  • MEAP began March 2017
  • Publication in August 2018 (estimated)
  • ISBN 9781617294471
  • 350 pages (estimated)
  • printed in black & white

This is an informative and inspiring book on Kafka Streams.

Pethuru Raj

Kafka Streams in Action teaches you everything you need to know to implement stream processing on data flowing into your Kafka platform, allowing you to focus on getting more from your data without sacrificing time or effort. Starting with a brief overview of Kafka and Stream Processing you'll discover everything you need to know to develop with Kafka Streams: from its APIs, to creating your first app. Using real-world examples based on the most common uses of distributed processing you'll learn to transform and collect data, work with multiple processors, aggregate data, and more. This book also teaches you about all important testing and integration techniques to ensure you'll never have to sacrifice functionality. By the end of the book, you'll be ready to use Kafka Streams in your projects to reap the benefits of the insight your data holds quickly and easily.

Table of Contents detailed table of contents

Part 1: Getting Started with Kafka Streams

1 Welcome to Kafka Streams

1.1 The big data movement, and how it changed the programming landscape

1.1.1 The genesis of big data

1.1.2 Important concepts from MapReduce

1.1.3 Batch processing is not enough

1.2 Introducing stream processing

1.2.1 When to use stream processing, and when not to use it

1.3 Handling a purchase transaction

1.3.1 Weighing the stream-processing option

1.3.2 Deconstructing the requirements into a graph

1.4 Changing perspective on a purchase transaction

1.4.1 Source node

1.4.2 Credit card masking node

1.4.3 Patterns node

1.4.4 Rewards node

1.4.5 Storage node

1.5 Kafka Streams as a graph of processing nodes

1.6 Applying Kafka Streams to the purchase transaction flow

1.6.1 Defining the source

1.6.2 The first processor: masking credit card numbers

1.6.3 The second processor: purchase patterns

1.6.4 The third processor: customer rewards

1.6.5 The fourth processor�writing purchase records

1.7 Summary

2 Kafka quickly

2.1 The data problem

2.2 Using Kafka to handle data

2.2.1 ZMart’s original data platform

2.2.2 A Kafka sales transaction data hub

2.3 Kafka architecture

2.3.1 Kafka is a message broker

2.3.2 Kafka is a log

2.3.3 How logs work in Kafka

2.3.4 Kafka and partitions

2.3.5 Partitions group data by key

2.3.6 Writing a custom partitioner

2.3.7 Specifying a custom partitioner

2.3.8 Determining the correct number of partitions

2.3.9 The distributed log

2.3.10 ZooKeeper: leaders, followers, and replication

2.3.11 Apache ZooKeeper

2.3.12 Electing a controller

2.3.13 Replication

2.3.14 Controller responsibilities

2.3.15 Log management

2.3.16 Deleting logs

2.3.17 Compacting logs

2.4 Sending messages with producers

2.4.1 Producer properties

2.4.2 Specifying partitions and timestamps

2.4.3 Specifying a partition

2.4.4 Timestamps in Kafka

2.5 Reading messages with consumers

2.5.1 Managing offsets

2.5.2 Automatic offset commits

2.5.3 Manual offset commits

2.5.4 Creating the consumer

2.5.5 Consumers and partitions

2.5.6 Rebalancing

2.5.7 Finer-grained consumer assignment

2.5.8 Consumer example

2.6 Installing and running Kafka

2.6.1 Kafka local configuration

2.6.2 Running Kafka

2.6.3 Sending your first message

2.7 Summary

Part 2: Kafka Streams Development

3 Developing Kafka Streams

3.1 The Streams Processor API

3.2 Hello World for Kafka Streams

3.2.1 Creating the topology for the Yelling App

3.2.2 Kafka Streams configuration

3.2.3 Serde creation

3.3 Working with customer data

3.3.1 Constructing a topology

3.3.2 Creating a custom Serde

3.4 Interactive development

3.5 Next steps

3.5.1 New requirements

3.5.2 Writing records outside of Kafka

3.6 Summary

4 Streams and state

4.1 Thinking of events

4.1.1 Streams need state

4.2 Applying stateful operations to Kafka Streams

4.2.1 The transformValues processor

4.2.2 Stateful customer rewards

4.2.3 Initializing the value transformer

4.2.4 Mapping the Purchase object to a RewardAccumulator using state

4.2.5 Updating the rewards processor

4.3 Using state stores for lookups and previously seen data

4.3.1 Data locality

4.3.2 Failure recovery and fault tolerance

4.3.3 Using state stores in Kafka Streams

4.3.4 Additional key/value store suppliers

4.3.5 StateStore fault tolerance

4.3.6 Configuring changelog topics

4.4 Joining streams for added insight

4.4.1 Data setup

4.4.2 Generating keys containing customer IDs to perform joins

4.4.3 Constructing the join

4.4.4 Other join options

4.5 Timestamps in Kafka Streams

4.5.1 Provided TimestampExtractor implementations

4.5.2 WallclockTimestampExtractor

4.5.3 Custom TimestampExtractor

4.5.4 Specifying a TimestampExtractor

4.6 Summary

5 The KTable API

5.1 The relationship between streams and tables

5.1.1 The record stream

5.1.2 Updates to records or the changelog

5.1.3 Event streams vs. update streams

5.2 Record updates and KTable configuration

5.2.1 Setting cache buffering size

5.2.2 Setting the commit interval

5.3 Aggregations and windowing operations

5.3.1 Aggregating share volume by industry

5.3.2 Windowing operations

5.3.3 Joining KStreams and KTables

5.3.4 GlobalKTables

5.3.5 Queryable state

5.4 Summary

6 The Processor API

6.1 The trade-offs of higher-level abstractions vs. more control

6.2 Working with sources, processors, and sinks to create a topology

6.2.1 Adding a source node

6.2.2 Adding a processor node

6.2.3 Adding a sink node

6.3 Digging deeper into the Processor API with a stock analysis processor

6.3.1 The stock-performance processor application

6.3.2 The process() method

6.3.3 The punctuator execution

6.4 The co-group processor

6.4.1 Building the co-grouping processor

6.5 Integrating the Processor API and the Kafka Streams API

6.6 Summary

Part 3: Administering Kafka Streams

7 Monitoring and performance

7.1 Basic Kafka monitoring

7.1.1 Measuring consumer and producer performance

7.1.2 Checking for consumer lag

7.1.3 Intercepting the producer and consumer

7.2 Application metrics

7.2.1 Metrics configuration

7.2.2 How to hook into the collected metrics

7.2.3 Using JMX

7.2.4 Viewing metrics

7.3 More Kafka Streams debugging techniques

7.3.1 Viewing a representation of the application

7.3.2 Getting notification on various states of the application

7.3.3 Using the StateLstener

7.3.4 State restore listener

7.3.5 Uncaught exception handler

7.4 Summary

8 Testing a Kafka Streams application

8.1 Testing a topology

8.1.1 Building the test

8.1.2 Testing a state store in the topology

8.1.3 Testing processors and transformers

8.2 Integration testing

8.2.1 Building an integration test

8.3 Summary

Part 4: Advanced Concepts with Kafka Streams

9 Advanced applications with Kafka Streams

9.1 Integrating Kafka with other data sources

9.1.1 Using Kafka Connect to integrate data

9.1.2 Setting up Kafka Connect

9.1.3 Transforming data

9.2 Kicking your database to the curb

9.2.1 How interactive queries work

9.2.2 Distributing state stores

9.2.3 Setting up and discovering a distributed state store

9.2.4 Coding interactive queries

9.2.5 Inside the query server

9.3 KSQL

9.3.1 KSQL streams and tables

9.3.2 KSQL architecture

9.3.3 Installing and running KSQL

9.3.4 Creating a KSQL stream

9.3.5 Writing a KSQL query

9.3.6 Creating a KSQL table

9.3.7 Configuring KSQL

9.4 Summary

Appendixes

Appendix A: Additional configuration information

A.1 Limiting the number of rebalances on startup

A.2 Resilience to broker outages

A.3 Handling deserialization errors

A.4 Scaling up your application

A.5 RocksDB configuration

A.6 Creating repartitioning topics ahead of time

A.7 Configuring internal topics

A.8 Reseting your Kafka Streams application

A.9 Cleaning up local state

Appendix B: Exactly Once Semantics

About the Technology

Kafka Streams is a library designed to allow for easy stream processing of data flowing into your Kafka cluster. Stream processing has become one of the biggest needs for companies over the last few years as quick data insight becomes more and more important but current solutions can be complex and large, requiring additional tools to perform lookups and aggregations. Kafka Streams is the solution to these issues - small and lightweight enough that it doesn't need a dedicated cluster to perform, but powerful enough to be highly fault tolerant, scalable, and easy to use. Kafka Streams is 100% compatible with Kafka. Since it's an application and doesn't need a separate cluster of machines to deploy, it's easy to integrate into current apps, allowing you to bring in these benefits without issue. With its lightweight design not sacrificing any of the power or capabilities you need Kafka Streams is a great alternative when it comes to real-time data processing.

What's inside

  • Developing with Kafka Streams
  • Using the KStreams API
  • Filtering, transforming, and splitting data into multiple streams
  • Working with The KTable API
  • Working with the Processor API
  • Test and debugging
  • Integration with external systems

About the reader

This book is suitable for all Java (or JVM language) developers looking to discover the world of stream processing and its many benefits. Knowledge of Kafka is not required, but would be a bonus.

About the author

Bill Bejeck is a Kafka Streams contributor with over 13 years of software development experience. With 6 years working exclusively on the back-end and large data volumes, Bill currently uses Kafka to improve data flow to downstream customers.


Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
buy
MEAP combo $44.99 pBook + eBook + liveBook
Kafka Streams in Action (eBook) added to cart
continue shopping
go to cart

MEAP eBook $35.99 pdf + ePub + kindle + liveBook

FREE domestic shipping on three or more pBooks

The book presents great insights on the problem and explains the concepts clearly.

Michele Adduci

It is a well written book, suitable for anyone who wants to write stream processing applications.

Laszlo Hegedus