Kafka Streams in Action
Real-time apps and microservices with the Kafka Streaming API
William P. Bejeck Jr.
  • MEAP began March 2017
  • Publication in June 2018 (estimated)
  • ISBN 9781617294471
  • 350 pages (estimated)
  • printed in black & white

This is an informative and inspiring book on Kafka Streams.

Pethuru Raj

Kafka Streams in Action teaches you everything you need to know to implement stream processing on data flowing into your Kafka platform, allowing you to focus on getting more from your data without sacrificing time or effort. Starting with a brief overview of Kafka and Stream Processing you'll discover everything you need to know to develop with Kafka Streams: from its APIs, to creating your first app. Using real-world examples based on the most common uses of distributed processing you'll learn to transform and collect data, work with multiple processors, aggregate data, and more. This book also teaches you about all important testing and integration techniques to ensure you'll never have to sacrifice functionality. By the end of the book, you'll be ready to use Kafka Streams in your projects to reap the benefits of the insight your data holds quickly and easily.

Table of Contents detailed table of contents

Part 1: Getting Started with Kafka Streams

1. Welcome to Kafka Streams

1.1. The "Big Data" movement and how it changed the programming landscape

1.1.1. The Genesis of Big Data

1.1.2. How to Implement PageRank

1.1.3. Important Concepts from Map Reduce

1.1.4. Distributing data across a cluster to achieve scale in processing

1.1.5. Using key-value pairs and partitions to group distributed data together

1.1.6. Embracing failure by using replication.

1.1.7. Batch Processing is Not Enough

1.2. Introducing Stream Processing

1.2.1. When to use Stream Processing and when not to use it

1.3. Handling a Purchase Transaction

1.3.1. Weighing the Stream Processing Option

1.3.2. The Business Requirements for ZMart's Purchase Transaction Tracking Program

1.3.3. Deconstructing the Requirements into a Graph

1.4. Changing the Perspective of a Purchase Transaction

1.4.1. Source Node

1.4.2. Credit Card Masking Node

1.4.3. Patterns Node

1.4.4. Rewards Node

1.4.5. Storage Node

1.5. Applying Kafka Streams to the Purchase Processing Graph

1.6. Kafka Streams as a Graph of Processing Nodes

1.7. Applying Kafka Streams to the Purchase Transaction Flow

1.7.1. Defining the Source

1.7.2. The First Processor - Masking Credit Card Numbers

1.7.3. The Second Processor - Purchase Patterns

1.7.4. The Third Processor - Customer Rewards

1.7.5. Fourth Processor - Writing Purchase Records

1.8. Summary

2. Kafka Quickly

2.1. Who Should Read This Chapter

2.2. The Data Problem

2.3. Using Kafka to Handle Data

2.3.1. ZMarts Original Data Platform

2.3.2. Sales Transaction Data Hub

2.3.3. Enter Kafka

2.4. Kafka Architecture

2.4.1. Kafka Is A Message Broker

2.4.2. Kafka Is A Log

2.4.3. How logs work in Kafka

2.4.4. Kafka and Partitions

2.4.5. Partitions Group Data by Key

2.4.6. Writing a Custom Partitioner

2.4.7. Specifying a Custom Partitioner

2.4.8. Determining the Correct Number of Partitions

2.4.9. The Distributed Log

2.4.10. Zookeeper - Leaders, Followers, and Replication

2.4.11. Apache Zookeeper

2.4.12. Electing A Leader

2.4.13. Controller Responsibilities

2.4.14. Replication

2.4.15. Log Management

2.4.16. Deleting Logs

2.4.17. Compacted Logs

2.5. Sending Messages with Producers

2.5.1. Producer Properties

2.5.2. Specifying Partitions and Timestamps

2.5.3. Specifying a Partition

2.5.4. Setting a Timestamp

2.6. Reading Messages with Consumers

2.6.1. Managing Offsets

2.6.2. Automatic Offset Commits

2.6.3. Manual Offset Commits

2.6.4. Creating the Consumer

2.6.5. Consumers and Partitions

2.6.6. Rebalancing

2.6.7. Finer Grained Consumer Assignment

2.6.8. Consumer Example

2.7. Installing and Running Kafka

2.7.1. Kafka Local Configuration

2.7.2. Running Kafka

2.7.3. Sending Your First Message

2.8. Conclusion

Part 2: Kafka Streams Development

3. Developing Kafka Streams

3.1. KStreams API

3.2. Kafka Stream Hello World

3.2.1. Configuration

3.2.2. Serde Creation

3.2.3. Creating the Topology

3.3. Working with Customer Data

3.3.1. Building The Application

3.3.2. Creating a Custom Serde

3.3.3. Constructing the Processors

3.4. Interactive Development

3.5. Next Steps

3.5.1. New Requirements

3.5.2. Writing Records Outside of Kafka

3.6. Summary

4. Streams and State

4.1. Thinking of Events

4.1.1. Streams Need State

4.2. Applying Stateful Operations to Kafka Streams

4.2.1. Transform Values Processor

4.2.2. Stateful Customer Rewards

4.2.3. Initialize the Value Transformer

4.2.4. Map the Purchase object to a RewardsAccumulator using state

4.2.5. Updating The Rewards Processor

4.3. Using State Stores for Lookups and Seen Data

4.3.1. Data Locality

4.3.2. Failure Recovery/Fault Tolerance

4.3.3. Using State Stores in Kafka Streams

4.3.4. Key and Value Options

4.3.5. InMemoryKeyValueStoreSupplier

4.3.6. InMemoryLRUCacheStoreSupplier

4.3.7. Persistent Stores

4.3.8. RocksDBKeyValueStoreSupplier

4.3.9. RocksDBWindowStoreSupplier

4.3.10. StateStore Fault Tolerance

4.4. Joining Streams for Added Insight

4.4.1. Data Setup

4.4.2. Generating Keys of Customer ID To Perform Joins

4.4.3. Constructing the Join

4.4.4. Other Join Options

4.5. Timestamps In Kafka Streams

4.5.1. ConsumerRecordTimestampExtractor

4.5.2. WallclockTimestampExtractor

4.5.3. Custom TimestampExtractor

4.5.4. Specifying a TimestampExtractor

4.6. Conclusion

5. The KTable API

5.1. The Relationship Between Streams and Tables

5.1.1. The Record Stream

5.1.2. Updates To Records or The Change Log

5.1.3. Event Streams vs Update Streams

5.2. Record Updates and KTable Configuration

5.2.1. Setting Cache Buffering Size

5.2.2. Setting the Commit Interval

5.3. and Windowing Operations

5.3.1. Aggregating Share Volume by Industry

5.3.2. Windowing Operations

5.3.3. Joining KStreams and KTables

5.3.4. GlobalKTables

5.3.5. Queryable State

5.4. Summary

6. The Processor API

6.1. The Tradeoffs of Higher Abstractions vs. More Control

6.2. Working with Sources, Processors, and Sinks to create a Topology

6.2.1. Adding a Source Node

6.2.2. Adding a Processor Node

6.2.3. Adding a Sink Node

6.3. Digging Deeper into the Processor API with a Stock Analysis Processor

6.3.1. Building A Custom Processor

6.3.2. The Punctuate Method

6.3.3. The Close Method

6.3.4. Putting it all together, the completed Stock Analysis Application

6.4. The CoGroup Processor

6.4.1. Building the CoGroup Processor

6.5. Integrating the Processor API and the KStream API

6.5.1. Integrating Processor API into KStream API

6.6. Conclusion

Part 3: Administering Kafka Streams

7. Monitoring and Performance

7.1. Basic Kafka Monitoring

7.1.1. Measuring Consumer and Producer performance

7.1.2. Checking For Consumer Lag

7.1.3. Intercepting the Producer and Consumer

7.2. Application Metrics

7.2.1. Metrics Configuration

7.2.2. How to Hook Into the Collected Metrics

7.2.3. Using JMX

7.2.4. Viewing Metrics

7.3. More Kafka Streams Debugging Techniques

7.3.1. Viewing a representation of the application

7.3.2. Getting notification on various states of the application

7.3.3. Using the State Listener

7.3.4. State Restore listener

7.3.5. Uncaught Exception Handler

7.4. Conclusion

8. Testing a Kafka Streaming Application

8.1. Testing a Topology

8.1.1. Building the Test

8.1.2. Testing a State Store in the Topology

8.1.3. Testing Processors and Transformers

8.2. Integration Testing

8.2.1. Building an Integration Test

8.3. Conslusion

Part 4: Advanced Concepts with Kafka Streams

9. Advanced Applications with Kafka Streams

9.1. Integrating with Other Data Sources

9.1.1. Using Kafka Connect to Integrate Data

9.1.2. Setting Up Kafka Connect

9.1.3. Transforming Data

9.2. Kicking Your Database to the Curb

9.2.1. How Interactive Queries Work

9.2.2. State Store Distribution

9.2.3. Setting Up and Discovering Distributed State Store

9.2.4. Code For Interactive Queries

9.2.5. Inside the Query Server

9.3. KSQL

9.3.1. KSQL Streams and Tables

9.3.2. KSQL Archictecture

9.3.3. Installing and Running KSQL

9.3.4. Creating a KSQL Stream

9.3.5. Writing a KSQL Query

9.3.6. Creating A KSQL Table

9.3.7. Configuring KSQL

9.4. Conclusion


Appendix A: Additional Configuration Information

A.1. Limiting the Number of Rebalances on Startup

A.2. Reslience To Broker Outages

A.3. Handling Deserialization Errors

A.4. Scaling Up Your Application

A.5. RocksDB Configuration

A.6. Maybe Create Repartitioning Topics Ahead of Time

A.7. Internal Topic Configuration

A.8. Reseting Your Streams Application

A.9. Cleaning Up Local State

Appendix B: Exactly Once Semantics

About the Technology

Kafka Streams is a library designed to allow for easy stream processing of data flowing into your Kafka cluster. Stream processing has become one of the biggest needs for companies over the last few years as quick data insight becomes more and more important but current solutions can be complex and large, requiring additional tools to perform lookups and aggregations. Kafka Streams is the solution to these issues - small and lightweight enough that it doesn't need a dedicated cluster to perform, but powerful enough to be highly fault tolerant, scalable, and easy to use. Kafka Streams is 100% compatible with Kafka. Since it's an application and doesn't need a separate cluster of machines to deploy, it's easy to integrate into current apps, allowing you to bring in these benefits without issue. With its lightweight design not sacrificing any of the power or capabilities you need Kafka Streams is a great alternative when it comes to real-time data processing.

What's inside

  • Developing with Kafka Streams
  • Using the KStreams API
  • Filtering, transforming, and splitting data into multiple streams
  • Working with The KTable API
  • Working with the Processor API
  • Test and debugging
  • Integration with external systems

About the reader

This book is suitable for all Java (or JVM language) developers looking to discover the world of stream processing and its many benefits. Knowledge of Kafka is not required, but would be a bonus.

About the author

Bill Bejeck is a Kafka Streams contributor with over 13 years of software development experience. With 6 years working exclusively on the back-end and large data volumes, Bill currently uses Kafka to improve data flow to downstream customers.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
Kafka Streams in Action (combo) added to cart
continue shopping
go to cart

MEAP combo $44.99 pBook + eBook + liveBook
Kafka Streams in Action (eBook) added to cart
continue shopping
go to cart

MEAP eBook $35.99 pdf + ePub + kindle + liveBook

FREE domestic shipping on three or more pBooks

The book presents great insights on the problem and explains the concepts clearly.

Michele Adduci

It is a well written book, suitable for anyone who wants to write stream processing applications.

Laszlo Hegedus