Kafka Streams in Action
William P. Bejeck Jr.
  • MEAP began March 2017
  • Publication in Early 2018 (estimated)
  • ISBN 9781617294471
  • 350 pages (estimated)
  • printed in black & white

Kafka Streams in Action teaches you everything you need to know to implement stream processing on data flowing into your Kafka platform, allowing you to focus on getting more from your data without sacrificing time or effort. Starting with a brief overview of Kafka and Stream Processing you'll discover everything you need to know to develop with Kafka Streams: from its APIs, to creating your first app. Using real-world examples based on the most common uses of distributed processing you'll learn to transform and collect data, work with multiple processors, aggregate data, and more. This book also teaches you about all important testing and integration techniques to ensure you'll never have to sacrifice functionality. By the end of the book, you'll be ready to use Kafka Streams in your projects to reap the benefits of the insight your data holds quickly and easily.

"This is an informative and inspiring book on Kafka Streams."

~ Pethuru Raj

"The book presents great insights on the problem and explains the concepts clearly."

~ Michele Adduci

"It is a well written book, suitable for anyone who wants to write stream processing applications."

~ Laszlo Hegedus

Table of Contents detailed table of contents

Part 1: Getting Started with Kafka Streams

1. Welcome to Kafka Streams

1.1. The "Big Data" movement and how it changed the programming landscape

1.1.1. The Genesis of Big Data

1.1.2. How to Implement PageRank

1.1.3. Important Concepts from Map Reduce

1.1.4. Distributing data across a cluster to achieve scale in processing

1.1.5. Using key-value pairs and partitions to group distributed data together

1.1.6. Embracing failure by using replication.

1.1.7. Batch Processing is Not Enough

1.2. Introducing Stream Processing

1.2.1. When to use Stream Processing and when not to use it

1.3. Handling a Purchase Transaction

1.3.1. Weighing the Stream Processing Option

1.3.2. The Business Requirements for ZMart's Purchase Transaction Tracking Program

1.3.3. Deconstructing the Requirements into a Graph

1.4. Changing the Perspective of a Purchase Transaction

1.4.1. Source Node

1.4.2. Credit Card Masking Node

1.4.3. Patterns Node

1.4.4. Rewards Node

1.4.5. Storage Node

1.5. Applying Kafka Streams to the Purchase Processing Graph

1.6. Kafka Streams as a Graph of Processing Nodes

1.7. Applying Kafka Streams to the Purchase Transaction Flow

1.7.1. Defining the Source

1.7.2. The First Processor - Masking Credit Card Numbers

1.7.3. The Second Processor - Purchase Patterns

1.7.4. The Third Processor - Customer Rewards

1.7.5. Fourth Processor - Writing Purchase Records

1.8. Summary

2. Kafka Quickly

2.1. Who Should Read This Chapter

2.2. The Data Problem

2.3. Using Kafka to Handle Data

2.3.1. ZMarts Original Data Platform

2.3.2. Sales Transaction Data Hub

2.3.3. Enter Kafka

2.4. Kafka Architecture

2.4.1. Kafka Is A Message Broker

2.4.2. Kafka Is A Log

2.4.3. How logs work in Kafka

2.4.4. Kafka and Partitions

2.4.5. Partitions Group Data by Key

2.4.6. Writing a Custom Partitioner

2.4.7. Specifying a Custom Partitioner

2.4.8. Determining the Correct Number of Partitions

2.4.9. The Distributed Log

2.4.10. Zookeeper - Leaders, Followers, and Replication

2.4.11. Apache Zookeeper

2.4.12. Electing A Leader

2.4.13. Controller Responsibilities

2.4.14. Replication

2.4.15. Log Management

2.4.16. Deleting Logs

2.4.17. Compacted Logs

2.5. Sending Messages with Producers

2.5.1. Producer Properties

2.5.2. Specifying Partitions and Timestamps

2.5.3. Specifying a Partition

2.5.4. Setting a Timestamp

2.6. Reading Messages with Consumers

2.6.1. Managing Offsets

2.6.2. Automatic Offset Commits

2.6.3. Manual Offset Commits

2.6.4. Creating the Consumer

2.6.5. Consumers and Partitions

2.6.6. Rebalancing

2.6.7. Finer Grained Consumer Assignment

2.6.8. Consumer Example

2.7. Installing and Running Kafka

2.7.1. Kafka Local Configuration

2.7.2. Running Kafka

2.7.3. Sending Your First Message

2.8. Conclusion

Part 2: Kafka Streams Development

3. Developing Kafka Streams

3.1. KStreams API

3.2. Kafka Stream Hello World

3.2.1. Configuration

3.2.2. Serde Creation

3.2.3. Creating the Topology

3.3. Working with Customer Data

3.3.1. Building The Application

3.3.2. Creating a Custom Serde

3.3.3. Constructing the Processors

3.4. Interactive Development

3.5. Next Steps

3.5.1. New Requirements

3.5.2. Writing Records Outside of Kafka

3.6. Summary

4. Streams and State

4.1. Thinking of Events

4.1.1. Streams Need State

4.2. Applying Stateful Operations to Kafka Streams

4.2.1. Transform Values Processor

4.2.2. Stateful Customer Rewards

4.2.3. Initialize the Value Transformer

4.2.4. Map the Purchase object to a RewardsAccumulator using state

4.2.5. Using a StreamPartitioner

4.2.6. Updating The Rewards Processor

4.3. Using State Stores for Lookups and Seen Data

4.3.1. Data Locality

4.3.2. Failure Recovery/Fault Tolerance

4.3.3. Using State Stores in Kafka Streams

4.3.4. Key and Value Options

4.3.5. InMemoryKeyValueStoreSupplier

4.3.6. InMemoryLRUCacheStoreSupplier

4.3.7. Persistent Stores

4.3.8. RocksDBKeyValueStoreSupplier

4.3.9. RocksDBWindowStoreSupplier

4.3.10. StateStore Fault Tolerance

4.4. Joining Streams for Added Insight

4.4.1. Data Setup

4.4.2. Generating Keys of Customer ID To Perform Joins

4.4.3. Keys and Partitions

4.4.4. Null Keys

4.4.5. Keys of Different Types

4.4.6. Repartitioning Streams

4.4.7. Constructing the Join

4.4.8. Other Join Options

4.5. How Time and Timestamps Drive Kafka Streams

4.5.1. Timestamps In Kafka

4.5.2. Timestamp Useage

4.5.3. Timestamps in Kafka Streams

4.5.4. ConsumerRecordTimestampExtractor

4.5.5. WallclockTimestampExtractor

4.5.6. Custom TimestampExtractor

4.5.7. Specifying a TimestampExtractor

4.6. Conclusion

5. The KTable API

5.1. The Relationship Between Streams and Tables

5.1.1. The Record Stream

5.1.2. Updates To Records or The Change Log

5.1.3. Event Streams vs Update Streams

5.2. Record Updates and KTable Configuration

5.2.1. Setting Cache Buffering Size

5.2.2. Setting the Commit Interval

5.3. and Windowing Operations

5.3.1. Aggregating Share Volume by Industry

5.3.2. Windowing Operations

5.3.3. Joining KStreams and KTables

5.3.4. GlobalKTables

5.3.5. Queryable State

5.4. Summary

6. When You Need More Control - Working with the Processor API

Part 3: Administering Kafka Streams

7. Configuration and Performance

8. Testing

Part 4: Advanced Concepts with Kafka Streams

9. Advanced Applications with Kafka Streams


Appendix A: Resources

Appendix B: Web Application for Viewing All Data

About the Technology

Kafka Streams is a library designed to allow for easy stream processing of data flowing into your Kafka cluster. Stream processing has become one of the biggest needs for companies over the last few years as quick data insight becomes more and more important but current solutions can be complex and large, requiring additional tools to perform lookups and aggregations. Kafka Streams is the solution to these issues - small and lightweight enough that it doesn't need a dedicated cluster to perform, but powerful enough to be highly fault tolerant, scalable, and easy to use. Kafka Streams is 100% compatible with Kafka. Since it's an application and doesn't need a separate cluster of machines to deploy, it's easy to integrate into current apps, allowing you to bring in these benefits without issue. With its lightweight design not sacrificing any of the power or capabilities you need Kafka Streams is a great alternative when it comes to real-time data processing.

What's inside

  • Developing with Kafka Streams
  • Using the KStreams API
  • Filtering, transforming, and splitting data into multiple streams
  • Working with The KTable API
  • Working with the Processor API
  • Test and debugging
  • Integration with external systems

About the reader

This book is suitable for all Java (or JVM language) developers looking to discover the world of stream processing and its many benefits. Knowledge of Kafka is not required, but would be a bonus.

About the author

Bill Bejeck is a Kafka Streams contributor with over 13 years of software development experience. With 6 years working exclusively on the back-end and large data volumes, Bill currently uses Kafka to improve data flow to downstream customers.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
MEAP combo $44.99 pBook + eBook
MEAP eBook $35.99 pdf + ePub + kindle

FREE domestic shipping on three or more pBooks