Kafka Streams in Action
Real-time apps and microservices with the Kafka Streams API
William P. Bejeck Jr.
Foreword by Neha Narkhede
  • August 2018
  • ISBN 9781617294471
  • 280 pages
  • printed in black & white

A great way to learn about Kafka Streams and how it is a key enabler of event-driven applications.

From the Foreword by Neha Narkhede, Cocreator of Apache Kafka

Kafka Streams in Action teaches you everything you need to know to implement stream processing on data flowing into your Kafka platform, allowing you to focus on getting more from your data without sacrificing time or effort.

Table of Contents detailed table of contents

preface

acknowledgments

about this book

Who should read this book

How this book is organized: a roadmap

About the code

Book forum

Other online resources

about the author

about the cover illustration

Part 1: Getting Started with Kafka Streams

1 Welcome to Kafka Streams

1.1 The big data movement, and how it changed the programming landscape

1.1.1 The genesis of big data

1.1.2 Important concepts from MapReduce

1.1.3 Batch processing is not enough

1.2 Introducing stream processing

1.2.1 When to use stream processing, and when not to use it

1.3 Handling a purchase transaction

1.3.1 Weighing the stream-processing option

1.3.2 Deconstructing the requirements into a graph

1.4 Changing perspective on a purchase transaction

1.4.1 Source node

1.4.2 Credit card masking node

1.4.3 Patterns node

1.4.4 Rewards node

1.4.5 Storage node

1.5 Kafka Streams as a graph of processing nodes

1.6 Applying Kafka Streams to the purchase transaction flow

1.6.1 Defining the source

1.6.2 The first processor: masking credit card numbers

1.6.3 The second processor: purchase patterns

1.6.4 The third processor: customer rewards

1.6.5 The fourth processor—​writing purchase records

Summary

2 Kafka quickly

2.1 The data problem

2.2 Using Kafka to handle data

2.2.1 ZMart’s original data platform

2.2.2 A Kafka sales transaction data hub

2.3 Kafka architecture

2.3.1 Kafka is a message broker

2.3.2 Kafka is a log

2.3.3 How logs work in Kafka

2.3.4 Kafka and partitions

2.3.5 Partitions group data by key

2.3.6 Writing a custom partitioner

2.3.7 Specifying a custom partitioner

2.3.8 Determining the correct number of partitions

2.3.9 The distributed log

2.3.10 ZooKeeper: leaders, followers, and replication

2.3.11 Apache ZooKeeper

2.3.12 Electing a controller

2.3.13 Replication

2.3.14 Controller responsibilities

2.3.15 Log management

2.3.16 Deleting logs

2.3.17 Compacting logs

2.4 Sending messages with producers

2.4.1 Producer properties

2.4.2 Specifying partitions and timestamps

2.4.3 Specifying a partition

2.4.4 Timestamps in Kafka

2.5 Reading messages with consumers

2.5.1 Managing offsets

2.5.2 Automatic offset commits

2.5.3 Manual offset commits

2.5.4 Creating the consumer

2.5.5 Consumers and partitions

2.5.6 Rebalancing

2.5.7 Finer-grained consumer assignment

2.5.8 Consumer example

2.6 Installing and running Kafka

2.6.1 Kafka local configuration

2.6.2 Running Kafka

2.6.3 Sending your first message

Summary

Part 2: Kafka Streams Development

3 Developing Kafka Streams

3.1 The Streams Processor API

3.2 Hello World for Kafka Streams

3.2.1 Creating the topology for the Yelling App

3.2.2 Kafka Streams configuration

3.2.3 Serde creation

3.3 Working with customer data

3.3.1 Constructing a topology

3.3.2 Creating a custom Serde

3.4 Interactive development

3.5 Next steps

3.5.1 New requirements

3.5.2 Writing records outside of Kafka

Summary

4 Streams and state

4.1 Thinking of events

4.1.1 Streams need state

4.2 Applying stateful operations to Kafka Streams

4.2.1 The transformValues processor

4.2.2 Stateful customer rewards

4.2.3 Initializing the value transformer

4.2.4 Mapping the Purchase object to a RewardAccumulator using state

4.2.5 Updating the rewards processor

4.3 Using state stores for lookups and previously seen data

4.3.1 Data locality

4.3.2 Failure recovery and fault tolerance

4.3.3 Using state stores in Kafka Streams

4.3.4 Additional key/value store suppliers

4.3.5 StateStore fault tolerance

4.3.6 Configuring changelog topics

4.4 Joining streams for added insight

4.4.1 Data setup

4.4.2 Generating keys containing customer IDs to perform joins

4.4.3 Constructing the join

4.4.4 Other join options

4.5 Timestamps in Kafka Streams

4.5.1 Provided TimestampExtractor implementations

4.5.2 WallclockTimestampExtractor

4.5.3 Custom TimestampExtractor

4.5.4 Specifying a TimestampExtractor

Summary

5 The KTable API

5.1 The relationship between streams and tables

5.1.1 The record stream

5.1.2 Updates to records or the changelog

5.1.3 Event streams vs. update streams

5.2 Record updates and KTable configuration

5.2.1 Setting cache buffering size

5.2.2 Setting the commit interval

5.3 Aggregations and windowing operations

5.3.1 Aggregating share volume by industry

5.3.2 Windowing operations

5.3.3 Joining KStreams and KTables

5.3.4 GlobalKTables

5.3.5 Queryable state

Summary

6 The Processor API

6.1 The trade-offs of higher-level abstractions vs. more control

6.2 Working with sources, processors, and sinks to create a topology

6.2.1 Adding a source node

6.2.2 Adding a processor node

6.2.3 Adding a sink node

6.3 Digging deeper into the Processor API with a stock analysis processor

6.3.1 The stock-performance processor application

6.3.2 The process() method

6.3.3 The punctuator execution

6.4 The co-group processor

6.4.1 Building the co-grouping processor

6.5 Integrating the Processor API and the Kafka Streams API

Summary

Part 3: Administering Kafka Streams

7 Monitoring and performance

7.1 Basic Kafka monitoring

7.1.1 Measuring consumer and producer performance

7.1.2 Checking for consumer lag

7.1.3 Intercepting the producer and consumer

7.2 Application metrics

7.2.1 Metrics configuration

7.2.2 How to hook into the collected metrics

7.2.3 Using JMX

7.2.4 Viewing metrics

7.3 More Kafka Streams debugging techniques

7.3.1 Viewing a representation of the application

7.3.2 Getting notification on various states of the application

7.3.3 Using the StateLstener

7.3.4 State restore listener

7.3.5 Uncaught exception handler

Summary

8 Testing a Kafka Streams application

8.1 Testing a topology

8.1.1 Building the test

8.1.2 Testing a state store in the topology

8.1.3 Testing processors and transformers

8.2 Integration testing

8.2.1 Building an integration test

Summary

Part 4: Advanced Concepts with Kafka Streams

9 Advanced applications with Kafka Streams

9.1 Integrating Kafka with other data sources

9.1.1 Using Kafka Connect to integrate data

9.1.2 Setting up Kafka Connect

9.1.3 Transforming data

9.2 Kicking your database to the curb

9.2.1 How interactive queries work

9.2.2 Distributing state stores

9.2.3 Setting up and discovering a distributed state store

9.2.4 Coding interactive queries

9.2.5 Inside the query server

9.3 KSQL

9.3.1 KSQL streams and tables

9.3.2 KSQL architecture

9.3.3 Installing and running KSQL

9.3.4 Creating a KSQL stream

9.3.5 Writing a KSQL query

9.3.6 Creating a KSQL table

9.3.7 Configuring KSQL

Summary

Appendixes

Appendix A: Additional configuration information

A.1 Limiting the number of rebalances on startup

A.2 Resilience to broker outages

A.3 Handling deserialization errors

A.4 Scaling up your application

A.5 RocksDB configuration

A.6 Creating repartitioning topics ahead of time

A.7 Configuring internal topics

A.8 Reseting your Kafka Streams application

A.9 Cleaning up local state

Appendix B: Exactly Once Semantics

index

About the Technology

Not all stream-based applications require a dedicated processing cluster. The lightweight Kafka Streams library provides exactly the power and simplicity you need for message handling in microservices and real-time event processing. With the Kafka Streams API, you filter and transform data streams with just Kafka and your application.

About the book

Kafka Streams in Action teaches you to implement stream processing within the Kafka platform. In this easy-to-follow book, you’ll explore real-world examples to collect, transform, and aggregate data, work with multiple processors, and handle real-time events. You’ll even dive into streaming SQL with KSQL! Practical to the very end, it finishes with testing and operational aspects, such as monitoring and debugging.

What's inside

  • Using the KStreams API
  • Filtering, transforming, and splitting data
  • Working with the Processor API
  • Integrating with external systems

About the reader

Assumes some experience with distributed systems. No knowledge of Kafka or streaming applications required.

About the author

Bill Bejeck is a Kafka Streams contributor and Confluent engineer with over 15 years of software development experience.


buy
Kafka Streams in Action (combo) added to cart
continue shopping
go to cart

combo $44.99 pBook + eBook + liveBook
eBook $35.99 pdf + ePub + kindle + liveBook

FREE domestic shipping on three or more pBooks

A comprehensive guide to Kafka Streams—from introduction to production!

Bojan Djurkovic, Cvent

Bridges the gap between message brokering and real-time streaming analytics.

Jim Mantheiy Jr., Next Century

Valuable both as an introduction to streams as well as an ongoing reference.

Robin Coe, TD Bank