Kafka in Action
Dylan Scott
  • MEAP began November 2017
  • Publication in Spring 2021 (estimated)
  • ISBN 9781617295232
  • 375 pages (estimated)
  • printed in black & white

For someone looking to increase their depth of knowledge with Kafka, this sets the bar.

Joshua Horwitz
In systems that handle big data, streaming data, or fast data, it's important to get your data pipelines right. Apache Kafka is a wicked-fast distributed streaming platform that operates as more than just a persistent log or a flexible message queue. With Kafka, you can build the powerful real-time data processing pipelines required by modern distributed systems. Kafka in Action is a fast-paced introduction to every aspect of working with Kafka you need to really reap its benefits.

About the Technology

Apache Kafka is a distributed streaming platform for logging and streaming data between services or applications. With Kafka, it's easy to build applications that can act on or react to data streams as they flow through your system. Operational data monitoring, large scale message processing, website activity tracking, log aggregation, and more are all possible with Kafka. Open-source, easily scalable, durable when demand gets heavy, and fast - Kafka is perfect for developers who need total control of the data flowing into and through their applications. The demand for Kafka developers is at an all-time high, as companies like LinkedIn, The New York Times, and Netflix, are relying on Kafka where fast data is essential.

About the book

Kafka in Action is a practical, hands-on guide to building Kafka-based data pipelines. Filled with real-world use cases and scenarios, this book probes Kafka's most common use cases, ranging from simple logging through managing streaming data systems for message routing, analytics, and more. Starting with an overview of Kafka's core concepts, you'll immediately learn how to set up and execute basic data movement tasks and how to record and consume streaming data. As you move through the examples in this book, you'll learn the skills you need to work in a Kafka focused team with the ability to handle both developer and admin based tasks. At the end of this book, you'll be more than ready to dig into even more advanced Kafka topics on your own, and happily able to use Kafka in your day-to-day workflow.
Table of Contents detailed table of contents

Part 1: Getting Started

1 Introduction to Kafka

1.1 What is Kafka?

1.2 Why the need for Kafka?

1.2.1 Why Kafka for the Developer

1.2.2 Explaining Kafka to your manager

1.3 Kafka Myths

1.3.1 Kafka only works with Hadoop

1.3.2 Kafka is the same as other message brokers

1.4 Real World Use Cases

1.4.1 Messaging

1.4.2 Website Activity Tracking

1.4.3 Log Aggregation

1.4.4 Stream Processing

1.4.5 Internet of Things

1.4.6 When Kafka might not be the right fit

1.5 Online resources to get started

1.6 Summary

2 Getting to know Kafka

2.1 Kafka’s Hello World: Sending and retrieving our first message

2.2 A quick tour of Kafka

2.2.1 The what and why of ZooKeeper

2.2.2 Kafka’s high-level architecture

2.2.3 The Commit Log

2.3 Various source code packages and what they do

2.3.1 Kafka Stream Package

2.3.2 Connect Package

2.3.3 AdminClient Package

2.3.4 KSQL

2.4 What sort of clients can I use for my own language of choice?

2.5 Terminology of Kafka

2.5.1 What is a streaming process

2.5.2 What exactly once means in our context

2.6 Summary

Part 2: Applying Kafka

3 Designing a Kafka project

3.1 Designing a Kafka project

3.1.1 Taking over an existing data architecture

3.1.2 Kafka Connect

3.1.3 Connect Features

3.1.4 When to use Connect vs Kafka Clients

3.2 Sensor Event Design

3.2.1 Existing issues

3.2.2 Why Kafka is a correct fit

3.2.3 Thought starters on our design

3.2.4 User data requirements

3.2.5 High-Level Plan for applying our questions

3.2.6 Reviewing our blueprint

3.3 Data Format

3.3.1 Why Schemas

3.3.2 Why Avro

3.4 Summary

4 Producers: sourcing data

4.1 Introducing the Producer

4.1.1 Key Producer Write Path

4.2 Important Configuration

4.2.1 Producer Configuration

4.2.2 Configuring the Broker list

4.2.3 How to go Fast (or Safer)

4.2.4 Timestamps

4.2.5 Adding compression to our messages

4.2.6 Custom Serializer

4.2.7 Creating custom partition code

4.2.8 Producer Interceptor

4.3 Generating data for our requirements

4.3.1 Client and Broker Versions

4.4 Summary

5 Consumers: unlocking data

5.1 Introducing the Consumer

5.2 Important Configuration

5.2.1 Understanding Tracking Offsets

5.3 Consumer Groups

5.4 The Need for Offsets

5.4.1 GroupCoordinator

5.4.2 ConsumerRebalanceListener

5.4.3 Partition Assignment Strategy

5.4.4 Standalone Consumer

5.4.5 Manual Partition Assignment

5.5 Auto or Manual Commit of Offsets

5.6 Reading From a Compacted Topic

5.7 Reading for a Specific Offset

5.7.1 Start at the beginning

5.7.2 Going to the end

5.7.3 Seek to an Offset

5.7.4 Offsets For Times

5.8 Reading Concerns

5.8.1 Broker use of Consumers

5.9 Summary

6 Brokers

6.1 Introducing the Broker

6.2 Why Kafka needs Zookeeper

6.3 What does it mean to be a message broker

6.4 Configuration at the Broker Level

6.4.1 Kafka’s Core: The Log

6.4.2 Application Logs

6.5 What Controllers are for

6.6 Leaders and their role

6.6.1 Inter-Broker Communications

6.6.2 The Role of Replicas

6.7 In-Sync Replicas (ISR) Defined

6.8 Unclean Leader Election

6.9 Seeing Metrics from Kafka

6.9.1 Cluster Maintenance

6.9.2 Adding a Broker

6.9.3 Upgrading your Cluster

6.9.4 Upgrading your clients

6.9.5 Backups

6.10 A Note on Stateful Systems

6.11 Exercise

6.12 Summary

7 Topics and partitions

7.1 Topics

7.1.1 Topic Creation Options

7.1.2 Removing a Topic

7.1.3 Replication Factors

7.2 Partitions

7.2.1 Partition Location

7.2.2 Viewing Segments

7.3 More Topic and Partition Maintenance

7.3.1 Replica Assignment Changes

7.3.2 Altering the Number of Replicas

7.3.3 Preferred Replica Elections

7.3.4 Editing ZooKeeper Directly

7.4 Topic Compaction

7.4.1 Compaction Cleaning

7.4.2 Can Compaction Cause 'Deletes'

7.5 Summary

8 Kafka storage

8.1 How Long to Store Data

8.2 Data Pipelines

8.2.1 Keeping the original event

8.2.2 Moving away from a batch mindset

8.3 Tools

8.3.1 Apache Flume

8.3.2 Debezium

8.3.3 Secor

8.4 Bringing data back into Kafka

8.5 Architectures with Kafka

8.5.1 Lambda Architecture

8.5.2 Kappa Architecture

8.6 Multiple Cluster setups

8.6.1 Scaling by adding Clusters

8.6.2 Active-Active

8.6.3 Active-Passive

8.7 Cloud and Container Based Storage Options

8.7.1 Amazon Elastic Block Store

8.7.2 Kubernetes Clusters

8.8 Summary

9 Administration: cluster tools, logging, and monitoring

9.1 Administration clients

9.1.1 Administration in code with AdminClient

9.1.2 Kafkacat

9.1.3 Confluent REST Proxy API

9.2 Running Kafka as a systemd Service

9.3 Logging

9.3.1 Kafka application logs

9.3.2 Zookeeper logs

9.4 Firewall

9.4.1 Advertised listeners

9.5 Metrics

9.5.1 JMX Console

9.5.2 Important broker metrics

9.5.3 Important producer metrics

9.5.4 Important consumer metrics

9.5.5 Burrow for Consumer Lag Logic

9.6 Tracing messages across a cluster

9.6.1 Producer interceptor

9.6.2 Consumer Interceptor

9.6.3 Overriding Clients

9.7 General monitoring tools

9.7.1 Kafka Manager

9.7.2 Cruise Control

9.7.3 Confluent Control Center

9.7.4 General OS monitoring needs

9.8 Summary

Part 3: Going Further

10 Protecting Kafka

10.1 Security Basics

10.1.1 Encryption with SSL

10.1.2 SSL Between Brokers and Clients

10.1.3 SSL Between Brokers

10.2 Simple Authentication and Security Layer (SASL)

10.2.1 Kerberos

10.2.2 HTTP Basic Auth

10.3 Authorization in Kafka

10.3.1 Access Control Lists

10.3.2 Role-based access control

10.4 ZooKeeper

10.4.1 Kerberos Setup

10.5 Quotas

10.5.1 Network Bandwidth Quota

10.5.2 Request Rate Quotas

10.6 Data at Rest

10.6.1 Managed Options

10.7 Summary

11 Schema registry

11.1 Schema Registry

11.1.1 Installing Schema Registry

11.1.2 Registry Configuration

11.2 Defining a Schema

11.2.1 A New Schema

11.3 Schema Features

11.3.1 REST API

11.3.2 Client library

11.4 Compatibility Rules

11.4.1 Validate Schema Modifications

11.5 Alternative to a Schema Registry

11.5.1 Create new Topics

11.5.2 Confluent Cloud Schema Registry

11.6 Summary

12 A tour of Kafka Streams and KSQL/ksqlDB


Appendix A: Installation

A.1 Which Operating System to use

A.2 Installing Prerequisite: Java

A.3 Installing Prerequisite: ZooKeeper

A.4 Installing Kafka

A.5 Confluent CLI

What's inside

  • Understanding Kafka's concepts
  • Implementing Kafka as a message queue
  • Setting up and executing basic ETL tasks
  • Recording and consuming streaming data
  • Working with Kafka producers and consumers from Java applications
  • Using Kafka as part of a large data project team
  • Performing Kafka developer and admin tasks

About the reader

Written for intermediate Java developers or data engineers. No prior knowledge of Kafka is required.

About the authors

Dylan Scott is a software developer with over ten years of experience in Java and Perl. His experience includes implementing Kafka as a messaging system for a large data migration, and he uses Kafka in his work in the insurance industry.

placing your order...

Don't refresh or navigate away from the page.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
print book $26.99 $44.99 pBook + eBook + liveBook
Additional shipping charges may apply
Kafka in Action (print book) added to cart
continue shopping
go to cart

eBook $28.79 $35.99 3 formats + liveBook
Kafka in Action (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.
customers also reading

This book 1-hop 2-hops 3-hops

FREE domestic shipping on three or more pBooks