Kafka in Action
Dylan Scott
  • MEAP began November 2017
  • Publication in Spring 2020 (estimated)
  • ISBN 9781617295232
  • 375 pages (estimated)
  • printed in black & white

Lays a great foundation for learning to operate and manage Kafka in production settings.

William Rudenmalm
In systems that handle big data, streaming data, or fast data, it's important to get your data pipelines right. Apache Kafka is a wicked-fast distributed streaming platform that operates as more than just a persistent log or a flexible message queue. With Kafka, you can build the powerful real-time data processing pipelines required by modern distributed systems. Kafka in Action is a fast-paced introduction to every aspect of working with Kafka you need to really reap its benefits.
Table of Contents detailed table of contents

Part 1: Getting Started

1. Introduction to Kafka

1.1. What is Kafka?

1.2. Why the need for Kafka?

1.2.1. Why Kafka for the Developer

1.2.2. Explaining Kafka to your manager

1.3. Kafka Myths

1.3.1. Kafka only works with Hadoop

1.3.2. Kafka is the same as other message brokers

1.4. Real World Use Cases

1.4.1. Messaging

1.4.2. Website Activity Tracking

1.4.3. Log Aggregation

1.4.4. Stream Processing

1.4.5. Internet of Things

1.4.6. When Kafka might not be the right fit

1.5. Online resources to get started

1.6. Summary

2. Getting to know Kafka

2.1. Kafka’s Hello World: Sending and retrieving our first message

2.2. A quick tour of Kafka

2.2.1. The what and why of ZooKeeper

2.2.2. Kafka’s high-level architecture

2.2.3. The Commit Log

2.3. Various source code packages and what they do

2.3.1. Kafka Stream Package

2.3.2. Connect Package

2.3.3. AdminClient Package

2.3.4. KSQL

2.4. What sort of clients can I use for my own language of choice?

2.5. Terminology of Kafka

2.5.1. What is a streaming process

2.5.2. What exactly once means in our context

2.6. Summary

Part 2: Applying Kafka

3. Designing a Kafka project

3.1. Designing a Kafka project

3.1.1. Taking over an existing data architecture

3.1.2. Kafka Connect

3.1.3. Connect Features

3.1.4. When to use Connect vs Kafka Clients

3.2. Sensor Event Design

3.2.1. Existing issues

3.2.2. Why Kafka is a correct fit

3.2.3. Thought starters on our design

3.2.4. User data requirements

3.2.5. High-Level Plan for applying our questions

3.2.6. Reviewing our blueprint

3.3. Data Format

3.3.1. Why Schemas

3.3.2. Why Avro

3.4. Summary

4. Producers: Sourcing Data

4.1. Introducing the Producer

4.1.1. Key Producer Write Path

4.2. Important Configuration

4.2.1. Producer Configuration

4.2.2. Configuring the Broker list

4.2.3. How to go Fast (or Safer)

4.2.4. Timestamps

4.2.5. Adding compression to our messages

4.2.6. Custom Serializer

4.2.7. Creating custom partition code

4.2.8. Producer Interceptor

4.3. Generating data for our requirements

4.3.1. Client and Broker Versions

4.4. Summary

5. Consumers: Unlocking Data

5.1. Introducing the Consumer

5.2. Important Configuration

5.2.1. Understanding Tracking Offsets

5.3. Consumer Groups

5.4. The Need for Offsets

5.4.1. GroupCoordinator

5.4.2. ConsumerRebalanceListener

5.4.3. Partition Assignment Strategy

5.4.4. Standalone Consumer

5.4.5. Manual Partition Assignment

5.5. Auto or Manual Commit of Offsets

5.6. Reading From a Compacted Topic

5.7. Reading for a Specific Offset

5.7.1. Start at the beginning

5.7.2. Going to the end

5.7.3. Seek to an Offset

5.7.4. Offsets For Times

5.8. Reading Concerns

5.8.1. Broker use of Consumers

5.8.2. Summary

6. Brokers

6.1. Introducing the Broker

6.2. Why Kafka needs Zookeeper

6.3. What does it mean to be a message broker

6.4. Configuration at the Broker Level

6.4.1. Kafka’s Core: The Log

6.4.2. Application Logs

6.5. What Controllers are for

6.6. Leaders and their role

6.6.1. Inter-Broker Communications

6.6.2. The Role of Replicas

6.7. In-Sync Replicas (ISR) Defined

6.8. Unclean Leader Election

6.9. Seeing Metrics from Kafka

6.9.1. Cluster Maintenance

6.9.2. Adding a Broker

6.9.3. Upgrading your Cluster

6.9.4. Upgrading your clients

6.9.5. Backups

6.10. A Note on Stateful Systems

6.11. Exercise

6.12. Summary

7. Topics and Partitions

7.1. Topics

7.1.1. Topic Creation Options

7.1.2. Removing a Topic

7.1.3. Replication Factors

7.2. Partitions

7.2.1. Partition Location

7.2.2. Viewing Segments

7.3. More Topic and Partition Maintenance

7.3.1. Replica Assignment Changes

7.3.2. Altering the Number of Replicas

7.3.3. Preferred Replica Elections

7.3.4. Editing ZooKeeper Directly

7.4. Topic Compaction

7.4.1. Compaction Cleaning

7.4.2. Can Compaction Cause 'Deletes'

7.5. Summary

8. Kafka Storage

8.1. How Long to Store Data

8.2. Data Pipelines

8.2.1. Keeping the original event

8.2.2. Moving away from a batch mindset

8.3. Tools

8.3.1. Apache Flume

8.3.2. Debezium

8.3.3. Secor

8.4. Bringing data back into Kafka

8.5. Architectures with Kafka

8.5.1. Lambda Architecture

8.5.2. Kappa Architecture

8.6. Multicluster setups

8.6.1. Scaling by adding Clusters

8.6.2. Hub and Spoke Integration

8.6.3. Active-Active

8.6.4. Active-Passive

8.7. Cloud and Container Based Storage Options

8.7.1. Amazon Elastic Block Store

8.7.2. Kubernetes Clusters

8.8. Summary

9. Administration

Part 3: Going Further

10. Protecting Kafka

11. Schema Registry

12. Kafka in the Wild (and getting involved)


Appendix A: The Kafka Codebase

Appendix B: Installation

B.1. Which Operating System to use

B.2. Installing Prerequisite: Java

B.3. Installing Prerequisite: ZooKeeper

B.4. Installing Kafka

B.5. Confluent CLI

About the Technology

Apache Kafka is a distributed streaming platform for logging and streaming data between services or applications. With Kafka, it's easy to build applications that can act on or react to data streams as they flow through your system. Operational data monitoring, large scale message processing, website activity tracking, log aggregation, and more are all possible with Kafka. Open-source, easily scalable, durable when demand gets heavy, and fast - Kafka is perfect for developers who need total control of the data flowing into and through their applications. The demand for Kafka developers is at an all-time high, as companies like LinkedIn, The New York Times, and Netflix, are relying on Kafka where fast data is essential.

About the book

Kafka in Action is a practical, hands-on guide to building Kafka-based data pipelines. Filled with real-world use cases and scenarios, this book probes Kafka's most common use cases, ranging from simple logging through managing streaming data systems for message routing, analytics, and more. Starting with an overview of Kafka's core concepts, you'll immediately learn how to set up and execute basic data movement tasks and how to record and consume streaming data. As you move through the examples in this book, you'll learn the skills you need to work in a Kafka focused team with the ability to handle both developer and admin based tasks. At the end of this book, you'll be more than ready to dig into even more advanced Kafka topics on your own, and happily able to use Kafka in your day-to-day workflow.

What's inside

  • Understanding Kafka's concepts
  • Implementing Kafka as a message queue
  • Setting up and executing basic ETL tasks
  • Recording and consuming streaming data
  • Working with Kafka producers and consumers from Java applications
  • Using Kafka as part of a large data project team
  • Performing Kafka developer and admin tasks

About the reader

Written for intermediate Java developers or data engineers. No prior knowledge of Kafka is required.

About the author

Dylan Scott is a software developer with over ten years of experience in Java and Perl. His experience includes implementing Kafka as a messaging system for a large data migration, and he uses Kafka in his work in the insurance industry.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
MEAP combo $44.99 pBook + eBook + liveBook
MEAP eBook $35.99 pdf + ePub + kindle + liveBook
Prices displayed in rupees will be charged in USD when you check out.

placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks