Kafka in Action
In systems that handle big data, streaming data, or fast data, it's important to get your data pipelines right. Apache Kafka is a wicked-fast distributed streaming platform that operates as more than just a persistent log or a flexible message queue. With Kafka, you can build the powerful real-time data processing pipelines required by modern distributed systems. Kafka in Action is a fast-paced introduction to every aspect of working with Kafka you need to really reap its benefits.
Right from the beginning, it explains how everything works!
Table of Contents detailed table of contents
Part 1: Getting Started
1. Introduction to Kafka
1.1. What is Kafka?
1.2. Why the need for Kafka?
1.2.1. Why Kafka for the Developer
1.2.2. Explaining Kafka to your manager
1.3. Kafka Myths
1.3.1. Kafka only works with Hadoop
1.3.2. Kafka is the same as other message brokers
1.4. Real World Use Cases
1.4.2. Website Activity Tracking
1.4.3. Log Aggregation
1.4.4. Stream Processing
1.4.5. Internet of Things
1.4.6. When Kafka might not be the right fit
1.5. Online resources to get started
2. Getting to know Kafka
2.1. Kafka’s Hello World: Sending and retrieving our first message
2.2. A quick tour of Kafka
2.2.1. The what and why of ZooKeeper
2.2.2. Kafka’s high-level architecture
2.2.3. The Commit Log
2.3. Various source code packages and what they do
2.3.1. Kafka Stream Package
2.3.2. Connect Package
2.3.3. AdminClient Package
2.4. What sort of clients can I use for my own language of choice?
2.5. Terminology of Kafka
2.5.1. What is a streaming process
2.5.2. What exactly once means in our context
Part 2: Applying Kafka
3. Designing a Kafka project
3.1. Designing a Kafka project
3.1.1. Taking over an existing data architecture
3.1.2. Kafka Connect
3.1.3. Connect Features
3.1.4. When to use Connect vs Kafka Clients
3.2. Sensor Event Design
3.2.1. Existing issues
3.2.2. Why Kafka is a correct fit
3.2.3. Thought starters on our design
3.2.4. User data requirements
3.2.5. High-Level Plan for applying our questions
3.2.6. Reviewing our blueprint
3.3. Data Format
3.3.1. Why Schemas
3.3.2. Why Avro
4. Sourcing Data
5. Unlocking Data
7. Topics and Partitions
8. Kafka Storage
Part 3: Going Further
10. Protecting Kafka
11. Schema Registry
12. Kafka in the Wild (and getting involved)
Appendix A: The Kafka Codebase
Appendix B: Installation
B.1. Which Operating System to use
B.2. Installing Prerequisite: Java
B.3. Installing Prerequisite: ZooKeeper
B.4. Installing Kafka
B.5. Confluent CLI
About the TechnologyApache Kafka is a distributed streaming platform for logging and streaming data between services or applications. With Kafka, it's easy to build applications that can act on or react to data streams as they flow through your system. Operational data monitoring, large scale message processing, website activity tracking, log aggregation, and more are all possible with Kafka. Open-source, easily scalable, durable when demand gets heavy, and fast - Kafka is perfect for developers who need total control of the data flowing into and through their applications. The demand for Kafka developers is at an all-time high, as companies like LinkedIn, The New York Times, and Netflix, are relying on Kafka where fast data is essential.
About the bookKafka in Action is a practical, hands-on guide to building Kafka-based data pipelines. Filled with real-world use cases and scenarios, this book probes Kafka's most common use cases, ranging from simple logging through managing streaming data systems for message routing, analytics, and more. Starting with an overview of Kafka's core concepts, you'll immediately learn how to set up and execute basic data movement tasks and how to record and consume streaming data. As you move through the examples in this book, you'll learn the skills you need to work in a Kafka focused team with the ability to handle both developer and admin based tasks. At the end of this book, you'll be more than ready to dig into even more advanced Kafka topics on your own, and happily able to use Kafka in your day-to-day workflow.
- Understanding Kafka's concepts
- Implementing Kafka as a message queue
- Setting up and executing basic ETL tasks
- Recording and consuming streaming data
- Working with Kafka producers and consumers from Java applications
- Using Kafka as part of a large data project team
- Performing Kafka developer and admin tasks
About the readerWritten for intermediate Java developers or data engineers. No prior knowledge of Kafka is required.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
customers also bought
A good high level overview.
Great for beginners.