Event Streams in Action
Unified log processing with Kafka and Kinesis
Alexander Dean
  • MEAP began July 2014
  • Publication in Fall 2017 (estimated)
  • ISBN 9781617292347
  • 350 pages (estimated)
  • printed in black & white

Event Streams in Action is a foundational book introducing the ULP paradigm and presenting techniques to use it effectively in data-rich environments. The book begins with an architectural overview, illustrating how ULP addresses the thorny issues associated with processing data from multiple sources, including simultaneous event streams. It then guides you through examples using the unified log technologies Apache Kafka and Amazon Kinesis and a variety of stream processing frameworks and analytics databases. You'll learn to aggregate events from multiple sources, store them in a unified log, and build data processing applications on the resulting event streams.

As you progress through the book, you'll see how to validate, filter, enrich, and store event streams, master key stream processing approaches, and explore important patterns like the lambda architecture, stream aggregation, and event re-processing. The book also dives into the methods and tools usable for event modelling and event analytics, along with scaling, resiliency, and advanced stream patterns.

Table of Contents detailed table of contents

Part 1 EVENT STREAMS AND UNIFIED LOGS

1. Introducing event streams

1.1. Defining our terms

1.1.1. Events

1.1.2. Continuous event streams

1.2. Some familiar event streams

1.2.1. Application-level logging

1.2.2. Web analytics

1.2.3. Publish/subscribe messaging

1.3. Unifying continuous event streams

1.3.1. The classic era

1.3.2. The hybrid era

1.3.3. The unified era

1.4. Use cases for the unified log

1.4.1. Customer feedback loops

1.4.2. Holistic systems monitoring

1.4.3. Hot swapping data application versions

1.5. Conclusion

2. The unified log

2.1. Anatomy of a unified log

2.1.1. Unified

2.1.2. Append-only

2.1.3. Distributed

2.1.4. Ordered

2.2. Introducing our application

2.2.1. Identifying our key events

2.2.2. Unified log, e-commerce style

2.2.3. Modeling our first event

2.3. Setting up our unified log

2.3.1. Download and install Apache Kafka

2.3.2. Creating our stream

2.3.3. Sending and receiving some events

2.4. Summary

3. Event stream processing

3.1. Event stream processing 101

3.1.1. Why process event streams?

3.1.2. Single event processing

3.1.3. Multiple event processing

3.2. Designing our first stream processing app

3.2.1. Using Kafka as our company's glue

3.2.2. Locking down our requirements

3.3. Writing a simple Kafka worker

3.3.1. Setting up our development environment

3.3.2. Configuring our application

3.3.3. Reading from Kafka

3.3.4. Writing to Kafka

3.3.5. Stitching it all together

3.3.6. Testing

3.4. Writing a single event processor

3.4.1. Writing our event processor

3.4.2. Updating our main function

3.4.3. Testing, redux

3.5. Conclusion

4. Stateful stream processing

4.1. Abandoned carts and state

4.1.1. On shopping cart abandonment

4.1.2. Introducing state

4.1.3. Stream windowing

4.2. Stream processing frameworks

4.2.1. Framework capabilities

4.2.2. A flock of frameworks

4.2.3. Why Apache Samza?

4.3. Designing our job

4.3.1. The basic design

4.3.2. Modeling our derived event

4.3.3. Designing our Samza job

4.3.4. Partitioning our raw stream

4.4. Writing our Samza job

4.4.1. Creating our job’s build file

4.4.2. Configuring our job

4.4.3. Representing our events in Java

4.4.4. Handling our state

4.5. Building and testing our job

4.5.1. Bootstrapping our environment

4.5.2. Building and submitting our job

4.5.3. Testing our job

4.5.4. Improving our job

4.6. Conclusion

Part 2 DATA ENGINEERING WITH STREAMS

5. Schemas

5.1. An introduction to schemas

5.1.1. Introducing Plum

5.1.2. Event schemas as contracts

5.1.3. Capabilities of schema technologies

5.1.4. Some schema technologies

5.1.5. Choosing a schema technology for Plum

5.2. Modeling our event in Avro

5.2.1. Setting up a development harness

5.2.2. Writing our health check event schema

5.2.3. From Avro to Java, and back again

5.2.4. Testing

5.3. Associating events with their schemas

5.3.1. Some modest proposals

5.3.2. A self-describing event for Plum

5.3.3. Plum’s schema registry

5.4. Summary

6. Archiving events

6.1. The archivist’s manifesto

6.1.1. Resilience

6.1.2. Reprocessing

6.1.3. Refinement

6.2. A design for archiving

6.2.1. What to archive

6.2.2. Where to archive

6.2.3. How to archive

6.3. Archiving Kafka with Secor

6.3.1. Warming up Kafka

6.3.2. Creating our event archive

6.3.3. Setting up Secor

6.4. Batch processing our archive

6.4.1. Batch processing 101

6.4.2. Designing our batch-processing job

6.4.3. Writing our job in Apache Spark

6.4.4. Running our job on Elastic MapReduce

6.5. Conclusion

7. Railway-oriented processing

7.1. Leaving the happy path

7.1.1. Failure and Unix programs

7.1.2. Failure and Java

7.1.3. Failure and the log-industrial complex

7.2. Failure and the unified log

7.2.1. A design for failure

7.2.2. Modeling failures as events

7.2.3. Composing our happy path across jobs

7.3. Failure composition with scalaz

7.3.1. Planning for failure

7.3.2. Setting up our Scala project

7.3.3. From Java to Scala

7.3.4. Better failure handling through scalaz

7.3.5. Composing failures

7.4. Railway-oriented processing

7.4.1. Introducing railway-oriented processing

7.4.2. Building the railway

7.5. Summary

8. Commands

8.1. Commands and the unified log

8.1.1. Events and commands

8.1.2. Implicit versus explicit commands

8.1.3. Working with commands in a unified log

8.2. Making decisions

8.2.1. Introducing Plum

8.2.2. Modeling commands

8.2.3. Writing our alert schema

8.2.4. Writing our alert schema

8.3. Consuming our commands

8.3.1. The right tool for the job

8.3.2. Reading our commands

8.3.3. Parsing our commands

8.3.4. Stitching it all together

8.3.5. Testing

8.4. Executing our commands

8.4.1. Signing up for Mailgun

8.4.2. Completing our executor

8.4.3. Final testing

8.5. Scaling up commands

8.5.1. One stream of commands, or many?

8.5.2. Handling command execution failures

8.5.3. Command hierarchies

8.6. Summary

Part 3 EVENT ANALYTICS

9. Analytics on read

9.1. Analytics on read, analytics on write

9.1.1. Analytics on read

9.1.2. Analytics on write

9.1.3. Choosing an approach

9.2. The OOPS event stream

9.2.1. Delivery truck events and entities

9.2.2. Delivery driver events and entities

9.2.3. The OOPS event model

9.2.4. The OOPS events archive

9.3. Getting started with Amazon Redshift

9.3.1. Introducing Redshift

9.3.2. Setting up Redshift

9.3.3. Designing an event warehouse

9.3.4. Creating our fat events table

9.4. ETL, ELT

9.4.1. Loading our events

9.4.2. Dimension widening

9.4.3. A detour on data volatility

9.5. Finally, some analysis

9.5.1. Analysis 1: Who does the most oil changes?

9.5.2. Analysis 2: Who is our most unreliable customer?

9.6. Conclusion

10. Analytics on write

10.1. Back to OOPS

10.1.1. Kinesis setup

10.1.2. Requirements gathering

10.1.3. Our analytics-on-write algorithm

10.2. Building our Lambda function

10.2.1. Setting up DynamoDB

10.2.2. Introduction to AWS Lambda

10.2.3. Lambda setup and event modeling

10.2.4. Revisiting our analytics-on-write algorithm

10.2.5. Conditional writes to DynamoDB

10.2.6. Finalizing our Lambda

10.3. Running our Lambda function

10.3.1. Deploying our Lambda function

10.3.2. Testing our Lambda function

10.4. Conclusion

About the Technology

Writing real-world applications in a data-rich environment can feel like being caught in the crossfire of a paintball battle. Any action may require you to combine event streams, batch archives, and live user or system requests in real time. Unified Log Processing is a coherent data processing architecture designed to encompass batch and near-real time stream data, event logging and aggregation, and data processing on the resulting unified event stream. By efficiently creating a single log of events from multiple data sources, Unified Log Processing makes it possible to design large-scale data-driven applications that are easier to design, deploy, and maintain.

What's inside

  • Why unified logs are crucial to modern event processing architectures
  • Using Apache Kafka as an event processing log
  • Data engineering with Amazon Kinesis
  • Processing and monitoring event streams
  • Event analytics with Amazon Redshift, Apache Giraph and Amazon DynamoDB

About the reader

This book assumes that the reader has written some Java code. Some Scala or Python experience is helpful but not required.

About the author

Alexander Dean is co-founder and technical lead of Snowplow Analytics, an open source event processing and analytics platform.


Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
Buy
MEAP combo $44.99 pBook + eBook
MEAP eBook $35.99 pdf + ePub + kindle

FREE domestic shipping on three or more pBooks