Event Streams in Action
Alexander Dean, Valentin Crettaz
  • MEAP began July 2014
  • Publication in June 2019 (estimated)
  • ISBN 9781617292347
  • 350 pages (estimated)
  • printed in black & white

Event Streams in Action is a foundational book introducing the ULP paradigm and presenting techniques to use it effectively in data-rich environments. The book begins with an architectural overview, illustrating how ULP addresses the thorny issues associated with processing data from multiple sources, including simultaneous event streams. It then guides you through examples using the unified log technologies Apache Kafka and Amazon Kinesis and a variety of stream processing frameworks and analytics databases. You'll learn to aggregate events from multiple sources, store them in a unified log, and build data processing applications on the resulting event streams.

As you progress through the book, you'll see how to validate, filter, enrich, and store event streams, master key stream processing approaches, and explore important patterns like the lambda architecture, stream aggregation, and event re-processing. The book also dives into the methods and tools usable for event modelling and event analytics, along with scaling, resiliency, and advanced stream patterns.

Table of Contents detailed table of contents

Part 1 Event streams and unified logs

1 Introducing event streams

1.1 Defining our terms

1.1.1 Events

1.1.2 Continuous event streams

1.2 Exploring familiar event streams

1.2.1 Application-level logging

1.2.2 Web analytics

1.2.3 Publish/subscribe messaging

1.3 Unifying continuous event streams

1.3.1 The classic era

1.3.2 The hybrid era

1.3.3 The unified era

1.4 Introducing use cases for the unified log

1.4.1 Customer feedback loops

1.4.2 Holistic systems monitoring

1.4.3 Hot-swapping data application versions

1.5 Summary

2 The unified log

2.1 Understanding the anatomy of a unified log

2.1.1 Unified

2.1.2 Append-only

2.1.3 Distributed

2.1.4 Ordered

2.2 Introducing our application

2.2.1 Identifying our key events

2.2.2 Unified log, e-commerce style

2.2.3 Modeling our first event

2.3 Setting up our unified log

2.3.1 Downloading and installing Apache Kafka

2.3.2 Creating our stream

2.3.3 Sending and receiving events

2.4 Summary

3 Event stream processing with Apache Kafka

3.1 Event stream processing 101

3.1.1 Why process event streams?

3.1.2 Single-event processing

3.1.3 Multiple-event processing

3.2 Designing our first stream-processing app

3.2.1 Using Kafka as our company’s glue

3.2.2 Locking down our requirements

3.3 Writing a simple Kafka worker

3.3.1 Setting up our development environment

3.3.2 Configuring our application

3.3.3 Reading from Kafka

3.3.4 Writing to Kafka

3.3.5 Stitching it all together

3.3.6 Testing

3.4 Writing a single-event processor

3.4.1 Writing our event processor

3.4.2 Updating our main function

3.4.3 Testing, redux

3.5 Summary

4 Event stream processing with Amazon Kinesis

4.1 Writing events to Kinesis

4.1.1 Systems monitoring and the unified log

4.1.2 Terminology differences from Kafka

4.1.3 Setting up our stream

4.1.4 Modeling our events

4.1.5 Writing our agent

4.2 Reading from Kinesis

4.2.1 Kinesis frameworks and SDKs

4.2.2 Reading events with the AWS CLI

4.2.3 Monitoring our stream with boto

4.3 Summary

5 Stateful stream processing

5.1 Detecting abandoned shopping carts

5.1.1 What management wants

5.1.2 Defining our algorithm

5.1.3 Introducing our derived events stream

5.2 Modeling our new events

5.2.1 Shopper adds item to cart

5.2.2 Shopper places order

5.2.3 Shopper abandons cart

5.3 Stateful stream processing

5.3.1 Introducing state management

5.3.2 Stream windowing

5.3.3 Stream processing frameworks and their capabilities

5.3.4 Stream processing frameworks

5.3.5 Choosing a stream processing framework for Nile

5.4 Detecting abandoned carts

5.4.1 Designing our Samza job

5.4.2 Preparing our project

5.4.3 Configuring our job

5.4.4 Writing our job’s Java task

5.5 Running our Samza job

5.5.1 Introducing YARN

5.5.2 Submitting our job

5.5.3 Testing our job

5.5.4 Improving our job

5.6 Summary

Part 2 Data engineering with Streams

6 Schemas

6.1 An introduction to schemas

6.1.1 Introducing Plum

6.1.2 Event schemas as contracts

6.1.3 Capabilities of schema technologies

6.1.4 Some schema technologies

6.1.5 Choosing a schema technology for Plum

6.2 Modeling our event in Avro

6.2.1 Setting up a development harness

6.2.2 Writing our health check event schema

6.2.3 From Avro to Java, and back again

6.2.4 Testing

6.3 Associating events with their schemas

6.3.1 Some modest proposals

6.3.2 A self-describing event for Plum

6.3.3 Plum’s schema registry

6.4 Summary

7 Archiving events

7.1 The archivist’s manifesto

7.1.1 Resilience

7.1.2 Reprocessing

7.1.3 Refinement

7.2 A design for archiving

7.2.1 What to archive

7.2.2 Where to archive

7.2.3 How to archive

7.3 Archiving Kafka with Secor

7.3.1 Warming up Kafka

7.3.2 Creating our event archive

7.3.3 Setting up Secor

7.4 Batch processing our archive

7.4.1 Batch processing 101

7.4.2 Designing our batch processing job

7.4.3 Writing our job in Apache Spark

7.4.4 Running our job on Elastic MapReduce

7.5 Summary

8 Railway-oriented processing

8.1 Leaving the happy path

8.1.1 Failure and Unix programs

8.1.2 Failure and Java

8.1.3 Failure and the log-industrial complex

8.2 Failure and the unified log

8.2.1 A design for failure

8.2.2 Modeling failures as events

8.2.3 Composing our happy path across jobs

8.3 Failure composition with scalaz

8.3.1 Planning for failure

8.3.2 Setting up our Scala project

8.3.3 From Java to Scala

8.3.4 Better failure handling through scalaz

8.3.5 Composing failures

8.4 Railway-oriented processing

8.4.1 Introducing railway-oriented processing

8.4.2 Building the railway

8.5 Summary

9 Commands

9.1 Commands and the unified log

9.1.1 Events and commands

9.1.2 Implicit vs. explicit commands

9.1.3 Working with commands in a unified log

9.2 Making decisions

9.2.1 Introducing Plum

9.2.2 Modeling commands

9.2.3 Writing our alert schema

9.2.4 Writing our alert schema

9.3 Consuming our commands

9.3.1 The right tool for the job

9.3.2 Reading our commands

9.3.3 Parsing our commands

9.3.4 Stitching it all together

9.3.5 Testing

9.4 Executing our commands

9.4.1 Signing up for Mailgun

9.4.2 Completing our executor

9.4.3 Final testing

9.5 Scaling up commands

9.5.1 One stream of commands, or many?

9.5.2 Handling command-execution failures

9.5.3 Command hierarchies

9.6 Summary

Part 3 Event analytics

10 Analytics-on-read

10.1 Analytics-on-read, analytics-on-write

10.1.1 Analytics-on-read

10.1.2 Analytics-on-write

10.1.3 Choosing an approach

10.2 The OOPS event stream

10.2.1 Delivery truck events and entities

10.2.2 Delivery driver events and entities

10.2.3 The OOPS event model

10.2.4 The OOPS events archive

10.3 Getting started with Amazon Redshift

10.3.1 Introducing Redshift

10.3.2 Setting up Redshift

10.3.3 Designing an event warehouse

10.3.4 Creating our fat events table

10.4 ETL, ELT

10.4.1 Loading our events

10.4.2 Dimension widening

10.4.3 A detour on data volatility

10.5 Finally, some analysis

10.5.1 Analysis 1: Who does the most oil changes?

10.5.2 Analysis 2: Who is our most unreliable customer?

10.6 Summary

11 Analytics-on-write

11.1 Back to OOPS

11.1.1 Kinesis setup

11.1.2 Requirements gathering

11.1.3 Our analytics-on-write algorithm

11.2 Building our Lambda function

11.2.1 Setting up DynamoDB

11.2.2 Introduction to AWS Lambda

11.2.3 Lambda setup and event modeling

11.2.4 Revisiting our analytics-on-write algorithm

11.2.5 Conditional writes to DynamoDB

11.2.6 Finalizing our Lambda

11.3 Running our Lambda function

11.3.1 Deploying our Lambda function

11.3.2 Testing our Lambda function

11.4 Summary

Appendixes

Appendix A: AWS Primer

A.1 Setting up the AWS account

A.2 Creating a user

A.3 Setting up the AWS CLI

Index

What's inside

  • Why unified logs are crucial to modern event processing architectures
  • Using Apache Kafka as an event processing log
  • Data engineering with Amazon Kinesis
  • Processing and monitoring event streams
  • Event analytics with Amazon Redshift, Apache Giraph and Amazon DynamoDB

About the reader

This book assumes that the reader has written some Java code. Some Scala or Python experience is helpful but not required.

About the author

Alexander Dean is co-founder and technical lead of Snowplow Analytics, an open source event processing and analytics platform.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
MEAP combo $44.99 pBook + eBook + liveBook
MEAP eBook $35.99 pdf + ePub + kindle + liveBook

placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks