Event Streams in Action
Real-time event systems with Kafka and Kinesis
Alexander Dean, Valentin Crettaz
  • May 2019
  • ISBN 9781617292347
  • 344 pages
  • printed in black & white

Clear, precise, detailed, and well written. A must read.

Thorsten P. Weber, Mercateo

Event Streams in Action is a foundational book introducing the ULP paradigm and presenting techniques to use it effectively in data-rich environments.

Table of Contents detailed table of contents

Part 1 Event streams and unified logs

1 Introducing event streams

1.1 Defining our terms

1.1.1 Events

1.1.2 Continuous event streams

1.2 Exploring familiar event streams

1.2.1 Application-level logging

1.2.2 Web analytics

1.2.3 Publish/subscribe messaging

1.3 Unifying continuous event streams

1.3.1 The classic era

1.3.2 The hybrid era

1.3.3 The unified era

1.4 Introducing use cases for the unified log

1.4.1 Customer feedback loops

1.4.2 Holistic systems monitoring

1.4.3 Hot-swapping data application versions

Summary

2 The unified log

2.1 Understanding the anatomy of a unified log

2.1.1 Unified

2.1.2 Append-only

2.1.3 Distributed

2.1.4 Ordered

2.2 Introducing our application

2.2.1 Identifying our key events

2.2.2 Unified log, e-commerce style

2.2.3 Modeling our first event

2.3 Setting up our unified log

2.3.1 Downloading and installing Apache Kafka

2.3.2 Creating our stream

2.3.3 Sending and receiving events

Summary

3 Event stream processing with Apache Kafka

3.1 Event stream processing 101

3.1.1 Why process event streams?

3.1.2 Single-event processing

3.1.3 Multiple-event processing

3.2 Designing our first stream-processing app

3.2.1 Using Kafka as our company’s glue

3.2.2 Locking down our requirements

3.3 Writing a simple Kafka worker

3.3.1 Setting up our development environment

3.3.2 Configuring our application

3.3.3 Reading from Kafka

3.3.4 Writing to Kafka

3.3.5 Stitching it all together

3.3.6 Testing

3.4 Writing a single-event processor

3.4.1 Writing our event processor

3.4.2 Updating our main function

3.4.3 Testing, redux

Summary

4 Event stream processing with Amazon Kinesis

4.1 Writing events to Kinesis

4.1.1 Systems monitoring and the unified log

4.1.2 Terminology differences from Kafka

4.1.3 Setting up our stream

4.1.4 Modeling our events

4.1.5 Writing our agent

4.2 Reading from Kinesis

4.2.1 Kinesis frameworks and SDKs

4.2.2 Reading events with the AWS CLI

4.2.3 Monitoring our stream with boto

Summary

5 Stateful stream processing

5.1 Detecting abandoned shopping carts

5.1.1 What management wants

5.1.2 Defining our algorithm

5.1.3 Introducing our derived events stream

5.2 Modeling our new events

5.2.1 Shopper adds item to cart

5.2.2 Shopper places order

5.2.3 Shopper abandons cart

5.3 Stateful stream processing

5.3.1 Introducing state management

5.3.2 Stream windowing

5.3.3 Stream processing frameworks and their capabilities

5.3.4 Stream processing frameworks

5.3.5 Choosing a stream processing framework for Nile

5.4 Detecting abandoned carts

5.4.1 Designing our Samza job

5.4.2 Preparing our project

5.4.3 Configuring our job

5.4.4 Writing our job’s Java task

5.5 Running our Samza job

5.5.1 Introducing YARN

5.5.2 Submitting our job

5.5.3 Testing our job

5.5.4 Improving our job

Summary

Part 2 Data engineering with Streams

6 Schemas

6.1 An introduction to schemas

6.1.1 Introducing Plum

6.1.2 Event schemas as contracts

6.1.3 Capabilities of schema technologies

6.1.4 Some schema technologies

6.1.5 Choosing a schema technology for Plum

6.2 Modeling our event in Avro

6.2.1 Setting up a development harness

6.2.2 Writing our health check event schema

6.2.3 From Avro to Java, and back again

6.2.4 Testing

6.3 Associating events with their schemas

6.3.1 Some modest proposals

6.3.2 A self-describing event for Plum

6.3.3 Plum’s schema registry

Summary

7 Archiving events

7.1 The archivist’s manifesto

7.1.1 Resilience

7.1.2 Reprocessing

7.1.3 Refinement

7.2 A design for archiving

7.2.1 What to archive

7.2.2 Where to archive

7.2.3 How to archive

7.3 Archiving Kafka with Secor

7.3.1 Warming up Kafka

7.3.2 Creating our event archive

7.3.3 Setting up Secor

7.4 Batch processing our archive

7.4.1 Batch processing 101

7.4.2 Designing our batch processing job

7.4.3 Writing our job in Apache Spark

7.4.4 Running our job on Elastic MapReduce

Summary

8 Railway-oriented processing

8.1 Leaving the happy path

8.1.1 Failure and Unix programs

8.1.2 Failure and Java

8.1.3 Failure and the log-industrial complex

8.2 Failure and the unified log

8.2.1 A design for failure

8.2.2 Modeling failures as events

8.2.3 Composing our happy path across jobs

8.3 Failure composition with Scalaz

8.3.1 Planning for failure

8.3.2 Setting up our Scala project

8.3.3 From Java to Scala

8.3.4 Better failure handling through scalaz

8.3.5 Composing failures

8.4 Implementing railway-oriented processing

8.4.1 Introducing railway-oriented processing

8.4.2 Building the railway

Summary

9 Commands

9.1 Commands and the unified log

9.1.1 Events and commands

9.1.2 Implicit vs. explicit commands

9.1.3 Working with commands in a unified log

9.2 Making decisions

9.2.1 Introducing Plum

9.2.2 Modeling commands

9.2.3 Writing our alert schema

9.2.4 Writing our alert schema

9.3 Consuming our commands

9.3.1 The right tool for the job

9.3.2 Reading our commands

9.3.3 Parsing our commands

9.3.4 Stitching it all together

9.3.5 Testing

9.4 Executing our commands

9.4.1 Signing up for Mailgun

9.4.2 Completing our executor

9.4.3 Final testing

9.5 Scaling up commands

9.5.1 One stream of commands, or many?

9.5.2 Handling command-execution failures

9.5.3 Command hierarchies

Summary

Part 3 Event analytics

10 Analytics-on-read

10.1 Analytics-on-read, analytics-on-write

10.1.1 Analytics-on-read

10.1.2 Analytics-on-write

10.1.3 Choosing an approach

10.2 The OOPS event stream

10.2.1 Delivery truck events and entities

10.2.2 Delivery driver events and entities

10.2.3 The OOPS event model

10.2.4 The OOPS events archive

10.3 Getting started with Amazon Redshift

10.3.1 Introducing Redshift

10.3.2 Setting up Redshift

10.3.3 Designing an event warehouse

10.3.4 Creating our fat events table

10.4 ETL, ELT

10.4.1 Loading our events

10.4.2 Dimension widening

10.4.3 A detour on data volatility

10.5 Finally, some analysis

10.5.1 Analysis 1: Who does the most oil changes?

10.5.2 Analysis 2: Who is our most unreliable customer?

Summary

11 Analytics-on-write

11.1 Back to OOPS

11.1.1 Kinesis setup

11.1.2 Requirements gathering

11.1.3 Our analytics-on-write algorithm

11.2 Building our Lambda function

11.2.1 Setting up DynamoDB

11.2.2 Introduction to AWS Lambda

11.2.3 Lambda setup and event modeling

11.2.4 Revisiting our analytics-on-write algorithm

11.2.5 Conditional writes to DynamoDB

11.2.6 Finalizing our Lambda

11.3 Running our Lambda function

11.3.1 Deploying our Lambda function

11.3.2 Testing our Lambda function

Summary

Appendixes

Appendix A: AWS Primer

A.1 Setting up the AWS account

A.2 Creating a user

A.3 Setting up the AWS CLI

About the Technology

Many high-profile applications, like LinkedIn and Netflix, deliver nimble, responsive performance by reacting to user and system events as they occur. In large-scale systems, this requires efficiently monitoring, managing, and reacting to multiple event streams. Tools like Kafka, along with innovative patterns like unified log processing, help create a coherent data processing architecture for event-based applications.

About the book

Event Streams in Action teaches you techniques for aggregating, storing, and processing event streams using the unified log processing pattern. In this hands-on guide, you’ll discover important application designs like the lambda architecture, stream aggregation, and event reprocessing. You’ll also explore scaling, resiliency, advanced stream patterns, and much more! By the time you’re finished, you’ll be designing large-scale data-driven applications that are easier to build, deploy, and maintain.

What's inside

  • Validating and monitoring event streams
  • Event analytics
  • Methods for event modeling
  • Examples using Apache Kafka and Amazon Kinesis

About the reader

For readers with experience coding in Java, Scala, or Python.

About the author

Alexander Dean developed Snowplow, an open source event processing and analytics platform. Valentin Crettaz is an independent IT consultant with 25 years of experience.


combo $44.99 pBook + eBook + liveBook
eBook $35.99 pdf + ePub + kindle + liveBook
Prices displayed in rupees will be charged in USD when you check out.

placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks