Unified Log Processing
Integrating and processing event streams
Alexander Dean
  • MEAP began July 2014
  • Publication in May 2017 (estimated)
  • ISBN 9781617292347
  • 350 pages (estimated)
  • printed in black & white

Unified Log Processing is a foundational book introducing the ULP paradigm and presenting techniques to use it effectively in data-rich environments. The book begins with an architectural overview, illustrating how ULP addresses the thorny issues associated with processing data from multiple sources, including simultaneous event streams. It then guides you through examples using the unified log technologies Apache Kafka and Amazon Kinesis and a variety of stream processing frameworks and analytics databases. You'll learn to aggregate events from multiple sources, store them in a unified log, and build data processing applications on the resulting event streams.

As you progress through the book, you'll see how to validate, filter, enrich, and store event streams, master key stream processing approaches, and explore important patterns like the lambda architecture, stream aggregation, and event re-processing. The book also dives into the methods and tools usable for event modelling and event analytics, along with scaling, resiliency, and advanced stream patterns.

Table of Contents detailed table of contents

Part 1 EVENT STREAMS AND UNIFIED LOGS

1. Introducing event streams

1.1. Defining our terms

1.1.1. Events

1.1.2. Continuous event streams

1.2. Some familiar event streams

1.2.1. Application-level logging

1.2.2. Web analytics

1.2.3. Publish/subscribe messaging

1.3. Unifying continuous event streams

1.3.1. The classic era

1.3.2. The hybrid era

1.3.3. The unified era

1.4. Use cases for the unified log

1.4.1. Customer feedback loops

1.4.2. Holistic systems monitoring

1.4.3. Hot swapping data application versions

1.5. Conclusion

2. The unified log

2.1. Anatomy of a unified log

2.1.1. Unified

2.1.2. Append-only

2.1.3. Distributed

2.1.4. Ordered

2.2. Introducing our application

2.2.1. Identifying our key events

2.2.2. Unified log, e-commerce style

2.2.3. Modeling our first event

2.3. Setting up our unified log

2.3.1. Download and install Apache Kafka

2.3.2. Creating our stream

2.3.3. Sending and receiving some events

2.4. Summary

3. Event stream processing

3.1. Event stream processing 101

3.1.1. Why process event streams?

3.1.2. Single event processing

3.1.3. Multiple event processing

3.2. Designing our first stream processing app

3.2.1. Using Kafka as our company's glue

3.2.2. Locking down our requirements

3.3. Writing a simple Kafka worker

3.3.1. Setting up our development environment

3.3.2. Configuring our application

3.3.3. Reading from Kafka

3.3.4. Writing to Kafka

3.3.5. Stitching it all together

3.3.6. Testing

3.4. Writing a single event processor

3.4.1. Writing our event processor

3.4.2. Updating our main function

3.4.3. Testing, redux

3.5. Conclusion

4. Stateful stream processing

4.1. Abandoned carts and state

4.1.1. On shopping cart abandonment

4.1.2. Introducing state

4.1.3. Stream windowing

4.2. Stream processing frameworks

4.2.1. Framework capabilities

4.2.2. A flock of frameworks

4.2.3. Why Apache Samza?

4.3. Designing our job

4.3.1. The basic design

4.3.2. Modeling our derived event

4.3.3. Designing our Samza job

4.3.4. Partitioning our raw stream

4.4. Writing our Samza job

4.4.1. Creating our job’s build file

4.4.2. Configuring our job

4.4.3. Representing our events in Java

4.4.4. Handling our state

4.5. Building and testing our job

4.5.1. Bootstrapping our environment

4.5.2. Building and submitting our job

4.5.3. Testing our job

4.5.4. Improving our job

4.6. Conclusion

Part 2 DATA ENGINEERING WITH STREAMS

5. Schemas

5.1. An introduction to schemas

5.1.1. Introducing Plum

5.1.2. Event schemas as contracts

5.1.3. Capabilities of schema technologies

5.1.4. Some schema technologies

5.1.5. Choosing a schema technology for Plum

5.2. Modeling our event in Avro

5.2.1. Setting up a development harness

5.2.2. Writing our health check event schema

5.2.3. From Avro to Java, and back again

5.2.4. Testing

5.3. Associating events with their schemas

5.3.1. Some modest proposals

5.3.2. A self-describing event for Plum

5.3.3. Plum’s schema registry

5.4. Summary

6. Archiving events

6.1. The archivist’s manifesto

6.1.1. Resilience

6.1.2. Reprocessing

6.1.3. Refinement

6.2. A design for archiving

6.2.1. What to archive

6.2.2. Where to archive

6.2.3. How to archive

6.3. Archiving Kafka with Secor

6.3.1. Warming up Kafka

6.3.2. Creating our event archive

6.3.3. Setting up Secor

6.4. Batch processing our archive

6.4.1. Batch processing 101

6.4.2. Designing our batch-processing job

6.4.3. Writing our job in Apache Spark

6.4.4. Running our job on Elastic MapReduce

6.5. Conclusion

7. Railway-oriented processing

7.1. Leaving the happy path

7.1.1. Failure and Unix programs

7.1.2. Failure and Java

7.1.3. Failure and the log-industrial complex

7.2. Failure and the unified log

7.2.1. A design for failure

7.2.2. Modeling failures as events

7.2.3. Composing our happy path across jobs

7.3. Failure composition with scalaz

7.3.1. Planning for failure

7.3.2. Setting up our Scala project

7.3.3. From Java to Scala

7.3.4. Better failure handling through scalaz

7.3.5. Composing failures

7.4. Railway-oriented processing

7.4.1. Introducing railway-oriented processing

7.4.2. Building the railway

7.5. Summary

8. Idempotency

9. Commands

9.1. Commands and the unified log

9.1.1. Events and commands

9.1.2. Implicit versus explicit commands

9.1.3. Working with commands in a unified log

9.2. Making decisions

9.2.1. Introducing Plum

9.2.2. Modeling commands

9.2.3. Writing our alert schema

9.2.4. Writing our alert schema

9.3. Consuming our commands

9.3.1. The right tool for the job

9.3.2. Reading our commands

9.3.3. Parsing our commands

9.3.4. Stitching it all together

9.3.5. Testing

9.4. Executing our commands

9.4.1. Signing up for Mailgun

9.4.2. Completing our executor

9.4.3. Final testing

9.5. Scaling up commands

9.5.1. One stream of commands, or many?

9.5.2. Handling command execution failures

9.5.3. Command hierarchies

9.6. Summary

Part 3 EVENT ANALYTICS

10. Analytics on read

10.1. Analytics on read, analytics on write

10.1.1. Analytics on read

10.1.2. Analytics on write

10.1.3. Choosing an approach

10.2. The OOPS event stream

10.2.1. Delivery truck events and entities

10.2.2. Delivery driver events and entities

10.2.3. The OOPS event model

10.2.4. The OOPS events archive

10.3. Getting started with Amazon Redshift

10.3.1. Introducing Redshift

10.3.2. Setting up Redshift

10.3.3. Designing an event warehouse

10.3.4. Creating our fat events table

10.4. ETL, ELT

10.4.1. Loading our events

10.4.2. Dimension widening

10.4.3. A detour on data volatility

10.5. Finally, some analysis

10.5.1. Analysis 1: Who does the most oil changes?

10.5.2. Analysis 2: Who is our most unreliable customer?

10.6. Conclusion

11. Analytics on write

11.1. Back to OOPS

11.1.1. Kinesis setup

11.1.2. Requirements gathering

11.1.3. Our analytics-on-write algorithm

11.2. Building our Lambda function

11.2.1. Setting up DynamoDB

11.2.2. Introduction to AWS Lambda

11.2.3. Lambda setup and event modeling

11.2.4. Revisiting our analytics-on-write algorithm

11.2.5. Conditional writes to DynamoDB

11.2.6. Finalizing our Lambda

11.3. Running our Lambda function

11.3.1. Deploying our Lambda function

11.3.2. Testing our Lambda function

11.4. Conclusion

About the Technology

Writing real-world applications in a data-rich environment can feel like being caught in the crossfire of a paintball battle. Any action may require you to combine event streams, batch archives, and live user or system requests in real time. Unified Log Processing is a coherent data processing architecture designed to encompass batch and near-real time stream data, event logging and aggregation, and data processing on the resulting unified event stream. By efficiently creating a single log of events from multiple data sources, Unified Log Processing makes it possible to design large-scale data-driven applications that are easier to design, deploy, and maintain.

What's inside

  • Why unified logs are crucial to modern event processing architectures
  • Using Apache Kafka as an event processing log
  • Data engineering with Amazon Kinesis
  • Processing and monitoring event streams
  • Event analytics with Amazon Redshift, Apache Giraph and Amazon DynamoDB

About the reader

This book assumes that the reader has written some Java code. Some Scala or Python experience is helpful but not required.

About the author

Alexander Dean is co-founder and technical lead of Snowplow Analytics, an open source event processing and analytics platform.


Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
Buy
MEAP combo $44.99 pBook + eBook
MEAP eBook $35.99 pdf + ePub + kindle

FREE domestic shipping on three or more pBooks