Streaming Data
Understanding the real-time pipeline
Andrew G. Psaltis
  • May 2017
  • ISBN 9781617292286
  • 216 pages
  • printed in black & white

The definitive book if you want to master the architecture of an enterprise-grade streaming application.

Sergio Fernandez Gonzalez, Accenture

Streaming Data introduces the concepts and requirements of streaming and real-time data systems. The book is an idea-rich tutorial that teaches you to think about how to efficiently interact with fast-flowing data.

Table of Contents detailed table of contents

Part 1: A new holistic approach

1. Introducing Streaming Data

1.1. What is a real-time system

1.2. Differences of real-time and streaming systems

1.3. The architectural blueprint

1.4. Security for streaming systems

1.5. How do we scale?

1.6. Summary

2. Getting data from clients: data ingestion

2.1. Common interaction patterns

2.1.1. Request/Response

2.1.2. Publish/Subscribe

2.1.3. Request/Acknowledge

2.1.4. One-Way

2.1.5. Stream

2.2. Scaling the interaction patterns

2.2.1. Request/Response Optional

2.2.2. Scaling the Stream Pattern

2.3. Fault-Tolerance

2.3.1. Receiver-Based Message Logging (RBML)

2.3.2. Sender-Based Message Logging (SBML)

2.3.3. Hybrid Message Logging (HML)

2.4. A dose of reality

2.5. Summary

3. Transporting the data from collection tier: decoupling the data pipeline

3.1. Do we really need a message queuing tier?

3.2. Core concepts

3.2.1. The Producer, The Broker, and the Consumer

3.2.2. Isolating Producers from Consumers

3.2.3. Durable Messaging

3.3. Message Delivery Semantics

3.4. Security

3.5. Fault tolerance

3.6. Applying the core concepts to business problems

3.7. Summary

4. Analyzing streaming data

4.1. Understanding in-flight data analysis

4.2. Distributed Stream Processing Architecture

4.3. Key Features of Stream-Processing Frameworks

4.4. Summary

5. Algorithms for data analysis

5.1. Accepting constraints and relaxing

5.2. Thinking about time

5.2.1. Sliding Window

5.2.2. Tumbling Window

5.3. Summarization techniques

5.3.1. Random Sampling

5.3.2. Counting Distinct Elements

5.3.3. Frequency

5.3.4. Membership

5.4. Summary

6. Storing the analyzed or collected data

6.1. When you need long—term storage

6.2. Keeping it In—Memory

6.2.1. Embedded In—Memory / Flash Optimized

6.2.2. Caching system

6.2.3. In Memory Database (IMDB) and In Memory Data Grid (IMDG)

6.3. Use case exercises

6.3.1. In—Session Personalization

6.3.2. Next Generation Energy Company

6.4. Summary

7. Making the data available

7.1. Communications Patterns

7.1.1. Data Sync

7.1.2. Remote Method Invocation (rmi) / Remote Method Call (rpc)

7.1.3. Simple Messaging

7.1.4. Publish—Subscribe

7.2. Protocols to use to send data to the client

7.2.1. Webhooks

7.2.2. Http Long Polling

7.2.3. Server—Sent Events

7.2.4. WebSockets

7.3. Filtering the stream

7.3.1. Where to filter

7.3.2. Static vs. Dynamic Filtering

7.4. Use Case: Building a 1.USA.gov Streaming API

7.5. Summary

8. Consumer device capabilities, limitations accessing the data

8.1. The core concepts

8.1.1. Reading fast enough

8.1.2. Maintaining state

8.1.3. Mitigating data loss

8.1.4. Exactly Once Processing

8.2. Introducing the Web Client

8.2.1. Integrating with the Streaming API Service

8.3. The move towards a query language

8.4. Summary

Part 2: Taking it Real World

9. Analyzing Meetup RSVPs in Real-Time

9.1. The Collection Tier

9.1.1. Collection Service Data Flow

9.2. Message Queueing Tier

9.2.1. Installing and Configuring Kafka

9.2.2. Integrating the Collection Service and Kafka

9.3. Analysis Tier

9.3.1. Installing Storm and Preparing Kafka

9.3.2. Building the TopN Storm Topology

9.3.3. Integrating Analysis

9.4. In-Memory Data Store

9.5. Data Access Tier

9.5.1. Taking it to production

9.6. Summary

About the Technology

As humans, we're constantly filtering and deciphering the information streaming toward us. In the same way, streaming data applications can accomplish amazing tasks like reading live location data to recommend nearby services, tracking faults with machinery in real time, and sending digital receipts before your customers leave the shop. Recent advances in streaming data technology and techniques make it possible for any developer to build these applications if they have the right mindset. This book will let you join them.

About the book

Streaming Data is an idea-rich tutorial that teaches you to think about efficiently interacting with fast-flowing data. Through relevant examples and illustrated use cases, you'll explore designs for applications that read, analyze, share, and store streaming data. Along the way, you'll discover the roles of key technologies like Spark, Storm, Kafka, Flink, RabbitMQ, and more. This book offers the perfect balance between big-picture thinking and implementation details.

What's inside

  • The right way to collect real-time data
  • Architecting a streaming pipeline
  • Analyzing the data
  • Which technologies to use and when

About the reader

Written for developers familiar with relational database concepts. No experience with streaming or real-time applications required.

About the author

Andrew Psaltis is a software engineer focused on massively scalable real-time analytics.


Buy
combo $49.99 pBook + eBook + liveBook
eBook $39.99 pdf + ePub + kindle + liveBook

FREE domestic shipping on three or more pBooks

A thorough explanation and examination of the different systems, strategies, and tools for streaming data implementations.

Kosmas Chatzimichalis, Mach 7x

A well-structured way to learn about streaming data and how to put it into practice in modern real-time systems.

Giuliano Araujo Bertoti, FATEC

This book is all you need to understand what streaming is all about!

Carlos Curotto, Globant