Apache Pulsar in Action
David Kjerrumgaard
  • MEAP began July 2019
  • Publication in Early 2022 (estimated)
  • ISBN 9781617296888
  • 300 pages (estimated)
  • printed in black & white

Yes! Yes! Yes! This book is fantastic!

Kent Spillner

Distributed applications demand reliable, high-performance messaging. The Apache Pulsar server-to-server messaging system provides a secure, stable platform without the need for a stream processing engine like Spark. Contributed by Yahoo to the Apache Foundation, Pulsar is mature and battle-tested, handling millions of messages per second for over three years at Yahoo. Apache Pulsar in Action is a comprehensive and practical guide to building high-traffic applications with Pulsar, delivering extreme levels of speed and durability.

About the Technology

Pulsar is a streaming messaging system designed for high performance server-to-server messaging. Built and tested under intense conditions at Yahoo, Pulsar has been proven in production and can handle millions of messages per second. Now free and open-source, Pulsar’s unique architecture helps solve some of the challenges of modern development. Pulsar avoids latency in streaming data transmission, making it a powerful tool for IoT Edge analytics. Its unified messaging model improves the performance of microservices architecture, and its tiered storage capabilities allow for larger volumes of data to be handled without fear of data loss. Pulsar’s flexible API interface works with Java, C++, Python, and Go, making it easy to incorporate Pulsar into your stack.

About the book

Apache Pulsar in Action is a hands-on guide to building scalable streaming messaging systems for distributed applications and microservices systems. You’ll start with Pulsar’s fundamentals, each illustrated by real-world examples, as you get to grips with Pulsar’s unique architecture. Pulsar contributor David Kjerrumgaard teaches the skills you need to deploy a Pulsar server, ingest data from third-party systems, and deploy lightweight computing logic with simple functions. You’ll learn to employ Pulsar’s seamless scalability through relatable case studies, including an IOT analytics application that can be deployed within a resource constrained environment and a microservices application based on Pulsar functions. At the end of this practical book, you’ll be ready to fully take advantage of Pulsar to create high-traffic message-driven applications.

Table of Contents detailed table of contents

Part 1: Getting Started with Apache Pulsar

1 Introduction to Apache Pulsar

1.1 Unified Messaging

1.1.1 Publish-Subscribe Messaging

1.1.2 Message Queuing

1.2 Stream Native Processing

1.2.1 Traditional Batching

1.2.2 Micro-Batching

1.2.3 Stream Processing

1.3 Scalable Storage

1.4 Comparison to Apache Kafka

1.4.1 Partition-Centric Storage in Kafka

1.4.2 Segment-Centric Storage in Pulsar

1.5 Why Do I Need Pulsar?

1.5.1 Zero Data Loss

1.5.2 Guaranteed Message Delivery

1.5.3 Infinite Scalability

1.5.4 Resilient to Failure

1.5.5 Support for Millions of Topics

1.5.6 Geo-Replication and Active Failover

1.5.7 Message Deduplication

1.6 Real World Use Cases

1.6.1 Unified Messaging System

1.6.2 Microservices

1.6.3 Connected Car

1.7 Additional Resources

1.8 Summary

2 Getting to Know Pulsar

2.1 Problem Statement

2.1.1 Collecting the Data

2.2 Pulsar Concepts and Terminology

2.2.1 Brokers, Bookies, and Proxies

2.2.2 Producers, Consumers, and Subscriptions

2.2.3 Tenants, Namespaces, and Topics

2.2.4 Message Retention and Expiration

2.3 Stream Storage

2.3.1 BookKeeper Terminology

2.3.2 Data Access Patterns

2.4 Getting Started with Pulsar

2.4.1 Pulsar Admin

2.4.2 Pulsar Client

2.5 Pulsar Java Client

2.5.1 Accessing the Pulsar Client Libraries

2.5.2 Pulsar Client Configuration

2.5.3 Pulsar Producers

2.5.4 Pulsar Consumers

2.6 Summary

Part 2: Apache Pulsar Development Essentials

3 Pulsar Functions

3.1 What are Pulsar Functions?

3.1.1 Pulsar Functions Overview

3.1.2 Programming Model

3.1.3 Processing Guarantees

3.1.4 State Storage

3.2 Developing Pulsar Functions

3.2.1 Native Functions

3.2.2 The Pulsar SDK

3.2.3 Stateful Functions

3.3 Deploying Pulsar Functions

3.3.1 Configuration

3.3.2 Runtime Environments

3.3.3 Deploying Your First Pulsar Function

3.4 Summary

4 Pulsar IO Connectors

4.1 What are Pulsar IO Connectors?

4.1.1 Sources and Sinks

4.1.2 Programming Model

4.1.3 Pulsar’s Built-In Connectors

4.1.4 Using the Built-In Connectors

4.2 Developing Pulsar IO Connectors

4.2.1 Developing a Source Connector

4.2.2 Developing a Sink Connector

4.2.3 Packaging and Deploying Pulsar IO Connectors

4.3 Deploying Pulsar IO Connectors

4.4 Administering Pulsar IO Connectors

4.4.1 Creating and Deleting Connectors

4.4.2 Listing Connectors

4.4.3 Monitoring Connectors

4.5 Summary

5 Pulsar Security

5.1 Transport Encryption

5.1.1 How Does TLS Work?

5.1.2 Enabling TLS on Pulsar

5.2 Authentication

5.2.1 TLS Authentication

5.2.2 JSON Web Token Authentication

5.3 Authorization

5.3.1 Roles

5.3.2 An Example Scenario

5.4 Message Encryption

5.5 Summary

6 Pulsar Schema Registry

6.1 Microservice Communication

6.1.1 Microservice APIs

6.1.2 The Need for a Schema Registry

6.2 The Pulsar Schema Registry

6.2.1 Architecture

6.2.2 Schema Versioning

6.2.3 Schema Compatibility

6.2.4 Schema Compatibility Check Strategies

6.3 Using the Schema Registry

6.3.1 Modelling the Food Order Event in Avro

6.3.2 Producing Food Order Events

6.3.3 Consuming the Food Order Events

6.3.4 Complete Example

6.4 Evolving the Schema

6.5 Summary

Part 3: Hands-On Application Development with Apache Pulsar

7 Pulsar Function Patterns

7.1 Application Design

7.1.1 Composition

7.1.2 Topologies

7.2 Message Routing Patterns

7.2.1 Splitter

7.2.2 Dynamic Router

7.2.3 Content Based Router

7.3 Message Transformation Patterns

7.3.1 Message Translator

7.3.2 Content Enricher

7.3.3 Content Filter

7.4 Summary

8 Resiliency Patterns

9 Data Access Patterns

10 Machine Learning in Pulsar

11 IoT Edge Analytics

Appendixes

Appendix A: Appendix A Running Pulsar in Containerized Environment

What's inside

  • Publish from Apache Pulsar into third-party data repositories and platforms
  • Design and develop Apache Pulsar functions
  • Perform interactive SQL queries against data stored in Apache Pulsar
  • Examples of Pulsar-based microservices that you can download and try yourself

About the reader

Written for experienced Java developers. No prior knowledge of Pulsar is needed.

About the author

David Kjerrumgaard is the Director of Solution Architecture at Streamlio, and a contributor to the Apache Pulsar and Apache NiFi projects.


placing your order...

Don't refresh or navigate away from the page.
Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
print book $24.99 $49.99 pBook + eBook + liveBook
Additional shipping charges may apply
Apache Pulsar in Action (print book) added to cart
continue shopping
go to cart

eBook $19.99 $39.99 3 formats + liveBook
Apache Pulsar in Action (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.

FREE domestic shipping on three or more pBooks