Welcome to Manning India!

We are pleased to be able to offer regional eBook pricing for Indian residents.
All eBook prices are discounted 40% or more!
Designing Cloud Data Platforms
Danil Zburivsky and Lynda Partner
  • MEAP began December 2019
  • Publication in Summer 2020 (estimated)
  • ISBN 9781617296444
  • 400 pages (estimated)
  • printed in black & white

A must have for anyone building a data platform or looking to move their existing data warehouse to to the cloud.

Christopher E. Phillips
Centralized data warehouses, the long-time defacto standard for housing data for analytics, are rapidly giving way to multi-faceted cloud data platforms. Companies that embrace modern cloud data platforms benefit from an integrated view of their business using all of their data and can take advantage of advanced analytic practices to drive predictions and as yet unimagined data services. Designing Cloud Data Platforms is an hands-on guide to envisioning and designing a modern scalable data platform that takes full advantage of the flexibility of the cloud. As you read, you’ll learn the core components of a cloud data platform design, along with the role of key technologies like Spark and Kafka Streams. You’ll also explore setting up processes to manage cloud-based data, keep it secure, and using advanced analytic and BI tools to analyse it.
Table of Contents detailed table of contents

1 Introducing the Data Platform

1.1 The back story

1.2 Data warehouses struggle with data Variety, Volume and Velocity

1.2.1 Variety

1.2.2 Volume

1.2.3 Velocity

1.2.4 All the V’s at once

1.3 Data Lakes to the rescue?

1.4 Along came Cloud

1.5 Cloud, data lakes and data warehouses belong together - the emergence of cloud data platforms

1.6 Building blocks of a cloud data platform

1.6.1 Ingestion layer

1.6.2 Storage layer

1.6.3 Processing layer

1.6.4 Serving layer

1.7 How the Cloud Data Platform deals with the 3 V’s

1.7.1 Variety

1.7.2 Volume

1.7.3 Velocity

1.7.4 Two More V’s

1.8 Common Use Cases

1.9 Summary

2 Why a Data Platform and not just a Data Warehouse

2.1 Cloud Data Platforms and Cloud Warehouses. The practical aspects

2.1.1 A closer look at the data sources

2.1.2 An example cloud data warehouse-only architecture

2.1.3 An example cloud data platform architecture

2.2 Ingesting data

2.2 1 Ingesting data directly into an Azure SQL Warehouse

2.2 2 Ingesting data into an Azure data platform

2.2.3 Managing changes in upstream data sources

2.3 Processing data

2.3.1 Processing data in the warehouse

2.3.2 Processing data in data platform

2.4 Accessing data

2.5 Cloud costs considerations

2.6 Summary

3 Getting bigger and leveraging the Big 3 — Google, Amazon and Microsoft

3.1 Cloud data platform layered architecture

3.1.1 Data ingestion layer

3.1.2 Fast and slow storage

3.1.3 Processing layer

3.1.4 Technical Metadata layer

3.1.5 The Serving Layer and data consumers

3.1.6 Orchestration and ETL overlay layers

3.2 The importance of layers in a data platform architecture

3.3 Mapping cloud data platform layers to specific tools

3.3.1 AWS

3.3.2 Google Cloud Platform

3.3.3 Azure

3.4 Open Source and commercial alternatives

3.4.1 Batch data ingestion.

3.4.2 Streaming data ingestion and real time analytics.

3.4.3 Orchestration layer.

3.5 Summary

4 Getting data into the platform

4.1 Databases, files, APIs and streams

4.1.1 Relational databases

4.1.2 Files

4.1.3 SaaS data via API

4.1.4 Streams

4.2 Ingesting data from relational databases

4.2.1 Ingesting data from RDBMS using an SQL interface

4.2.2 Full table ingestion

4.2.3 Incremental table ingestion

4.2.4 Change Data Capture (CDC)

4.2.5 CDC Vendors Overview

4.2.6 Data Types Conversion

4.2.7 Ingesting data from NoSQL databases

4.2.8 Capturing important metadata for RDBMS or NoSQL ingestion pipeline

4.3 Ingesting data from files

4.3.1 Tracking ingested files

4.3.2 Capturing file ingestion metadata

4.4 Ingesting data from streams

4.4.1 Differences between batch and streaming ingestion

4.4.2 Capturing streaming pipeline metadata

4.5 Ingesting data from SaaS applications

4.6 Network and security considerations for data ingestion into the cloud

4.6.1 Connecting other networks to your cloud data platform

4.7 Summary

5 Organizing and processing data

6 Real time data processing and analytics

7 MetaData

8 Schema management

9 Cloud data warehouses

10 Serving and Orchestration Layers — Applications, BI, and ML

11 Cloud cost optimizations

About the Technology

Access to affordable, dependable, serverless cloud services has revolutionized the way organizations can approach data management, and companies both big and small are raring to migrate to the cloud. But without a properly designed data platform, data in the cloud can remain just as siloed and inaccessible as it is today for most organizations. Designing Cloud Data Platforms lays out the principles of a well-designed platform that uses the scalable resources of the public cloud to manage all of an organization's data, and present it as useful business insights.

About the book

In Designing Cloud Data Platforms, you’ll learn how to integrate data from multiple sources into a single, cloud-based, modern data platform. Drawing on their real-world experiences designing cloud data platforms for dozens of organizations, cloud data experts Danil Zburivsky and Lynda Partner take you through a six-layer approach to creating cloud data platforms that maximizes flexibility and manageability and reduces costs. Starting with foundational principles, you’ll learn how to get data into your platform from different databases, files, and APIs, the essential practices for organizing and processing that raw data, and how to best take advantage of the services offered by major cloud vendors. As you progress past the basics you’ll take a deep dive into advanced topics to get the most out of your data platform, including real-time data management, machine learning analytics, schema management, and more.

What's inside

  • The tools of different public cloud for implementing data platforms
  • Best practices for managing structured and unstructured data sets
  • Machine learning tools that can be used on top of the cloud
  • Cost optimization techniques

About the reader

For data professionals familiar with the basics of cloud computing and distributed data processing systems like Hadoop and Spark.

About the authors

Danil Zburivsky has over 10 years experience designing and supporting large-scale data infrastructure for enterprises across the globe. Lynda Partner is the VP of Analytics-as-a-Service at Pythian, and has been on the business side of data for over 20 years.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
MEAP combo $59.99 pBook + eBook + liveBook
MEAP eBook $47.99 pdf + ePub + kindle + liveBook
Prices displayed in rupees will be charged in USD when you check out.

placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks