Azure Data Engineering
Real-time, streaming, and batch analytics
Richard L. Nuckolls
  • MEAP began April 2019
  • Publication in Early 2020 (estimated)
  • ISBN 9781617296307
  • 400 pages (estimated)
  • printed in black & white
The Microsoft Azure cloud is an ideal platform for data-intensive applications. Designed for productivity, Azure provides pre-built services that make collection, storage, and analysis much easier to implement and manage. Azure Data Engineering teaches you how to design a reliable, performant, and cost-effective data infrastructure in Azure by progressively building a complete working analytics system.
Table of Contents detailed table of contents

1 What is data engineering

1.1 What is data engineering?

1.2 What do data engineers do?

1.3 How does Microsoft define data engineering?

1.3.1 Data acquisition

1.3.2 Data storage

1.3.3 Data processing

1.3.4 Data queries

1.3.5 Orchestration

1.3.6 Data retrieval

1.4 What tools does Azure provide for data engineering?

1.5 Azure Data Engineers

1.6 Example Application

1.7 Summary

2 Building an analytics system in Azure

2.1 Lambda architecture

2.2 Azure cloud services

2.2.1 Event Hubs

2.2.2 Stream Analytics

2.2.3 Data Lake Store

2.2.4 Data Lake Analytics

2.2.5 SQL Data Warehouse

2.2.6 Data Factory

2.2.7 Azure Powershell

2.3 Azure analytics system architecture

2.4 Walkthrough of processing a series of event data records

2.4.1 Hot Path

2.4.2 Cold Path

2.4.3 Choosing abstract Azure services

2.5 Calculating cloud hosting costs

2.5.1 Event Hubs

2.5.2 Stream Analytics

2.5.3 Data Lake Storage

2.5.4 Data Lake Analytics

2.5.5 SQL Data Warehouse

2.5.6 Data Factory

2.6 Summary

3 Azure Storage Blob service

3.1 Azure naming conventions

3.1.1 Resource group

3.2 Searching for services

3.3 Cloud storage services

3.3.1 Problem definition: backup IIS logs

3.3.2 Create an Azure Storage account

3.3.3 Selecting a Storage account container

3.3.4 Create a Storage account container

3.3.5 Copy tools for Blob service

3.3.6 Blob tiering

3.4 Storage access

3.4.1 Problem definition: Backup files from two departments to common cloud storage. Maintain separate security access.

3.4.2 Designing Storage account access

3.5 Summary

4 Azure Data Lake storage

4.1 Storage services compared

4.1.1 Problem definition: backup IIS logs

4.1.2 Create an Azure Data Lake store

4.1.3 Copy tools for Data Lake store

4.2 Storage access

4.2.1 Access schemes

4.2.2 Problem definition: Backup files from two departments to common cloud storage. Maintain separate security access.

4.2.3 Configuring ADL access

4.2.4 Hierarchy structure in Data Lake store

4.3 Storage folder structure and data drift

4.3.1 Problem definition: IIS logging configuration is adding fields

4.3.2 Data drift

4.3.3 Hierarchy structure revisited

4.4 Summary

5 Queueing with Event Hubs

6 Real-time queries with Azure Stream Analytics

7 Batch queries with Azure Data Lake Analytics

8 Integrating with Azure Data Lake Analytics

9 U-SQL for complex analytics

10 Service integration with Azure Data Factory

11 Distributed SQL with Azure SQL Data Warehouse

12 Data movement in Azure SQL Data Warehouse

Appendixes

Appendix A: Set up of Azure resources through Powershell

A.1 Setup Azure Powershell

A.2 Create a subscription

A.3 Azure naming conventions

A.4 Setup common Azure resources using Powershell

A.4.1 Create a new resource group

A.4.2 Create new Azure Active Directory user

A.4.3 Create new Azure Active Directory group

A.5 Setup Azure services using Powershell

A.5.1 Create new Storage account

A.5.2 Create new Data Lake store

A.6 Summary

About the Technology

The Microsoft Azure cloud platform can host virtually any sort of computing task, from simple web applications to full-scale enterprise systems. With many pre-built services for everything from data storage to advanced machine learning, Azure offers all the building blocks for scalable big data analysis systems including ingestion, processing, querying, and migration.

About the book

Azure Data Engineering teaches you to build high-capacity data analytics systems using Azure cloud services for storing, collecting, and analyzing data. In it, seasoned IT professional and author Richard Nuckolls starts you off with an overview of core data engineering tasks and the Azure tools that support them. Then, you’ll dive right into building your analytics system, starting with Data Lake Store for data retention, Azure Event Hubs for high-throughput ingestion, and Stream Analytics for real-time query processing.

For batch scheduling and aggregate data movement, you’ll add Data Factory and Data Lake Analytics, along with SQL Data Warehouse for interactive queries. With Azure Active Directory, you’ll manage security by applying permissions and access roles. And because your design is based on the Lambda architecture, you can be sure it will handle large volumes of data beautifully and with lightning speed!

What's inside

  • Azure cloud services architecture
  • Building a data warehouse in Azure
  • How to choose the right Azure technology for your task
  • Calculating fixed and variable costs
  • Hot and cold path analytics
  • Stream processing with Azure Stream Analytics and Event Hub integration
  • Giving structure to distributed storage
  • Practical examples leading up to a fully functioning analytics system

About the reader

Readers should be comfortable with RDBMS systems like SQLServer and scripting using a language like PowerShell, Bash, or Python. Book examples use PowerShell and C#.

About the author

Richard Nuckolls is a senior developer building a big data analytics and reporting system in Azure. During his nearly 20 years of experience, he’s done server and database administration, desktop and web development, and more recently has led teams in building a production content management system in Azure.

Manning Early Access Program (MEAP) Read chapters as they are written, get the finished eBook as soon as it’s ready, and receive the pBook long before it's in bookstores.
MEAP combo
...
$49.99 pBook + eBook + liveBook
MEAP eBook
...
$39.99 pdf + ePub + kindle + liveBook

placing your order...

Don't refresh or navigate away from the page.

FREE domestic shipping on three or more pBooks