Object Storage Across the Cloud
AWS, Azure and GCP
With chapters selected by J.T. Wolohan
  • February 2020
  • ISBN 9781617297786
  • 125 pages
The amount of data being collected these days is growing exponentially. Traditional file storage systems, like the ones you may be familiar with on your local machine, lack the scalability necessary to manage all that data. Object storage solves the scalability problem. In fact, in this big data era, object storage in the cloud is fast becoming the standard. And with cloud providers—Amazon, Google, and Microsoft—offering affordable, dependable, massively scalable cloud storage, you can leave all your storage concerns to the cloud and focus on your application!

About the book

Object Storage Across the Cloud is a collection of chapters from four Manning books, chosen by data scientist JT Wolohan, with the goal of helping you become comfortable developing with object storage, no matter which provider you choose. This mini ebook explores choosing the right storage class, access control and lifecycle configuration, and several common use cases. You’ll delve into the internals of the AWS S3 object store and use this highly popular system to host a website. In a chapter that examines Microsoft Azure’s Blob Storage, you’ll learn about Azure naming conventions, choosing an Azure storage service, creating an Azure storage account, and designing storage account access. And to put your newfound object storage knowledge to the test, you’ll use object storage to power big data analytics as you run both Hadoop and Spark jobs in the cloud using Amazon EMR. This packed primer is an excellent showcase of object storage, its many uses and benefits, and how to choose the best platform for your task!
Table of Contents detailed table of contents


Cloud Storage: object storage

Cloud Storage: object storage


Storing data in Cloud Storage

Choosing the right storage class

Multiregional storage

Regional storage

Nearline storage

Coldline storage

Access control

Limiting access with ACLs

Signed URLs

Logging access to your data

Object versions

Object lifecycles

Change notifications

URL restrictions

Common use cases

Hosting user content

Data archival

Understanding pricing

Amount of data stored

Amount of data transferred

Number of operations executed

Nearline and Coldline pricing

When should I use Cloud Storage?


Query complexity


Speed (latency)



To-do list


Storing your data on hard drives: EBS and instance store

Storing your data on hard drives: EBS and instance store

Network-attached storage

Creating an EBS volume and attaching it to your server

Using Elastic Block Store

Tweaking performance

Backing up your data

Instance stores

Using an instance store

Testing performance

Backing up your data

Comparing block-level storage solutions

Hosting a shared file system backed by an instance store and EBS

Security groups for NFS

NFS server and volume

NFS server installation and configuration script

NFS clients

Sharing files via NFS

Azure Storage Blob service

Azure Storage Blob service

Azure naming conventions

Resource group

Searching for services

Cloud storage services

Problem definition: Backup IIS logs

Create an Azure Storage account

Selecting a Storage account container

Create a Storage account container

Copy tools for Blob service

Blob tiering

Storage access

Problem definition: Backup files from two departments to common cloud storage. Maintain separate security access.

Designing Storage account access

MapReduce in the cloud with Amazon’s Elastic MapReduce

MapReduce in the cloud with Amazon’s Elastic MapReduce

Running Hadoop on EMR with mrjob

Convenient cloud clusters with EMR

Starting EMR clusters with mrjob

The AWS EMR browser interface

Machine learning in the cloud with Spark on EMR

Writing our machine learning model

Setting up an EMR cluster for Spark

Running PySpark jobs from our cluster


R-series cluster

Back-to-back Hadoop jobs

Instance types


What's inside

  • “Cloud Storage: object storage” from Google Cloud Platform in Action by JJ Geewax
  • “Storing your objects: S3 and Glacier” from Amazon Web Services in Action, Second Edition by Michael Wittig and Andreas Wittig
  • “Azure Storage Blob service” from Azure Data Engineering by Richard L. Nuckolls
  • “Large datasets in the cloud with Amazon Web Services and S3” from Mastering Large Datasets with Python by JT Wolohan

About the author

J.T. Wolohan is a lead data scientist at Booz Allen Hamilton and a PhD researcher at Indiana University, Bloomington, affiliated with the Department of Information and Library Science and the School of Informatics and Computing. His professional work focuses on rapid prototyping and scalable AI. His research focuses on computational analysis of social uses of language online. He is the author of Mastering Large Datasets with Python.

placing your order...

Don't refresh or navigate away from the page.
eBook $0.00 PDF only + liveBook
Object Storage Across the Cloud (eBook) added to cart
continue shopping
go to cart

Prices displayed in rupees will be charged in USD when you check out.

FREE domestic shipping on three or more pBooks