Spark in Motion
Jason Kolter
  • Course duration: 1h 57m
    Estimated full duration: 2h 40m
  • MEAP began July 2017
  • Publication in October 2017 (estimated)

See it. Do it. Learn it! Spark in Motion teaches you to use Spark for big data analytics through high-quality video-based lessons and built-in exercises, so you can put what you learn into practice.

Spark in Motion teaches you how to use Spark for batch and streaming data analytics. In nearly 3 hours of hands-on video lessons, you'll get up and running with Spark, starting with the basic architecture of a Spark application. You'll explore data partitioning and accessing common application state, and then you'll deep-dive into using Spark SQL and dataframes for structured analytics. Finally, you'll use Spark Streaming to handle and process real-time data flowing into your application.

"Quick, no nonsense. What more can you wish?"

~ Jonathan Rioux, Senior Analyst

"Best course I have seen so far."

~ Peter J. Hampton, AI Researcher

"Spark is a very valuable library, but it's very hard to use (the learning step is very steep). This video course makes the learning smoother, and takes the users to a place where they can experiment by themselves."

~ Alberto Boschetti, Data Scientist

Table of Contents detailed table of contents

An Introduction to Apache Spark

What is Spark

Exploring the Spark Ecosystem 1

Functional Programming Using the Spark Shell

Rich Programming Using Notebooks

Using RDDs Part 1 Features and Creating Loading

Using RDDs Part 2 Transformations and Actions

Spark Application Architecture

Summary

Building Realistic Spark Applications

Deploying Spark on a Cluster

Scaling Spark Applications

Making Iterative Applications Fly

Accessing Common Application State

Configuring the Spark Runtime

Monitoring and Metrics with the Spark Web UI

Summary

Advanced Analytics with Spark SQL and Datasets

Creating and Using Datasets

Structured Processing Using Spark SQL

Bringing SQL to Spark with the Dataframe API

Working with Spark SQL Data Sources

Interactive Queries with the Spark SQL Server

Summary

Real-time applications using Spark streaming

Spark streaming: overview and architecture

Understanding DStream operations

Streaming data sources and basic tuning

Building custom streaming pipelines

Summary

About the subject

When you're doing analytics on big data systems, it can be a challenge to efficiently query, stream, filter, and consolidate data sharded across a cluster. Built especially for efficiently operating over large distributed datasets, the Spark data processing engine takes some of the weight off your shoulders. Spark features an easy-to-use interface, near-limitless upgrade potential, and performance that will knock your socks off. Spark simplifies your data infrastructure so you can focus on creating top-notch analytics.

Prerequisites

Designed for a software engineer or architect, data scientist, or data analyst interested in getting started with Spark. No prior experience is needed.

What you will learn

  • Exploring the Spark Ecosystem
  • Deploying Spark on a cluster
  • Analytics with SparkSQL
  • Real-time applications with Spark Streaming

About the instructor

Jason Kolter is an instructor for the University of Washington certificate program in Big Data Technologies. Additionally he has worked in a wide range of technology companies, gaining extensive experience leading teams building production large-scale distributed analytics systems.


Manning Early Access Program (MEAP) Watch raw videos as they are added, and get the entire course, complete with transcript and exercises, when it is finished.
Buy
MEAP liveVideo $49.99