Aggregating Your Data read this article now in
Manning's Free Content Center

Resources

📖 Guide to static functions for Apache Spark 3.0.0 Preview 📖 Guide to static functions for Apache Spark 2.3.4 📖 Guide to static functions for Apache Spark 2.4.3 📖 Guide to static functions for Apache Spark 2.3.3 📖 Guide to static functions for Apache Spark 2.2.3 Source code - part 1 Source code - part 2 Book forum Source code on GitHub Slideshare: Using Apache Spark with Java What Happens behind the Scenes with Spark The Majestic Role of the Dataframe in Spark Ingesting Data from Files with Spark, Part 1 Ingesting Data from Files with Spark, Part 2 Article: Ingesting Data from Files with Spark, Part 3 Article: Ingesting Data from Files with Spark, Part 4 Spark in Action’s Chapter Eleven on Working with SQL is in MEAP Mental Model Graphic: Spark in Action 2E Data Engineer Podcast Aggregating Your Data Consuming records with Spark Aggregating Your Data Register your pBook for a free eBook 📺 DataFriday Youtube channel 🎙️ Jean-Georges Perrin interviewed 🎙️ Jean-Georges Perrin on Spark and Data Quality 🎙️ Spark in Action with Jean-Georges Perrin 🎙️ Processing Covid-19 Data with Apache Spark 🎙️ Mastering Data Pipelines with Apache Spark with Jean-Georges Perrin more

Become a
Reviewer

Help us create great books

Spark in Action, Second Edition

you own this product

Covers Apache Spark 3 with Examples in Java, Python, and Scala

Jean-Georges Perrin
Foreword by Rob Thomas

May 2020
ISBN 9781617295522
576 pages

Included with a Manning Online subscription

printed in black & white

Available translations: Russian, Simplified Chinese

catalog / Data Science / Big Data / Apache Spark

read now

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$47.99 $28.79

you save $19.20 (40%)

Look inside

The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop.

about the technology

Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem.

about the book

Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms.

what's inside

Writing Spark applications in Java
Spark application architecture
Ingestion through files, databases, streaming, and Elasticsearch
Querying distributed datasets with Spark SQL

about the reader

This book does not assume previous experience with Spark, Scala, or Hadoop.

about the author

Jean-Georges Perrin is an experienced data and software architect. He is France’s first IBM Champion and has been honored for 12 consecutive years.

eBook

$47.99 $28.79

you save $19.20 (40%)

This book reveals the tools and secrets you need to drive innovation in your company or community.

Rob Thomas, IBM

An indispensable, well-paced, and in-depth guide. A must-have for anyone into big data and real-time stream processing.

Anupam Sengupta, GuardHat Inc.

This book will help spark a love affair with distributed processing.

Conor Redmond, InComm Product Control

Currently the best book on the subject!

Markus Breuer, Materna IPS

choose your plan

pro

monthly

annual

$24.99

$249.99
only $20.83 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Spark in Action, Second Edition ebook for free

team

monthly

annual

$49.99

$399.99
only $33.33 per month

five seats for your team
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Spark in Action, Second Edition ebook for free

more seats?

choose your plan

pro

monthly

annual

$24.99

$249.99
only $20.83 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Spark in Action, Second Edition ebook for free

team

monthly

annual

$49.99

$399.99
only $33.33 per month

five seats for your team
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Spark in Action, Second Edition ebook for free

more seats?

Spark in Action, Second Edition

pro $24.99 per month

lite $19.99 per month

team

about the technology

about the book

what's inside

about the reader

about the author

related titles

related titles

pro

team

pro

team