The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop.
about the technology
Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem.
about the book
Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms.
what's inside
Writing Spark applications in Java
Spark application architecture
Ingestion through files, databases, streaming, and Elasticsearch
Querying distributed datasets with Spark SQL
about the reader
This book does not assume previous experience with Spark, Scala, or Hadoop.
about the author
Jean-Georges Perrin is an experienced data and software architect. He is France’s first IBM Champion and has been honored for 12 consecutive years.
eBook
$47.99
$23.99
you save $24.00 (50%)
print
$59.99
$29.99
you save $30.00 (50%)
with subscription
$24.99
This book reveals the tools and secrets you need to drive innovation in your company or community.
An indispensable, well-paced, and in-depth guide. A must-have for anyone into big data and real-time stream processing.
This book will help spark a love affair with distributed processing.
Currently the best book on the subject!
related titles
related titles
choose your plan
pro
monthly
annual
$24.99
$249.99
only $20.83 per month
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose another free product every time you renew
choose twelve free products per year
exclusive 50% discount on all purchases
Spark in Action, Second Edition ebook for free
team
monthly
annual
$49.99
$499.99
only $41.67 per month
five seats for your team
access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!