The Ultimate Introduction to Big Data you own this product

Frank Kane
  • Course duration: 14h 30m

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


Look inside
See it. Do it. Learn it! Businesses rely on data for decision-making, success, and survival. The volume of data companies can capture is growing every day, and big data platforms like Hadoop help store, manage, and analyze it. In The Ultimate Introduction to Big Data, big data guru Frank Kane introduces you to big data processing systems and shows you how they fit together. This liveVideo spotlights over 25 different technologies in over 14 hours of video instruction.


Distributed by Manning Publications

This course was created independently by big data expert Frank Kane and is distributed by Manning through our exclusive liveVideo platform.

about the subject

Designed for data storage and processing, Hadoop is a reliable, fault-tolerant operating system. The most celebrated features of this open source Apache project are HDFS, Hadoop’s highly-scalable distributed file system, and the MapReduce data processing engine. Together, they can process vast amounts of data across large clusters. An ecosystem of hundreds of technologies has sprung up around Hadoop to answer the ever-growing demand for large-scale data processing solutions. Understanding the architecture of massive-scale data processing applications is an increasingly important and desirable skill, and you’ll have it when you complete this liveVideo course!

about the video

The Ultimate Introduction to Big Data teaches you how to design powerful distributed data applications. With lots of hands-on exercises, instructor Frank Kane goes beyond Hadoop to cover many related technologies, giving you valuable firsthand experience with modern data processing applications. You’ll learn to choose an appropriate data storage technology for your application and discover how Hadoop clusters are managed by YARN, Tez, Mesos, and other technologies. You’ll also experience the combined power of HDFS and MapReduce for storing and analyzing data at scale.

Using other key parts of the Hadoop ecosystem like Hive and MySQL, you’ll analyze relational data, and then tackle non-relational data analysis using HBase, Cassandra, and MongoDB. With Kafka, Sqoop, and Flume, you’ll make short work of publishing data to your Hadoop cluster. When you’re done, you’ll have a deep understanding of data processing applications on Hadoop and its distributed systems.

prerequisites

Suitable for software engineers, program managers, data analysts, database administrators, system architects, and everyone else with an interest in learning about Hadoop, its ecosystem, and how it relates to their work. Familiarity with the Linux command line would be helpful, along with some programming experience in Python or Scala.

what you will learn

  • Using HDFS and MapReduce for storing and analyzing data at scale
  • Analyzing relational data using Hive and MySQL
  • Creating scripts to process data on a Hadoop cluster using Pig and Spark
  • Using HBase, Cassandra, and MongoDB to analyze non-relational data
  • Querying data interactively with Drill, Phoenix, and Presto
  • Choosing an appropriate data storage technology for your application
  • Understanding how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie
  • Publishing data to your Hadoop cluster using Kafka, Sqoop, and Flume
  • Consuming streaming data using Spark Streaming, Flink, and Storm

about the instructor

Frank Kane holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. He spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to millions of customers every day. Sundog Software, his own company specializing in virtual reality environment technology and teaching others about big data analysis, is his pride and joy.

I love that the author demonstrates how to use each tool and technology in the course, and provides great examples. The comparison (pros & cons) of tools really helps to decide what to use in a project.

Dmytro Bekuzarov, Java tech lead, GlobalLogic

Good source to get to know the big data tools better.

Felipe Esteban Vildoso Castillo
what's a liveVideo?
Find out more

choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • The Ultimate Introduction to Big Data liveVideo for free