See it. Do it. Learn it! Businesses rely on data for decision-making, success, and survival. The volume of data companies can capture is growing every day, and big data platforms like Hadoop help store, manage, and analyze it. In The Ultimate Introduction to Big Data, big data guru Frank Kane introduces you to big data processing systems and shows you how they fit together. This liveVideo spotlights over 25 different technologies in over 14 hours of video instruction.
Distributed by Manning Publications
This course was created independently by big data expert Frank Kane and is distributed by Manning through our exclusive liveVideo platform.
about the subject
Designed for data storage and processing, Hadoop is a reliable, fault-tolerant operating system. The most celebrated features of this open source Apache project are HDFS, Hadoop’s highly-scalable distributed file system, and the MapReduce data processing engine. Together, they can process vast amounts of data across large clusters. An ecosystem of hundreds of technologies has sprung up around Hadoop to answer the ever-growing demand for large-scale data processing solutions. Understanding the architecture of massive-scale data processing applications is an increasingly important and desirable skill, and you’ll have it when you complete this liveVideo course!
about the video
The Ultimate Introduction to Big Data teaches you how to design powerful distributed data applications. With lots of hands-on exercises, instructor Frank Kane goes beyond Hadoop to cover many related technologies, giving you valuable firsthand experience with modern data processing applications. You’ll learn to choose an appropriate data storage technology for your application and discover how Hadoop clusters are managed by YARN, Tez, Mesos, and other technologies. You’ll also experience the combined power of HDFS and MapReduce for storing and analyzing data at scale.
Using other key parts of the Hadoop ecosystem like Hive and MySQL, you’ll analyze relational data, and then tackle non-relational data analysis using HBase, Cassandra, and MongoDB. With Kafka, Sqoop, and Flume, you’ll make short work of publishing data to your Hadoop cluster. When you’re done, you’ll have a deep understanding of data processing applications on Hadoop and its distributed systems.
Suitable for software engineers, program managers, data analysts, database administrators, system architects, and everyone else with an interest in learning about Hadoop, its ecosystem, and how it relates to their work. Familiarity with the Linux command line would be helpful, along with some programming experience in Python or Scala.
what you will learn
Using HDFS and MapReduce for storing and analyzing data at scale
Analyzing relational data using Hive and MySQL
Creating scripts to process data on a Hadoop cluster using Pig and Spark
Using HBase, Cassandra, and MongoDB to analyze non-relational data
Querying data interactively with Drill, Phoenix, and Presto
Choosing an appropriate data storage technology for your application
Understanding how Hadoop clusters are managed by YARN, Tez, Mesos, Zookeeper, Zeppelin, Hue, and Oozie
Publishing data to your Hadoop cluster using Kafka, Sqoop, and Flume
Consuming streaming data using Spark Streaming, Flink, and Storm
about the instructor
Frank Kane holds 17 issued patents in the fields of distributed computing, data mining, and machine learning. He spent 9 years at Amazon and IMDb, developing and managing the technology that automatically delivers product and movie recommendations to millions of customers every day. Sundog Software, his own company specializing in virtual reality environment technology and teaching others about big data analysis, is his pride and joy.
I love that the author demonstrates how to use each tool and technology in the course, and provides great examples. The comparison (pros & cons) of tools really helps to decide what to use in a project.
Good source to get to know the big data tools better.