O’Reilly – Analyzing Big Data with Hadoop, AWS, and EMR

O’Reilly – Analyzing Big Data with Hadoop, AWS, and EMR
English | Size: 375.16 MB
Category: Tutoria

Analyzing Big Data with Hadoop, AWS, and EMR
Understanding how to use Hadoop on Amazon’s Elastic MapReduce Service
By Frank Kane
Publisher: O’Reilly Media
Final Release Date: March 2017
Run time: 1 hour 3 minutes

Hadoop is today’s most pervasive technology used in Big Data for distributing the processing of massive data sets across clusters of commodity computers. With Amazon’s Elastic MapReduce service (EMR), you can rent capacity through Amazon Web Services (AWS) to store and analyze data at minimal cost on top of a real Hadoop cluster.

This course shows you how to use an EMR Hadoop cluster via a real life example where you’ll analyze movie ratings data using Hive, Pig, and Oozie. It focuses on practical tips for using an EMR cluster efficiently, integrating the cluster with Amazon’s S3 service, and determining the right money-saving size for a cluster. You’ll learn how to interact with your cluster through the Hue Web interface, from a terminal prompt, as well as through EMR steps that can execute your scripts automatically.

• Gain experience with three high value skill sets: Hadoop, AWS, and EMR
• Save time and money by learning about the undocumented "gotchas" of AWS and EMR
• See how the experts provision EMR clusters and connect to them via SSH and web UIs
• Learn to import data into a cluster and to access external data stored on Amazon’s S3
• Explore three different ways to query data using Hive and Pig
• Discover the Tez engine and see how it accelerates Hive and Pig queries
• Learn how to schedule workflows using Oozie

About WoW Team

I'm WoW Team , I love to share all the video tutorials. If you have a video tutorial, please send me, I'll post on my website. Because knowledge is not limited to, irrespective of qualifications, people join hands to help me.