Course to maximize your Development & Data Analysis skills on massive data sets in the Hadoop Cluster using SQL and known Scripting languages.
Target Audience: BI Analysts, BI Developers, Data Analysts, Business Analysts, Quality Analysts, Programmers and Beginners.
MODULE-1: BIG DATA & HADOOP INTRODUCTION
-
State of Data
-
Big Data Evolution (Volume, Velocity & Variety)
-
The Motivation for Hadoop
-
Hadoop Distribution (Cloudera, Hortonworks, Map R, IBM, MicroSoft, Amazon etc.)
-
Enterprise, Cloud & Local Hadoop
MODULE-2: HADOOP ECO-SYSTEM
-
Hadoop Evolution (Gen 1 vs Hadoop Gen 2)
-
Hadoop Technology Stack
-
Hadoop Core (Common) / Projects / Incubator
-
Modern Data Architecture with existing Data Repositories
MODULE-3: HADOOP LOCAL INSTALLATION
-
Cloudera Distributed Hadoop (CDH VM)
-
HortonWorks Data Platform (HDP)
-
Apache Hadoop Overview
MODULE-4: HADOOP DISTRIBUTED FILE SYSTEM (HDFS)
-
Data Storage: HDFS
-
HDFS Architecture (Name Node, Data Node & Secondary Name Node)
-
HDFS Features & Internals
-
HDFS Interaction & Management
-
HDFS LAB Sessions
MODULE-5: MAPREDUCE & YARN
-
Distributed Data Processing: MapReduce, Yet Another Resource Negotiator (YARN)
-
MapReduce Architecture (Job Client, Job Tracker, Task Tracker)
-
MapReduce Internals (Input, Split, Map, Combine, Shuffle, Sort, Reduce, Output)
-
Classic MapReduce (Map Reduce 1) vs YARN (MapReduce 2)
-
YARN Architecture (Resource Manager, Node Manager, Application Master)
-
MapReduce LAB Sessions with Java Programs
MODULE-6: HIVE INTRODUCTION
-
Hive Architecture Overview
-
Installing and Running Hive
-
Schema and Data Storage
-
Hive Principles - Schema on Read & The Hive Warehouse
-
Hive vs. Traditional Relational Databases
-
Hive Access Tools (Shell, Web UI, Thrift Client, JDBC/ODBC Driver)
-
Hive Services
-
Hive Meta Store
-
Use Cases
MODULE-7: DEVELOPING WITH HIVE
-
Hive Query Language (HiveQL)
-
Using Command Line Interface (CLI) and Hue UI to Execute Queries
-
Data Types & Type Conversions
-
Data Storage & Managing Metadata
-
Creating / Altering Databases and Tables
-
Loading Data - Hive Managed and External Tables
-
Simplifying Queries with Views
-
Joining Datasets (Inner, Outer, Semi & Map)
-
Built-In Functions
-
Aggregation, Windowing and Analytics Functions
-
User Defined Functions (Java) & Streaming (Python) from HiveQL
-
SerDe, Performance & Security
-
HIVE LAB Sessions
MODULE-8: PIG INTRODUCTION
-
Pig Overview
-
Installing and Running Pig
-
Pig's Features & Use Cases
-
Pig Data Model, Execution Modes and Methods
-
Pig (Procedural) Vs Hive (Declarative)
MODULE-9: DEVELOPING WITH PIG LATIN
-
Pig Latin Basics
-
Data Types and Storage Formats
-
Loading and Storing Data
-
Filtering, Sorting, Grouping & Iterating Grouped Data
-
Joining and Splitting Data Sets
-
Set Operations
-
Commonly-Used Built-In Functions
-
Develop User-Defined Functions (Java) and Macros and invoking them from Pig
-
Parameter Substitution Methods
-
Troubleshooting, Debugging and Logging Pig
-
PIG LAB Session
MODULE-10: HADOOP DATA INTEGRATION, SCHEDULING & OPERATIONS (SQOOP & OOZIE) INTRODUCTION
-
Data Import / Export between Relational Databases and HDFS / HIVE
-
Workflow Development
-
Use Cases
MODULE-11: CHOOSING THE BEST TOOL FOR THE REQUIREMENT
-
Comparing MapReduce, Pig, Hive, Impala and Relational Databases
Classes: 22-25 Hours
Lab Sessions: 25 Hours
Duration: 6 Weeks
LIVE Session FEE: $450 (Special Discount for Students)
Self Paced On-Demand Videos & Material FEE: $150
***As per the tutor's discretion, some of the provided course content may be altered/omitted to suit the class needs***
**Used Images and Logos are Trademarks of the Respective Companies**
*Provided Individual Course Fee is not applicable for Corporate Customers & Students*