Big Data Hadoop Training Course

What you will learn

Role of Relational Database Management System (RDBMS) and Grid computing

Concepts of MapReduce and HDFS

Using Hadoop I/O to write MapReduce programs

Develop MapReduce applications to solve the problems

Set up Hadoop cluster and administer

Hive, a data warehouse software, for querying and managing large datasets residing in distributed storage

Use of Sqoop in controlling the import and consistency

Spark, Spark SQL, Streaming, Data Frame, RDD, GraphX and MLlib writing Spark applications

Hadoop testing applications using MRUnit and other automation tools

Configuring ETL tools like Pentaho/Talend to work with MapReduce, Hive, Pig, etc.

Requirements

The candidates with basic understanding of computers, SQL, and elementary programing skills in Python are ideal for this training.

Description

|| About Big Data Hadoop Training Course

Big Data and Hadoop online training is essential to understand the power of Big Data. The training introduces about Hadoop, MapReduce, and Hadoop Distributed File system (HDFS). It will drive you through the process of developing distributed processing of large data sets across clusters of computers and administering Hadoop. The participants will learn how to handle heterogeneous data coming from different sources. This data may be structured, unstructured, communication records, log files, audio files, pictures, and videos.

BIT’s Big Data Hadoop training program helps you master Big Data Hadoop Administration. In this Big Data course, you will master MapReduce, Hive, Pig, Sqoop, Oozie and Flume and work with Amazon EC2 for cluster setup, Spark framework and RDD, Scala and Spark SQL etc. The Big Data Hadoop professional training is designed to give you an in-depth knowledge of the Big Data framework using Hadoop and Spark. In this hands-on Hadoop course, you will execute real-life, industry-based projects using Integrated Lab. Big Data is one of the accelerating and most promising fields, considering all the technologies available in the IT market today. In order to take benefit of these opportunities, you need a structured training with the latest curriculum as per current industry requirements and best practices. Besides strong theoretical understanding, you need to work on various real world big data projects using different Big Data and Hadoop tools as a part of solution strategy. Additionally, you need the guidance of a Hadoop expert who is currently working in the industry on real world Big Data projects and troubleshooting day to day challenges while implementing them.

Course Content

	Live Lecture · Understanding Big Data · Types of Big Data · Big Data Challenges · Limitations & Solutions of Big Data Architecture · Hadoop & its Features · Hadoop Ecosystem · Different Hadoop Distributions · Difference between Traditional Data and Big Data · Hadoop 2.x Core Components Preview · Hadoop Storage: HDFS (Hadoop Distributed File System) · Hadoop Processing: MapReduce Framework · Distributed Data Storage in Hadoop, HDFS and Hbase · Hadoop Data processing Analyzing Services MapReduce and spark, Hive Pig and Storm · Data Integration Tools in Hadoop · Resource Management and cluster management Services · Practical Exercise

	Live Lecture · Hadoop 2.x Cluster Architecture · Federation and High Availability Architecture · Typical Production Hadoop Cluster · Hadoop Cluster Modes · Common Hadoop Shell Commands · Hadoop 2.x Configuration Files · Single Node Cluster & Multi-Node Cluster set up · Basic Hadoop Administration · Need of Hadoop in Big Data · The MapReduce Framework · What is YARN? · Understanding Big Data Components · Monitoring, Management and Orchestration Components of Hadoop Ecosystem · Different Distributions of Hadoop · Practical Exercise

	Live Lecture · Hortonworks sandbox installation & configuration · Hadoop Configuration files · Working with Hadoop services using Ambari · Hadoop Daemons · Browsing Hadoop UI consoles · Basic Hadoop Shell commands · Eclipse & winscp installation & configurations on VM · Practical Exercise

	Live Lecture · Running a MapReduce application in MR2 · MapReduce Framework on YARN · Fault tolerance in YARN · Map, Reduce & Shuffle phases · Understanding Mapper, Reducer & Driver classes · Writing MapReduce WordCount program · Executing & monitoring a Map Reduce job · Counters · Distributed Cache · MRunit · Reduce Join · Custom Input Format · Sequence Input Format · XML file Parsing using MapReduce · Practical Exercise

	Live Lecture · Introduction to Apache Pig · MapReduce vs Pig · Pig Components & Pig Execution · Pig architecture · Pig Data Types & Data Models in Pig · Pig Latin Programs · Shell and Utility Commands · Pig processing – loading and transforming data · Pig built-in functions · Filtering, grouping, sorting data · Relational join operators · Pig UDF & Pig Streaming · Testing Pig scripts with Punit · Aviation use-case in PIG · Pig Demo of Healthcare Dataset · Practical Exercise

	Live Lecture · Background of Hive · Hive vs Pig · Hive architecture and Components · Hive Metastore · Comparison with Traditional Database · Limitations of Hive · Hive Query Language · Derby to MySQL database · Managed & external tables · Data processing – loading data into tables · Hive Query Language · Using Hive built-in functions · Hive Data Types and Data Models · Partitioning data using Hive · Bucketing data · Hive Scripting · Using Hive UDF's · Hive Tables (Managed Tables and External Tables) · Importing Data · Querying Data & Managing Outputs · Hive Demo on Healthcare Dataset · Practical Exercise

	Live Lecture · Hive QL: Joining Tables, Dynamic Partitioning · Custom MapReduce Scripts · Hive Indexes and views · Hive Query Optimizers · Hive Thrift Server · Hive UDF · Apache HBase: Introduction to NoSQL Databases and HBase · HBase v/s RDBMS · HBase Components · HBase Architecture · HBase shell · HBase Client API · Hive Data Loading Techniques · HBase Run Modes · HBase Configuration · Creating table · Creating column families · CLI commands – get, put, delete & scan · Scan Filter operations · Zookeeper & its role in HBase environment · Apache Zookeeper Introduction · ZooKeeper Data Model · Zookeeper Service · HBase Bulk Loading · Getting and Inserting Data · HBase Filters · Practical Exercise

	Live Lecture · What is Spark · Spark Ecosystem · Spark Components · What is Scala · Why Scala · Spark Context · Spark RDD · A short introduction to streaming · Spark Streaming · Discretized Streams · Stateful and stateless transformations · Checkpointing · Operating with other streaming platforms (such as Apache Kafka) · Structured Streaming · Practical Exercise

	Live Lecture · Importing data from RDBMS to HDFS · Exporting data from HDFS to RDBMS · Importing & exporting data between RDBMS & Hive tables · Practical Exercise

	Live Lecture · Overview of Oozie · Oozie Workflow Architecture · Creating workflows with Oozie · Introduction to Flume · Flume Architecture · Flume Demo · Practical Exercise

	Live Lecture · Introduction · Tableau · Chart types · Data visualization tools · Practical Exercise

	Live Lecture · Cloud computing basics · Concepts and terminology · Goals and benefits · Risks and challenges · Roles and boundaries · Cloud characteristics · Cloud delivery models · Cloud deployment models · Practical Exercise

Fees

Offline Training @ Vadodara

Classroom Based Training
Practical Based Training
No Cost EMI Option

~~45000~~ 40000

Online Training preferred

Live Virtual Classroom Training
1:1 Doubt Resolution Sessions
Recorded Live Lectures*
Flexible Schedule

~~40000~~ 35000

Corporate Training

Customized Learning
Onsite Based Corporate Training
Online Corporate Training
Certified Corporate Training

Certification

Upon the completion of the Classroom training, you will have an Offline exam that will help you prepare for the Professional certification exam and score top marks. The BIT Certification is awarded upon successfully completing an offline exam after reviewed by experts
Upon the completion of the training, you will have an online exam that will help you prepare for the Professional certification exam and score top marks. BIT Certification is awarded upon successfully completing an online exam after reviewed by experts.
This course is designed to clear Cloudera Certification Exam: CCA Spark and Hadoop Developer Exam (CCA175)