|
|
|
|
|
Lecture-1 Introduction to Scala
· Introducing Scala
· Deployment of Scala for Big Data applications and Apache Spark analytics
· Scala REPL, lazy values, and control structures in Scala
· Directed Acyclic Graph (DAG)
· First Spark application using SBT/Eclipse
· Spark Web UI
· Spark in the Hadoop ecosystem.
|
|
|
|
Lecture-2 Pattern Matching
· The importance of Scala
· The concept of REPL (Read Evaluate Print Loop)
· Deep dive into Scala pattern matching
· Type interface, higher-order function, currying, traits, application space and Scala for data analysis
|
|
|
|
Lecture-3 Executing the Scala Code
· Learning about the Scala Interpreter
· Static object timer in Scala and testing string equality in Scala
· Implicit classes in Scala
· The concept of currying in Scala
· Various classes in Scala
|
|
|
|
Lecture-4 Classes Concept in Scala
· Learning about the Classes concept
· Understanding the constructor overloading
· Various abstract classes
· The hierarchy types in Scala
· The concept of object equality
· The val and var methods in Scala
|
|
|
|
Lecture-5 Case Classes and Pattern Matching
· Understanding sealed traits, wild, constructor, tuple, variable pattern, and constant pattern
|
|
|
|
Lecture-6 Concepts of Traits with Example
· Understanding traits in Scala
· The advantages of traits
· Linearization of traits
· The Java equivalent
· Avoiding of boilerplate code
|
|
|
|
Lecture-7 Scala–Java Interoperability
· Implementation of traits in Scala and Java
· Handling of multiple traits extending
|
|
|
|
Lecture-8 Scala Collections
· Introduction to Scala collections
· Classification of collections
· The difference between iterator and iterable in Scala
· Example of list sequence in Scala
|
|
|
|
Lecture-9 Mutable Collections Vs. Immutable Collections
· The two types of collections in Scala
· Mutable and immutable collections
· Understanding lists and arrays in Scala
· The list buffer and array buffer
· Queue in Scala
· Double-ended queue Deque, Stacks, Sets, Maps, and Tuples in Scala
|
|
|
|
Lecture-10 Use Case Bobsrockets Package
· Introduction to Scala packages and imports
· The selective imports
· The Scala test classes
· Introduction to JUnit test class
· JUnit interface via JUnit suite for Scala test
· Packaging of Scala applications in the directory structure
· Examples of Spark Split and Spark Scala
|
|
|
|
|
|
|
|
Lecture-11 Introduction to Spark
· Introduction to Spark
· Spark overcomes the drawbacks of working on MapReduce
· Understanding in-memory MapReduce
· Interactive operations on MapReduce
· Spark stack, fine vs coarse-grained update, Spark stack, Spark Hadoop YARN, HDFS Revision, and YARN Revision
· The overview of Spark and how it is better than Hadoop
· Deploying Spark without Hadoop
· Spark history server and Cloudera distribution
|
|
|
|
Lecture-12 Spark Basics
· Spark installation guide
· Spark configuration
· Memory management
· Executor memory vs driver memory
· Working with Spark Shell
· The concept of resilient distributed datasets (RDD)
· Learning to do functional programming in Spark
· The architecture of Spark
|
|
|
|
Lecture-13 Working with RDDs in Spark
· Spark RDD
· Creating RDDs
· RDD partitioning
· Operations and transformation in RDD
· Deep dive into Spark RDDs
· The RDD general operations
· Read-only partitioned collection of records
· Using the concept of RDD for faster and efficient data processing
· RDD action for the collect, count, collects map, save-as-text-files, and pair RDD functions
|
|
|
|
Lecture-14 Aggregating Data with Pair RDDs
· Understanding the concept of key-value pair in RDDs
· Learning how Spark makes MapReduce operations faster
· Various operations of RDD
· MapReduce interactive operations
· Fine and coarse-grained update
· Spark stack
|
|
|
|
Lecture-15 Writing and Deploying Spark Applications
· Comparing the Spark applications with Spark Shell
· Creating a Spark application using Scala or Java
· Deploying a Spark application
· Scala built application
· Creation of the mutable list, set and set operations, list, tuple, and concatenating list
· Creating an application using SBT
· Deploying an application using Maven
· The web user interface of Spark application
· A real-world example of Spark
· Configuring of Spark
|
|
|
|
Lecture-16 Parallel Processing
· Learning about Spark parallel processing
· Deploying on a cluster
· Introduction to Spark partitions
· File-based partitioning of RDDs
· Understanding of HDFS and data locality
· Mastering the technique of parallel operations
· Comparing repartition and coalesce
· RDD actions
|
|
|
|
Lecture-17 Spark RDD Persistence
· The execution flow in Spark
· Understanding the RDD persistence overview
· Spark execution flow, and Spark terminology
· Distribution shared memory vs RDD
· RDD limitations
· Spark shell arguments
· Distributed persistence
· RDD lineage
· Key-value pair for sorting implicit conversions like CountByKey, ReduceByKey, SortByKey, and AggregateByKey
|
|
|
|
Lecture-18 Spark MLlib
· Introduction to Machine Learning
· Types of Machine Learning
· Introduction to MLlib
· Various ML algorithms supported by MLlib
· Linear regression, logistic regression, decision tree, random forest, and K-means clustering techniques
|
|
|
|
Lecture-19 Integrating Apache Flume and Apache Kafka
· Why Kafka and what is Kafka?
· Kafka architecture
· Kafka workflow
· Configuring Kafka cluster
· Operations
· Kafka monitoring tools
· Integrating Apache Flume and Apache Kafka
|
|
|
|
Lecture-20 Spark Streaming
· Introduction to Spark Streaming
· Features of Spark Streaming
· Spark Streaming workflow
· Initializing StreamingContext, discretized Streams (DStreams), input DStreams and Receivers
· Transformations on DStreams, output operations on DStreams, windowed operators and why it is useful
· Important windowed operators and stateful operators
|
|
|
|
Lecture-21 Improving Spark Performance
· Introduction to various variables in Spark like shared variables and broadcast variables
· Learning about accumulators
· The common performance issues
· Troubleshooting the performance problems
|
|
|
|
Lecture-22 Spark SQL and Data Frames
· Learning about Spark SQL
· The context of SQL in Spark for providing structured data processing
· JSON support in Spark SQL
· Working with XML data
· Parquet files
· Creating Hive context
· Writing data frame to Hive
· Reading JDBC files
· Understanding the data frames in Spark
· Creating Data Frames
· Manual inferring of schema
· Working with CSV files
· Reading JDBC tables
· Data frame to JDBC
· User-defined functions in Spark SQL
· Shared variables and accumulators
· Learning to query and transform data in data frames
· Data frame provides the benefit of both Spark RDD and Spark SQL
· Deploying Hive on Spark as the execution engine
|
|
|
|
Lecture-23 Scheduling/Partitioning
· Learning about the scheduling and partitioning in Spark
· Hash partition
· Range partition
· Scheduling within and around applications
· Static partitioning, dynamic sharing, and fair scheduling
· Map partition with index, the Zip, and GroupByKey
· Spark master high availability, standby masters with ZooKeeper, single-node recovery with the local file system and high order functions
|
|
|