Lesson 1: Getting started with Talend
o Working of Talend,Introduction to Talend Open Studio and its Usability
o What is Meta Data?
Lesson 2: Jobs
o Creating a new Job,Concept and creation of Delimited file,Using Meta Data and its Significance
o What is propagation?
o Data integration schema,Creating Jobs using t-filter row and string filter
o Input delimation file creation.
Lesson 3: Overview of Schema and Aggregation
o Job design and its features,What is a T map?
o Data Aggregation,Introduction to triplicate and its Working
o Significance and working of tlog,T map and its properties. Lesson 4: Connectivity with Data Source
o Extracting data from the source,Source and Target in Database (MySQL)
o Creating a connection
o Importing Schema or Metadata.
Lesson 5: Getting started with Routines/Functions
o Calling and using Functions,What are Routines?
o Use of XML file in Talend,Working of Format data functions
o What is type casting?
Lesson 6: Data Transformation
o Defining Context variable
o Learning Parameterization in ETL,Writing an example using trow generator
o Define and Implement Sorting
o What is Aggregator?
o Using t flow for publishing data
o Running Job in a loop.
Lesson 7: Connectivity with Hadoop
o Learn to start Trish Server,Connectivity of ETL tool connect with Hadoop
o Define ETL method
o Implementation of Hive,Data Import into Hive with an example,An example of Partitioning in hive
o Reason behind no customer table overwriting?,Component of ETL,Hive vs. Pig,Data Loading using demo customer
o ETL Tool,Parallel Data Execution.
Lesson 8: Introduction to Hadoop and its Ecosystem, Map Reduce and HDFS
o Big Data, Factors constituting Big Data,Hadoop and Hadoop Ecosystem,Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency
o Hadoop Distributed File System (HDFS) Concepts and its Importance,Deep Dive in Map Reduce – Execution Framework, Partitioner Combiner
o Data Types, Key pairs,HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow
o Parallel Copying with DISTCP, Hadoop Archives.
Lesson 9: Hands on Exercises
o Installing Hadoop in Pseudo Distributed Mode
o Understanding Important configuration files
o their Properties and Demon Threads,Accessing HDFS from Command Line
o Map Reduce – Basic Exercises,Understanding Hadoop Eco-system,Introduction to Sqoop
o use cases and Installation,Introduction to Hive
o use cases and Installation,Introduction to Pig
o use cases and Installation,Introduction to Oozie
o use cases and Installation
o Introduction to Flume, use cases and Installation
o Introduction to Yarn
o Mini Project – Importing Mysql Data using Sqoop and Querying it using Hive.
Lesson 10: Deep Dive in Map Reduce
o How to develop Map Reduce Application
o writing unit test,Best Practices for developing and writing
o Debugging Map Reduce applications
o Joining Data sets in Map Reduce.
Lesson 11: Hive
o Introduction to Hive
o What Is Hive?,Hive Schema and Data Storage,Comparing Hive to Traditional Databases,Hive vs. Pig,Hive Use Cases,Interacting with Hive
o Relational Data Analysis with Hive
o Hive Databases and Tables,Basic HiveQL Syntax,Data Types ,Joining Data Sets,Common Built-in Functions,Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue
o Hive Data Management:Hive Data Formats,Creating Databases and Hive-Managed Tables,Loading Data into Hive,Altering Databases and Tables,Self-Managed Tables,Simplifying Queries with Views,Storing Query Results,Controlling Access to Data,Hands-On Exercise: Data Management with Hive
o Hive Optimization: Understanding Query Performance,Partitioning,Bucketing,Indexing Data
o Extending Hive: Topics : User-Defined Functions
o Hands on Exercises – Playing with huge data and Querying extensively.
o User defined Functions, Optimizing Queries, Tips and Tricks for performance tuning.
Lesson 12: Pig
o Introduction to Pig:What Is Pig?,Pig’s Features,Pig Use Cases,Interacting with Pig
o Basic Data Analysis with Pig: Pig Latin Syntax, Loading Data,Simple Data Types,Field Definitions,Data Output,Viewing the Schema,Filtering and Sorting Data,Commonly-Used Functions,Hands-On
Exercise: Using Pig for ETL Processing
o Processing Complex Data with Pig: Complex/Nested Data Types,Grouping,Iterating Grouped Data,Hands-On Exercise: Analyzing Data with Pig
o Multi-Data set Operations with Pig: Techniques for Combining Data Sets,Joining Data Sets in Pig,Set Operations,Splitting Data Sets,Hands-On Exercise
o Extending Pig: Macros and Imports,UDFs,Using Other Languages to Process Data with Pig,Hands-On Exercise: Extending Pig with Streaming and UDFs.
Lesson 13: Impala
o A. Introduction to Impala
o What is Impala?
o How Impala Differs from Hive and Pig,How Impala Differs from Relational Databases,Limitations and Future Directions Using the Impala Shell
o Choosing the best (Hive, Pig, Impala)
o Major Project – Putting it all together and Connecting Dots:Putting it all together and Connecting Dots,Working with Large data sets
o Steps involved in analyzing large data.
Lesson 14: ETL Connectivity with Hadoop Ecosystem, Job and Certification Support
o How ETL tools work in big data Industry,Connecting to HDFS from ETL tool and moving data from Local system to HDFS
o Moving Data from DBMS to HDFS,Working with Hive with ETL Tool,Creating Map Reduce job in ETL tool,End to End ETL PoC showing Hadoop integration with ETL tool.
o Major Project, Hadoop Development
o cloudera Certification Tips and Guidance and Mock Interview Preparation
o Practical Development Tips and Techniques
o certification preparation.