Big Data

Big data describes structured, semi structured and unstructured huge volume of data that grows exponentially with time. Such huge and large data cannot be handled and processed with traditional DBMS. Here we will learn how to handle Big Data and process it in Industry.

Category:

COURSE OVERVIEW

What is Big Data?
5Vs of Big Data
Big Data Architecture Overview
Batch vs Stream Processing
Introduction to Hadoop
HDFS and its Components
Working with Hadoop Commands
Introduction to YARN
Introduction to Hive
Creating Tables and Writing Hive Queries
Partitioning and Bucketing in Hive
Introduction to Pig
Writing Pig Scripts using Pig Latin
Pig vs Hive Comparison
Introduction to HBase
Creating and Querying Data in HBase
HBase vs RDBMS
Introduction to Sqoop
Importing and Exporting Data using Sqoop
Introduction to Flume
Using Flume for Data Ingestion
Introduction to Apache Spark
Spark vs Hadoop MapReduce
Working with RDDs and DataFrames
Writing SQL in Spark (Spark SQL)
Basics of PySpark
Introduction to Spark Streaming
Big Data on Cloud (AWS, Azure, GCP)
Big Data Use Cases

Course Outcome

By the end of this course, you’ll be able to:

  • Understand Big Data concepts, challenges, and applications.

  • Analyze and process large datasets using Hadoop/Spark.

  • Apply tools like Hive, Pig, and NoSQL for data management.

  • Implement data pipelines for structured & unstructured data.

  • Evaluate performance and scalability of Big Data systems.

  • Develop end-to-end Big Data solutions and projects.

  • Integrate Big Data with Cloud, ML, and AI.