Azure Data Engineering Full Stack

Home Courses Azure Data Engineering Full Stack

Azure Data Engineering

Azure Data Engineering Full Stack prepares learners to manage, process, and analyze massive datasets using Microsoft Azure. The course covers Azure Data Factory, Databricks, Synapse Analytics, and cloud data architecture, enabling professionals to become skilled data engineers.

Contact for More Information
+91 96660 64406

Course Curriculum

Day 1

What is Big Data Analytics
Data Analytics Platform
Storage
Compute
Data Processing Paradigms
Monolithic Computing
Distributed Computing

Day 2

Distributed Computing Frameworks
Hadoop MapReduce
Apache Spark
Big Data Analytics : Data Lakes
Tightly Coupled Data Lake
Loosely Coupled Data Lake

Day 3

Big Data File Formats
Row Storage Format
Columnar Storage Format
Scalability
Scale-Up (Vertical Scalability)
Scale-Out (Horizontal Scalability)

Day 4: Intruduction To Azure Databricks

Core Databricks Concepts
Workspace
Notebooks
Library
Folder
Repos
Data
Compute
Workflows

Day 5: Introducing Spark Fundamentals

What is Apache Spark
Why Choose Apache Spark
What are the Spark use cases

Day 6: Spark Architecture

Spark Components
Spark Driver
SparkSession
Cluster Manager
Spark Executors

Day 7: Create Databricks Workspace

Workspace Assets

Day 8: Creating Spark Cluster

All-Purpose Cluster
Single Node Cluster
Multi Node Cluster

Day 9: Databricks - Internal Storage

Databricks File System (DBFS)
Uploading Files to DBFS

Day 10: DBUTILS Module

Interaction with DBFS
%fs Magic Command

Day 11: Spark Data API's

RDD (Resilient Distributed Dataset)
DataFrame
Dataset

Day 12: Create Data Frame

Using Python Collection
Converting RDD to DataFrame

Day 13: Reading CSV data with Apache Spark

Inferred Schema
Explicit Schema
Parsing Modes

Day 14: Reading JSON data with Apache Spark

SingleLine JSON
Multiline JSON
Complex JSON
explode() Function

Day 15: Reading XML Data with Apache Spark

Install Spark-xml Library
User Defined Schema
DDL String Approach
StructType() with StructFields()

Day 16: Reading Excel File With Apache Spark Single Sheet Reading

Multiple Sheet Reading Using List object

Day 17: Reading Excel File With Apache Spark

Multiple Excel Sheets with Same Structure
Multiple Excel Sheets with Different Structures

Day 18: Reading parquet data With Apache Spark

Uploading parquet data
View the data DataFrame
View the Schema of the DataFrame
Limitations of parquet file
Schema Evolution

Day 19: Intruduction to Delta Lake

Delta Lake Features
Delta Lake Components

Day 20: Delta lake Features

DML Operations
Time Travel Operations

Day 21: Delta lake Features

Schema Validation and Enforcement
Schema Evolution

Day 22: Access Data from Azure Blob Storage

Account Access Key
Windows Azure Storage Blob driver (WASB)
Read Operations
Write Operation

Day 23: Access Data from Azure Data Lake Gen2

Azure Service Principal
Azure Blob Filesystem driver (ABFS)
Read Operations
Write Operation

Day 24: Access Data from Azure Data Lake Gen2

Shared access signatures (SAS)
Azure Blob Filesystem driver (ABFS)
Read Operations
Write Operation

Day 25: Access Data from Azure SQL Database

Configure a connection to SQL server

Day 26: Access Data from Synapse Dedicated SQL Pool

Configure storage account access key
Read data from an Azure Synapse table
Write Data to Azure Synapse table

Day 27: Access Data from Snowflake

Reading Data
Writing Data

Day 28: Create Mount Point to Azure Cloud Storages

Azure Blob Storage
Azure Data Lake Storage

Day 29: Introduction to Spark SQL Module

Hive Metastore
Spark Catalog

Day 30: Spark SQL - Create Global Managed Tables

DataFrame API
SQL API

Day 31: Spark SQL - Create Global Un-Managed Tables

DataFrame API
SQL API

Day 32: Spark SQL\_Create Views

Temporary Views
Global Temporary Views
DataFrame API
SQL API
Dropping Views

Day 33: Spark Batch Processing

Reading Batch Data
Writing Batch Data

Day 34: Spark Structured Streaming API

Reading Streaming Data
Write Streaming Data
checkPoint Location

Day 35: Spark Structured Streaming API - outputModes

Append
Complete
Update

Day 36: Spark Structured Streaming API\_Triggers

Unspecified Trigger (Default Behavior)
trigger(availableNow = True)
trigger(processingTime = “n minutes”)

Day 37: Spark Structured Streaming API

Data Processing
Joins
Aggregation

Day 38: Code Modularity of Notebooks

%run Magic Command

Day 39: dbutils.notebook Utility

run()
exit()

Day 40: Widgets\_Types of Widgets

text
dropdown
multiselect
combobox

Day 41:Parameterization of Notebooks

History Load
Incremental Load

Day 42:Trigger Notebook from Data Factory Pipeline

Notebook Parameters

Day 43:Databricks Workflow

Orchestration of Tasks

Day 44:Databricks Workflow

Job Trigger

Day 45: Delta Lake Implementation

SCD Type0 Dimension

Day 46:Delta Lake Implementation

SCD Type1 Dimension

Day 47:Delta Lake Implementation

SCD Type2 Dimension

Day 48:Delta Lake Implementation

SCD Type3 Dimension

Day 49:PySpark Performance Optimization

Cache()
Persist()

Day 50:PySpark Performance Optimization

repartition()
coalesce()

Day 51:PySpark Performance Optimization

Column Predicate Pushdown
partitionBy()

Day 52:PySpark Performance Optimization

bucketBy()

Day 53:PySpark Performance Optimization

BroadCastJoin

Day 54:Delta Lake\_Performance Optimization

OPTIMIZE
ZORDER

Day 55:Delta Lake\_Performance Optimization

Delta Cache

Day 57:Delta Lake\_Performance Optimization

Partitioning
Liquid Clustering

Day 58:Databricks Unity Catalog

Metastore
Catalog
Schema
Tables
Volumes
Views

Day 59:Databricks Unity Catalog

Managed Tables
External Tables

Day 60: Databricks Unity Catalog

Managed Volumes
External Volumes

Day 61: Databricks - Auto Loader

Auto Loader file detection modes
Directory Listing mode
File Notification mode
Schema Evolution with Auto Loader

Day 62: Delta Live Tables

Simple Declarative SQL & Python APIs
Automated Pipeline Creation
Data Quality Checks

Azure Data Engineering Full Stack

Azure Data Engineering Full Stack

Azure Data Engineering Full Stack

Azure Data Engineering

Course Curriculum

Day 1

Day 2

Day 3

Day 4: Intruduction To Azure Databricks

Day 5: Introducing Spark Fundamentals

Day 6: Spark Architecture

Day 7: Create Databricks Workspace

Day 8: Creating Spark Cluster

Day 9: Databricks - Internal Storage

Day 10: DBUTILS Module

Day 11: Spark Data API's

Day 12: Create Data Frame

Day 13: Reading CSV data with Apache Spark

Day 14: Reading JSON data with Apache Spark

Day 15: Reading XML Data with Apache Spark

Day 16: Reading Excel File With Apache Spark Single Sheet Reading

Day 17: Reading Excel File With Apache Spark

Day 18: Reading parquet data With Apache Spark

Day 19: Intruduction to Delta Lake

Day 20: Delta lake Features

Day 21: Delta lake Features

Day 22: Access Data from Azure Blob Storage

Day 23: Access Data from Azure Data Lake Gen2

Day 24: Access Data from Azure Data Lake Gen2

Day 25: Access Data from Azure SQL Database

Day 26: Access Data from Synapse Dedicated SQL Pool

Day 27: Access Data from Snowflake

Day 28: Create Mount Point to Azure Cloud Storages

Day 29: Introduction to Spark SQL Module

Day 30: Spark SQL - Create Global Managed Tables

Day 31: Spark SQL - Create Global Un-Managed Tables

Day 32: Spark SQL\_Create Views

Day 33: Spark Batch Processing

Day 34: Spark Structured Streaming API

Day 35: Spark Structured Streaming API - outputModes

Day 36: Spark Structured Streaming API\_Triggers

Day 37: Spark Structured Streaming API

Day 38: Code Modularity of Notebooks

Day 39: dbutils.notebook Utility

Day 40: Widgets\_Types of Widgets

Day 41:Parameterization of Notebooks

Day 42:Trigger Notebook from Data Factory Pipeline

Day 43:Databricks Workflow

Day 44:Databricks Workflow

Day 45: Delta Lake Implementation

Day 46:Delta Lake Implementation

Day 47:Delta Lake Implementation

Day 48:Delta Lake Implementation

Day 49:PySpark Performance Optimization

Day 50:PySpark Performance Optimization

Day 51:PySpark Performance Optimization

Day 52:PySpark Performance Optimization

Day 53:PySpark Performance Optimization

Day 54:Delta Lake\_Performance Optimization

Day 55:Delta Lake\_Performance Optimization

Day 57:Delta Lake\_Performance Optimization

Day 58:Databricks Unity Catalog

Day 59:Databricks Unity Catalog

Day 60: Databricks Unity Catalog

Day 61: Databricks - Auto Loader

Day 62: Delta Live Tables

For More Detail Contact Now

About Us

Courses

Cloud Monks

Skill Innovations Pvt Ltd

Fill the Service Form

Corporate-Service