Autoplay
Autocomplete
Previous Lesson
Complete and Continue
The Data Engineering Bootcamp: Zero to Mastery
Section 00 - Introduction
The Data Engineering Bootcamp: Zero to Mastery (1:34)
Exercise: Meet Your Classmates and Instructor
Course Resources
ZTM Plugin + Understanding Your Video Player
Set Your Learning Streak Goal
Section 01 - Introduction to Data Engineering
Introduction to Data Engineering (4:16)
Who Are Data Engineers? (4:42)
Prerequisites (3:18)
Source Code for This Bootcamp (1:18)
Plan for This Bootcamp (4:37)
[Optional] What Is a Virtualenv? (6:36)
[Optional] What Is Docker? (11:02)
Section 02 - Big Data Processing with Apache Spark: Process & Analyze Real-World Airbnb Data
Introduction (4:08)
Apache Spark (3:43)
How Spark Works (4:23)
Spark Application (7:40)
DataFrames (6:42)
Installing Spark (5:50)
Inside Airbnb Data (7:01)
Writing Your First Spark Job (7:04)
Lazy Processing (2:16)
[Exercise] Basic Functions (1:28)
[Exercise] Basic Functions - Solution (6:41)
Aggregating Data (3:59)
Joining Data (4:39)
Aggregations and Joins with Spark (6:09)
Complex Data Types (5:08)
[Exercise] Aggregate Functions (0:49)
[Exercise] Aggregate Functions - Solution (5:53)
User Defined Functions (3:25)
Data Shuffle (6:13)
Data Accumulators (3:41)
Optimizing Spark Jobs (7:38)
Submitting Spark Jobs (4:28)
Other Spark APIs (5:15)
Spark SQL (4:32)
[Exercise] Advanced Spark (2:10)
[Exercise] Advanced Spark - Solution (5:25)
Summary (3:07)
Let's Have Some Fun (+ More Resources)
Section 03 - Creating a Data Lake with AWS
Introduction (4:25)
What Is a Data Lake? (9:08)
Amazon Web Services (AWS) (7:46)
Simple Storage Service (S3) (5:44)
Setting Up an AWS Account (9:29)
Data Partitioning (3:23)
Using S3 (7:48)
EMR Serverless (2:58)
IAM Roles (2:51)
Running a Spark Job (8:48)
Parquet Data Format (7:41)
Implementing a Data Catalog (5:31)
Data Catalog Demo (6:41)
Querying a Data Lake (3:59)
Summary (3:38)
Unlimited Updates
Section 04 - Implementing Data Pipelines with Apache Airflow
Introduction (5:52)
What Is Apache Airflow? (5:18)
Airflow’s Architecture (3:14)
Installing Airflow (6:32)
Defining an Airflow DAG (8:02)
Errors Handling (3:37)
Idempotent Tasks (4:53)
Creating a DAG - Part 1 (4:58)
Creating a DAG - Part 2 (4:41)
Handling Failed Tasks (4:08)
[Exercise] Data Validation (4:30)
[Exercise] Data Validation - Solution (3:26)
Spark with Airflow (3:01)
Using Spark with Airflow - Part 1 (7:38)
Using Spark with Airflow - Part 2 (5:51)
Sensors In Airflow (4:45)
Using File Sensors (4:07)
Data Ingestion (5:49)
Reading Data From Postgres - Part 1 (6:02)
Reading Data from Postgres - Part 2 (5:39)
[Exercise] Average Customer Review (3:52)
[Exercise] Average Customer Review - Solution (4:32)
Advanced DAGs (4:25)
Summary (2:26)
Course Check-In
Section 05 - Machine Learning with Spark ML: Create a Data Pipeline, Train a Model + more
Introduction (5:27)
What Is Machine Learning (6:05)
Regression Algorithms (5:37)
Building a Regression Model (5:03)
Training a Model (9:45)
Model Evaluation (7:25)
Testing a Regression Model (3:56)
Model Lifecycle (2:11)
Feature Engineering (8:43)
Improving a Regression Model (7:33)
Machine Learning Pipelines (3:55)
Creating a Pipeline (2:40)
[Exercise] House Price Estimation (1:58)
[Exercise] House Price Estimation - Solution (3:12)
[Exercise] Imposter Syndrome (2:55)
Classification (7:36)
Classifiers Evaluation (4:26)
Training a Classifier (8:30)
Hyperparameters (8:05)
Optimizing a Model (3:01)
[Exercise] Loan Approval (2:33)
[Exercise] Load Approval - Solution (2:32)
Deep Learning (6:55)
Summary (3:23)
Implement a New Life System
Section 06 - Using AI with Data Engineering: LLMs, HuggingFace + more
Introduction (5:06)
Natural Language Processing (NLP) before LLMs (6:10)
Transformers (6:20)
Types of LLMs (7:39)
Hugging Face (2:18)
Databricks Set Up (10:37)
Using an LLM (7:35)
Structured Output (3:41)
Producing JSON Output (5:09)
LLMs With Apache Spark (5:19)
Summary (2:47)
Section 07 - Real-Time Data Processing ("Stream Processing") with Apache Kafka
Introduction (6:05)
What Is Apache Kafka? (6:59)
Partitioning Data (8:55)
Kafka API (7:41)
Kafka Architecture (3:14)
Set Up Kafka (5:52)
Writing to Kafka (6:06)
Reading from Kafka (7:36)
Data Durability (6:38)
Kafka vs Queues (2:10)
[Exercise] Processing Records (3:43)
[Exercise] Processing Records - Solution (2:58)
Delivery Semantics (5:52)
Kafka Transactions (4:33)
Log Compaction (3:22)
Kafka Connect (6:58)
Using Kafka Connect (9:44)
Outbox Pattern (4:30)
Schema Registry (8:00)
Using Schema Registry (8:09)
Tiered Storage (3:27)
[Exercise] Track Order Status Changes (4:26)
[Exercise] Track Order Status Changes - Solution (5:05)
Summary (4:40)
Section 08 - Stream Processing with Apache Flink
Introduction (5:40)
What Is Apache Flink? (5:23)
Kafka Application (8:10)
Multiple Streams (3:10)
Installing Apache Flink (5:45)
Processing Individual Records (7:21)
[Exercise] Stream Processing (4:01)
[Exercise] Stream Processing - Solution (2:39)
Time Windows (6:48)
Keyed Windows (2:39)
Using Time Windows (5:17)
Watermarks (10:05)
Advanced Window Operations (6:16)
Stateful Stream Processing (7:49)
Using Local State (4:41)
[Exercise] Anomalies Detection (4:34)
[Exercise] Anomalies Detection - Solution (3:33)
Joining Streams (5:49)
Summary (3:09)
Where To Go From Here?
Thank You! (1:17)
Review This Course!
Become An Alumni
Learning Guideline
ZTM Events Every Month
LinkedIn Endorsements
Writing Your First Spark Job
This lecture is available exclusively for ZTM Academy members.
If you're already a member,
you'll need to login
.
Join ZTM To Unlock All Lectures