Autoplay
Autocomplete
Previous Lesson
Complete and Continue
AI Engineering: Customizing LLMs for Business (Fine-Tuning LLMs with QLoRA & AWS)
Introduction
Course Introduction (What We're Building) (5:19)
Exercise: Meet Your Classmates and Instructor
Course Resources
ZTM Plugin + Understanding Your Video Player
Set Your Learning Streak Goal
Setting up our AWS Account
Signing in to AWS (4:30)
Creating an IAM User (5:29)
Using our new IAM User (3:12)
What To Do In Case You Get Hacked! (1:30)
Setting Up AWS Sagemaker Environment
Creating a SageMaker Domain (2:28)
Logging in to our SageMaker Environment (4:53)
Introduction to JupyterLab (7:37)
Let's Have Some Fun (+ More Resources)
Gathering, Chunking, Tokenizing and Uploading our Dataset
Sagemaker Sessions, Regions, and IAM Roles (7:50)
Examining Our Dataset from HuggingFace (13:29)
Tokenization and Word Embeddings (9:08)
HuggingFace Authentication with Sagemaker (4:21)
Applying the Templating Function to our Dataset (8:43)
Attention Masks and Padding (15:55)
Star Unpacking with Python (4:03)
Chain Iterator, List Constructor and Attention Mask example with Python (10:22)
Understanding Batching (8:11)
Slicing and Chunking our Dataset (7:31)
Creating our Custom Chunking Function (16:06)
Tokenizing our Dataset (9:30)
Running our Chunking Function (4:30)
Understanding the Entire Chunking Process (8:32)
Uploading the Training Data to AWS S3 (5:53)
Course Check-In
Understanding LoRA and Setting up HuggingFace Estimator
Setting Up Hyperparameters for the Training Job (6:47)
Creating our HuggingFace Estimator in Sagemaker (6:45)
Introduction to Low-rank adaptation (LoRA) (8:11)
LoRA Numerical Example (10:55)
LoRA Summarization and Cost Saving Calculation (9:08)
(Optional) Matrix Multiplication Refresher (4:45)
Understanding LoRA Programatically Part 1 (12:32)
Understanding LoRA Programatically Part 2 (5:48)
Unlimited Updates
Improving Training Speed with Bfloat 16
Bfloat16 vs Float32 (8:10)
Comparing Bfloat16 Vs Float32 Programatically (6:32)
Implement a New Life System - at end of 3rd section
Setting up the QLoRA Training Script with Mixed Precision & Double Quantization
Setting up Imports and Libraries for the Train Script (7:19)
Argument Parsing Function Part 1 (7:56)
Argument Parsing Function Part 2 (10:54)
Understanding Trainable Parameters Caveats (14:30)
Introduction to Quantization (7:35)
Identifying Trainable Layers for LoRA (7:19)
Setting up Parameter Efficient Fine Tuning (4:36)
Implement LoRA Configuration and Mixed Precision Training (10:34)
Understanding Double Quantization (4:21)
Creating the Training Function Part 1 (14:14)
Creating the Training Function Part 2 (7:16)
Exercise: Imposter Syndrome (2:55)
Finishing our Sagemaker Script (5:09)
Gaining Access to Powerful GPUs with AWS Quotas (5:10)
Final Fixes Before Training (3:54)
Running our Fine Tuning Script for our LLM
Starting our Training Job (7:15)
Inspecting the Results of our Training Job and Monitoring with Cloudwatch (11:23)
Deploying our Fine Tuned LLM
Deploying our LLM to a Sagemaker Endpoint (17:57)
Testing our LLM in Sagemaker Locally (8:18)
Creating the Lambda Function to Invoke our Endpoint (8:55)
Creating API Gateway to Deploy the Model Through the Internet (2:36)
Implementing our Streamlit App (5:11)
Streamlit App Correction (3:26)
Cleaning up Resources
Congratulations and Cleaning up AWS Resources (2:38)
Where To Go From Here?
Thank You! (1:17)
Review This Course!
Become An Alumni
Learning Guideline
ZTM Events Every Month
LinkedIn Endorsements
Course Introduction (What We're Building)