Autoplay
Autocomplete
Previous Lesson
Complete and Continue
AI Engineering: Customizing LLMs for Business (Fine-Tuning LLMs with QLoRA & AWS)
Introduction
Course Introduction (What We're Building) (5:19)
Exercise: Meet Your Classmates and Instructor
Course Resources
ZTM Plugin + Understanding Your Video Player
Set Your Learning Streak Goal
Setting up our AWS Account
Signing in to AWS (4:30)
Creating an IAM User (5:29)
Using our new IAM User (3:12)
What To Do In Case You Get Hacked! (1:30)
Setting Up AWS Sagemaker Environment
Creating a SageMaker Domain (2:28)
Logging in to our SageMaker Environment (4:53)
Introduction to JupyterLab (7:37)
Let's Have Some Fun (+ More Resources)
Gathering, Chunking, Tokenizing and Uploading our Dataset
Sagemaker Sessions, Regions, and IAM Roles (7:50)
Examining Our Dataset from HuggingFace (13:29)
Tokenization and Word Embeddings (9:08)
HuggingFace Authentication with Sagemaker (4:21)
Applying the Templating Function to our Dataset (8:43)
Attention Masks and Padding (15:55)
Star Unpacking with Python (4:03)
Chain Iterator, List Constructor and Attention Mask example with Python (10:22)
Understanding Batching (8:11)
Slicing and Chunking our Dataset (7:31)
Creating our Custom Chunking Function (16:06)
Tokenizing our Dataset (9:30)
Running our Chunking Function (4:30)
Understanding the Entire Chunking Process (8:32)
Uploading the Training Data to AWS S3 (5:53)
Course Check-In
Understanding LoRA and Setting up HuggingFace Estimator
Setting Up Hyperparameters for the Training Job (6:47)
Creating our HuggingFace Estimator in Sagemaker (6:45)
Introduction to Low-rank adaptation (LoRA) (8:11)
LoRA Numerical Example (10:55)
LoRA Summarization and Cost Saving Calculation (9:08)
(Optional) Matrix Multiplication Refresher (4:45)
Understanding LoRA Programatically Part 1 (12:32)
Understanding LoRA Programatically Part 2 (5:48)
Unlimited Updates
Improving Training Speed with Bfloat 16
Bfloat16 vs Float32 (8:10)
Comparing Bfloat16 Vs Float32 Programatically (6:32)
Implement a New Life System - at end of 3rd section
Setting up the QLoRA Training Script with Mixed Precision & Double Quantization
Setting up Imports and Libraries for the Train Script (7:19)
Argument Parsing Function Part 1 (7:56)
Argument Parsing Function Part 2 (10:54)
Understanding Trainable Parameters Caveats (14:30)
Introduction to Quantization (7:35)
Identifying Trainable Layers for LoRA (7:19)
Setting up Parameter Efficient Fine Tuning (4:36)
Implement LoRA Configuration and Mixed Precision Training (10:34)
Understanding Double Quantization (4:21)
Creating the Training Function Part 1 (14:14)
Creating the Training Function Part 2 (7:16)
Exercise: Imposter Syndrome (2:55)
Finishing our Sagemaker Script (5:09)
Gaining Access to Powerful GPUs with AWS Quotas (5:10)
Final Fixes Before Training (3:54)
Running our Fine Tuning Script for our LLM
Starting our Training Job (7:15)
Inspecting the Results of our Training Job and Monitoring with Cloudwatch (11:23)
Deploying our Fine Tuned LLM
Deploying our LLM to a Sagemaker Endpoint (17:57)
Testing our LLM in Sagemaker Locally (8:18)
Creating the Lambda Function to Invoke our Endpoint (8:55)
Creating API Gateway to Deploy the Model Through the Internet (2:36)
Implementing our Streamlit App (5:11)
Streamlit App Correction (3:26)
Cleaning up Resources
Congratulations and Cleaning up AWS Resources (2:38)
Where To Go From Here?
Thank You! (1:17)
Review This Course!
Become An Alumni
Learning Guideline
ZTM Events Every Month
LinkedIn Endorsements
Testing our LLM in Sagemaker Locally
This lecture is available exclusively for ZTM Academy members.
If you're already a member,
you'll need to login
.
Join ZTM To Unlock All Lectures