Learning Tracks

Structured deep-dives on LLMs, RAG, multimodal models, and more. Pick a track and start exploring.

No tracks found.

Math Essentials

The core math behind deep learning — activation functions, softmax, loss functions, entropy, and KL divergence — explained visually with code.

ReLU Softmax Cross-Entropy KL Divergence Sigmoid Loss Functions

PyTorch Cheatsheet

From tensors to transistors — memory layout, autograd, the compilation stack, torch.compile, and how your Python code becomes GPU microcode.

PyTorch Tensors Autograd CUDA Triton torch.compile cuBLAS

GPU Architecture & CUDA

From transistors to kernels — streaming multiprocessors, the CUDA programming model, the software stack, and the roofline model for performance analysis.

GPU CUDA SM Tensor Core Roofline cuBLAS Warp SASS

Transformers

From the attention mechanism to full encoder and decoder architectures — how transformers process sequences, why each component exists, and how to build one from scratch.

Attention Multi-Head Encoder Decoder BERT GPT Pre-training SFT

NanoGPT Speedrun

Incremental improvements that push GPT pre-training efficiency to its limits — from baseline to SOTA in hours.

GPT-2 Pre-training Optimization CUDA

Fine-tuning

From full fine-tuning to LoRA and QLoRA — how to adapt foundation models to your task, build instruction datasets, run distributed training, evaluate results, and merge models.

LoRA QLoRA PEFT SFT Instruction Tuning DeepSpeed Model Merging Evaluation

RLHF & Alignment

From SFT's rigidity to reinforcement learning — how PPO, DPO, and GRPO align language models with human preferences, with formulas, implementations, and the HuggingFace TRL ecosystem.

SFT PPO RLHF DPO GRPO Reward Model Alignment TRL

RAG Pipelines

From TF-IDF and BM25 to dense bi-encoders, hybrid fusion, rerankers, HNSW indexing, and fine-tuning with contrastive losses — everything you need to build and understand production-grade retrieval-augmented generation systems.

BM25 SPLADE Dense Retrieval ColBERT RRF Rerankers HNSW Fine-tuning

Inference Optimization

KV-cache, speculative decoding, quantization (GPTQ, AWQ, GGUF), continuous batching, and the engineering behind serving LLMs at scale with low latency.

KV-Cache Quantization Speculative Batching

Vision-Language Models

From CLIP to LLaVA — contrastive pre-training, Vision Transformers, SigLIP, DINOv2, multimodal fusion, and visual instruction tuning.

CLIP SigLIP DINOv2 ViT LLaVA Multimodal VQA Contrastive

Vision-Language-Action Models

VLAs for robotics — grounding language and vision into motor policies, from OpenVLA to diffusion-based action prediction.

Robotics OpenVLA Motor Policy Embodied AI

Diffusion Models

Score-based generative models, DDPM, DDIM, classifier-free guidance, and latent diffusion — the architecture behind modern image and video generation.

DDPM Latent Diffusion CFG Image Gen

Mixture of Experts

Sparse MoE layers, routing algorithms, load balancing, and how models like Mixtral and GPT-4 scale to hundreds of billions of parameters efficiently.

MoE Routing Mixtral Sparse

State Space Models

Mamba, S4, and the family of structured SSMs that achieve linear-time sequence modelling as a competitive alternative to the Transformer attention mechanism.

Mamba S4 Linear-Time SSM

Agents & Tool Use

LLM-powered agents, function calling, ReAct, multi-agent systems, and the infrastructure for autonomous task execution.

ReAct Function Calling Multi-Agent Autonomy

Benchmarks & Evaluation

MMLU, HumanEval, HELM, lm-evaluation-harness, and the methodology behind measuring, comparing, and stress-testing language model capabilities.

MMLU HumanEval HELM Evaluation