Learning Tracks
Structured deep-dives on LLMs, RAG, multimodal models, and more. Pick a track and start exploring.
No tracks found.
Math Essentials
The core math behind deep learning — activation functions, softmax, loss functions, entropy, KL divergence, and probability distributions — explained visually with code and interactive plots.
PyTorch Cheatsheet
From tensors to transistors — memory layout, autograd, the compilation stack, torch.compile, and how your Python code becomes GPU microcode.
GPU Architecture & CUDA
From transistors to kernels — streaming multiprocessors, the CUDA programming model, the software stack, and the roofline model for performance analysis.
Transformers
From the attention mechanism to full encoder and decoder architectures — how transformers process sequences, why each component exists, and how to build one from scratch.
NanoGPT Speedrun
Incremental improvements that push GPT pre-training efficiency to its limits — from baseline to SOTA in hours.
Fine-tuning
From full fine-tuning to LoRA and QLoRA — how to adapt foundation models to your task, build instruction datasets, run distributed training, evaluate results, and merge models.
RLHF & Alignment
From SFT's rigidity to reinforcement learning — how PPO, DPO, and GRPO align language models with human preferences, with formulas, implementations, and the HuggingFace TRL ecosystem.
RAG Pipelines
From TF-IDF and BM25 to dense bi-encoders, hybrid fusion, rerankers, HNSW indexing, and fine-tuning with contrastive losses — everything you need to build and understand production-grade retrieval-augmented generation systems.
Inference Optimization
KV-cache, speculative decoding, quantization (GPTQ, AWQ, GGUF), continuous batching, and the engineering behind serving LLMs at scale with low latency.
Vision-Language Models
From CLIP to LLaVA — contrastive pre-training, Vision Transformers, SigLIP, DINOv2, multimodal fusion, and visual instruction tuning.
Vision-Language-Action Models
VLAs for robotics — grounding language and vision into motor policies, from OpenVLA to diffusion-based action prediction.
Audio & Omni Models
From waveforms and Fourier transforms to neural audio codecs, speech synthesis, and omni-modal models — how machines hear, speak, and reason about sound.
Image & Video Generation
From DDPM to latent diffusion, flow matching, and Diffusion Transformers — how Stable Diffusion, Flux, DALL-E, and Sora generate images and video from text.
Long Context & Memory
From the quadratic attention wall to RoPE scaling, sparse attention, and memory-augmented transformers — how models handle long sequences and remember beyond the context window.
Agents & Tool Use
From function calling and ReAct to MCP, computer use, and multi-agent orchestration — how LLMs go from generating text to taking actions in the real world.