Learning Tracks
Structured deep-dives on LLMs, RAG, multimodal models, and more. Pick a track and start exploring.
No tracks found.
Math Essentials
The core math behind deep learning — activation functions, softmax, loss functions, entropy, and KL divergence — explained visually with code.
PyTorch Cheatsheet
From tensors to transistors — memory layout, autograd, the compilation stack, torch.compile, and how your Python code becomes GPU microcode.
GPU Architecture & CUDA
From transistors to kernels — streaming multiprocessors, the CUDA programming model, the software stack, and the roofline model for performance analysis.
Transformers
From the attention mechanism to full encoder and decoder architectures — how transformers process sequences, why each component exists, and how to build one from scratch.
NanoGPT Speedrun
Incremental improvements that push GPT pre-training efficiency to its limits — from baseline to SOTA in hours.
Fine-tuning
From full fine-tuning to LoRA and QLoRA — how to adapt foundation models to your task, build instruction datasets, run distributed training, evaluate results, and merge models.
RLHF & Alignment
From SFT's rigidity to reinforcement learning — how PPO, DPO, and GRPO align language models with human preferences, with formulas, implementations, and the HuggingFace TRL ecosystem.
RAG Pipelines
From TF-IDF and BM25 to dense bi-encoders, hybrid fusion, rerankers, HNSW indexing, and fine-tuning with contrastive losses — everything you need to build and understand production-grade retrieval-augmented generation systems.
Inference Optimization
KV-cache, speculative decoding, quantization (GPTQ, AWQ, GGUF), continuous batching, and the engineering behind serving LLMs at scale with low latency.
Vision-Language Models
From CLIP to LLaVA — contrastive pre-training, Vision Transformers, SigLIP, DINOv2, multimodal fusion, and visual instruction tuning.
Vision-Language-Action Models
VLAs for robotics — grounding language and vision into motor policies, from OpenVLA to diffusion-based action prediction.
Diffusion Models
Score-based generative models, DDPM, DDIM, classifier-free guidance, and latent diffusion — the architecture behind modern image and video generation.
Mixture of Experts
Sparse MoE layers, routing algorithms, load balancing, and how models like Mixtral and GPT-4 scale to hundreds of billions of parameters efficiently.
State Space Models
Mamba, S4, and the family of structured SSMs that achieve linear-time sequence modelling as a competitive alternative to the Transformer attention mechanism.
Agents & Tool Use
LLM-powered agents, function calling, ReAct, multi-agent systems, and the infrastructure for autonomous task execution.
Benchmarks & Evaluation
MMLU, HumanEval, HELM, lm-evaluation-harness, and the methodology behind measuring, comparing, and stress-testing language model capabilities.