Learning Tracks

Structured deep-dives on LLMs, RAG, multimodal models, and more. Pick a track and start exploring.

No tracks found.

NanoGPT Speedrun

Incremental improvements that push GPT pre-training efficiency to its limits — from baseline to SOTA in hours.

GPT-2 Pre-training Optimization CUDA

RAG Pipelines

From TF-IDF and BM25 to dense bi-encoders, hybrid fusion, rerankers, HNSW indexing, and fine-tuning with contrastive losses — everything you need to build and understand production-grade retrieval-augmented generation systems.

BM25 SPLADE Dense Retrieval ColBERT RRF Rerankers HNSW Fine-tuning

Decoder Models

Autoregressive language models from the ground up — architecture, attention, decoding strategies, and scaling laws.

Transformers Attention Autoregressive Scaling

Encoder Fine-tuning

BERT-style bidirectional models, masked language modelling, and efficient fine-tuning techniques like LoRA and adapters.

BERT LoRA Adapters MLM

Vision-Language Models

Multimodal architectures that align visual and textual representations — CLIP, contrastive pre-training, and VQA.

CLIP Multimodal VQA Contrastive

Vision-Language-Action Models

VLAs for robotics — grounding language and vision into motor policies, from OpenVLA to diffusion-based action prediction.

Robotics OpenVLA Motor Policy Embodied AI

RLHF & Alignment

Reinforcement learning from human feedback, reward modelling, PPO, DPO, and the techniques that align language models with human preferences.

PPO DPO Reward Model Alignment

Agents & Tool Use

LLM-powered agents, function calling, ReAct, multi-agent systems, and the infrastructure for autonomous task execution.

ReAct Function Calling Multi-Agent Autonomy

Diffusion Models

Score-based generative models, DDPM, DDIM, classifier-free guidance, and latent diffusion — the architecture behind modern image and video generation.

DDPM Latent Diffusion CFG Image Gen

Mixture of Experts

Sparse MoE layers, routing algorithms, load balancing, and how models like Mixtral and GPT-4 scale to hundreds of billions of parameters efficiently.

MoE Routing Mixtral Sparse

State Space Models

Mamba, S4, and the family of structured SSMs that achieve linear-time sequence modelling as a competitive alternative to the Transformer attention mechanism.

Mamba S4 Linear-Time SSM

Benchmarks & Evaluation

MMLU, HumanEval, HELM, lm-evaluation-harness, and the methodology behind measuring, comparing, and stress-testing language model capabilities.

MMLU HumanEval HELM Evaluation

Inference Optimization

KV-cache, speculative decoding, quantization (GPTQ, AWQ, GGUF), continuous batching, and the engineering behind serving LLMs at scale with low latency.

KV-Cache Quantization Speculative Batching