Cruxr.ai

Math Essentials

The core math behind deep learning — activation functions, softmax, loss functions, entropy, KL divergence, and probability distributions — explained visually with code and interactive plots.

ReLU GELU SiLU SwiGLU Softmax Log-Softmax Temperature Cross-Entropy BCE MSE KL Divergence Entropy PDF CDF Normal Distribution

+ More

PyTorch Cheatsheet

From tensors to transistors — memory layout, autograd, the compilation stack, torch.compile, and how your Python code becomes GPU microcode.

Tensors Strides Autograd ATen torch.compile Triton JAX Flax Kernel Fusion

+ More

GPU Architecture & CUDA

From transistors to kernels — streaming multiprocessors, the CUDA programming model, the software stack, and the roofline model for performance analysis.

GPU CUDA SM Tensor Core Roofline cuBLAS Warp SASS

+ More

Transformers

From the attention mechanism to full encoder and decoder architectures — how transformers process sequences, why each component exists, and how to build one from scratch.

Attention Multi-Head Causal Masking Cross-Attention Encoder Decoder BERT GPT Positional Encoding Layer Norm Pre-training SFT

+ More

NanoGPT Speedrun

Incremental improvements that push GPT pre-training efficiency to its limits — from baseline to SOTA in hours.

GPT-2 Pre-training Optimization CUDA

Fine-tuning

From full fine-tuning to LoRA and QLoRA — how to adapt foundation models to your task, build instruction datasets, run distributed training, evaluate results, and merge models.

LoRA QLoRA PEFT DoRA Adapters Prefix Tuning SFT Instruction Tuning DeepSpeed FSDP Model Merging SLERP TIES

+ More

RLHF & Alignment

From SFT's rigidity to reinforcement learning — how PPO, DPO, and GRPO align language models with human preferences, with formulas, implementations, and the HuggingFace TRL ecosystem.

SFT PPO RLHF DPO GRPO Reward Model Alignment TRL

+ More

RAG Pipelines

From TF-IDF and BM25 to dense bi-encoders, hybrid fusion, rerankers, HNSW indexing, and fine-tuning with contrastive losses — everything you need to build and understand production-grade retrieval-augmented generation systems.

TF-IDF BM25 SPLADE Dense Retrieval Bi-Encoder ColBERT RRF Reranking HNSW Chunking Embedding Pipeline

+ More

Inference Optimization

KV-cache, speculative decoding, quantization (GPTQ, AWQ, GGUF), continuous batching, and the engineering behind serving LLMs at scale with low latency.

KV-Cache GQA MQA FlashAttention GPTQ AWQ GGUF Speculative Decoding PagedAttention vLLM Continuous Batching Tensor Parallelism

+ More

Vision-Language Models

From CLIP to LLaVA — contrastive pre-training, Vision Transformers, SigLIP, DINOv2, multimodal fusion, and visual instruction tuning.

CLIP SigLIP DINOv2 ViT LLaVA Contrastive Learning Multimodal Fusion Instruction Tuning VQA

+ More

Vision-Language-Action Models

VLAs for robotics — grounding language and vision into motor policies, from OpenVLA to diffusion-based action prediction.

Robotics OpenVLA pi0 Diffusion Policy Flow Matching Action Tokenization Embodied AI Motor Control

+ More

Audio & Omni Models

From waveforms and Fourier transforms to neural audio codecs, speech synthesis, and omni-modal models — how machines hear, speak, and reason about sound.

Fourier Mel Spectrogram Whisper wav2vec HuBERT EnCodec RVQ VALL-E TTS Moshi Qwen-Audio GPT-4o Omni

+ More

Image & Video Generation

From DDPM to latent diffusion, flow matching, and Diffusion Transformers — how Stable Diffusion, Flux, DALL-E, and Sora generate images and video from text.

DDPM Stable Diffusion SDXL Flux DiT MMDiT Flow Matching CFG VAE DALL-E Imagen ControlNet IP-Adapter Sora Video Generation

+ More

Long Context & Memory

From the quadratic attention wall to RoPE scaling, sparse attention, and memory-augmented transformers — how models handle long sequences and remember beyond the context window.

RoPE ALiBi YaRN Sliding Window Titans Infini-Attention Sparse Attention Long Context

+ More

Agents & Tool Use

From function calling and ReAct to MCP, computer use, and multi-agent orchestration — how LLMs go from generating text to taking actions in the real world.

ReAct Function Calling MCP Tool Use Computer Use Claude Code Codex Devin Multi-Agent CrewAI SWE-bench Sandboxing

+ More

Learning Tracks