LoRA: Low-Rank Adaptation

Decomposing Weight Updates

LoRA (Hu et al., 2021) freezes the pre-trained weights $W_0 \in \mathbb{R}^{d \times k}$ and injects trainable low-rank matrices $A \in \mathbb{R}^{r \times k}$ and $B \in \mathbb{R}^{d \times r}$ such that the effective weight update is:

W = W_0 + \Delta W = W_0 + B A

With $r \ll \min(d, k)$, the number of trainable parameters drops dramatically — often by 10,000× — while matching full fine-tuning performance on many downstream tasks.