Decomposing Weight Updates
LoRA (Hu et al., 2021) freezes the pre-trained weights $W_0 \in \mathbb{R}^{d \times k}$ and injects trainable low-rank matrices $A \in \mathbb{R}^{r \times k}$ and $B \in \mathbb{R}^{d \times r}$ such that the effective weight update is:
$$W = W_0 + \Delta W = W_0 + B A$$
With $r \ll \min(d, k)$, the number of trainable parameters drops dramatically — often by 10,000× — while matching full fine-tuning performance on many downstream tasks.