Before we dive into Singular Value Decomposition, let's quickly review some foundational concepts in linear algebra.
Linear Transformation:
A matrix can be thought of as a linear transformation. When you multiply a column vector $v$ by a matrix $L$ (left matrix), you are applying a transformation that stretches, shrinks, rotates, or reflects the vector. Now, if you stack several column vectors in a matrix $R$ (right matrix), when you multiply $L$ and $R$, this applies the linear transformation $L$ to every column vector of $R$.
Matrix $L$ can be thought of as a linear transformation that affects any column vectors multiplied by it.
It must be noted that, while the convention is to apply linear transformations to column vectors, $L R$ could also be seen as $R$ applying a linear transformation to the row vectors of $L$, which can be made column vectors if wanted to via the following property:
$$LR = (R^T L^T)^T$$
Inverse:
A (square) matrix $M$ is invertible (and $M^{-1}$ is its inverse), if:
$$M M^{-1} = M^{-1} M = I$$
where $I$ is the identity matrix, i.e., a square matrix with ones in its diagonal and zeros everywhere else. See for example the following $M$ and $M^{-1}$ matrices, inverses of each other:
$$M M^{-1} = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} 2 & -1 \\ -1 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$$$$M^{-1} M = \begin{pmatrix} 2 & -1 \\ -1 & 1 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$$
Symmetric:
A (square) matrix $M$ is symmetric if:
$$M = M^T$$
Thus, the elements of a symmetric matrix must be mirrored across the main diagonal, e.g.:
$$\begin{pmatrix} 1 & 7 & 3 \\ 7 & 4 & 5 \\ 3 & 5 & 6 \end{pmatrix}$$
Orthogonal:
A (square) matrix $O$ is orthogonal if:
$$O^T O = O O^T = I$$
For this to hold, the dot product of any two different columns (and likewise for rows) of $O$ has to be 0, and the dot product of any column (likewise for a row) with itself has to be 1. A useful property of orthogonal matrices is that their inverse is exactly their transpose (which is convenient because transposing is easier than inverting):
$$O^{-1} = O^T$$
And another quick fact: for any orthogonal matrix $O$, $|\det O| = 1$, with the sign —in 2D— classifying the motion, i.e., $\det O = +1$ meaning a rotation, and $\det O = -1$ meaning a reflection. As a quick reminder, given a $2\times 2$ matrix $\begin{pmatrix} a & b \\ c & d \end{pmatrix}$, the determinant is $ad - bc$.
Now, multiplying by an orthogonal matrix preserves the length of every vector and the angle between any pair of vectors is also preserved. So what does it do? In 2D, orthogonal matrices come in exactly two flavors:
proper
(determinant $+1$) and
improper
(determinant $-1$) rotations. A proper rotation matrix —in 2D— is defined as such (with its orthogonality verifiable via dot products):
$$R_{\alpha} = \begin{pmatrix} \cos\alpha & -\sin\alpha \\ \sin\alpha & \cos\alpha \end{pmatrix}$$
See how: Column 1 · Column 1 is 1 (unit-length):
$$(\cos\alpha)^2 + (\sin\alpha)^2 = 1$$
Column 2 · Column 2 is 1 (unit-length):
$$(-\sin\alpha)^2 + (\cos\alpha)^2 = 1$$
Column 1 · Column 2 is 0 (orthogonal):
$$(\cos\alpha)(-\sin\alpha) + (\sin\alpha)(\cos\alpha) = 0$$
In other words, if a vector makes angle $\theta$ with the $x$-axis —counterclockwise—, it will go to $(\theta+\alpha) \pmod{2\pi}$ radians and it will keep order, in terms of proximity counterclockwise to the $x$-axis, with any other non-collinear vector also rotated by $\theta$.
On the other hand, an improper rotation — in 2D, a
reflection
— is defined as such:
$$F_{\phi} = \begin{pmatrix} \cos 2\phi & \sin 2\phi \\ \sin 2\phi & -\cos 2\phi \end{pmatrix}$$
In other words, if a vector is $\theta$ radians away —counterclockwise— from the $x$-axis, and a reflection angle $\phi$ radians away (again, counterclockwise) from the $x$-axis is chosen, the vector will go to $\theta' = 2\phi - \theta \pmod{2\pi}$ radians and will flip order, in terms of proximity (counterclockwise) to the $x$-axis, with any other non-collinear vector also reflected along $\phi$.
Vector $v$ is rotated and reflected along a reflection axis.
Reflections can be mostly seen as rotations in an everyday sense, but in a mathematical sense, these operations are slightly different, hence the use of proper and improper rotations. In general, just remember that the sign of the determinant determines the 'properness' of the rotation:
$$\det R_{\alpha} = (\cos\alpha)(\cos\alpha) - (-\sin\alpha)(\sin\alpha) = \cos^2\alpha + \sin^2\alpha = 1$$$$\det F_{\phi} = (\cos 2\phi)(-\cos 2\phi) - (\sin 2\phi)(\sin 2\phi) = -\bigl(\cos^2 2\phi + \sin^2 2\phi\bigr) = -1$$
Lastly (on orthogonal matrices), it must be noted that non-square (or rectangular) matrices cannot be orthogonal, but they can have orthonormal rows or orthonormal columns (often called row-orthonormal or column-orthonormal matrices); just not both (or they would be orthogonal).
Understanding Eigenvectors
A (square) matrix can be used as a (linear) transformation to a vector. This often changes the vector's direction... except for a special set of vectors, called the matrix's
eigenvectors
, which can be scaled but keep direction!
First of all, eigenvectors are only defined for square matrices. An $n \times n$ matrix can have at most $n$ linearly independent eigenvectors, but an infinite number of eigenvectors in total (as any scaled version of an eigenvector is also an eigenvector). So, if there is at least one eigenvector, there will be an infinite number of them (as scaled versions of such eigenvector).
Visualizing how applying a symmetric matrix scales along its eigenvector directions (no shearing).
So, each matrix when used as a (linear) transformation would knock every vector 'off balance/direction' (apart from scaling) except for some vectors called the matrix's eigenvectors. All eigenvectors in the same direction have a single
eigenvalue
, which tells how much the eigenvector is scaled during the linear transformation.
The equation for an eigenvector and its corresponding eigenvalue is:
$$A v = \lambda v$$
which means the eigenvector $v$ is such that when receiving a linear transformation $A$, the resulting vector is $v$ (hence, same direction) scaled by the eigenvalue $\lambda$. For a $2 \times 2$ matrix that had 2 linearly independent eigenvectors, we would have:
$A v_1 = \lambda_1 v_1$
$A v_2 = \lambda_2 v_2$
Since $\lambda$ can be negative, eigenvectors may be flipped by the transformation, but they will still remain in the same line/direction (just pointing the other way). The two cases where a matrix does not have as many real linearly independent eigenvectors as dimensions are a shear matrix (which has a repeated eigenvalue but a deficient number of linearly independent eigenvectors), and a pure rotation matrix (if you rotate, no vector will end up in the same direction, unless the rotation is 180 or 360 degrees).
The code below verifies the eigenvector equation numerically and also illustrates inverse and orthogonal matrix properties:
import numpy as np
# Linear transformation L applied to the columns of R
L = np.array([[2.0, 1.0],
[0.5, 1.5]])
R = np.array([[1.0, 0.0, -1.0],
[0.0, 1.0, 2.0]])
LR = L @ R
print("L @ R =\n", LR)
# Inverse: L @ L^{-1} should equal I
if np.linalg.det(L) != 0:
L_inv = np.linalg.inv(L)
I = L @ L_inv
print("\nL @ L^{-1} =\n", np.round(I, 10))
# Orthogonal matrix Q (rotation by 30°): Q^T Q = I
theta = np.deg2rad(30)
Q = np.array([[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]])
print("\nQ^T Q =\n", np.round(Q.T @ Q, 10))
# Eigenvectors of a symmetric matrix
S = np.array([[4.0, 2.0],
[2.0, 3.0]])
eigenvalues, eigenvectors = np.linalg.eigh(S)
print("\nEigenvalues of S:", eigenvalues)
print("Eigenvectors (columns):\n", eigenvectors)
# Verify: S v = lambda v for the first eigenvector
v1 = eigenvectors[:, 0]
lam1 = eigenvalues[0]
print("\nS @ v1 =", S @ v1)
print("lambda1 * v1 =", lam1 * v1)
print("Residual:", np.max(np.abs(S @ v1 - lam1 * v1)))
Spectral Decomposition
A property of symmetric matrices is that their eigenvectors are
orthogonal
(perpendicular). Note here two orthogonal vectors need not be orthonormal (can be perpendicular but each can have non-unit length), but an orthogonal matrix needs be orthonormal.
Also as a refresher, length is determined by the square root of the dot product with itself. When doing this, each component is squared (because it is multiplied by itself), added up, then the sum is square rooted. The dot product is the square of the magnitude or length of the vector (e.g., $(4,3)$ has magnitude 5 — from $\sqrt{4^2 + 3^2} = \sqrt{16+9} = \sqrt{25} = 5$ — and its dot product is 25). Defining the magnitude as $\|q_i\|$, the dot product can be expressed as $\|q_i\|^2$, and if this dot product equals 1, so will the magnitude (unit-length vector). For the 'normalization' part, each row vector and each column vector needs to be unit-length (dot product with itself equals 1).
For a matrix:
The
diagonal entries
of $Q^T Q$ are the dot products of each column with itself. For these to equal 1 (the diagonal of $I$), each column must be unit-length (normalized).
The
off-diagonal entries
of $Q^T Q$ are the dot products of each column with every other column. For these to equal 0 (the off-diagonals of $I$), each column must be perpendicular to the others (orthogonal).
Note that for e.g., a $3 \times 3$ matrix: The main diagonal is composed of the elements $a_{11}$, $a_{22}$, and $a_{33}$. The off-diagonal elements are all the others: $a_{12}$, $a_{13}$, $a_{21}$, $a_{23}$, $a_{31}$, and $a_{32}$.
Because there exists a rotation matrix that would rotate the standard basis to align with the perpendicular and unit-length eigenvectors, its inverse would do the reverse, i.e., align the eigenvectors to the standard basis. The theorem of spectral decomposition says: whenever you have a symmetric matrix $S$, you can always decompose the linear transformation it will produce into $S = Q \Lambda Q^T$, with $Q$ an orthogonal matrix whose columns are the eigenvectors of $S$ (applying rotation of the standard basis to align with the eigenvectors), $\Lambda$ a diagonal matrix whose diagonal values are the eigenvalues of $S$ (applying a potentially different scaling along each axis), and $Q^T$, also an orthogonal matrix, the transpose of the eigenvectors (applying rotation of the eigenvectors to align with the standard basis).
$$S = Q \Lambda Q^T$$
Because truly symmetric matrices are not the norm in applications, spectral decomposition is powerful but limited in scope. This motivates Singular Value Decomposition (SVD), which imposes no restriction on symmetry, dimension, or rank.
The Power of SVD
Finally, because it is rare to find a symmetric matrix, spectral decomposition works but is limited in scope. Singular value decomposition on the other hand, imposes no restriction on symmetry, dimension, or rank. For this, first we note we can artificially create a symmetric matrix, given any other matrix (e.g., rectangular matrix $R$) by multiplying it by its transpose. Thus $R R^T = S_L$ (for $S$ left), and $R^T R = S_R$ (for $S$ right). Note here $S_L$ and $S_R$ are different (e.g., if $R$ is $3 \times 2$, then $R R^T$ is a $3 \times 3$ symmetric matrix and $R^T R$ a $2 \times 2$ symmetric matrix).
Because $S_L$ and $S_R$ are symmetric, let's say they have, in this case, 2 and 3 perpendicular eigenvectors, respectively (thus in $\mathbb{R}^2$ and in $\mathbb{R}^3$), called the left and right singular vectors of $R$. Because $S_L$ and $S_R$ are PSD (positive semi-definite), the corresponding eigenvalue to each eigenvector is non-negative. Also, if we sort the eigenvalues, the largest eigenvalue of $S_L$ equals the largest eigenvalue of $S_R$, and the second largest eigenvalue of $S_L$ equals the second largest eigenvalue of $S_R$ (and so forth if there are more dimensions). The leftover eigenvalue(s) will always be 0. The number of non-zero eigenvalues of $S_L$ and $S_R$ is always the same and is equal to the rank of the original matrix $R$. If the matrix is square and invertible, there will be no zero-valued eigenvalues.
Singular Value Decomposition (SVD)
SVD is a matrix factorization that decomposes any matrix $R$ into the product of 3 matrices:
$$R = U \Sigma V^T$$
Multiplication by $R$ — e.g., $R A$ — can this way be decomposed into a series of linear transformations to (the column vectors of) $A$ — e.g., $U \Sigma V^T A$.
$V$ is an
orthogonal matrix
whose columns are the
right singular vectors
of $R$ — arranged so the first column holds the singular vector corresponding to the highest singular value, the second column the one corresponding to the second-highest, etc. These are the normalized (unit-length) eigenvectors of the symmetric matrix $R^T R$ (i.e., of $S_R$). The right singular vectors form an orthonormal basis for the input space (domain) of the transformation.
Because the first step in the transformation is the multiplication by $V^T$ (i.e., $V^T A$), the first step is to apply a rigid rotation (and possibly a reflection) of the input space. It aligns the principal axes of the transformation with the standard coordinate axes. In other words, it rotates the right singular vector corresponding to the largest singular value to align with the positive $x$-axis, the second largest to align with the positive $y$-axis, and so on.
$\Sigma$ is a rectangular and
diagonal matrix
that holds the non-negative singular values — i.e., square roots of the (non-zero) eigenvalues of $R R^T$ and $R^T R$ — on its diagonal (in descending order). If some of the singular values are 0, $\Sigma$ will act as a dimension-erasing matrix too (while it stretches in the $x$-, $y$-, etc. axis according to the singular values, it also removes dimensions).
$U$ is an orthogonal matrix that holds the
normalized eigenvectors
of the symmetric matrix $R R^T$ (i.e., of $S_L$), arranged in columns in descending order of their eigenvalues. In other words, $U$ contains the normalized
left singular vectors
of $R$. Multiplication by $U$ rotates the standard basis to align with the left singular vectors.
📌Reading A = U Σ Vᵀ right to left: Vᵀ first rotates the input to align principal directions with the axes; Σ then stretches each axis by its singular value; U finally rotates the result into the output space. Every linear map is exactly this — rotate, scale, rotate.
A Final Note on SVD
It is important to note that in SVD the singular values — which define how much scaling would happen to the singular vectors, or eigenvectors, of $R$, if applying $R$ — are the
square roots
of the eigenvalues.
For comparison with spectral decomposition: in spectral decomposition of a symmetric matrix $S$, the diagonal matrix $\Lambda$ contains the singular values of $S$, which are the eigenvalues of $S$. In singular value decomposition of a matrix $R$, the diagonal matrix $\Sigma$ contains the singular values of $R$, which are the square roots of the eigenvalues of $R^T R$ ($S_R$) and $R R^T$ ($S_L$).
Computing SVD with NumPy
NumPy exposes SVD through
numpy.linalg.svd
. The code below decomposes a small matrix, verifies the reconstruction $U \Sigma V^T \approx A$, and confirms the relationship between singular values and eigenvalues of $A^T A$.
import numpy as np
A = np.array([[3, 2, 2],
[2, 3, -2]], dtype=float)
U, s, Vt = np.linalg.svd(A, full_matrices=True)
print("Singular values (sigma):", np.round(s, 6))
print("U shape:", U.shape, " Vt shape:", Vt.shape)
# Reconstruct A = U Sigma V^T
Sigma = np.zeros_like(A)
np.fill_diagonal(Sigma, s)
A_reconstructed = U @ Sigma @ Vt
print("\nMax reconstruction error:", np.max(np.abs(A - A_reconstructed)))
# Verify: singular values = sqrt of eigenvalues of A^T A
eigenvalues_AtA = np.linalg.eigvalsh(A.T @ A)
eigenvalues_AtA = np.sort(eigenvalues_AtA)[::-1] # descending
print("\nEigenvalues of A^T A: ", np.round(eigenvalues_AtA[eigenvalues_AtA > 1e-10], 6))
print("Sigma^2 (sigma_i^2): ", np.round(s**2, 6))
💡The singular values are exactly the square roots of the non-zero eigenvalues of $A^T A$ (and of $A A^T$). This is the bridge between spectral decomposition and SVD.
Low-Rank Approximation and LoRA
The most practically important consequence of SVD is the
Eckart–Young theorem
: the best rank-$r$ approximation to $A$ (in both Frobenius and spectral norm) is obtained by keeping only the top $r$ singular values and their corresponding singular vectors:
This is exactly the insight behind
LoRA (Low-Rank Adaptation)
. Instead of fine-tuning the full weight matrix $W \in \mathbb{R}^{d \times k}$, LoRA parameterises the update as $\Delta W = B A$ where $B \in \mathbb{R}^{d \times r}$ and $A \in \mathbb{R}^{r \times k}$ with $r \ll \min(d, k)$. The hypothesis is that the meaningful weight changes during fine-tuning live in a low-dimensional subspace — exactly the space captured by the top singular vectors.
import numpy as np
# Simulate a weight update matrix
np.random.seed(42)
delta_W = np.random.randn(64, 64) * 0.01
U, s, Vt = np.linalg.svd(delta_W)
print(f"{'rank':>6} {'rel. error':>12} {'full params':>12} {'LoRA params':>12} {'compression':>12}")
print("-" * 62)
for r in [1, 2, 4, 8, 16, 32]:
approx = (U[:, :r] * s[:r]) @ Vt[:r, :]
error = np.linalg.norm(delta_W - approx, 'fro') / np.linalg.norm(delta_W, 'fro')
params_full = delta_W.size
params_lora = 64 * r + r * 64
print(f"{r:>6} {error:>12.4f} {params_full:>12} {params_lora:>12} {100*params_lora/params_full:>11.1f}%")
💡At rank 8, LoRA uses 25% of the parameters of a full fine-tune while capturing most of the meaningful update direction. This works because gradient updates during fine-tuning are empirically low-rank.
Practice Quiz
Test your understanding of Singular Value Decomposition with this quick quiz:
Which of the following matrices in the SVD factorization R = UΣVᵀ is a rotation and/or reflection of the input space?
In the context of SVD for a matrix R, what is the relationship between the singular values and the eigenvalues of the constructed symmetric matrices RᵀR and RRᵀ?
What is the primary advantage of SVD over Spectral Decomposition?