Linear Algebra Singular Value Decomposition (SVD)

A Geometric Explanation of SVD

Fundamental Concepts

Before we dive into Singular Value Decomposition, let's quickly review some foundational concepts in linear algebra.

  • Linear Transformation: A matrix can be thought of as a linear transformation. When you multiply a column vector $v$ by a matrix $L$ (left matrix), you are applying a transformation that stretches, shrinks, rotates, or reflects the vector. Now, if you stack several column vectors in a matrix $R$ (right matrix), when you multiply $L$ and $R$, this applies the linear transformation $L$ to every column vector of $R$.
    Matrix $L$ can be thought of as a linear transformation that affects any column vectors multiplied by it.
    It must be noted that, while the convention is to apply linear transformations to column vectors, $L R$ could also be seen as $R$ applying a linear transformation to the row vectors of $L$, which can be made column vectors if wanted to via the following property: $$L R = (R^T L^T)^T$$
  • Inverse: A (square) matrix $M$ is invertible (and $M^{-1}$ is its inverse), if: $$M M^{-1} = M^{-1} M = I$$ where $I$ is the identity matrix, i.e., a square matrix with ones in its diagonal and zeros everywhere else. See for example the following $M$ and $M^{-1}$ matrices, inverses of each other: $$M M^{-1} = \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} \begin{pmatrix} 2 & -1 \\ -1 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$$ $$M^{-1} M = \begin{pmatrix} 2 & -1 \\ -1 & 1 \end{pmatrix} \begin{pmatrix} 1 & 1 \\ 1 & 2 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$$
  • Symmetric: A (square) matrix $M$ is symmetric if: $$M = M^T$$ Thus, the elements of a symmetric matrix must be mirrored across the main diagonal, e.g.: $$ \begin{pmatrix} 1 & 7 & 3 \\ 7 & 4 & 5 \\ 3 & 5 & 6 \end{pmatrix} $$
  • Orthogonal: A (square) matrix $O$ is orthogonal if: $$O^T O = O O^T = I$$ For this to hold, the dot product of any two different columns (and likewise for rows) of $O$ has to be 0, and the dot product of any column (likewise for a row) with itself has to be 1. A useful property of orthogonal matrices is that their inverse is exactly their transpose (which is convenient because transposing is easier than inverting): $$ O^{-1} = O^T $$ And another quick fact: for any orthogonal matrix $O$, $|\det O| = 1$, with the sign —in 2D— classifying the motion, i.e., $\det O = +1$ meaning a rotation, and $\det O = -1$ meaning a reflection. As a quick reminder, given a $2\times 2$ matrix $\begin{pmatrix} a & b \\ c & d \end{pmatrix}$, the determinant is $ad - bc$. Now, multiplying by an orthogonal matrix preserves the length of every vector and the angle between any pair of vectors is also preserved. So what does it do? In 2D, orthogonal matrices come in exactly two flavors: proper (determinant $+1$) and improper (determinant $-1$) rotations. A proper rotation matrix —in 2D— is defined as such (with its orthogonality verifiable via dot products): $$ R_{\alpha} = \begin{pmatrix} \cos\alpha & -\sin\alpha \\ \sin\alpha & \cos\alpha \end{pmatrix}$$ See how: Column 1 · Column 1 is 1 (unit-length): $$(\cos\alpha)^2 + (\sin\alpha)^2 = 1$$ Column 2 · Column 2 is 1 (unit-length): $$(-\sin\alpha)^2 + (\cos\alpha)^2 = 1$$ Column 1 · Column 2 is 0 (orthogonal): $$(\cos\alpha)(-\sin\alpha) + (\sin\alpha)(\cos\alpha) = 0$$ In other words, if a vector vector makes angle $\theta$ with the $x$-axis —counterclockwise—, it will go to $(\theta+\alpha) \pmod{2\pi}$ radians and it will keep order, in terms of proximity counterclockwise to the $x$-axis, with any other non-collinear vector also rotated by $\theta$. On the other hand, an improper rotation — in 2D, a reflection — is defined as such: $$ F_{\phi} = \begin{pmatrix} \cos 2\phi & \sin 2\phi \\ \sin 2\phi & -\cos 2\phi \end{pmatrix}$$ In other words, if a vector is $\theta$ radians away —counterclockwise— from the $x$-axis, and a reflection angle $\phi$ radians away (again, counterclockwise) from the $x$-axis is chosen, the vector will go to $\theta' = (\theta + 2(\phi - \theta)) \pmod{2\pi} = 2\phi - \theta \pmod{2\pi}$ radians and will flip order, in terms of proximity (counterclockwise) to the $x$-axis, with any other non-collinear vector also reflected along $\phi$.
    Vector $v$ is rotated and reflected along a reflexion axis.
    Relections can be mostly seen as rotations in an everyday sense, but in a mathematical sense, these operations are slightly different, hence the use of proper and improper rotations. In general, just remember that the sign of the determinant determines the 'properness' of the rotation: $$\det R_{\alpha} = (\cos\alpha)(\cos\alpha) - (-\sin\alpha)(\sin\alpha) = \cos^2\alpha + \sin^2\alpha = 1$$ $$\det F_{\phi} = (\cos 2\phi)(-\cos 2\phi) - (\sin 2\phi)(\sin 2\phi) = -\big(\cos^2 2\phi + \sin^2 2\phi\big) = -1$$ Lastly (on orthogonal matrices), it must be noted that non-square (or rectangular) matrices cannot be orthogonal, but they can have orthonormal rows or orthonormal columns (often called a row-orthonormal or column-orthonormal matrices); just not both (or they would be orthogonal).

Spectral Decomposition

To understand spectral decomposition, we need to understand eigenvectors. A (square) matrix can be used as a (linear) transformation to a vector (by multiplication, or $A v$). This often changes the vector's direction, except for the matrix's eigenvectors (which can be scaled but keep direction).

  • Eigenvectors and eigenvalues: $A v = \lambda v$ means the eigenvector $v$ is such that when receiving a linear transformation $A$, the resulting vector is $v$ (same direction) scaled by the eigenvalue $\lambda$.
  • How many: An $n\times n$ matrix can have at most $n$ linearly independent eigenvectors (but infinitely many eigenvectors as any scaled version of an eigenvector is also an eigenvector).
  • Special cases: A pure rotation in 2D has no real eigenvectors (unless the angle is $0,\pi,2\pi$). A shear can have a repeated eigenvalue but a deficient number of linearly independent eigenvectors.
  • Symmetric matrices: A property of symmetric matrices is that their eigenvectors are orthogonal (perpendicular). If we normalize these eigenvectors and package them into a matrix, we get an orthogonal matrix.

The theorem of spectral decomposition says: whenever you have a symmetric matrix $S$, you can always decompose the linear transformation it will produce into: $$S = Q \\Lambda Q^T,$$ with $Q$ an orthogonal matrix whose columns are the eigenvectors of $S$ (applying rotation of the standard basis to align with the eigenvectors), $\\Lambda$ a diagonal matrix whose diagonal values are the eigenvalues of $S$ (applying a possibly different scaling along each axis), and $Q^T$ (the transpose of the eigenvectors) rotating back to the standard basis.

Because truly symmetric matrices are not the norm in applications, spectral decomposition is powerful but limited in scope. This motivates singular value decomposition (SVD), which imposes no restriction on symmetry, dimension, or rank.

Understanding Eigenvectors

To understand spectral decomposition, we need to understand eigenvectors. A (square) matrix can be used as a (linear) transformation to a vector. This often changes the vector's direction... except for a special set of vectors, called the matrix's eigenvectors , which can be scaled but keep direction!

First of all, eigenvectors are only defined for square matrices. An n×n matrix can have at most n linearly independent eigenvectors, but an infinite number of eigenvectors in total (as any scaled version of an eigenvector is also an eigenvector). So, if there is at least one eigenvector, there will be an infinite number of them (as scaled versions of such eigenvector).

Manim: Visualizing how applying a symmetric matrix scales along its eigenvector directions (no shearing).
import numpy as np

# Linear transformation L applied to the columns of R
L = np.array([[2.0, 1.0],
              [0.5, 1.5]])
R = np.array([[1.0, 0.0, -1.0],
              [0.0, 1.0,  2.0]])

LR = L @ R
print('L @ R =\n', LR)

# Inverse (if square & invertible); here L is 2x2, check det != 0
if np.linalg.det(L) != 0:
    L_inv = np.linalg.inv(L)
    I = L @ L_inv
    print('L @ L^{-1} =\n', I)

# Orthogonal example Q: Q^T Q = I
# (rotation by theta)
theta = np.deg2rad(30)
Q = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta),  np.cos(theta)]])
print('Q^T Q =\n', Q.T @ Q)

So, each matrix when used as a (linear) transformation would knock every vector 'off balance/direction' (apart from scaling) except for some vectors called the matrix's eigenvectors. All eigenvectors in the same direction have a single eigenvalue , which tells how much the eigenvector is scaled during the linear transformation.

The equation for an eigenvector and its corresponding eigenvalue is:

which means the eigenvector $v$ is such that when receiving a linear transformation $A$, the resulting vector is $v$ (hence, same direction) scaled by the eigenvalue $λ$. For a 2x2 matrix that had 2 linearly independent eigenvectors, we would have:

  • $A v_1 = λ v_1$
  • $A v_2 = λ v_2$

Since $λ$ can be negative, eigenvectors may be flipped by the transformation, but they will still remain in the same line/direction (just pointing the other way). The two cases where a matrix does not have as many real linearly independent eigenvectors as dimensions are a shear matrix (which has a repeated eigenvalue but a deficient number of linearly independent eigenvectors), and a pure rotation matrix (if you rotate, no vector will end up in the same direction, unless the rotation is 180 or 360 degrees).

Spectral Decomposition

A property of symmetric matrices is that their eigenvectors are orthogonal (perpendicular). Note here two orthogonal vectors need not be orthonormal (can be perpendicular but each can have non-unit length), but an orthogonal matrix needs be orthonormal.

Also as a refresher, length is determined by the square root of the dot product with itself. When doing this, each component is squared (because it is multiplied by itself), added up, then the sum is square rooted. The dot product is the square of the magnitude or length of the vector (e.g., (4,3) has magnitude 5 -from $\sqrt{4^2 + 3^2}=\sqrt{16+9}=\sqrt{25}=5$- and its dot product is 25). Defining the magnitude as $||q_i||$, the dot product can be expressed as $||q_i||^2$, and if this dot product equals 1 (i.e., $||q_i||^2 = 1$), so will the magnitude (i.e., unit-length vector). Now, for the 'normalization' part, each row vector and each column vector needs be unit-length (so dot product with itself needs be 1).

For a matrix:

  • The diagonal entries of $Q^T Q$ are the dot products of each column with itself. For these to equal 1 (the diagonal of $I$), each column must be unit-length (normalized).
  • The off-diagonal entries of $Q^T Q$ are the dot products of each column with every other column. For these to equal 0 (the off-diagonals of $I$), each column must be perpendicular to the others (orthogonal).

Note that for e.g., a 3x3 matrix: The main diagonal is composed of the elements $a_{11}$, $a_{22}$, and $a_{33}$. The off-diagonal (i.e., anything not on the diagonal) elements are all the others: $a_{12}$, $a_{13}$, $a_{21}$, $a_{23}$, $a_{31}$, and $a_{32}$.

Because there exists a rotation matrix that would rotate the standard basis to align with the perpendicular and unit-length eigenvectors, its inverse would do the reverse, i.e., align the eigenvectors to the standard basis. The theorem of spectral decomposition says: whenever you have a symmetric matrix $S$, you can always decompose the linear transformation it will produce into $$S = Q Λ Q^T$$ , with $Q$ an orthogonal matrix whose columns are the eigenvectors of $S$ (applying rotation of the standard basis to align with the eigenvectors), $Λ$ a diagonal matrix whose diagonal values are the eigenvalues of $S$ (applying -a potentially different- scaling into each axis), and $Q^T$, also an orthogonal matrix, the transpose of the eigenvectors (applying rotation of the eigenvectors to align with the standard basis).

The Power of SVD

Finally, because it is rare to find a symmetric matrix, spectral decomposition works but is limited in scope. Singular value decomposition on the other hand, imposes no restriction on symmetry, dimension, or rank. For this, first we note we can artificially create a symmetric matrix, given any other matrix (e.g., rectangular matrix $R$) by multiplying it by its transpose. Thus $R R^T = S_L$ (for $S$ left), and $R^T R = S_R$ (for $S$ right). Note here $S_L$ and $S_R$ are different (e.g., if $R$ is 3x2, $R R^T$ is a 3x3 symmetric matrix and $R^T R$ a 2x2 symmetric matrix).

Because $S_L$ and $S_R$ are symmetric, let's say they have, in this case, 2 and 3 perpendicular eigenvectors, respectively (thus in $R^2$ and in $R^3$), called the left and right singular vectors of $R$. Because $S_L$ and $S_R$ are PSD (positive semi-definite), the corresponding eigenvalue to each eigenvector is non-negative. Also, if we sort the eigenvalues, the largest eigenvalue of $S_L$ equals the largest eigenvalue of $S_R$, and the second largest eigenvalue of $S_L$ equals the second largest eigenvalue of $S_R$ (and so forth if there are more dimensions -noting the example here is for a 3x2 matrix-). The leftover eigenvalue(s) will always be 0. The number of non-zero eigenvalues of $S_L$ and $S_R$ is always the same and is equal to the rank of the original matrix $A$. Since the rank of a 4×2 matrix is at most 2, there will be at most 2 non-zero eigenvalues for both $S_L$ and $S_R$. However, if the matrix is square and invertible, there could be no zero (i.e., 0-valued) eigenvalues.

Singular Value Decomposition (SVD)

SVD is a matrix factorization that decomposes any matrix $R$ into the (matrix) product (i.e., matmul, or @) of 3 matrices:

Multiplication by $R$ -e.g., $R A$- can this way be decomposed into a series of linear transformations to (the column vectors of) $A$ -e.g., $U Σ V^T A$-.

$V$ is an orthogonal matrix whose columns are the right singular vectors of $R$ -arranged so the first column holds the singular vector corresponding to the highest eigenvalue, the second column the singular vector corresponding to the second-highest eigenvalue, etc.-. These (singular vectors) are the normalized (i.e., unit length) eigenvectors of the symmetric matrix $R^T R$ -i.e., of $S_R$-. The right singular vectors form an orthonormal basis for the input space (domain) of the transformation.

Because the first step in the transformation is the multiplication by $V^T$ -i.e., $V^T A$ -, the first step is to apply a rigid rotation (and possibly a reflection) of the input space. It aligns the principal axes of the transformation with the standard coordinate axes. In other words, it rotates the right singular vector corresponding to the largest singular value to align with the positive x-axis, the second largest to align with the positive y-axis, and so on. The (right) singular values of $R$ -which define how much scaling would happen to the singular vectors or the eigenvectors of $R$- are here are the square roots of the eigenvalues. In spectral decomposition, the singular values are simply the absolute values of its eigenvalues.

Σ is a rectangular and diagonal matrix that holds the non-zero singular values -i.e., square roots of the (non-zero) eigenvalues of $R R^T$ and $R^T R$- on its diagonal (arranged in descending order). If some of the singular values are 0, Σ will act as a dimension eraser matrix too (i.e., while it stretches in the x-, y-, etc. axis according to the singular values, it also removes dimensions).

$U$ is an orthogonal matrix that holds the normalized eigenvectors of the symmetric matrix $R R^T$ -i.e., of $S_L$- (arranged in columns, in descending order of their eigenvalues). In other words, $U$ contains the normalized left singular vectors of $R$. Multiplication by $U$ rotates the standard basis to align with the left singular vectors.

A Final Note on SVD

It is important to note that in SVD the singular values -which define how much scaling would happen to the singular vectors, or eigenvectors, of $R$, if applied $R$- are the square roots of the eigenvalues.

For comparison with Spectral Decomposition:

In spectral decomposition of a symmetric matrix $S$, the diagonal matrix $Λ$ contains the singular values of $S$ which are the eigenvalues of $S$. In singular value decomposition of a matrix $R$, the diagonal matrix $Σ$ contains the singular values of $R$, which are the square roots of the eigenvalues of $R^T R$ ($S_R$) and $R R^T$ ($S_L$).

Practice Quiz

Test your understanding of Singular Value Decomposition with this quick quiz:

Which of the following matrices in the SVD factorization, $R = UΣV^T$, is a rotation and/or reflection of the input space?

In the context of SVD for a matrix $R$, what is the relationship between the singular values and the eigenvalues of the constructed symmetric matrices, $R^T R$ and $R R^T$?

What is the primary advantage of SVD over Spectral Decomposition?