Complex Numbers and Euler's Formula

Numbers as Points on a Line... and Beyond

Every number you've used so far — integers, fractions, decimals, even irrational numbers like $\pi$ — lives on a single line: the real number line . You can slide left (negative) or right (positive), but you're always stuck in one dimension. For most of arithmetic, that's fine. But when we need to describe rotations — and rotation turns out to be central to how modern transformers encode position — one dimension isn't enough. We need a number system that lives in a plane.

The key idea is deceptively simple: define a new number $i$ whose square is $-1$.

$$i^2 = -1$$

No real number has this property (any real number squared is non-negative), so $i$ is called the imaginary unit . A complex number is then any expression of the form:

$$z = a + bi$$

where $a$ and $b$ are real numbers. We call $a$ the real part and $b$ the imaginary part . When $b = 0$ we recover ordinary real numbers. When $a = 0$ we get a purely imaginary number like $3i$ or $-2i$.

Here's the geometric payoff: since every complex number has two components ($a$ and $b$), we can plot it as a point on a 2D plane. The horizontal axis is the real part, the vertical axis is the imaginary part. This is the complex plane (also called the Argand diagram). The number $z = 3 + 2i$ sits at the point $(3, 2)$. The number $z = -1 - i$ sits at $(-1, -1)$. Real numbers live on the horizontal axis, and purely imaginary numbers live on the vertical axis.

The plot below shows several complex numbers in the plane. Notice how each one is uniquely identified by its position — a complex number is a 2D point.

import math, json, js

# Complex numbers to plot: (real, imag, label)
points = [
    (3, 2, "3 + 2i"),
    (-1, -1, "-1 - i"),
    (0, 1, "i"),
    (-2, 0, "-2"),
    (1, -1.5, "1 - 1.5i"),
    (0, 0, "0"),
]

# Build scatter-style plot by placing each point as a separate line with a single data point
# We'll plot the real axis and imaginary axis as reference lines, then overlay points
x_axis = [i * 0.5 for i in range(-8, 9)]  # -4 to 4
y_zero = [0 for _ in x_axis]

lines = [
    {"label": "Real axis", "data": y_zero, "color": "#d1d5db"}
]

# Plot each point as a small cluster around its location for visibility
for real, imag, label in points:
    # Create a tiny visible dot by using a single-element line at the right x position
    # We'll use a trick: create x_data with just the point's x, data with just the point's y
    pass

# Better approach: use a single series per point, with matching x_data
# Since the plot system uses shared x_data, we'll mark points by creating spike data
# that is null everywhere except at the point's x-coordinate

# Actually, let's plot the points as separate charts won't work well.
# Instead, build a scatter-like visualisation using lines that converge to points.

# Simplest approach: plot thin vertical lines from x-axis to each point
all_x = []
all_lines_data = {}

for real, imag, label in points:
    line_x = [real, real]
    line_y = [0, imag]

# Let's use a different strategy: create a combined x_data that includes all point x-coords
# and for each point, only show a value at its x-coordinate

# Best approach for this plot system: draw the unit circle and overlay key points
theta_vals = [i * 0.02 for i in range(0, 315)]  # 0 to 2pi
circle_x = [math.cos(t) for t in theta_vals]
circle_y = [math.sin(t) for t in theta_vals]

# Create a grid of x values that includes our point locations
x_vals = [i * 0.1 for i in range(-40, 41)]  # -4.0 to 4.0

# For each labelled point, create a "spike" series
spike_lines = []
colors = ["#3b82f6", "#ef4444", "#10b981", "#f59e0b", "#8b5cf6", "#6b7280"]
for idx, (real, imag, label) in enumerate(points):
    # Create data that shows the point's y-value at the nearest x-grid position
    data = []
    for x in x_vals:
        if abs(x - real) < 0.05:
            data.append(imag)
        else:
            data.append(None)
    spike_lines.append({"label": label, "data": data, "color": colors[idx % len(colors)]})

plot_data = [
    {
        "title": "The Complex Plane",
        "x_label": "Real part",
        "y_label": "Imaginary part",
        "x_data": x_vals,
        "lines": spike_lines
    }
]
js.window.py_plot_data = json.dumps(plot_data)

print("Each complex number z = a + bi is a point (a, b) in the plane.")
print()
for real, imag, label in points:
    print(f"  {label:>10}  =>  point ({real}, {imag})")

💡 Why invent a number whose square is negative? Because it unlocks a beautiful connection between algebra and geometry. Adding complex numbers is vector addition. Multiplying them — as we're about to see — is rotation and scaling. This geometric meaning of multiplication is the entire reason complex numbers appear in deep learning.

Multiplying by i Rotates 90°

Now for the key geometric insight that makes complex numbers powerful. Start with the number $z = 1$, which sits at the point $(1, 0)$ on the positive real axis. What happens when we multiply by $i$?

1 \cdot i = i

The result is $i$, which sits at $(0, 1)$. We moved from the positive real axis to the positive imaginary axis — a 90° counterclockwise rotation. Multiply by $i$ again:

i \cdot i = i^2 = -1

Now we're at $(-1, 0)$. Another 90° rotation. Again:

-1 \cdot i = -i

We're at $(0, -1)$. And one more time:

-i \cdot i = -i^2 = -(-1) = 1

Back to where we started. Four multiplications by $i$, four 90° rotations, one full 360° circle. This is not a coincidence or a trick — it's the geometric meaning of complex multiplication. Multiplying any complex number by $i$ rotates it 90° counterclockwise around the origin. The powers of $i$ trace out the four compass points of the unit circle:

$i^0 = 1$ — angle 0° — the point $(1, 0)$
$i^1 = i$ — angle 90° — the point $(0, 1)$
$i^2 = -1$ — angle 180° — the point $(-1, 0)$
$i^3 = -i$ — angle 270° — the point $(0, -1)$
$i^4 = 1$ — angle 360° — back to $(1, 0)$

The plot below shows these four points on the unit circle, with arrows indicating the direction of rotation. Each multiplication by $i$ advances the angle by 90°.

import math, json, js

# Unit circle
theta_vals = [i * 0.02 for i in range(0, 316)]
circle_x = [math.cos(t) for t in theta_vals]
circle_y = [math.sin(t) for t in theta_vals]

# The four powers of i
powers = [
    (1, 0, "i⁰ = 1"),
    (0, 1, "i¹ = i"),
    (-1, 0, "i² = -1"),
    (0, -1, "i³ = -i"),
]

# Use shared x_data as the circle x-values, plot circle y as one line
# Then mark the four points

# Plot the unit circle and the four key points
# We'll use the circle as one continuous line
# For the four points, create spike markers

# Build x_data from circle_x (already 316 points around the circle)
# For each power-of-i point, find the nearest index in circle_x and mark it

x_data = circle_x
lines = [
    {"label": "Unit circle", "data": circle_y, "color": "#d1d5db"}
]

colors = ["#3b82f6", "#10b981", "#ef4444", "#f59e0b"]
for idx, (px, py, label) in enumerate(powers):
    marker_data = []
    for j in range(len(circle_x)):
        dist = math.sqrt((circle_x[j] - px)**2 + (circle_y[j] - py)**2)
        if dist < 0.05:
            marker_data.append(py)
        else:
            marker_data.append(None)
    lines.append({"label": label, "data": marker_data, "color": colors[idx]})

plot_data = [
    {
        "title": "Powers of i on the Unit Circle: Each Multiplication Rotates 90°",
        "x_label": "Real part",
        "y_label": "Imaginary part",
        "x_data": x_data,
        "lines": lines
    }
]
js.window.py_plot_data = json.dumps(plot_data)

print("Multiplying by i rotates 90° counterclockwise:")
print("  1  ->  i  ->  -1  ->  -i  ->  1")
print("  0°    90°    180°    270°    360°")

This raises a natural question: $i$ gives us 90° rotations, but what if we want to rotate by an arbitrary angle — say 30° or 45° or $\theta$ radians? For that, we need Euler's formula.

Euler's Formula: The Bridge Between Exponentials and Rotations

Euler's formula is one of the most beautiful equations in all of mathematics. It connects the exponential function — which governs growth and decay — to trigonometry — which governs circles and rotations:

e^{i\theta} = \cos\theta + i\sin\theta

Let's unpack what this says geometrically. The right-hand side, $\cos\theta + i\sin\theta$, is a complex number with real part $\cos\theta$ and imaginary part $\sin\theta$. Where does this point sit in the complex plane? The real component is the $x$-coordinate on the unit circle at angle $\theta$, and the imaginary component is the $y$-coordinate. So $e^{i\theta}$ is the point on the unit circle at angle $\theta$ from the positive real axis.

This is extraordinary: raising $e$ to an imaginary power doesn't give exponential growth — it gives rotation . The exponent $i\theta$ tells you the angle. The base $e$ is just the mathematical constant $2.718...$, but with an imaginary exponent it traces a circle instead of shooting off to infinity.

Let's verify this with the rotations we already know from the powers of $i$:

$e^{i \cdot 0} = \cos(0) + i\sin(0) = 1 + 0 = 1$ — no rotation. Matches $i^0 = 1$. Correct.
$e^{i\pi/2} = \cos(\pi/2) + i\sin(\pi/2) = 0 + i \cdot 1 = i$ — 90° rotation. Matches $i^1 = i$. Correct.
$e^{i\pi} = \cos(\pi) + i\sin(\pi) = -1 + 0 = -1$ — 180° rotation. This is Euler's identity , often written $e^{i\pi} + 1 = 0$, connecting five fundamental constants ($e$, $i$, $\pi$, $1$, $0$) in a single equation.
$e^{i \cdot 2\pi} = \cos(2\pi) + i\sin(2\pi) = 1 + 0 = 1$ — full 360° rotation, back to the start.

As $\theta$ sweeps from $0$ to $2\pi$, the point $e^{i\theta}$ traces out the entire unit circle. The plot below shows this: the real and imaginary parts oscillate as cosine and sine respectively, and together they trace a perfect circle.

import math, json, js

PI = math.pi
N_STEPS = 60
theta_steps = [2 * PI * i / N_STEPS for i in range(N_STEPS + 1)]

# Full unit circle (always visible reference)
n_circle = 200
circle_t = [2 * PI * i / n_circle for i in range(n_circle + 1)]
circle_x = [math.cos(t) for t in circle_t]
circle_y = [math.sin(t) for t in circle_t]

# Full cos/sin curves (always visible reference)
curve_t = [round(t, 4) for t in circle_t]
cos_curve = [math.cos(t) for t in circle_t]
sin_curve = [math.sin(t) for t in circle_t]

traces = []

# --- Always-visible traces (indices 0-3) ---
# 0: unit circle reference (left subplot)
traces.append({
    "x": circle_x, "y": circle_y, "mode": "lines",
    "line": {"color": "rgba(156,163,175,0.45)", "width": 1.5},
    "name": "unit circle", "showlegend": False,
    "xaxis": "x", "yaxis": "y", "visible": True
})
# 1: cos curve (right subplot)
traces.append({
    "x": curve_t, "y": cos_curve, "mode": "lines",
    "line": {"color": "#3b82f6", "width": 2},
    "name": "cos(\u03b8) [real]",
    "xaxis": "x2", "yaxis": "y2", "visible": True
})
# 2: sin curve (right subplot)
traces.append({
    "x": curve_t, "y": sin_curve, "mode": "lines",
    "line": {"color": "#ef4444", "width": 2},
    "name": "sin(\u03b8) [imag]",
    "xaxis": "x2", "yaxis": "y2", "visible": True
})
# 3: axes cross-hairs on left subplot
traces.append({
    "x": [-1.35, 1.35, None, 0, 0],
    "y": [0, 0, None, -1.35, 1.35],
    "mode": "lines",
    "line": {"color": "rgba(156,163,175,0.3)", "width": 1},
    "name": "axes", "showlegend": False,
    "xaxis": "x", "yaxis": "y", "visible": True
})

N_ALWAYS = len(traces)  # 4

# --- Per-step traces (3 per step): marker, vector line, vertical indicator ---
for i, th in enumerate(theta_steps):
    cx = math.cos(th)
    sy = math.sin(th)
    is_default = (i == 0)

    # marker point on unit circle (left)
    traces.append({
        "x": [cx], "y": [sy], "mode": "markers",
        "marker": {"color": "#6366f1", "size": 10},
        "name": "e^(i\u03b8)", "showlegend": False,
        "xaxis": "x", "yaxis": "y", "visible": is_default
    })
    # vector from origin to point (left)
    traces.append({
        "x": [0, cx], "y": [0, sy], "mode": "lines",
        "line": {"color": "#6366f1", "width": 2.5},
        "name": "vector", "showlegend": False,
        "xaxis": "x", "yaxis": "y", "visible": is_default
    })
    # vertical indicator line at current theta (right)
    traces.append({
        "x": [round(th, 4), round(th, 4)], "y": [-1.1, 1.1],
        "mode": "lines",
        "line": {"color": "#6366f1", "width": 1.5, "dash": "dash"},
        "name": "current \u03b8", "showlegend": False,
        "xaxis": "x2", "yaxis": "y2", "visible": is_default
    })

TRACES_PER_STEP = 3

# --- Slider steps ---
steps = []
for i, th in enumerate(theta_steps):
    cx = math.cos(th)
    sy = math.sin(th)
    deg = th * 180 / PI

    visibility = [True] * N_ALWAYS
    for j in range(len(theta_steps)):
        vis = (j == i)
        visibility.extend([vis, vis, vis])

    label = f"{th:.2f}"
    annotation_text = (
        f"\u03b8 = {deg:.0f}\u00b0<br>"
        f"cos\u03b8 = {cx:+.3f}<br>"
        f"sin\u03b8 = {sy:+.3f}"
    )
    steps.append({
        "label": label,
        "method": "update",
        "args": [
            {"visible": visibility},
            {
                "annotations": [
                    {
                        "x": 0.22, "y": 1.02,
                        "xref": "paper", "yref": "paper",
                        "text": annotation_text,
                        "showarrow": False,
                        "font": {"size": 12},
                        "align": "left",
                        "bgcolor": "rgba(255,255,255,0.85)",
                        "bordercolor": "#6366f1",
                        "borderwidth": 1,
                        "borderpad": 4
                    }
                ]
            }
        ]
    })

# Initial annotation
init_ann = {
    "x": 0.22, "y": 1.02,
    "xref": "paper", "yref": "paper",
    "text": "\u03b8 = 0\u00b0<br>cos\u03b8 = +1.000<br>sin\u03b8 = +0.000",
    "showarrow": False,
    "font": {"size": 12},
    "align": "left",
    "bgcolor": "rgba(255,255,255,0.85)",
    "bordercolor": "#6366f1",
    "borderwidth": 1,
    "borderpad": 4
}

layout = {
    "title": "e^(i\u03b8) on the unit circle",
    "grid": {"rows": 1, "columns": 2, "pattern": "independent"},
    "xaxis":  {"title": "Real", "range": [-1.4, 1.4],
               "scaleanchor": "y", "scaleratio": 1, "domain": [0, 0.44]},
    "yaxis":  {"title": "Imaginary", "range": [-1.4, 1.4]},
    "xaxis2": {"title": "\u03b8 (radians)", "range": [0, 6.35], "domain": [0.56, 1]},
    "yaxis2": {"title": "Value", "range": [-1.3, 1.3], "anchor": "x2"},
    "sliders": [{
        "active": 0,
        "currentvalue": {"prefix": "\u03b8 = ", "suffix": " rad"},
        "pad": {"t": 50},
        "steps": steps
    }],
    "annotations": [init_ann],
    "legend": {"x": 0.72, "y": 1.0},
    "margin": {"t": 90, "b": 60}
}

js.window.py_plotly_data = json.dumps({"data": traces, "layout": layout})

# Print key values
key_angles = [0, PI/2, PI, 3*PI/2, 2*PI]
key_labels = ["0", "\u03c0/2", "\u03c0", "3\u03c0/2", "2\u03c0"]
print("Key values of e^(i\u03b8):")
for angle, lbl in zip(key_angles, key_labels):
    r = math.cos(angle)
    im = math.sin(angle)
    sign = "+" if im >= 0 else "-"
    print(f"  \u03b8 = {lbl:>5}  =>  e^(i\u03b8) = {r:+.3f} {sign} {abs(im):.3f}i")

📌 Where does Euler's formula come from? One elegant derivation uses Taylor series. The exponential $e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots$ and when you substitute $x = i\theta$, the powers of $i$ cycle ($i^2 = -1$, $i^3 = -i$, $i^4 = 1$, ...) and sort the real and imaginary terms into exactly the Taylor series for $\cos\theta$ and $\sin\theta$ respectively. It's not a definition or a trick — it falls out inevitably from how $e^x$, $\cos x$, and $\sin x$ are defined.

Rotation as Complex Multiplication

Now we can answer the question that motivates this entire article: how do you rotate a 2D vector by an arbitrary angle? Take any 2D point $(x_1, x_2)$ and represent it as a complex number $z = x_1 + ix_2$. To rotate this point by angle $\theta$ counterclockwise around the origin, multiply by $e^{i\theta}$:

z' = z \cdot e^{i\theta} = (x_1 + ix_2)(\cos\theta + i\sin\theta)

Expanding this product using the distributive law (and remembering that $i^2 = -1$):

z' = (x_1\cos\theta - x_2\sin\theta) + i(x_1\sin\theta + x_2\cos\theta)

Reading off the real and imaginary parts of the result:

New real part: $x_1' = x_1\cos\theta - x_2\sin\theta$
New imaginary part: $x_2' = x_1\sin\theta + x_2\cos\theta$

Now write this in matrix form:

\begin{pmatrix} x_1' \\ x_2' \end{pmatrix} = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}

This is the standard 2D rotation matrix . We've just proved that complex multiplication by $e^{i\theta}$ is exactly equivalent to matrix rotation. Two completely different-looking operations — multiplying complex numbers and multiplying matrices — produce the same geometric transformation. This equivalence is the mathematical foundation of RoPE (Rotary Position Embedding) , which encodes token positions in transformers by rotating query and key vectors.

Let's verify this numerically. The code below takes a 2D vector and rotates it by several angles using complex multiplication, then confirms that the rotation matrix gives the same result.

import math, json, js

# Original vector
x1, x2 = 3.0, 1.0

angles_deg = [0, 30, 45, 90, 180, 270]
rows = []

for deg in angles_deg:
    theta = math.radians(deg)
    cos_t = math.cos(theta)
    sin_t = math.sin(theta)

    # Method 1: Complex multiplication
    # z = x1 + i*x2,  rotator = cos(theta) + i*sin(theta)
    # z' = z * rotator = (x1*cos - x2*sin) + i*(x1*sin + x2*cos)
    new_real = x1 * cos_t - x2 * sin_t
    new_imag = x1 * sin_t + x2 * cos_t

    # Method 2: Rotation matrix (same formula, just to confirm)
    mat_x1 = cos_t * x1 + (-sin_t) * x2
    mat_x2 = sin_t * x1 + cos_t * x2

    match = "Yes" if abs(new_real - mat_x1) < 1e-10 and abs(new_imag - mat_x2) < 1e-10 else "No"

    rows.append([
        f"{deg}\u00b0",
        f"({new_real:.3f}, {new_imag:.3f})",
        f"({mat_x1:.3f}, {mat_x2:.3f})",
        match
    ])

js.window.py_table_data = json.dumps({
    "headers": ["Angle", "Complex mult.", "Matrix mult.", "Match?"],
    "rows": rows
})

print(f"Original vector: ({x1}, {x2})")
print(f"At 90\u00b0: ({x1}, {x2}) -> ({-x2}, {x1}) [swapped and negated, as expected]")
print(f"At 180\u00b0: ({x1}, {x2}) -> ({-x1}, {-x2}) [negated both]")

The table confirms what the algebra promised: complex multiplication and matrix rotation are the same operation, for every angle. The deep learning connection is direct: RoPE takes a query or key vector, groups its dimensions into consecutive pairs $(x_1, x_2)$, treats each pair as a complex number, and multiplies by $e^{im\theta}$ where $m$ is the token's position in the sequence. Because the dot product of two rotated vectors depends only on the difference in their rotation angles (i.e. the difference in positions), the attention score naturally captures relative position — exactly the inductive bias we want.

💡 Why does the dot product only depend on relative position? If vector $\mathbf{q}$ is rotated by angle $m\theta$ and vector $\mathbf{k}$ is rotated by angle $n\theta$, their dot product equals the dot product of the unrotated vectors rotated by $(m - n)\theta$. This is because rotation preserves lengths and the angle between two vectors changes only by the difference of their individual rotations. So position $m$ attending to position $n$ gives the same geometry as position $m+5$ attending to position $n+5$ — only the gap $m - n$ matters.

Why This Matters for Deep Learning

Complex numbers and Euler's formula aren't just abstract mathematics — they appear throughout modern deep learning, sometimes explicitly and sometimes behind the scenes.

RoPE (Rotary Position Embedding). This is the most direct application and the reason this article exists in the math essentials track. Given a $d$-dimensional query or key vector, RoPE splits it into $d/2$ consecutive pairs. Each pair $(x_1, x_2)$ is treated as a complex number $x_1 + ix_2$, then multiplied by $e^{im\theta_k}$ where $m$ is the token position and $\theta_k$ is a frequency that differs for each pair (typically $\theta_k = 10000^{-2k/d}$). The result: each pair gets rotated by a position-dependent angle. When we compute the dot product between a query at position $m$ and a key at position $n$, the rotation angles partially cancel, leaving a dot product that depends on the relative distance $m - n$ rather than the absolute positions. This elegant encoding of relative position through rotation is why complex numbers matter for transformers. We build the full picture in the position encodings article .

The Fourier Transform. The discrete Fourier transform decomposes a signal into its constituent frequencies using the formula $X_k = \sum_{n=0}^{N-1} x_n \cdot e^{-i2\pi kn/N}$. Each term $e^{-i2\pi kn/N}$ is a rotation in the complex plane, and the sum measures how much the signal "resonates" at each frequency. Fourier transforms underpin spectrogram-based audio processing, the frequency components used in some positional encodings (like the sinusoidal encodings in the original Transformer), and efficient convolution via the FFT algorithm.

Signal processing and diffusion models. While diffusion models primarily use real-valued Gaussian noise, the underlying noise schedules and the theory connecting different noise levels draw on exponential decay ($e^{-\lambda t}$), which is the real-axis cousin of Euler's formula. More broadly, any technique that involves periodic functions, phase shifts, or frequency decomposition is built on $e^{i\theta} = \cos\theta + i\sin\theta$.

The core takeaway: whenever you see rotations, frequencies, or periodic patterns in deep learning, complex numbers and Euler's formula are the mathematical language behind them. The chain of ideas we've built — $i$ rotates 90°, $e^{i\theta}$ rotates by arbitrary $\theta$, complex multiplication equals the rotation matrix — is the foundation that makes RoPE and Fourier-based methods work.

Quiz

Test your understanding of complex numbers, Euler's formula, and their connection to rotations.

What is the geometric effect of multiplying a complex number by $i$?

It doubles the magnitude of the number

It reflects the number across the real axis

It rotates the number 90° counterclockwise around the origin

It moves the number one unit to the right

What does $e^{i\pi}$ equal?

$1$

$i$

$-1$

$-i$

To rotate a 2D vector $(x_1, x_2)$ by angle $\theta$, we represent it as a complex number $z = x_1 + ix_2$ and multiply by:

$i\theta$

$\cos\theta + \sin\theta$

$e^{i\theta} = \cos\theta + i\sin\theta$

$e^{\theta} = e^{\theta}$

Why does RoPE use complex multiplication (rotation) to encode position?

Because complex numbers compress the embedding dimension by half

Because rotation is computationally cheaper than addition

Because the dot product of two rotated vectors depends only on their relative position difference, giving the model a relative-position inductive bias

Because Euler's formula only works with even-dimensional vectors