Erasure codes for software engineers

Erasure codes appear all over storage systems. In large scale systems, they are esential for keeping costs low vs. pure replication for redundancy. Modern codes for example, can have an overhead of just 1.2x or even lower with similar failure tolerances as 4x replication. Erasure codes allow you to set up the shards in such a way that you can tolerate disk, node, rack, and even whole availability zone failures. Some codes even make trade-offs to increase overhead in order to reduce network bandwidth.

At their core, they are just linear equations. This post builds up from simple linear systems to Reed-Solomon style codes, using tiny Python snippets along the way. We’ll also try to minimize dependencies here and just use numpy.

Warmup: linear equations Link to heading

A linear equation in two variables looks like:

$$ 2x + 3y = 7 $$

Two independent equations define a single point (x, y). For example:

$$ 2x + 3y = 7 \\ x - y = 1 $$

You can solve this by hand, but it is also just a matrix equation:

$$ A c = b $$

where A is the coefficient matrix, c is the unknowns, and b is the result vector.

Here is a tiny NumPy example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


import numpy as np

A = np.array([
    [2, 3],
    [1, -1],
], dtype=float)

b = np.array([7, 1], dtype=float)

c = np.linalg.solve(A, b)
print(c)  # [2. 1.]

Solving systems of equations Link to heading

Here is a simple 3-variable system:

$$ x + y + z = 6 \\ 2x - y + z = 3 \\ x + 2y - z = 3 $$

You can solve it by substitution, but in practice we let the computer do the linear algebra:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


import numpy as np

A = np.array([
    [1, 1, 1],
    [2, -1, 1],
    [1, 2, -1],
], dtype=float)

b = np.array([6, 3, 3], dtype=float)

c = np.linalg.solve(A, b)
# [2. 1. 3.]

A polynomial view Link to heading

Okay, here is where the ’trick’ comes in. The key is to encode data as coefficients of a polynomial. For example:

$$ M(x) = 2 + 3x + 4x^2 $$

The coefficients [2, 3, 4] are the data. If we evaluate M(x) at multiple points, we get more equations than unknowns, which gives redundancy.

This is the idea behind Reed-Solomon codes.

Encoding by evaluation Link to heading

We will:

Define a message polynomial.
Evaluate it at 5 points to create the codeword.
Lose two points and recover the coefficients from the remaining points.

1) Define the message polynomial Link to heading

Let:

$$ M(x) = 2 + 3x + 4x^2 $$

The coefficients [2, 3, 4] are the data.

1
2
3
4


import numpy as np
import matplotlib.pyplot as plt

coeffs = np.array([2, 3, 4], dtype=float)  # degree < 3

2) Choose evaluation points Link to heading

We will use 5 distinct points: 1, 2, 3, 4, 5.

1
2
3


n = 5
eval_points = np.arange(1, n + 1, dtype=float)
print("Evaluation points:", eval_points)

3) Encode: evaluate the polynomial Link to heading

We use Horner’s method to evaluate the polynomial (at the given points) and generate the codeword, which is just the list of outputs at those points. It’s a fast, standard way to evaluate polynomials.

1
2
3
4
5
6
7
8


def poly_eval(coeffs, x):
    result = 0.0
    for c in coeffs[::-1]:
        result = result * x + c
    return result

codeword = np.array([poly_eval(coeffs, x) for x in eval_points])
print("Encoded codeword (evaluations):", codeword)

4) Plot the polynomial and points (optional) Link to heading

This part is optional, but we can plot the polynomial at the different selected points in order to visualize what we’re doing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


x_dense = np.linspace(0, 6, 400)
y_dense = np.array([poly_eval(coeffs, x) for x in x_dense])

plt.figure(figsize=(8, 5))
plt.plot(x_dense, y_dense, label="Polynomial curve", color='blue')
plt.scatter(eval_points, codeword, color='red', zorder=5, label="Evaluation points")
plt.xlabel("x")
plt.ylabel("M(x)")
plt.title("Polynomial Evaluation and Codeword Points")
plt.legend()
plt.grid(True)
plt.show()

Polynomial evaluation and codeword points

5) Simulate erasures Link to heading

In this example, we are going to simulate losing part of our codeword, which is points 1 and 3. Assume we only receive evaluations at indices 0, 2, 4.

1
2
3


known_indices = [0, 2, 4]
x_known = eval_points[known_indices]
y_known = codeword[known_indices]

6) Recover the polynomial Link to heading

To recover the coefficients, we write the equations in matrix form. A Vandermonde matrix is just the table of powers of the x-values (1, x, x^2, ...) so that A * coeffs = y. Solving that system gives us the original coefficients.

If there is a unique solution, then the matrix A must be invertible. That means we can do:

$$ coeffs = A^{-1} y $$

In our example, that means we can solve for the unknowns in one step. The x-values must be distinct for this to work.

Where do the linear equations come from? Pick an x, plug it into the polynomial, and you get an equation like:

$$ c_0 + c_1 x + c_2 x^2 = y $$

The unknowns are the coefficients c0, c1, c2, .... If there are k coefficients, you need at least k distinct equations (i.e., k different x values) to solve for them. The encoder’s job is to pick distinct x values, evaluate the polynomial, and send those outputs (the codeword).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


k = len(coeffs)  # degree < k, here k = 3
A = np.vander(x_known, N=k, increasing=True)

recovered_coeffs = np.linalg.solve(A, y_known)
print("Recovered polynomial coefficients:", recovered_coeffs)

if np.allclose(recovered_coeffs, coeffs):
    print("Successfully recovered the original polynomial coefficients!")
else:
    print("Recovery failed. Original coefficients:", coeffs)

Two erasure examples:

Lose two points (keep indices 0, 2, 4). This is the example above.
Lose two different points (keep indices 1, 2, 3). This also works because we still have 3 distinct points.

The rule is simple: a degree-2 polynomial needs any 3 distinct points.

This is exactly the erasure model: we lose symbols, but we know which ones.

Erasure vs. error correction (quick note): erasures mean we know which symbols are missing, while errors mean unknown symbols might be wrong. This post only covers erasures.

From linear equations to Reed-Solomon Link to heading

Reed-Solomon codes are polynomial evaluation codes over a finite field. The encoder chooses k data symbols (the coefficients) and outputs n evaluated symbols. Any k of those n are enough to reconstruct the polynomial, so you can tolerate n - k erasures.

Key point: the decoder is just solving a linear system.

Finite fields (Galois fields) Link to heading

The example above used real numbers, but storage systems use finite fields so that all arithmetic stays within a fixed symbol size (like a byte).

A Galois field GF(2^m) has exactly 2^m elements. Arithmetic wraps around inside this field, so every non-zero element has a multiplicative inverse, and all linear systems behave nicely for coding.

In practice, Reed-Solomon uses GF(2^8) for byte symbols. The same linear algebra applies, but the math is done modulo a field polynomial instead of with reals.

Tiny GF(2^3) example Link to heading

We can illustrate the idea with a tiny field. Suppose we work in GF(2^3), which has 8 elements. Addition is XOR, and multiplication is done modulo an irreducible polynomial like $x^3 + x + 1$.

You can think of each element as a 3-bit value. For example, 0b101 means x^2 + 1.

Here is a minimal example of multiplication in GF(2^3):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


# Multiply two 3-bit values in GF(2^3) with modulus x^3 + x + 1 (0b1011).

def gf_mul(a, b):
    mod = 0b1011
    result = 0
    for _ in range(3):
        if b & 1:
            result ^= a
        b >>= 1
        carry = a & 0b100
        a = (a << 1) & 0b111
        if carry:
            a ^= mod & 0b111
    return result

print(bin(gf_mul(0b010, 0b111)))  # 0b011

The important thing is not the bit twiddling, but that the same linear algebra still works, now over a finite field.

A simple erasure code end-to-end Link to heading

Let k = 3 data symbols d0, d1, d2. We create n = 5 encoded symbols by choosing 5 distinct points and evaluating the polynomial:

$$ M(x) = d_0 + d_1 x + d_2 x^2 $$

The encoder sends M(1), M(2), M(3), M(4), M(5).

If any two are erased, the decoder picks any 3 remaining points, builds the Vandermonde matrix, and solves for [d0, d1, d2].

That is the entire idea: erasure codes are linear equations plus redundancy.

A full “hello” example (recover the letter “h”) Link to heading

Okay, our equivalent “hello” test will tie all these concepts together. Here is a minimal example that encodes the string "hello", then simulates losing some encoded symbols and recovers the letter "h" by solving for the coefficients again.

We will use a degree-4 polynomial code over the integers. We treat the bytes as the polynomial coefficients, evaluate it at x = 1..7, and use the extra points as parity. This is still a toy example (not a true Reed-Solomon over a finite field), but the mechanics are the same.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


import numpy as np

def poly_eval(coeffs, x):
    result = 0.0
    for c in coeffs[::-1]:
        result = result * x + c
    return result

def fit_poly(x, y):
    A = np.vander(x, N=5, increasing=True)
    return np.linalg.solve(A, y)

msg = "hello"
data = np.frombuffer(msg.encode("utf-8"), dtype=np.uint8)

block = data  # b"hello"

x_all = np.array([1, 2, 3, 4, 5, 6, 7], dtype=float)

# Encode: treat bytes as polynomial coefficients, then evaluate.
coeffs = block.astype(float)
codeword = np.array([poly_eval(coeffs, x) for x in x_all])

# Simulate losing two encoded symbols.
known_indices = [1, 2, 4, 5, 6]  # x = 2, 3, 5, 6, 7
x_known = x_all[known_indices]
y_known = codeword[known_indices]

recovered_coeffs = fit_poly(x_known, y_known)
recovered_block = np.round(recovered_coeffs).astype(np.uint8)
print(bytes(recovered_block).decode("utf-8"))  # "hello"

Warning: this demo uses floating point + rounding, which is not exact. Real Reed-Solomon uses finite field arithmetic so every step is exact and stays in a fixed-width byte.

Closing Link to heading

This is the simplest path from linear equations to Reed-Solomon. Of course, there is a lot that we didn’t cover and there are many ways to approach this subject, but this is one that I think most people with little exposure to linear algebra and programming can relate to. In a follow-up, we can explore other types of codes and self-correcting codes.

If you want to suggest edits or improvements, feel free to open a pull request.