Reed-Solomon Error Correction


An interactive demo of Reed-Solomon error correction. The code for the encoder and decoder can be found on github.

Reed-Solomon codes are a family of error correction codes which encode a message of length \(k\) into a codeword of length \(n\) using \(t = n - k\) redundant symbols (here, a symbol will just be a byte). They can:

  • Detect up to \(t\) errors (altered symbols).
  • Correct up to \(t\) erasures (errors where the location is known).
  • Correct up to \(t/2\) errors where the location is not known. Intuitively, the factor of 2 is because we need to determine both the locations and the magnitude of the errors.

Reed-Solomon codes work by representing the message as a polynomial with degree less than \(k\). A codeword is constructed such that any \(k\) symbols can be used to reconstruct the original polynomial.

The original Reed-Solomon code took the symbols of the message as coefficients of a polynomial, and evaluated the polynomial at \(n\) different points. If the location of errors are known then polynomial interpolation can be used to recover the message. Handling unknown error locations is harder, and the original paper offers the very inefficient method of choosing the most popular out of all \(n \choose k\) decodings (more efficient algorithms have since been developed).

Instead, here we implement the more common BCH variant, which constructs the codeword directly from the polynomial coefficients. Redundancy is introduced by constructing the codeword such that \(t\) roots of the polynomial are fixed. Decoding is conceptually trickier, but can be implemented efficiently as described below.

Codec Configuration

The encoder and decoder need to agree on all configuration parameters.

The amount of redundancy \(t\) must be specified when encoding a message.


\(n = t + k\) must be less than 256.

The general idea behind a Reed-Solomon code would work using real numbers as the symbols. However, computers cannot store infinitely precise real numbers, so instead we use a finite field (or Galois field).

The relevant part to understand about finite fields is that they follow the same rules of arithmetic as real numbers.

We use \({\rm GF}(2^8)\) because its 256 elements map nicely to bytes. Elements in \({\rm GF}(2^8)\) will be displayed in hexadecimal to keep them distinct from normal integers.

Field: \({\rm GF}(2^8)\)

Implementation detail that is not important for understanding the algorithm:

To fully specify operations in \({\rm GF}(2^8)\) we must select a primitive polynomial. Specifically, we define \({\rm GF}(2^8) = {\rm GF}(2)[z]/(z^8+z^4+z^3+z^2+1)\).

Primitive polynomial: \(z^8+z^4+z^3+z^2+1 = \texttt{11D}\)

Define a codeword polynomial \(c(x)\) as valid if it is divisible by a generator polynomial \(g(x)\). i.e. \(c(x) = 0 \pmod{g(x)}\).

To have \(t\) degrees of redundancy, construct a \(g(x)\) with \(t\) roots. To choose these roots, we must use a generator element \(\alpha \in {\rm GF}(2^8)\) — an element whose powers generate all non-zero elements in \({\rm GF}(2^8)\). i.e. \({\rm GF}(2^8) = \{\texttt{00}, \alpha^0, \alpha, \alpha^2, \alpha^3, ...\}\)

Generator element: \(\alpha = \texttt{02}\)

Generator polynomial: \(g(x) = \prod_{j=1}^{t} (x - \alpha^j) =\)

Encoding

Message \(m\)

The message must be encoded as a sequence of bytes. Here the input string is UTF-8 encoded.

Message was truncated because it is too long.

Bytes (hex) \([a_k, \cdots, a_1] = \text{utf8}(m)\)

Take the series of bytes as representing elements in the finite field \({\rm GF}(2^8)\). Define a polynomial whose coefficients are the elements of the message: \(p(x) = \sum_{j=1}^k a_j x^{j-1} = a_k x^{k-1} + \cdots + a_2 x + a_1\)

Note: This is purely a conceptual difference. The representation is the same for both the byte string and the polynomial in this implementation.

\(p(x) = \sum_{j=1}^k a_j x^{j-1}\)

By our definition, a valid codeword is divisible by the generator polynomial \(g(x)\). We must map \(p(x)\) to a valid codeword.

A simple way of doing this is simply multiplying \(p(x) \cdot g(x)\). However, this has the disadvantage that the original bytes are not present in the result (the code is not systematic).

Instead, we will find the appropriate symbols to append to the message such that the result is a valid codeword:

Shift \(p(x)\) to make room for \(t\) symbols at the end: \(p(x) \cdot x^t\)

Find the remainder after dividing by \(g(x)\): \(s_r(x) = p(x) \cdot x^t \mod g(x)\)
Either polynomial long-division or synthetic division will give the remainder.

\(s_r(x) = p(x) \cdot x^t\) \(\mod g(x)\)

Now \(s(x) = p(x) \cdot x^t - s_r(x)\) is divisible by \(g(x)\) and thus is a valid codeword.

Note: In \({\rm GF}(2^8)\) addition is the same as subtraction (in the byte representation it is an xor of the symbols). Thus \(s_r(x)\) is unaltered as a suffix of \(s(x)\).

\(s(x) = p(x) \cdot x^t - s_r(x)\)

Transmission

Encoded

Corrupter


Change the text to corrupt the message in transit!

Received

Decoding

Interpret the data we receive as a polynomial \(r(x)\). This data may not be the same as the transmitted data \(s(x)\) as it could have been corrupted by errors, which we represent as \(e(x)\):

\[r(x) = s(x) + e(x)\]

To be able to reason about the errors, define a few more variables:

  • \(\nu\) is the (unknown) number of errors
  • \(i_k\) is the position of the errors for \(1 \le k \le \nu\)
  • \(e_{i_k}\) is the magnitude of the error at \(i_k\)

Thus \(e(x) = \sum_{k=1}^\nu e_{i_k} x^{i_k} = e_{i_1} x^{i_1} + \cdots + e_{i_\nu} x^{i_\nu}\)

\[r(x) = s(x) + e(x)\]

where \(e(x) = \sum_{k=1}^\nu e_{i_k} x^{i_k}\)

Next determine whether \(r(x)\) is a valid codeword, or whether it has been corrupted. For a valid codeword \(c(x)\), we know that all the roots of \(g(x)\) must also be roots of \(c(x)\).

Define a syndrome as \(r(x)\) evaluated at a root of \(g(x)\). There are \(t\) syndromes:

\(S_j = r(\alpha^j)\) for \(1 \le j \le t\)

Syndromes have several useful properties:

  • \[S_j = s(\alpha^j) + e(\alpha^j) = 0 + e(\alpha^j) = e(\alpha^j)\]
  • Each syndrome depends only on the error \(e(x)\)
  • If there are no errors then all syndromes are zero

Note: When evaluating \(r(x)\), terms \(x^{255}\) and greater are indistinguishable from smaller terms because \(x^k = x^{k+255}\) in \({\rm GF}(2^8)\). Thus we must limit our codewords to 255 bytes to be able to detect and correct errors. More generally, the codewords lengths must be less than the number of elements in the field.

Syndromes \(S_j = r(\alpha^j) = e(\alpha^j)\)

The message was not corrupted: \(r(x)\) is a valid codeword.

All syndromes are zero, thus \(e(x) = 0\).

The message was corrupted: \(r(x)\) is not a valid codeword because there was at least one non-zero syndrome.

If we stop here, then we can be certain of having caught all corruptions with up to \(t\) errors. However, we are going to try to repair the error, and we may incorrectly repair it if there were over \(t/2\) errors.

The syndromes \(S_1 \cdots S_t\) define a set of equations where \(i_k\) and \(e_{i_k}\) are unknown, \(S_j = e(\alpha^j) = \sum_{k=1}^\nu e_{i_k} X_k^j\):

\(S_j = e_{i_1}X_1^j + \cdots + e_{i_\nu}X_\nu^j\) for \(1 \le j \le \nu\) where \(X_k = \alpha^{i_k}\)

Unfortunately, this set of equations is not linear (i.e. hard to solve) and we don’t even know how many unknowns there are. We want to convert this to a set of linear equations:

Define the error locator \(\Lambda(x)\) as a polynomial with roots \(X_k^{-1}\) (i.e. one root per error):

\[\Lambda(x) = \prod_{k=1}^\nu (1 - x X_k ) = 1 + \Lambda_1 x^1 + \Lambda_2 x^2 + \cdots + \Lambda_\nu x^\nu\]

Combining with \(S_j\) we can derive a system of \(\nu\) linear equations:

\(S_j \Lambda_{\nu} + S_{j+1}\Lambda_{\nu-1} + \cdots + S_{j+\nu-1} \Lambda_1 = - S_{j + \nu}\) for \(1 \leq j \leq \nu\)

This requires us to have \(2\nu\) syndromes. Given we have \(t\) syndromes, we can only solve for at most \(t/2\) errors.

If we knew \(\nu\) we could solve this directly, but we don’t. We can still solve this by trying values of \(\nu\) (from \(t/2\) down) until we find one for which the system is solvable — this is the method used by the PGZ decoder.

The Berlekamp-Massey algorithm will more efficiently find the solution.

Error locator \(\Lambda(x) = \prod_{k=1}^\nu (1 - x X_k )\)

where \(X_{k} = \alpha^{i_k}\)

Determine the error positions \(i_k\) by finding the roots of \(\Lambda(x)\). By construction \(\Lambda (X_{k}^{-1}) = 0\), and given the roots we have: \(i_k = \log_{\alpha}(\alpha^{i_k}) = \log_{\alpha}(X_k)\).

\({\rm GF}(2^8)\) only has 256 elements, making it feasible to brute force the solution by evaluating \(\Lambda(x)\) at every possible value. Chien search is an efficient way to implement the evaluations.

Note: If we don’t find \(\nu\) different roots of \(\Lambda(x)\) or if the positions are outside the message, then there were over \(t/2\) errors and we can’t recover.

Error positions \(i_k\)

Number of errors \(\nu\)

To find the error magnitudes \(e_{i_k}\) we can solve the system of \(\nu\) linear equations given by the definition of \(S_j\):

\(S_j = e_{i_1}X_1^j + \cdots + e_{i_\nu}X_\nu^j\) for \(1 \le j \le \nu\)

This can be computed more efficiently with the Forney algorithm, which provides a closed form solution for each \(e_{i_k}\).

\(e(x) = \sum_{k=1}^\nu e_{i_k} x^{i_k}\)

Calculate the repaired codeword \(s'(x) = r(x) - e(x)\). If there weren’t too many errors (\(\le t/2\)) then it will match the original message — i.e. \(s'(x) = s(x)\). Otherwise it’s possible that we have decoded incorrectly!

We check that the syndromes of \(s'(x)\) are zero to verify that it is indeed a valid codeword.

\(s'(x) = r(x) - e(x)\)

\([s'(\alpha), s'(\alpha^2), \cdots, s'(\alpha^t)]\)

Recover the message bytes by truncating the \(t\) check symbols: \(p'(x) = \lfloor \frac{s'(x)}{x^t} \rfloor\) — then recasting the result as a byte string.

\(p'(x) = \lfloor \frac{s'(x)}{x^t} \rfloor\) \(= \sum_{j=1}^k a'_j x^{j-1}\)

Recovered bytes \([a'_k, \cdots, a'_1]\)

Convert the UTF-8 encoded bytes back to a string.

\(m' = \text{utf8}^{-1}([a'_k, \cdots, a'_1])\)