I am trying to implement a Reed-Solomon encoder-decoder in Python supporting the decoding of both erasures and errors, and that's driving me crazy.
The implementation currently supports decoding only errors or only erasures, but not both at the same time (even if it's below the theoretical bound of 2*errors+erasures <= (n-k) ).
From Blahut's papers (here and here), it seems we only need to initialize the error locator polynomial with the erasure locator polynomial to implicitly compute the errata locator polynomial inside Berlekamp-Massey.
This approach partially works for me: when I have 2*errors+erasures < (n-k)/2 it works, but in fact after debugging it only works because BM compute an errors locator polynomial that gets the exact same value as the erasure locator polynomial (because we are below the limit for errors-only correction), and thus it is truncated via galois fields and we end up with the correct value of the erasure locator polynomial (at least that's how I understand it, I may be wrong).
However, when we go above (n-k)/2, for example if n = 20 and k = 11, thus we have (n-k)=9 erased symbols we can correct, if we feed in 5 erasures then BM just goes wrong. If we feed in 4 erasures + 1 error (we are still well below the bound since we have 2*errors+erasures = 2+4 = 6 < 9), the BM still goes wrong.
The exact algorithm of Berlekamp-Massey I implemented can be found in this presentation (pages 15-17), but a very similar description can be found here and here, and here I attach a copy of the mathematical description:
Now, I have an almost exact reproduction of this mathematical algorithm into a Python code. What I would like is to extend it to support erasures, which I tried by initializing the error locator sigma with the erasure locator:
def _berlekamp_massey(self, s, k=None, erasures_loc=None):
'''Computes and returns the error locator polynomial (sigma) and the
error evaluator polynomial (omega).
If the erasures locator is specified, we will return an errors-and-erasures locator polynomial and an errors-and-erasures evaluator polynomial.
The parameter s is the syndrome polynomial (syndromes encoded in a
generator function) as returned by _syndromes. Don't be confused with
the other s = (n-k)/2
Notes:
The error polynomial:
E(x) = E_0 + E_1 x + ... + E_(n-1) x^(n-1)
j_1, j_2, ..., j_s are the error positions. (There are at most s
errors)
Error location X_i is defined: X_i = a^(j_i)
that is, the power of a corresponding to the error location
Error magnitude Y_i is defined: E_(j_i)
that is, the coefficient in the error polynomial at position j_i
Error locator polynomial:
sigma(z) = Product( 1 - X_i * z, i=1..s )
roots are the reciprocals of the error locations
( 1/X_1, 1/X_2, ...)
Error evaluator polynomial omega(z) is here computed at the same time as sigma, but it can also be constructed afterwards using the syndrome and sigma (see _find_error_evaluator() method).
'''
# For errors-and-erasures decoding, see: Blahut, Richard E. "Transform techniques for error control codes." IBM Journal of Research and development 23.3 (1979): 299-315. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.600&rep=rep1&type=pdf and also a MatLab implementation here: http://www.mathworks.com/matlabcentral/fileexchange/23567-reed-solomon-errors-and-erasures-decoder/content//RS_E_E_DEC.m
# also see: Blahut, Richard E. "A universal Reed-Solomon decoder." IBM Journal of Research and Development 28.2 (1984): 150-158. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.84.2084&rep=rep1&type=pdf
# or alternatively see the reference book by Blahut: Blahut, Richard E. Theory and practice of error control codes. Addison-Wesley, 1983.
# and another good alternative book with concrete programming examples: Jiang, Yuan. A practical guide to error-control coding using Matlab. Artech House, 2010.
n = self.n
if not k: k = self.k
# Initialize:
if erasures_loc:
sigma = [ Polynomial(erasures_loc.coefficients) ] # copy erasures_loc by creating a new Polynomial
B = [ Polynomial(erasures_loc.coefficients) ]
else:
sigma = [ Polynomial([GF256int(1)]) ] # error locator polynomial. Also called Lambda in other notations.
B = [ Polynomial([GF256int(1)]) ] # this is the error locator support/secondary polynomial, which is a funky way to say that it's just a temporary variable that will help us construct sigma, the error locator polynomial
omega = [ Polynomial([GF256int(1)]) ] # error evaluator polynomial. We don't need to initialize it with erasures_loc, it will still work, because Delta is computed using sigma, which itself is correctly initialized with erasures if needed.
A = [ Polynomial([GF256int(0)]) ] # this is the error evaluator support/secondary polynomial, to help us construct omega
L = [ 0 ] # necessary variable to check bounds (to avoid wrongly eliminating the higher order terms). For more infos, see https://www.cs.duke.edu/courses/spring11/cps296.3/decoding_rs.pdf
M = [ 0 ] # optional variable to check bounds (so that we do not mistakenly overwrite the higher order terms). This is not necessary, it's only an additional safe check. For more infos, see the presentation decoding_rs.pdf by Andrew Brown in the doc folder.
# Polynomial constants:
ONE = Polynomial(z0=GF256int(1))
ZERO = Polynomial(z0=GF256int(0))
Z = Polynomial(z1=GF256int(1)) # used to shift polynomials, simply multiply your poly * Z to shift
s2 = ONE + s
# Iteratively compute the polynomials 2s times. The last ones will be
# correct
for l in xrange(0, n-k):
K = l+1
# Goal for each iteration: Compute sigma[K] and omega[K] such that
# (1 + s)*sigma[l] == omega[l] in mod z^(K)
# For this particular loop iteration, we have sigma[l] and omega[l],
# and are computing sigma[K] and omega[K]
# First find Delta, the non-zero coefficient of z^(K) in
# (1 + s) * sigma[l]
# This delta is valid for l (this iteration) only
Delta = ( s2 * sigma[l] ).get_coefficient(l+1) # Delta is also known as the Discrepancy, and is always a scalar (not a polynomial).
# Make it a polynomial of degree 0, just for ease of computation with polynomials sigma and omega.
Delta = Polynomial(x0=Delta)
# Can now compute sigma[K] and omega[K] from
# sigma[l], omega[l], B[l], A[l], and Delta
sigma.append( sigma[l] - Delta * Z * B[l] )
omega.append( omega[l] - Delta * Z * A[l] )
# Now compute the next B and A
# There are two ways to do this
# This is based on a messy case analysis on the degrees of the four polynomials sigma, omega, A and B in order to minimize the degrees of A and B. For more infos, see https://www.cs.duke.edu/courses/spring10/cps296.3/decoding_rs_scribe.pdf
# In fact it ensures that the degree of the final polynomials aren't too large.
if Delta == ZERO or 2*L[l] > K \
or (2*L[l] == K and M[l] == 0):
# Rule A
B.append( Z * B[l] )
A.append( Z * A[l] )
L.append( L[l] )
M.append( M[l] )
elif (Delta != ZERO and 2*L[l] < K) \
or (2*L[l] == K and M[l] != 0):
# Rule B
B.append( sigma[l] // Delta )
A.append( omega[l] // Delta )
L.append( K - L[l] )
M.append( 1 - M[l] )
else:
raise Exception("Code shouldn't have gotten here")
return sigma[-1], omega[-1]
Polynomial and GF256int are generic implementation of, respectively, polynomials and galois fields over 2^8. These classes are unit tested and they are, normally, bug proof. Same goes for the rest of the encoding/decoding methods for Reed-Solomon such as Forney and Chien search. The full code with a quick test case for the issue I am talking here can be found here: http://codepad.org/l2Qi0y8o
Here's an example output:
Encoded message:
hello world�ꐙ�Ī`>
-------
Erasures decoding:
Erasure locator: 189x^5 + 88x^4 + 222x^3 + 33x^2 + 251x + 1
Syndrome: 149x^9 + 113x^8 + 29x^7 + 231x^6 + 210x^5 + 150x^4 + 192x^3 + 11x^2 + 41x
Sigma: 189x^5 + 88x^4 + 222x^3 + 33x^2 + 251x + 1
Symbols positions that were corrected: [19, 18, 17, 16, 15]
('Decoded message: ', 'hello world', '\xce\xea\x90\x99\x8d\xc4\xaa`>')
Correctly decoded: True
-------
Errors+Erasures decoding for the message with only erasures:
Erasure locator: 189x^5 + 88x^4 + 222x^3 + 33x^2 + 251x + 1
Syndrome: 149x^9 + 113x^8 + 29x^7 + 231x^6 + 210x^5 + 150x^4 + 192x^3 + 11x^2 + 41x
Sigma: 101x^10 + 139x^9 + 5x^8 + 14x^7 + 180x^6 + 148x^5 + 126x^4 + 135x^3 + 68x^2 + 155x + 1
Symbols positions that were corrected: [187, 141, 90, 19, 18, 17, 16, 15]
('Decoded message: ', '\xf4\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00.\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00P\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xe3\xe6\xffO> world', '\xce\xea\x90\x99\x8d\xc4\xaa`>')
Correctly decoded: False
-------
Errors+Erasures decoding for the message with erasures and one error:
Erasure locator: 77x^4 + 96x^3 + 6x^2 + 206x + 1
Syndrome: 49x^9 + 107x^8 + x^7 + 109x^6 + 236x^5 + 15x^4 + 8x^3 + 133x^2 + 243x
Sigma: 38x^9 + 98x^8 + 239x^7 + 85x^6 + 32x^5 + 168x^4 + 92x^3 + 225x^2 + 22x + 1
Symbols positions that were corrected: [19, 18, 17, 16]
('Decoded message: ', "\xda\xe1'\xccA world", '\xce\xea\x90\x99\x8d\xc4\xaa`>')
Correctly decoded: False
Here, the erasure decoding is always correct since it doesn't use BM at all to compute the erasure locator. Normally, the other two test cases should output the same sigma, but they simply don't.
The fact that the problem comes from BM is blatant here when you compare the first two test cases: the syndrome and the erasure locator are the same, but the resulting sigma is totally different (in the second test, BM is used, while in the first test case with erasures only BM is not called).
Thank you very much for any help or any idea on how I could debug this out. Note that your answers can be mathematical or code, but please explain what has gone wrong with my approach.
/EDIT: still didn't find how to correctly implement an errata BM decoder (see my answer below). The bounty is offered to anyone who can fix the issue (or at least guide me to the solution).
/EDIT2: silly me, sorry, I just re-read the schema and found that I missed the change in the assignment L = r - L - erasures_count
... I have updated the code to fix that and re-accepted my answer.
l
is not such a good variable name, due to its resemblance to1
.i
is typically used as a loop counter. – BregerK
to avoid usingl
in the rest of the code. – Barhorst