This is a short tutorial on Linear Codes, using Sage. To follow along with the code, you must either download Sage or use it online. Both options are free and you can find the links here: sagemath.org
Size of input word: k
Size of coded word: n
Number of coded words: 2k
Generator Matrix: [I;P] where I is kxk and P=kxn–k
I : Identity Matrix
P : An Arbitrary Matrix
The Minimum Hamming Distance (MHD) is very important in Code Theory because it is closely related to the code’s ability to correct errors. A code with MHD=d can only correct (d-1)/2 errors.
The MHD depends heavily on P, so deciding on its values is of the utmost importance. A safe strategy to build around is by cycling a string of bits k times.
We cycled the binary word 11100 three times and produced the above matrix.
Choosing all 1s or all 0s is an especially bad practise and will result in the shortest possible MHD. Increasing n generally increases MHD, but one will still have some experimenting to do.
Let’s define two coded words c1 and c2. Their modulo 2 addition is also a coded word. c1 + c2 = c3, where c3 a valid coded word. Because of that, the code will always include the word 0n (n zeros in a row – 000…000). We can see that by adding the same word c twice. c + c = 0.
This is how a coded word is calculated:
d = d1,d2,d3…dk – Input Word
c = c1,c2,c3…cn – Coded Word (output)
The first k bits of the coded word c are the same with the input word d:
c1 = d1
c2 = d2
c3 = d3
ck = dk
The rest bits of the coded word are computed like this:
ck+1 = p11*d1 + p21*d2 + p31*d3 + pk1*dk
ck+2 = p12*d1 + p22*d2 + p32*d3 + pk2*dk
cn = p1,n-k*d1 + p2,n-k*d2 + p3,n-k*d3 + pk,n-k*dk
All multiplications are modulo 2.
If we represent the coded word with the 1xn matrix C, the input word with the 1xk matrix D and our generator matrix with G, we can write the above with the formula:
Linear codes are systematic codes, which means the input data can be found in the coded word. In this case, the input will be at the front of the word (ci = di for i in [i,k]).
Let’s say we want to encode a message of m bits, with m > k. We simply divide the message into parts of k and encode them seperately, joining them all together in the end to form our coded message.
Because of that, it is better if we choose k as a number that divides m perfectly. For example, if m is in ASCII, which is represented by 8-bits (although the first is not used), we can choose k to be 2, 4 or 8 (or even 1).
After you open up a new Sage project, add this code in the editor:
MS = MatrixSpace(GF(2),8,16); G = MS([ [1,0,0,0,0,0,0,0,1,0,1,1,1,0,0,0], [0,1,0,0,0,0,0,0,0,1,0,1,1,1,0,0], [0,0,1,0,0,0,0,0,0,0,1,0,1,1,1,0], [0,0,0,1,0,0,0,0,0,0,0,1,0,1,1,1], [0,0,0,0,1,0,0,0,1,0,0,0,1,0,1,1], [0,0,0,0,0,1,0,0,1,1,0,0,0,1,0,1], [0,0,0,0,0,0,1,0,1,1,1,0,0,0,1,0], [0,0,0,0,0,0,0,1,0,1,1,1,0,0,0,1] ]);
MS is the Matrix Space the Generator Matrix will be in. GF(2) means that the values in the space will be binary (0 or 1). The dimension of the matrices in the space will be 8. That also means the input word will be 8 bits long. As a result, there will be 28 coded words in total. The legnth of a coded word is 16.
G is our generator matrix. It comes from MS (our Matrix Space). We simply give it the values we want. If you pay close attention, you will notice that G is in the form of [I;P] (where I is 8x8 and P=8x16-8). In every line, the first 8 values come from the Identity Matrix, and the last 8 are some arbitrary values. Notice though that we haven’t put just some random values, but we have cycled a 5-bit word into a space of 8 bits. Because the dimension of G is 8, we can only cycle those 5 bits 8 times.
Now it is time to actually create our Linear Code. Add:
And voila. Our Linear Code is C. We can quickly check the code’s minimum distance (MHD) by writing:
This code has a minimum distance of 5. That means it can correct (5-1)/2=2 errors. (NOTE: If MHD is even, then the number of errors the code can correct is x.5. The .5 means that it can correct all errors of size x, and it can correct half of the errors of size x+1, but not any for bigger sizes.)
Let’s say we have a message that we want to encode. We will call this message m, and it will be of length l. Keep in mind that m is in binary and l is the number of bits in the message. If you have a non-binary message you want to encode, you’d have to convert it to binary.
As previously mentioned, we need to divide our word into parts of length k. Assuming we chose k as a number that perfectly divides l, we can divide l into l/k parts, all of which will have a length of k.
Now we go through each part and encode it, joining all the coded parts to form our coded word. The coded word will have a length of n*l/k.
In general, we can write something like this:
l = len(m); c = ; for i in range(l/k): w = C.encode(m[i*k:i*k+k]); c.append(w);
The above code divides m into l/k parts and encodes them one-by-one, adding each separate result into one final coded word. The part of the message from i*k to i*k+k is the part that will be encoded on each loop.
For our example, we replace k with 8 and we have:
l = len(m); c = ; for i in range(l/8): w = C.encode(m[i*8:i*8+8]); c.append(w);
Keep in mind that every time we encode a part, a vector is created. So, c is in fact a list of vectors, where each vector is a coded word on its own. If we join the vectors together one after the other, we will end up with our true coded word in string (or anything we want) from.
To decode a word we do the exact opposite of encoding. We have our list of coded vectors, c, and we need to decode them one-by-one and join them together to form our original message. Let s be the size of c, aka the number of vectors in the list. We will store the decoded message to decodedMessage. Because we want the decoded word as a string, we will also have to do some convertion. After we decode each vector, we get another vector of k bits. We will be converting these bits to string one-by-one and we will add them to decodedMessage. In the end, we have this piece of code:
decodedMessage = ""; for i in range(s): decodedPart = C.decode_to_message(c[i]); for j in range(k): decodedMessage += str(decodedPart[j]);
The temporary variable decodedPart is used for storing the decoded vector. We then need to parse the k bits in decodedPart to string and add them to decodedMessage.
After the above code, decodedMessage will hold our original message in binary form.
MS = MatrixSpace(GF(2),8,16); G = MS([ [1,0,0,0,0,0,0,0,1,0,1,1,1,0,0,0], [0,1,0,0,0,0,0,0,0,1,0,1,1,1,0,0], [0,0,1,0,0,0,0,0,0,0,1,0,1,1,1,0], [0,0,0,1,0,0,0,0,0,0,0,1,0,1,1,1], [0,0,0,0,1,0,0,0,1,0,0,0,1,0,1,1], [0,0,0,0,0,1,0,0,1,1,0,0,0,1,0,1], [0,0,0,0,0,0,1,0,1,1,1,0,0,0,1,0], [0,0,0,0,0,0,0,1,0,1,1,1,0,0,0,1] ]); C=LinearCode(G); C.minimum_distance(); message = "Hello World!"; #Our message to encode m = binascii.hexlify(message); #Turn to hex message = ""; #We will put the message here, converted to binary #Convert message to binary for i in range(0, len(m), 2): binary = bin(int(m[i:i+2], 16))[2:]; message += (8 - len(binary)) * '0' + binary; m = vector(map(int, message));#Turn into vector #Encoding the message l = len(m); c = ; for i in range(l/8): w = C.encode(m[i*8:i*8+8]); c.append(w); #Decoding the message decodedMessage = ""; for i in range(s): decodedPart = C.decode_to_message(c[i]); for j in range(k): decodedMessage += str(decodedPart[j]); decodedMessage.replace(' ','').strip(); #Remove any rogue space n = int(decodedMessage, 2); #Binary to decimal #From binary to string decodedMessage = binascii.unhexlify('%x' % n); #Our message, decoded. It's the same as the original. decodedMessage;