Dakota Nelson* //
Part 4: Resilient Steganography
We’ve covered the basics of hiding data in images. We’ve covered the math. We’ve covered error detection and correction using hamming codes. There have been Venn diagrams, and puppies, and secret messages, and it’s all been very exciting.
Where do we end?
With code, of course! We’ll tie everything together, slap a nice bow on it, write some documentation, and declare our steganography project complete. I left you after part 3 with a bunch of math about error correcting codes, and a promise to put that math to work, and that’s exactly what we’ll do – steganographically hide data which we’ve run through a hamming encoder so that we can correct errors caused by compression (or anything else).
All of the code for this article can be found on Github – I’ll walk you through it below, but you can always go take a look for yourself.
Let’s get started.
When you read part 2, we covered the Python code used to steganographically insert data into images. The final steganography software, in Python, which does this is the same as before (except slightly cleaned up), so I’ll skip over a full explanation of it. As a refresher – the least significant bits of each pixel in the image are basically random, so we can change them to whatever we want and the image will look the same to a human observer. This code just flips the correct bits in order to output an image with our message in it.
What happens, though, if the image is then compressed, or damaged in some way? In the last post, we walked through the math behind hamming codes, an error-correcting code that allows us to fix errors in our message. Here’s the code that puts that math into action:
def encode(msg): """ passed a list of bits (integers, 1 or 0), returns a hamming(8,4)-coded list of bits """ while len(msg) % 4 != 0: # pad the message to length msg.append(0) msg = np.reshape(np.array(msg), (-1, 4)) # create parity bits using transition matrix transition = np.mat('1,0,0,0,0,1,1,1;\ 0,1,0,0,1,0,1,1;\ 0,0,1,0,1,1,0,1;\ 0,0,0,1,1,1,1,0') result = np.dot(msg, transition) # mod 2 the matrix multiplication return np.mod(result, 2)
This is some pretty dense code, so let’s walk through it one piece at a time. First, we add zeros to the end of the message until it’s the proper length so that the matrix multiplication will work out right (the number of bits in the message must be a multiple of 4).
Once the message is the right length, we create an nx4 array of the message’s bits (where n is whatever it has to be to fit the whole message). This array is then multiplied by a transition matrix.
“Matrices,” you say, “where did they come from? We haven’t talked about any stinkin’ matrices.”
Well, astute reader, you caught me. I didn’t mention the matrices, and I’m going to mostly ignore them here, except to say this: remember when we had to count up the number of bits in each circle of the Venn diagram and then create a parity bit for each group based on what the data bits were? That’s what this matrix does. We multiply the message by this hamming code matrix, then mod the result by two (that is, take the remainder of each entry in the resulting array divided by 2) and we have ourselves a hamming-encoded message.
(If you still don’t like me not explaining the matrix multiplication, here’s the mathy version: the left half of the matrix is the identity matrix (preserving our original message), while the right half’s columns are entirely linearly independent from each other such that every generated parity bit is based on 3 data bits with no redundancy in parity. Given 4 bits of data, this matrix outputs 8 bits of “data plus parity,” known to error correcting code people as a codeword.)
Now that we’ve got a hamming-encoded message that will tolerate some errors, we insert it into an image, as usual, exactly how we’ve discussed in previous posts. We then extract it on the other end – again, as usual. The mechanics of actual steganography should be pretty familiar to you by now. (If not, go back and read part 2 for a discussion of image steganography techniques.)
Once we’ve retrieved our message from the image that our steganography algorithm put it into, how do we fix errors? That’s the whole point of this hamming error correction code thing, after all.
It turns out that the answer is more matrices. This next piece of code acts as a hamming code decoder in three parts. We’ll break each down individually.
def syndrome(msg): """ passed a list of hamming(8,4)-encoded bits (integers, 1 or 0), returns an error syndrome for that list """ msg = np.reshape(np.array(msg), (-1, 8)).T
# syndrome generation matrix transition = np.mat('0,1,1,1,1,0,0,0;\ 1,0,1,1,0,1,0,0;\ 1,1,0,1,0,0,1,0;\ 1,1,1,0,0,0,0,1') result = np.dot(transition, msg) # mod 2 the matrix multiplication return np.mod(result, 2)
The first task is to calculate a syndrome for this hamming code. This error syndrome, as it’s called, is basically a record of what’s wrong with the message – much like the syndrome of a disease, it can tell us what’s wrong with the message so we can tell how to apply our error corrections to fix it. The mechanics here are the same as the hamming encoding – get the array in the right shape, multiply with the proper hamming code syndrome matrix, then mod everything by 2.
def correct(msg, syndrome): """ passed a syndrome and a message (as received, presumably with some errors), will use the syndrome to correct the message as best possible """ # the syndrome for any incorrect bit will match the column of the syndrome # generation matrix that corresponds to the incorrect bit; a syndrome of # (1, 1, 0, 1) would indicate that the third bit has been flipped, since it # corresponds to the third column of the matrix # syndrome generation matrix (copy/pasted from above) transition = np.mat('0,1,1,1,1,0,0,0;\ 1,0,1,1,0,1,0,0;\ 1,1,0,1,0,0,1,0;\ 1,1,1,0,0,0,0,1') for synd in range(syndrome.shape): if not np.any(syndrome[:,synd]): # all zeros - no error! continue # otherwise we have an error syndrome for col in range(transition.shape): # not very pythonic iteration, but we need the index if np.array_equal(transition[:,col], syndrome[:,synd]): current_val = msg[synd,col] new_val = (current_val + 1) % 2 msg.itemset((synd,col), new_val) return msg
Once we have a syndrome, we know how to correct the message. This code above matches each syndrome to the bit that needs to be flipped by comparing each error syndrome to the syndrome generation matrix from above. If we find a match, we flip the corresponding bit – if it doesn’t match, we use the continue keyword to skip to the next iteration of the loop.
def decode(msg): r = np.mat('1,0,0,0,0,0,0,0;\ 0,1,0,0,0,0,0,0;\ 0,0,1,0,0,0,0,0;\ 0,0,0,1,0,0,0,0') res = np.dot(r, msg.T) # convert to a regular python list, which is a pain return res.T.reshape((1,-1)).tolist()
Now that we’ve corrected our message, we can decode it! Using the decoding matrix above, we do the same ol’ matrix multiplication in order to get our final message out.
So, what’s the outcome of all this? Continuing our image steganography example from part 3, this is the final result:
Here’s our original message, with text on the right and the corresponding hex values on the left:
[[email protected] hamming-stego]$ xxd -s 7 -l 45 output.txt 00000007: 7765 2772 6520 676f 696e 6720 746f 2068 we're going to h 00000017: 6964 6520 7468 6973 206d 6573 7361 6765 ide this message 00000027: 2069 6e20 616e 2069 6d61 6765 21 in an image!
This is the data after some errors have been inserted into it and it’s been padding a little bit:
[[email protected] hamming-stego]$ xxd -s 70 -l 100 output.txt 00000046: 7765 2772 6520 656b 696e 6720 746f 2068 we're eking to h 00000056: 6964 6520 7468 6973 206d 6573 7361 6765 ide this message 00000066: 2069 6e20 616e 2069 6d61 6765 0100 0400 in an image.... 00000076: 0000 0000 0000 0000 0002 0000 0000 0000 ................ 00000086: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 00000096: 0000 0800 0000 0000 0000 1000 0000 0000 ................ 000000a6: 0000 0000 ....
And this is our final, corrected output:
[[email protected] hamming-stego]$ xxd -s 54765 -l 100 output.txt 0000d5ed: 7765 2772 6520 676f 696e 6720 746f 2068 we're going to h 0000d5fd: 6964 6520 7468 6973 206d 6573 7361 6765 ide this message 0000d60d: 2069 6e20 616e 2069 6d61 6765 2100 0000 in an image!... 0000d61d: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000d62d: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000d63d: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0000d64d: 0000 0000 ....
That’s it – we corrected a message using a hamming code! Whereas before we had to repeat each character 9 times, this hamming code fits 4 bytes of data into each 8 byte code word – a 1:2 ratio of data to total instead of our 1:9 from before. A pretty sizable improvement!
Note, however, that the error correction capabilities of a hamming code are only so good. The “damaged” message up above is still pretty readable to a human – much more than that, and errors start to sneak through to the end. Maybe that’s okay… but maybe it’s not. As always, there are tradeoffs.
Want to see the full image steganography example in action? Visit https://github.com/DakotaNelson/hamming-stego and check it out! That link heads straight to Github, where you’ll find a free steganography tool in Python, usable on Linux, Mac, Windows, or anywhere else you can run Python.
If you’d like a single PDF of this entire four part series, you can head over to the Striker Security Blog to download one.
*Dakota runs Striker Security – you can find more of his writing at https://strikersecurity.com/blog