Fork off: Three ways to deal with forking processes

Have you ever tested a Linux application that forks into multiple processes? Isn’t it a pain? Whether you’re debugging, trying to see a process crash, or trying to write an exploit, it can be super duper annoying!

In a few days, I’m giving a talk at NorthSec in Montreal. I asked some co-workers to review my slides, and they commented that I have some neat techniques to deal with forking, so I thought I’d share a couple!

Spoiler alert: The last one is the best, so you can just skip to that. :)

Targets

I wrote two simple apps, one that forks and one that doesn’t. I’ll hopefully remember to edit in a GitHub repo for them later - and did! You can grab them here! I included everything else I use for this blog, as well.

To check out the project and follow along, go ahead and clone the repo:

$ git clone https://github.com/iagox86/forktest.git
Cloning into 'forktest'...
remote: Enumerating objects: 8, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 8 (delta 1), reused 7 (delta 0), pack-reused 0
Receiving objects: 100% (8/8), done.
Resolving deltas: 100% (1/1), done.

$ cd forktest

I’ve included built versions of all the files, but they aren’t built to be portable so they might not work cleanly. If you need to build them yourself, I’ve included a basic Makefile:

$ make clean && make
rm -f *.o forkapp noforkapp onlyyoucanpreventforking.so patch
gcc -g -Wall -fno-stack-protector -o forkapp forkapp.c
gcc -g -Wall -fno-stack-protector -o noforkapp noforkapp.c
gcc -shared -fPIC -o onlyyoucanpreventforking.so onlyyoucanpreventforking.c
nasm -o patch patch.asm

This should work more or less the same on any 64-bit Intel Linux system.

The problem

When you run either test app, it copies the first argument into a string (unsafely) then prints it to the screen:

$ ./forkapp test
You entered: test

Let’s say you want to use strace to view system calls. In a process that prints a string, you’d expect to see a call to write or something similar, which is the system call that writes to, say, stdout (your terminal). Here’s what it looks like without forking:

$ strace ./noforkapp test
execve("./noforkapp", ["./noforkapp", "test"], 0x7fffd7acc8f8 /* 72 vars */) = 0

[...]

write(1, "You entered: test\n", 18You entered: test
)     = 18
exit_group(0)                           = ?
+++ exited with 0 +++

But once you add forking into the equation, you no longer see the write syscall in strace by default:

$ strace ./forkapp test
execve("./forkapp", ["./forkapp", "test"], 0x7ffd1b7f02c8 /* 52 vars */) = 0

[...]

clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f82376b4a10) = 133314
wait4(133314, You entered: test
NULL, 0, NULL)            = 133314
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=133314, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
exit_group(0)                           = ?
+++ exited with 0 +++

We see the string, but that’s all!

Likewise, if we overflow the stack, we should get some sorta feedback like this:

$ ./noforkapp AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
fish: Job 1, './noforkapp AAAAAAAAAAAAAAAAAAA…' terminated by signal SIGSEGV (Address boundary error)

But when it forks, we get nothing:

$ ./forkapp AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
$

If you know to look in dmesg or journalctl, they should catch the crash, but it’s so easy to forget that:

$ dmesg | tail -n1
[1034540.509003] traps: forkapp[133518] general protection fault ip:40120f sp:7ffd5e41b668 error:0 in forkapp[401000+1000]

$ journalctl | grep forkapp

[...]
May 12 13:06:50 ronlab kernel: traps: forkapp[140097] general protection fault ip:40120f sp:7ffd5e41b668 error:0 in forkapp[401000+1000]
[...]

So basically, forking is a pain when reverse engineering, fuzzing, exploit testing, and basically everything else. It’s widely believe to have been a mistake (at least by exploit devs).

Technique 1: Explaining forking to your tools

The most common way to handle this, and also what I’d call the worst way (okay, upon review, Technique 2 is worse), is by configuring your tools correctly. The first problem with this is that it requires you to RTFM, which is something I’m not a fan of. The second problem is that it’s easy to forget, and then you miss stuff.

strace has a -f or --follow-forks option, which will follow the child processes:

$ strace -f ./forkapp test
execve("./forkapp", ["./forkapp", "test"], 0x7ffd203a57c0 /* 72 vars */) = 0

[...]

clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 133742 attached
, child_tidptr=0x7fa1dfd5ca10) = 133742
[pid 133742] set_robust_list(0x7fa1dfd5ca20, 24) = 0
[pid 133741] wait4(133742,  <unfinished ...>

[...]

[pid 133742] write(1, "You entered: test\n", 18You entered: test
) = 18
[pid 133742] exit_group(0)              = ?
[pid 133742] +++ exited with 0 +++
<... wait4 resumed>NULL, 0, NULL)       = 133742
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=133742, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
exit_group(0)                           = ?
+++ exited with 0 +++

When the -f option is specified, you can once again see the write syscall! If you’re looking to see it crash, you can run the process in gdb and set the follow-fork-mode option to child, which tells gdb to attach to the first child process spawned:

$ gdb -q ./forkapp
Reading symbols from ./forkapp...
(gdb) set follow-fork-mode child
(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Starting program: /home/ron/tmp/forktest/forkapp AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Attaching after Thread 0x7ffff7dcb740 (LWP 134979) fork to child process 135430]
[New inferior 2 (process 135430)]
[Detaching after fork from parent process 134979]
[Inferior 1 (process 134979) detached]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Thread 2.1 "forkapp" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7dcb740 (LWP 135430)]
0x000000000040120f in main (argc=2, argv=0x7fffffffda98) at forkapp.c:27
27      }

That’s great, but also a pain!

Surely there’s a better way!

Technique 2: Globally killing fork with `LD_PRELOAD`

I wanted to talk about using LD_PRELOAD for two reasons: first, it’s a neat technique that applies to a whole bunch of other stuff; second, I wanted to have three techniques for a better blog title!

With the LD_PRELOAD environmental variable, you can override functions from libraries with your own implementations! I wrote some CTF challenges last year called loadit, which uses this technique; you can see writeups here.

To implement fork yourself, you create your own program that defines your own version of fork that does what you want - basically, nothing. It should return 0, which tells the process that fork worked and the process is the child process. The fun part is, since we didn’t actually fork, there is no parent process!

Here’s an empty fork implementation (which you can also grab from the repo):

#include <stdio.h>
#include <unistd.h>

pid_t fork(void) {
  printf("The process tried to fork!\n");
  return 0;
}

Here’s the command to compile it:

$ gcc -shared -fPIC -o onlyyoucanpreventforking.so onlyyoucanpreventforking.c

Then set the environmental variable, and you can freely test the app:

$ export LD_PRELOAD=./onlyyoucanpreventforking.so

$ ./forkapp AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
The process tried to fork!
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)

Until of course, tools fail to work. I’m testing this as I write, and both strace and gdb seem to depend on forking. That means you have to specify the LD_PRELOAD environmental variable for the child, but not the parent. Sometimes that’s easy, sometimes not.

Here’s how you can use it with strace:

$ strace -E LD_PRELOAD=./onlyyoucanpreventforking.so ./forkapp test

[...]

write(1, "The process tried to fork!\n", 27The process tried to fork!
) = 27
write(1, "You entered: test\n", 18You entered: test
)     = 18
exit_group(0)                           = ?
+++ exited with 0 +++

And gdb:

$ gdb -q ./forkapp
Reading symbols from ./forkapp...

(gdb) set environment LD_PRELOAD=./onlyyoucanpreventforking.so

(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Starting program: /home/ron/tmp/forktest/forkapp AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
The process tried to fork!
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x000000000040120f in main (argc=2, argv=0x7fffffffd8a8) at forkapp.c:27
27      }

Now that I am actually messing with this, it’s kind of a terrible technique and you probably shouldn’t use it. Upon review, I’m calling this the worst of the three techniques (but I refuse to re-order the blog!)

It’s interesting, though!

Technique 3: Kill it with fire (edit the binary)

My favourite technique is “burn it down”. I edit the binary’s hex code, and literally remove the call to fork(). That sorta thing will fail on Windows, because of how relocations work (see my writeup for the CTF challenge “loca”), but works great on Linux! Note that if the parent process actually does something, this will fail in some spectacular way. Use at your own peril. :)

Disassembling

First, you want to disassemble the binary to find where fork() is called. You can use whatever disassembler you’re comfortable with; IDA or Ghidra are common choices, but I’ll use objdump since it’s always handy on Linux.

When I run objdump, I pass two flags: -M intel to emit the more familiar Intel syntax (default is AT&T, which you don’t see much anymore); and -d to disassemble the file.

Here’s what the output looks like (note that it’s quite long, so be prepared to pipe into grep or less or redirect into a file you can search:

$ objdump -M intel -d ./forkapp | grep -C3 'fork'

[...]

  4011a6:       e8 a5 fe ff ff          call   401050 <fprintf@plt>
  4011ab:       bf 01 00 00 00          mov    edi,0x1
  4011b0:       e8 bb fe ff ff          call   401070 <exit@plt>
  4011b5:       e8 c6 fe ff ff          call   401080 <fork@plt>
  4011ba:       89 45 fc                mov    DWORD PTR [rbp-0x4],eax
  4011bd:       83 7d fc 00             cmp    DWORD PTR [rbp-0x4],0x0
  4011c1:       75 32                   jne    4011f5 <main+0x7f>

  [...]

Note that there might be more than one call to fork - if that’s the case, you can try replacing each one, or just replace all of them to see what happens!

The left-most column is the virtual address where the code will load - the actual address doesn’t matter, since it’s not the in-file address, but as far as I can tell there’s no good way to get the in-file address with objdump (although --file-offsets looked promising!). It will, however, share the last 4 digits with the in-file address, which can help disambiguate things.

The second column is the machine code - it’s important to note how many bytes the call to fork takes up, and which bytes they are so we can recognize them later. It should always be 5 bytes, but the values will change in each app; in our case, it’s e8 c6 fe ff ff. You CANNOT add or subtract bytes without a world of problems, so you’re going to need to replace those five bytes with something else that’s exactly 5 bytes. That’s super important!

Building a patch

Now that we know we need to replace 5 bytes, and what the original bytes are, but what do we replace them with?

There are lots of options, but let’s use nasm to create the simplest patch we can. Here’s some 64-bit assembly code that simulates a function that does nothing but returns 0 (not that since it’s never actually called, we don’t actually have to return):

bits 64

mov rax, 0

We can assemble that with nasm, then use hexdump to check what it becomes:

$ nasm -o patch patch.asm
$ hexdump -C patch
00000000  b8 00 00 00 00                                    |.....|
00000005

It’s handy that the naive solution is already 5 bytes! If you need to take up more space, you can add nop (which is one byte - 90) as many times as you want, before or after the instruction. If you need to take up less space, you need to get creative and find shorter ways to do things. Replacing mov rax, 0 with xor rax, rax is one such optimization.

Insert the patch

We’re going to literally change the binary to replace call fork with our patch, using a hex editor! You can use whatever hex editor you like (I often use xvi32, which is super old and janky). For demo purposes, I’ll use xxd to convert the binary to hex, and xxd -r to convert back.

To convert forkapp to hex using xxd, run xxd and redirect the binary into it on stdin. I use -g1 since that format is slightly more familiar to me:

$ xxd -g1 < forkapp > forkapp.hex

Then I open the file in whatever text editor I like, and find the 5 bytes that we noted earlier - e8 c6 fe ff ff. Hopefully they should only appear once; if they appear multiple times, look for an offset in the file that looks similar to the offset in objdump. Here’s what it looks like in forkapp.hex:

000011a0: c7 b8 00 00 00 00 e8 a5 fe ff ff bf 01 00 00 00  ................
000011b0: e8 bb fe ff ff*e8 c6 fe ff ff*89 45 fc 83 7d fc  ...........E..}.
000011c0: 00 75 32 48 8b 45 d0 48 83 c0 08 48 8b 10 48 8d  .u2H.E.H...H..H.

Replace those bytes with our patch - b8 00 00 00 00:

000011a0: c7 b8 00 00 00 00 e8 a5 fe ff ff bf 01 00 00 00  ................
000011b0: e8 bb fe ff ff b8 00 00 00 00 89 45 fc 83 7d fc  ...........E..}.
000011c0: 00 75 32 48 8b 45 d0 48 83 c0 08 48 8b 10 48 8d  .u2H.E.H...H..H.

Then use xxd -r to convert back to binary:

$ xxd -r < forkapp.hex > forkapp.patched

Optionally, disassemble again to ensure it worked:

$ objdump  -M intel -d forkapp.patched | grep -C2 '4011b5:'
  4011ab:       bf 01 00 00 00          mov    edi,0x1
  4011b0:       e8 bb fe ff ff          call   401070 <exit@plt>
  4011b5:       b8 00 00 00 00          mov    eax,0x0
  4011ba:       89 45 fc                mov    DWORD PTR [rbp-0x4],eax
  4011bd:       83 7d fc 00             cmp    DWORD PTR [rbp-0x4],0x0

Go go go!

Then make the newly patched binary executable with chmod +x, and do your testing:

$ chmod +x forkapp.patched 

$ ./forkapp.patched AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
fish: Job 1, './forkapp.patched AAAAAAAAAAAAA…' terminated by signal SIGSEGV (Address boundary error)

$ strace ./forkapp.patched test

[...]

write(1, "You entered: test\n", 18You entered: test
)     = 18
exit_group(0)                           = ?

$ gdb -q ./forkapp.patched 

(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

[...]

You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x000000000040120f in main (argc=2, argv=0x7fffffffda88) at forkapp.c:27
27      }

Pretty much everything will work as if there was never a fork!