Have you ever tested a Linux application that forks into multiple processes? Isn’t it a pain? Whether you’re debugging, trying to see a process crash, or trying to write an exploit, it can be super duper annoying!
In a few days, I’m giving a talk at NorthSec in Montreal. I asked some co-workers to review my slides, and they commented that I have some neat techniques to deal with forking, so I thought I’d share a couple!
Spoiler alert: The last one is the best, so you can just skip to that. :)
Targets
I wrote two simple apps, one that forks and one that doesn’t. I’ll hopefully remember to edit in a GitHub repo for them later - and did! You can grab them here! I included everything else I use for this blog, as well.
To check out the project and follow along, go ahead and clone the repo:
$ git clone https://github.com/iagox86/forktest.git
Cloning into 'forktest'...
remote: Enumerating objects: 8, done.
remote: Counting objects: 100% (8/8), done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 8 (delta 1), reused 7 (delta 0), pack-reused 0
Receiving objects: 100% (8/8), done.
Resolving deltas: 100% (1/1), done.
$ cd forktest
I’ve included built versions of all the files, but they aren’t built to be portable so they might not work cleanly. If you need to build them yourself, I’ve included a basic Makefile:
$ make clean && make
rm -f *.o forkapp noforkapp onlyyoucanpreventforking.so patch
gcc -g -Wall -fno-stack-protector -o forkapp forkapp.c
gcc -g -Wall -fno-stack-protector -o noforkapp noforkapp.c
gcc -shared -fPIC -o onlyyoucanpreventforking.so onlyyoucanpreventforking.c
nasm -o patch patch.asm
This should work more or less the same on any 64-bit Intel Linux system.
The problem
When you run either test app, it copies the first argument into a string (unsafely) then prints it to the screen:
$ ./forkapp test
You entered: test
Let’s say you want to use strace
to view system calls. In a process that
prints a string, you’d expect to see a call to write
or something similar,
which is the system call that writes to, say, stdout (your terminal). Here’s
what it looks like without forking:
$ strace ./noforkapp test
execve("./noforkapp", ["./noforkapp", "test"], 0x7fffd7acc8f8 /* 72 vars */) = 0
[...]
write(1, "You entered: test\n", 18You entered: test
) = 18
exit_group(0) = ?
+++ exited with 0 +++
But once you add forking into the equation, you no longer see the write
syscall in strace
by default:
$ strace ./forkapp test
execve("./forkapp", ["./forkapp", "test"], 0x7ffd1b7f02c8 /* 52 vars */) = 0
[...]
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f82376b4a10) = 133314
wait4(133314, You entered: test
NULL, 0, NULL) = 133314
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=133314, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
exit_group(0) = ?
+++ exited with 0 +++
We see the string, but that’s all!
Likewise, if we overflow the stack, we should get some sorta feedback like this:
$ ./noforkapp AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
fish: Job 1, './noforkapp AAAAAAAAAAAAAAAAAAA…' terminated by signal SIGSEGV (Address boundary error)
But when it forks, we get nothing:
$ ./forkapp AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
$
If you know to look in dmesg
or journalctl
, they should catch the crash, but
it’s so easy to forget that:
$ dmesg | tail -n1
[1034540.509003] traps: forkapp[133518] general protection fault ip:40120f sp:7ffd5e41b668 error:0 in forkapp[401000+1000]
$ journalctl | grep forkapp
[...]
May 12 13:06:50 ronlab kernel: traps: forkapp[140097] general protection fault ip:40120f sp:7ffd5e41b668 error:0 in forkapp[401000+1000]
[...]
So basically, forking is a pain when reverse engineering, fuzzing, exploit testing, and basically everything else. It’s widely believe to have been a mistake (at least by exploit devs).
Technique 1: Explaining forking to your tools
The most common way to handle this, and also what I’d call the worst way (okay, upon review, Technique 2 is worse), is by configuring your tools correctly. The first problem with this is that it requires you to RTFM, which is something I’m not a fan of. The second problem is that it’s easy to forget, and then you miss stuff.
strace
has a -f
or --follow-forks
option, which will follow the child
processes:
$ strace -f ./forkapp test
execve("./forkapp", ["./forkapp", "test"], 0x7ffd203a57c0 /* 72 vars */) = 0
[...]
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLDstrace: Process 133742 attached
, child_tidptr=0x7fa1dfd5ca10) = 133742
[pid 133742] set_robust_list(0x7fa1dfd5ca20, 24) = 0
[pid 133741] wait4(133742, <unfinished ...>
[...]
[pid 133742] write(1, "You entered: test\n", 18You entered: test
) = 18
[pid 133742] exit_group(0) = ?
[pid 133742] +++ exited with 0 +++
<... wait4 resumed>NULL, 0, NULL) = 133742
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=133742, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
exit_group(0) = ?
+++ exited with 0 +++
When the -f
option is specified, you can once again see the write
syscall!
If you’re looking to see it crash, you can run the process in gdb
and set
the follow-fork-mode
option to child
, which tells gdb
to attach to the
first child process spawned:
$ gdb -q ./forkapp
Reading symbols from ./forkapp...
(gdb) set follow-fork-mode child
(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Starting program: /home/ron/tmp/forktest/forkapp AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Attaching after Thread 0x7ffff7dcb740 (LWP 134979) fork to child process 135430]
[New inferior 2 (process 135430)]
[Detaching after fork from parent process 134979]
[Inferior 1 (process 134979) detached]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Thread 2.1 "forkapp" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7dcb740 (LWP 135430)]
0x000000000040120f in main (argc=2, argv=0x7fffffffda98) at forkapp.c:27
27 }
That’s great, but also a pain!
Surely there’s a better way!
Technique 2: Globally killing fork with LD_PRELOAD
I wanted to talk about using LD_PRELOAD
for two reasons: first, it’s a neat
technique that applies to a whole bunch of other stuff; second, I wanted to have
three techniques for a better blog title!
With the LD_PRELOAD
environmental variable, you can override functions from
libraries with your own implementations! I wrote some CTF challenges last year
called loadit
, which uses this technique; you can see writeups
here.
To implement fork
yourself, you create your own program that defines your own
version of fork
that does what you want - basically, nothing. It should
return 0
, which tells the process that fork
worked and the process is the
child
process. The fun part is, since we didn’t actually fork, there is no
parent process!
Here’s an empty fork
implementation (which you can also grab from the repo):
#include <stdio.h>
#include <unistd.h>
pid_t fork(void) {
printf("The process tried to fork!\n");
return 0;
}
Here’s the command to compile it:
$ gcc -shared -fPIC -o onlyyoucanpreventforking.so onlyyoucanpreventforking.c
Then set the environmental variable, and you can freely test the app:
$ export LD_PRELOAD=./onlyyoucanpreventforking.so
$ ./forkapp AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
The process tried to fork!
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)
Until of course, tools fail to work. I’m testing this as I write, and both
strace
and gdb
seem to depend on forking. That means you have to specify
the LD_PRELOAD
environmental variable for the child, but not the parent.
Sometimes that’s easy, sometimes not.
Here’s how you can use it with strace
:
$ strace -E LD_PRELOAD=./onlyyoucanpreventforking.so ./forkapp test
[...]
write(1, "The process tried to fork!\n", 27The process tried to fork!
) = 27
write(1, "You entered: test\n", 18You entered: test
) = 18
exit_group(0) = ?
+++ exited with 0 +++
And gdb
:
$ gdb -q ./forkapp
Reading symbols from ./forkapp...
(gdb) set environment LD_PRELOAD=./onlyyoucanpreventforking.so
(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Starting program: /home/ron/tmp/forktest/forkapp AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.fedoraproject.org/>
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
The process tried to fork!
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
0x000000000040120f in main (argc=2, argv=0x7fffffffd8a8) at forkapp.c:27
27 }
Now that I am actually messing with this, it’s kind of a terrible technique and you probably shouldn’t use it. Upon review, I’m calling this the worst of the three techniques (but I refuse to re-order the blog!)
It’s interesting, though!
Technique 3: Kill it with fire (edit the binary)
My favourite technique is “burn it down”. I edit the binary’s hex code, and
literally remove the call to fork()
. That sorta thing will fail on Windows,
because of how relocations work (see
my writeup
for the CTF challenge “loca”), but works great on Linux! Note that if the parent
process actually does something, this will fail in some spectacular way. Use
at your own peril. :)
Disassembling
First, you want to disassemble the binary to find where fork()
is called.
You can use whatever disassembler you’re comfortable with; IDA or Ghidra are
common choices, but I’ll use objdump
since it’s always handy on Linux.
When I run objdump
, I pass two flags: -M intel
to emit the more familiar
Intel syntax (default is AT&T, which you don’t see much anymore); and -d
to
disassemble the file.
Here’s what the output looks like (note that it’s quite long, so be prepared to
pipe into grep
or less
or redirect into a file you can search:
$ objdump -M intel -d ./forkapp | grep -C3 'fork'
[...]
4011a6: e8 a5 fe ff ff call 401050 <fprintf@plt>
4011ab: bf 01 00 00 00 mov edi,0x1
4011b0: e8 bb fe ff ff call 401070 <exit@plt>
4011b5: e8 c6 fe ff ff call 401080 <fork@plt>
4011ba: 89 45 fc mov DWORD PTR [rbp-0x4],eax
4011bd: 83 7d fc 00 cmp DWORD PTR [rbp-0x4],0x0
4011c1: 75 32 jne 4011f5 <main+0x7f>
[...]
Note that there might be more than one call to fork
- if that’s the case, you
can try replacing each one, or just replace all of them to see what happens!
The left-most column is the virtual address where the code will load - the
actual address doesn’t matter, since it’s not the in-file address, but as
far as I can tell there’s no good way to get the in-file address with objdump
(although --file-offsets
looked promising!). It will, however, share the last
4 digits with the in-file address, which can help disambiguate things.
The second column is the machine code - it’s important to note how many bytes
the call to fork
takes up, and which bytes they are so we can recognize them
later. It should always be 5 bytes, but the values will change in each app; in
our case, it’s e8 c6 fe ff ff
. You CANNOT add or subtract bytes without a
world of problems, so you’re going to need to replace those five bytes with
something else that’s exactly 5 bytes. That’s super important!
Building a patch
Now that we know we need to replace 5 bytes, and what the original bytes are, but what do we replace them with?
There are lots of options, but let’s use nasm
to create the simplest patch
we can. Here’s some 64-bit assembly code that simulates a function that does
nothing but returns 0 (not that since it’s never actually called, we don’t
actually have to return):
bits 64
mov rax, 0
We can assemble that with nasm
, then use hexdump
to check what it becomes:
$ nasm -o patch patch.asm
$ hexdump -C patch
00000000 b8 00 00 00 00 |.....|
00000005
It’s handy that the naive solution is already 5 bytes! If you need to take up
more space, you can add nop
(which is one byte - 90
) as many times as you want,
before or after the instruction. If you need to take up less space, you need
to get creative and find shorter ways to do things. Replacing mov rax, 0
with
xor rax, rax
is one such optimization.
Insert the patch
We’re going to literally change the binary to replace call fork
with our
patch, using a hex editor! You can use whatever hex editor you like (I often
use xvi32
, which is super old and janky). For demo purposes, I’ll use xxd
to convert the binary to hex, and xxd -r
to convert back.
To convert forkapp
to hex using xxd
, run xxd
and redirect the binary into
it on stdin. I use -g1
since that format is slightly more familiar to me:
$ xxd -g1 < forkapp > forkapp.hex
Then I open the file in whatever text editor I like, and find the 5 bytes that
we noted earlier - e8 c6 fe ff ff
. Hopefully they should only appear once; if
they appear multiple times, look for an offset in the file that looks similar to
the offset in objdump
. Here’s what it looks like in forkapp.hex
:
000011a0: c7 b8 00 00 00 00 e8 a5 fe ff ff bf 01 00 00 00 ................
000011b0: e8 bb fe ff ff*e8 c6 fe ff ff*89 45 fc 83 7d fc ...........E..}.
000011c0: 00 75 32 48 8b 45 d0 48 83 c0 08 48 8b 10 48 8d .u2H.E.H...H..H.
Replace those bytes with our patch - b8 00 00 00 00
:
000011a0: c7 b8 00 00 00 00 e8 a5 fe ff ff bf 01 00 00 00 ................
000011b0: e8 bb fe ff ff b8 00 00 00 00 89 45 fc 83 7d fc ...........E..}.
000011c0: 00 75 32 48 8b 45 d0 48 83 c0 08 48 8b 10 48 8d .u2H.E.H...H..H.
Then use xxd -r
to convert back to binary:
$ xxd -r < forkapp.hex > forkapp.patched
Optionally, disassemble again to ensure it worked:
$ objdump -M intel -d forkapp.patched | grep -C2 '4011b5:'
4011ab: bf 01 00 00 00 mov edi,0x1
4011b0: e8 bb fe ff ff call 401070 <exit@plt>
4011b5: b8 00 00 00 00 mov eax,0x0
4011ba: 89 45 fc mov DWORD PTR [rbp-0x4],eax
4011bd: 83 7d fc 00 cmp DWORD PTR [rbp-0x4],0x0
Go go go!
Then make the newly patched binary executable with chmod +x
, and do your
testing:
$ chmod +x forkapp.patched
$ ./forkapp.patched AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
fish: Job 1, './forkapp.patched AAAAAAAAAAAAA…' terminated by signal SIGSEGV (Address boundary error)
$ strace ./forkapp.patched test
[...]
write(1, "You entered: test\n", 18You entered: test
) = 18
exit_group(0) = ?
$ gdb -q ./forkapp.patched
(gdb) run AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
[...]
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Program received signal SIGSEGV, Segmentation fault.
0x000000000040120f in main (argc=2, argv=0x7fffffffda88) at forkapp.c:27
27 }
Pretty much everything will work as if there was never a fork!
Comments
Join the conversation on this Mastodon post (replies will appear below)!
Loading comments...