Understanding buffer overflows (I)

In this series of articles I will try to explain the basics behind buffer overflow exploits. I will try to make this as easy and accessible as possible, but I cannot start explaining what a computer is or how it works. So the required knowledge to understand this series are as follows:
  • Basic computer (Von Neumann) architecture - knowing how a computer works, what is a CPU, what is memory, what is the stack, what is an address.
  • Basic assembly - knowing what machine code means and how this translate to assembly.
  • Basic C (with pointers) - comfortable with C and pointers.
  • Comfortable user level on Linux/UNIX - exprience working on UNIX-like environment.
And the needed tools are:
  • Your favorite 32-bit Linux distro (I will be using Ubuntu 12.04).
  • GCC C compiler (I will be using 4.6.3) (sudo apt-get install build-essential). If you never used GCC don't worry, I'll cover its basic usage in this series.
  • GDB debugger (I will be using 7.4-2012.04) (sudo apt-get install gdb). I will also cover GDB usage.
  • NASM assembler (I will be using 2.07) (sudo apt-get install nasm). Small, fast and support for various syntaxes and output formats. I can't live without it.
  • Execstack  (I will be using 1.0) (sudo apt-get install execstack). Only needed for compiling, you don't need to worry about this one.
If you don't have (or don't want to use) a 32-bit Linux distro, you can compile with the flag -m32 to get 32-bit executables anyway and be able to follow this article.

Here is a fast reference about Intel instructions for the curious.

The vulnerable program

First we're going to build a very simple exploitable program to use it to understand how buffer overflows work

buffoverflow1.c

#include <stdio.h>

int main()
{
    char name[10];
    printf("Please input your name: ");
    scanf("%s", name);
    printf("Your name is %s\n", name);
    return 0;
}

So as you see, this does not much except wait for a string then print it out. Nothing fancy. So let's try compiling this. I will first remove buffer overflow protections from both the OS and the C compiler. So first turning off ASLR (don't forget to turn it on again after you've finished):

echo 0 > /proc/sys/kernel/randomize_va_space

And then compiling with stack execution activated.

cc -fno-stack-protector -z execstack -g -o buffoverflow1 buffoverflow1.c

Let's execute this:

./buffoverflow1
Please input your name: David
Your name is David

Looks like it's working correctly. What's the problem with this program?
  • The buffer for the input string (name) is too small..
  • The buffer for the input string is statically allocated.
  • Uses scanf() and thus does no check for buffer overflows
Let's see an example on this vulnerability on name buffer. What happens when we input a name larger than the program buffer?

Please input your name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Your name is aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Segmentation fault (core dumped)

Hmmm looks interesting. Let's see what GDB can show us about this.

gdb -q buffoverflow1
Reading symbols from /some/path/buffoverflow1...done.
(gdb) r
Starting program: /some/path/buffoverflow1
Please input your name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Your name is aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Program received signal SIGSEGV, Segmentation fault.
0x61616161 in ?? ()

As you can see, the program crashed because of some strange value 0x61616161. To see what's happening let's put a breakpoint after the printf(), so line 9.

(gdb) b 9
Breakpoint 1 at 0x8048474: file buffoverflow1.c, line 9.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /some/path/buffoverflow1 
Please input your name: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Your name is aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

Breakpoint 1, main () at buffoverflow1.c:9
9  return 0;

Now we stopped at the return. Let's see the assembly for the main function.
 
(gdb) disas main
Dump of assembler code for function main:
   0x08048434 <+0>: push   %ebp
   0x08048435 <+1>: mov    %esp,%ebp
   0x08048437 <+3>: and    $0xfffffff0,%esp
   0x0804843a <+6>: sub    $0x20,%esp
   0x0804843d <+9>: mov    $0x8048550,%eax
   0x08048442 <+14>: mov    %eax,(%esp)
   0x08048445 <+17>: call   0x8048340 
   0x0804844a <+22>: mov    $0x8048569,%eax
   0x0804844f <+27>: lea    0x16(%esp),%edx
   0x08048453 <+31>: mov    %edx,0x4(%esp)
   0x08048457 <+35>: mov    %eax,(%esp)
   0x0804845a <+38>: call   0x8048370 <__isoc99_scanf plt="plt">
   0x0804845f <+43>: mov    $0x804856c,%eax
   0x08048464 <+48>: lea    0x16(%esp),%edx
   0x08048468 <+52>: mov    %edx,0x4(%esp)
   0x0804846c <+56>: mov    %eax,(%esp)
   0x0804846f <+59>: call   0x8048340 
=> 0x08048474 <+64>: mov    $0x0,%eax
   0x08048479 <+69>: leave  
   0x0804847a <+70>: ret    
End of assembler dump.

As you can see, we're about to execute mov $0x0,%eax instruction (EAX is the register used to return values from functions). Let's step one instruction at a time to see where the segmentation fault happens.

(gdb) s
10 }
(gdb) disas main
Dump of assembler code for function main:
   0x08048434 <+0>: push   %ebp
   0x08048435 <+1>: mov    %esp,%ebp
   0x08048437 <+3>: and    $0xfffffff0,%esp
   0x0804843a <+6>: sub    $0x20,%esp
   0x0804843d <+9>: mov    $0x8048550,%eax
   0x08048442 <+14>: mov    %eax,(%esp)
   0x08048445 <+17>: call   0x8048340 
   0x0804844a <+22>: mov    $0x8048569,%eax
   0x0804844f <+27>: lea    0x16(%esp),%edx
   0x08048453 <+31>: mov    %edx,0x4(%esp)
   0x08048457 <+35>: mov    %eax,(%esp)
   0x0804845a <+38>: call   0x8048370 <__isoc99_scanf plt="plt">
   0x0804845f <+43>: mov    $0x804856c,%eax
   0x08048464 <+48>: lea    0x16(%esp),%edx
   0x08048468 <+52>: mov    %edx,0x4(%esp)
   0x0804846c <+56>: mov    %eax,(%esp)
   0x0804846f <+59>: call   0x8048340 
   0x08048474 <+64>: mov    $0x0,%eax
=> 0x08048479 <+69>: leave  
   0x0804847a <+70>: ret    
End of assembler dump.

We stepped one instruction so we're now at LEAVE instruction. Let's step once again.

(gdb) s
Warning:
Cannot insert breakpoint 0.
Error accessing memory address 0x61616161: Input/output error.

0x61616161 in ?? ()

Ok it's either the LEAVE or the RET instruction that is causing the segment violation trying to access address 0x61616161. If it is trying to access memory address 0x61616161 then some register has to contain this value. Let's check:

(gdb) i r esp
esp            0xbffff120 0xbffff120
(gdb) i r ebp
ebp            0x61616161 0x61616161

Bingo!

Stack is fun!

As we saw above, EBP holds the 0x61616161 value. But first, what is EBP register? EBP is used to point to the bottom of the local stack frame. If you check main disassembly you can see how EBP is saved into the stack with PUSH, and then set to ESP value when entering the function.

   0x08048434 <+0>: push   %ebp
   0x08048435 <+1>: mov    %esp,%ebp

ESP is the stack pointer, that points to current top of stack. EBP thus indicates the bottom of the stack for this function, that is, where the start of stack for this function is at. Generally EBP >= ESP since stack grows downward in memory. This is called the function entry protocol.

The LEAVE instruction will do the opposite of the above 2 instructions: it will first put ESP = EBP and restore EBP from the stack, leaving ESP and EBP with the same values that were there before entering this function. This is called the function exit protocol.

The RET instruction returns from a function called previously by the CALL instruction. CALL instruction pushes the return address -the address to where the code must continue after finishing executing the function- to the top of the stack. RET just pops this return address from the top of the stack and sets it to EIP, which is the instruction pointer -the register that holds the address of next instruction to be executed-. By the way, what value EIP has?

(gdb) i r eip
eip            0x61616161 0x61616161

And bingo number 2. This is what is causing the segmentation fault. This means we somehow managed to overwrite the return address of main function. But how did this happen?

Local variables are reserved (or allocated) into the stack. So if you define a variable inside a function, the space to hold this variable will be allocated from the stack. This means the name variable in our vulnerable program is allocated into the stack. We can guess the address of name by taking a look at main code.

   0x08048453 <+31>: mov    %edx,0x4(%esp)
   0x08048457 <+35>: mov    %eax,(%esp)
   0x0804845a <+38>: call   0x8048370 <__isoc99_scanf@plt>

This is the call to scanf, as you can guess from the CALL instruction. The two previous MOV instructions are for passing the function parameters. In Intel architecture, all parameters are passed into the stack in reverse order. For scanf, we have 2 parameters. So first MOV is the second parameter -the pointer to our buffer-. This is what we need. We will run the program again, put a breakpoint there, and check what value is ESP + 4.

(gdb) b *0x0804845a
Breakpoint 1 at 0x804845a: file buffoverflow1.c, line 7.
(gdb) r
Starting program: /some/path/buffoverflow1 

Breakpoint 1, 0x0804845a in main () at buffoverflow1.c:7
7  scanf("%s", name);
(gdb) p /x $esp + 4
$1 = 0xbffff0f4
(gdb) x /xw 0xbffff0f4
0xbffff0f4: 0xbffff106

So first I set the breakpoint as said, then run the program. Once it stops at the scanf call, I calculate ESP + 4 = 0xbffff0f4. This is the addres of the name variable as we said previously. Since it's a pointer, it contains the pointer to the memory zone where our input is going to be stored. We find out what this pointer is, which turns out to be 0xbffff106. We can check to see what's on the buffer before the scanf call. Here's the dump of 20 bytes of the buffer:

(gdb) x /20xb 0xbffff106
0xbffff106: 0x00 0x00 0x89 0x84 0x04 0x08 0xf4 0xff
0xbffff10e: 0xfb 0xb7 0x80 0x84 0x04 0x08 0x00 0x00
0xbffff116: 0x00 0x00 0x00 0x00

Just garbage, meaningless numbers. Now we can execute the scanf and check this buffer again.

(gdb) s
Please input your name: Jason
8  printf("Your name is %s\n", name);
(gdb) x /20xb 0xbffff106
0xbffff106: 0x4a 0x61 0x73 0x6f 0x6e 0x00 0xf4 0xff
0xbffff10e: 0xfb 0xb7 0x80 0x84 0x04 0x08 0x00 0x00
0xbffff116: 0x00 0x00 0x00 0x00

Now it contains our input:

0x4a    0x61    0x73    0x6f    0x6e    0x00
J       a       s       o       n       \0

Now as we said before, the return address is also stored in the stack. To find out where, we just have to stop just before the RET instruction is executed and check the top of the stack.

(gdb) c
Continuing.
Your name is Jason

Breakpoint 2, 0x0804847a in main () at buffoverflow1.c:10
10 }
(gdb) disas main
Dump of assembler code for function main:
   0x08048434 <+0>: push   %ebp
   0x08048435 <+1>: mov    %esp,%ebp
   0x08048437 <+3>: and    $0xfffffff0,%esp
   0x0804843a <+6>: sub    $0x20,%esp
   0x0804843d <+9>: mov    $0x8048550,%eax
   0x08048442 <+14>: mov    %eax,(%esp)
   0x08048445 <+17>: call   0x8048340 
   0x0804844a <+22>: mov    $0x8048569,%eax
   0x0804844f <+27>: lea    0x16(%esp),%edx
   0x08048453 <+31>: mov    %edx,0x4(%esp)
   0x08048457 <+35>: mov    %eax,(%esp)
   0x0804845a <+38>: call   0x8048370 <__isoc99_scanf plt="plt">
   0x0804845f <+43>: mov    $0x804856c,%eax
   0x08048464 <+48>: lea    0x16(%esp),%edx
   0x08048468 <+52>: mov    %edx,0x4(%esp)
   0x0804846c <+56>: mov    %eax,(%esp)
   0x0804846f <+59>: call   0x8048340 
   0x08048474 <+64>: mov    $0x0,%eax
   0x08048479 <+69>: leave  
=> 0x0804847a <+70>: ret    
End of assembler dump.
(gdb) i r esp
esp            0xbffff11c 0xbffff11c
(gdb) x /wx 0xbffff11c
0xbffff11c: 0xb7e334d3

So the return address is stored at 0xbffff11c and it is in fact 0xb7e334d3. We can check to which function does this address belong to:
(gdb) disas 0xb7e334d3
Dump of assembler code for function __libc_start_main:
   0xb7e333e0 <+0>: push   %ebp
   0xb7e333e1 <+1>: push   %edi
   0xb7e333e2 <+2>: push   %esi
   0xb7e333e3 <+3>: push   %ebx
   0xb7e333e4 <+4>: call   0xb7f44ee3
   0xb7e333e9 <+9>: add    $0x18cc0b,%ebx
   0xb7e333ef <+15>: sub    $0x5c,%esp
   [Removed for brievity]
---Type  to continue, or q  to quit---q
Quit

Bingo, looks like this is actually the return address.

Mixing apples with pears

As we saw before, our buffer is stored at 0xbffff106 and the return address is at 0xbffff11c. This addresses are pretty close one to another, only 22 bytes. And given the fact that scanf does no boundary/overflow check, we might be able to overwrite this return address if we can write enough data into the stack at 0xbffff106, precisely we need 22 bytes + 4 that will overwrite the return address. Let's check this.

(gdb) b *0x0804847a
Breakpoint 1 at 0x804847a: file buffoverflow1.c, line 10.
(gdb) r
Starting program: /some/path/buffoverflow1 
Please input your name: 1234567890123456789012aaaa   
Your name is 1234567890123456789012aaaa

Breakpoint 1, 0x0804847a in main () at buffoverflow1.c:10
10 }
(gdb) x /32bx 0xbffff106
0xbffff106: 0x31 0x32 0x33 0x34 0x35 0x36 0x37 0x38
0xbffff10e: 0x39 0x30 0x31 0x32 0x33 0x34 0x35 0x36
0xbffff116: 0x37 0x38 0x39 0x30 0x31 0x32 0x61 0x61
0xbffff11e: 0x61 0x61 0x00 0x00 0x00 0x00 0xb4 0xf1

In the name we've inputted exactly 26 characters, which are 26 bytes, as we specified before. The four last characters are a, which binary representation is 0x61, thus 4 a's are 0x61616161. As you can see in the memory dump, the address at 0xbffff11c is effectively 0x61616161. Just continuing the program will make it crash with the same segment violation we got earlier.

(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x61616161 in ?? ()

We can try with nother value instead of aaaa so you can see this better. Let's try with ABCD (0x41424344).

(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /some/path/buffoverflow1 
Please input your name: 1234567890123456789012ABCD
Your name is 1234567890123456789012ABCD

Breakpoint 1, 0x0804847a in main () at buffoverflow1.c:10
10 }
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x44434241 in ?? ()

(0x44434241 is little-endian representation for ABCD).

As you see, we now can freely manipulate main's return address, we can set it to any value we want just changing the input given to the program. This is a very serious security problem as we will see.

Comments

  1. Nice and easy to understand explanation

    Waiting for 2nd part :)

    ReplyDelete
  2. Thanks for your feedback Guille :)

    ReplyDelete
  3. Really nice article!

    Thank you.

    ReplyDelete

Post a Comment

Comment, motherf*cker

Popular Posts