Hello once again! Following my successful completion of the buff.htb challenge on Hack The Box, I felt inspired to dive deeper into the subject and explore the fundamentals of Buffer Overflow. This is going to be the first part of a two-parts series dedicated to Buffer Overflow, where I aim to dive into its basics as thoroughly as possible.
We’ve all encountered the term ‘Buffer’ in various contexts, but what exactly is a buffer? And why does it hold such significance in the realm of software applications that attackers seek to overflow it? These are the kinds of questions I pondered before gaining a comprehensive understanding of Buffer Overflow.
Introduction
What is a Buffer?
A buffer is a memory location utilized by a program to temporarily store various types of data. Imagine a simple program that prompts the user to enter their name and then prints “Hello <>”. In this scenario, the user’s name is stored in a buffer until the program executes the print instructions and retrieves the name value from the buffer to display it on the output screen. Essentially, a buffer holds temporary data.
What is a Buffer Overflow or Buffer Overrun?
A buffer overflow occurs when data is attempted to be written into a buffer beyond its allocated length. For example, if an array variable designed to hold 10 bytes of data is overwhelmed with more than 10 bytes, it results in overwriting memory addresses beyond the intended space.
Let’s start by examining our simple C program, and then we’ll explore how it executes in memory and how we can leverage buffer overflow to gain shell control.
1 #include <stdio.h>
2 #include <string.h>
3 int main(int argc, char const *argv[])
4 {
5 char buff[500];
6 strcpy(buff, argv[1]);
7 printf("%s\n", buff);
8 return 0;
9 }
In the above C program, we accept one runtime argument and declare a variable named ‘buff’ with 500 bytes assigned to it. Subsequently, the strcpy function copies the argument into the ‘buff’ variable, which is then printed. Programming languages like C or C++ do not inherently provide protections against buffer overflows or data overwriting in any part of memory unless additional code is implemented. Bounds checking can prevent buffer overflow, and operating systems employ various techniques to guard against it, such as randomizing memory layout (Address Space Layout Randomization or ASLR) or inserting canaries. Canaries, also known as Canary words or stack cookies, are predetermined values placed between the buffer and control data on the stack to thwart buffer overflows.
Inside the Memory
Let’s explore into what occurs in memory when this program is executed. When a program is initiated from the Operating System, the OS invokes the main method (or main function) of the program. The program, along with its associated data, is organized in memory in a distinct manner, which is also shared among different processes. Take a look at the memory’s structured layout depicted below.
The diagram above represents the memory structure. Stack is one of the basic data structure which follow LIFO (Last In First Out). It is a linear data strcture which means there is always a sequence that is going to be followed when inserting or removing items from the stack. Moreover, Memory is going to be allocated at runtime which means whatever data is going to be stored in stack is allocated during runtime. Stack supports two operations. PUSH and POP. PUSH is used to insert the items and POP is used for removing the items. Stack grows downwards. It holds the local variable for each of your functions.
Let us breakdown the stack using our C program.
When the program is executed using the “./Buffer HelloStringCopyMe” command, the operating system transfers control to the main function. This essentially means that the ‘main()’ thread is invoked. Consequently, ‘main()’ becomes the called function initiated by the operating system, pausing its own routine to facilitate the transition. Here, all parameters passed through the command line (./Buffer HelloStringCopyMe) are loaded. Following parameter allocation, the return address and the address of the base pointer are pushed sequentially. The return address points to the location from which a subroutine was called by the parent routine. Meanwhile, the base pointer indicates the stack location to which the parent routine was pointing before the subroutine call.
Looking at our C program, as the main function begins execution from line 3, the first instruction given by the OS is to allocate memory space at line 5 for our variable ‘buff’. Upon executing the instruction at line 5, a buffer space of 500 bytes is allocated within the stack at compile time, as depicted in the image above.
As the stack fills up, it progresses upwards, albeit with higher addresses being lower in memory. When the data size is smaller than 500 bytes (as in our case, “HelloStringCopyMe”), provided as the command line parameter, the strcpy function copies the data into the ‘buff’ variable. However, if the data exceeds 500 bytes, it overwrites adjacent memory address space, leading to a segmentation fault error. When you overwrite the value of other registers or memory locations by completely filling up the memory block, it’s termed as a buffer overflow.
On the line 7, we are only printing the content of the variable Buff. And on line 8, we are returning the control to the parent routine.