Implementing Functions in x86 Assembly
When we use functions in high-level languages, a lot of the details of what is happening is abstracted from us. When we work in assembly, we need to do more work to implement the structure properly. In general, there are five problems we need to be able to solve in order to create a function:
- Call/Return: How do we get to the function, and how do we know where to return to once the function execution is completed?
- Parameters: If the function requires parameters, where can they be stored so that the function can access them?
- Storing local variables: Where can the called function store variables declared in a local context?
- Handling registers: How do we preserve registers from the caller while using the registers in the called function?
- Returning a value: How does the function return a value back to the calling function?
The first problem that we will address relates to the storage of parameters. One idea for this could be to store them in registers, however, this has several drawbacks. Mainly, if we want to call a function inside of a function, we will quickly run out of registers to store parameters on. To solve this problem, we turn to the memory stack. The memory stack is a stack provided by the system as an alternative to register storage.
To properly utilize the stack, we need to first discuss a few special registers and operations in x86. The first register is the esp register, which keeps track of the current location on the stack. The ebp register is known as the stack base register, and it gives us a reference point when entering a function so that we can easily find data relative to it. To push data onto the stack, we can use pushl, which will push a long type value onto the stack. To pop data from the stack, we use popl.
The stack will also be a helpful tool for calling functions, as it can allow us to store information such as the return location. The eip register tracks the next instruction to be executed, so if we store the eip register on the stack, we will know what instruction comes after the function execution. To do this, we typically use a special instruction, call, which stores the eip on the stack, then jumps to the function provided as the argument. Once we are done executing the function, we can use the ret instruction to get the eip and return to where we started.
As an example, suppose we wanted to have parameters 3 and 2 passed to a function called add2. This function will add the two numbers provided and return their sum. We could use the code below to do this.
When we enter the function add2, we will have a stack that looks like below.
Once we have called the function, we have a few more items we need to take care of. First, we need to be able to pull the parameters off the stack. In addition to this, we need to be able to make space for local variables, preserve any registers that currently exist, and set up a return value. The first thing we do when we enter a function is push the value of ebp onto the stack. The location of ebp will act as a reference point for us to know where our parameters and related data lies on the stack. Once this is done, we will move our stack pointer to be just above the ebp, to create space for any local variables.
The reason we subtract 4 from esp is due to how the stack grows in memory. Since we are working with long values, we move by 4 units of memory on each push and pop. This means that moving the stack pointer by 4 will move it just above the data we wrote. We subtract because stacks start at the largest possible memory address, and move towards 0, so moving up the stack is done by moving closer to 0, using subtraction.
Once this is done, we can now place any registers we wish to preserve onto the stack using the pushl operator. In this case, we don't have any registers we want to preserve, so we can move on to the next step, which is dealing with the return value. Typically, we place the return value on a predefined register (in this case, we will use eax). At the current state of our program, we have the following stack.
To get our parameters off the stack, we just need to reference relative to the location of ebp. The first parameter is two slots below ebp which would be 8(ebp). Similarly, the second parameter is three slots below ebp which would be 12(ebp). We can place these values into registers to use in our function. From here we can start executing our function logic. Once the function logic is done, we move the ebp back into esp to set it to the location of the stored ebp value on the stack. Once this is done, we pop, moving the stored ebp value into the ebp register. Once this is done, we can return, as our return instruction will be pointing directly at the return address.
When this function returns to our call location, we will have the result of our addition stored in eax. Since eax is used to set the interrupt type to end the program, we will move the value to ebx, which is the register that displays when you run echo $?. After this is done, we can do a system interrupt, and see the results of our code. The full program and function is shown below.
When we compile and run this, echo $? will give us the result of our addition function. You can try playing with the parameters to see how the value changes.