Tutorials on Advanced Math and Computer Science Concepts

Writing Your First x86 Program

X86 is an assembly programming language written for the Intel processor family. The goal of assembly programming is to allow developers to write as close to the hardware level as possible. In this article, I will show you how you can write, compile, and run a basic x86 assembly program. Understanding x86 assembly is valuable as it gives you insight into how high-level languages are really run once, they are compiled. Additionally, it is very valuable for situations where you need to interface directly with hardware.

Often when learning a new programming language, we write a simple "hello world" program to understand the basics of running a program and getting output. With assembly, printing output the screen is a bit more complex, so instead of "hello world", we will create a program that sets the exit status to 0, and then exits.

This code may seem rather complex and abstract, but once we walk through it, you will see that it's fairly similar to any other high-level language.

The first two lines of our program are labelled as .section .data and .section .text. The .section keyword is known as an assembler directive, or pseudo-operation. It is an instruction to the actual assembler itself and isn't translated into machine instructions. The .section keyword specifically tells the assembler to divide our program into two sections. The .data section is where we set up any memory storage that is required for our application. It is a similar idea to declaring variables before we use them in other languages. In this case, we don't need any data, so we leave the section empty.

The .text section holds all the instructions for the assembler to execute. The first instruction we provide is .globl _start. This instruction is like setting up a main function in other languages. The next line, _start: is referred to as a label, and allows us to divide our program into components. Everything underneath the _start: label is executed once the _start label is called. The line globl _start indicates that the starting point of the program is the _start label.

After the _start label, we get into the actual logic of the program. The operator movl moves the value provided in the first argument into the location specified as the second argument. In the case of the first line, movl $1, %eax the value 1 is moved into the register eax. The second line moves 0 into the register ebx, and the final line creates a system interrupt. A system interrupt transfers the control to the operating system so that it can do a system call. In this case, $0x80 is a system call to terminate the program.

There are a few important details to learn from this basic program. The first thing to note is the registers that exist in the processor. In general, there are six general-purpose registers we can use for storing data:

  1. %eax
  2. %ebx
  3. %ecx
  4. %edx
  5. %edi
  6. %esi

Sometimes, these general-purpose registers server a special purpose for system interrupts. For instance, in our example, the %edi register determines what exit code is set when the program terminates. Since it is set to 0 by default, our program exits with a code 0. If we were to move a value into the %edi register, the exit code would take it on as its value.

In addition to these registers, there are four special-purpose registers which you will encounter later on in more advanced programs. Another important detail to note is how we reference the data we are moving into memory. Notice that each time we reference a constant value (immediate addressing), we start with a $ character. This tells the system that we are working with values rather than instructions or labels. The system interrupt line contains a slightly different formatted number, $0x80. The 0x portion indicates that the number is formatted in hexadecimal, rather than decimal.

Now that we have an understanding of the program syntax, let's take a look at how to compile and run the program. To get the program compiled and running, we need to execute a few commands through the terminal.

In this example, I've named my assembly program "exit.s". The first command I run is as exit.s -o exit.o. This command tells the assembler, as, to take in the program file, exit.s, compile it, and generate the output file exit.o. From this point, we need to link the object file generated to a file that can be executed by the system, which is done using ld exit.o -o exit. Finally, we can run the program using ./exit. To test that the program executed correctly, we use echo $?, which will show the exit code of the last program that ran, in this case, 0.