Disassembly examples: Control-flow statements

In this and the remaining subsections of this chapter we will take a look at the assembly code generated for different C programming statements and constructs. Quite often when debugging low-level code, we will be dealing with assembly. The objective of this section is to demystify the translation from high-level language to assembly. Readers should be comfortable compiling and disassembling code and also develop some intuition about what assembly the compiler is likely to generate for typical C code. Do note that if the program is compiled with a high level of compiler optimizations then the resulting assembly can be completely different from what one expects.

Let us now look at disassembly of examples we had introduced in the Control flow subsection from C Language Syntax.

If-Else

Here is the code for if-else statement example from C language syntax section:

#include "uart.h"


int main()
{
    int lower_int_val = find_lower_int(2, 6);
    uart_print_num(lower_int_val);
}

// function returns the lower of two integers
int find_lower_int(int x, int y) {
  if (x <= y) {
    // if the value of x is less than or equal to y
    // execute the code in this block
    return x;
  } else {
    // if the condition above is false
    // execute code in this block
    return y;
  }
}

Path of example:

exercises/c_functions/if_else_example.c

The command for compiling if_else_example.c file is:

aarch64-none-elf-gcc -O0 -ffreestanding -nostdinc -nostdlib -nostartfiles -I../include/ \
-c if_else_example.c -o if_else_example.o

The -O0 option in the above command tells compiler not to optimize the code. The compiler optimization tries to improve the generated assembly code by making it consume fewer resources (compile time, memory space etc) so that it will result in faster-running machine code. We disable optimization here to make it easier to relate the C statements with the disassembly. Compiler supports different levels for optimization from -O0 to -O3. -O0 completely disables optimization and -O3 performs maximum amount of optimization.

The linker command to produce final executable for this example is:

aarch64-none-elf-ld -nostdlib -nostartfiles start.o if_else_example.o ../common/uart.o -T link.ld -o if_else_example.elf

For generating disassembly for examples in this chapter we will an add option --source to objdump command. Turning on this switch tells it to interleave source code with disassembly. This would help in associating C lines with assembly. Here is the full command to produce the disassembly output for this example:

aarch64-none-elf-objdump --source -d if_else_example.o > if_else_example.disass

Let us now look at the output in more detail in small logical blocks. We will start with the main function.

First registers x29 and x30 iare stored on the stack and SP is moved to x29.

0000000000080088 <main>:

int main()
{
   80088:       a9be7bfd        stp     x29, x30, [sp, #-32]!
   8008c:       910003fd        mov     x29, sp

Then arguments are prepared for calling the function find_lower_int. Before calling the function, values 2 and 6 are moved to registers w0 and w1. If you recall from Chapter 9: Functions, registers w0 through w7 are used to pass arguments to functions and w0 is used to return a value from the function.

    int lower_int_val = find_lower_int(2, 6);
   80090:       528000c1        mov     w1, #0x6                        // #6
   80094:       52800040        mov     w0, #0x2                        // #2

The function find_lower_int is called using bl instruction. Finally the return value from the function which is placed in GPR x0 is stored to stack at offset 0x28.

   80098:       97fffff0        bl      80058 <find_lower_int>
   8009c:       b9001fe0        str     w0, [sp, #28]

This code loads back the value from stack to GPR x0 and calls the uart_print_num function to print the value in UART.

    uart_print_num(lower_int_val);
   800a0:       b9801fe0        ldrsw   x0, [sp, #28]
   800a4:       940000db        bl      80410 <uart_print_num>
}

The above code loads back the value from stack to GPR x0 and calls the uart_print_num function to print the value in UART.

   800a8:       d503201f        nop
   800ac:       a8c27bfd        ldp     x29, x30, [sp], #32
   800b0:       d65f03c0        ret

Now let us us now look at the function find_lower_int that returns the lower of two integers. The sub instruction creates space for stack and following str instructions save w0 and w1 in stack at offset #12 and #8.

// function returns 
int find_lower_int(int x, int y) {
   80058:       d10043ff        sub     sp, sp, #0x10
   8005c:       b9000fe0        str     w0, [sp, #12]
   80060:       b9000be1        str     w1, [sp, #8]

This code loads back w1 and w0 from stack from offsets 12 and #8, effectively swapping the original value of w0 with w1. It then compares the value of w1 with w0 and branches to location 0x8007c if value in w1 is greater than w0 value.

  if (x <= y) {
   80064:       b9400fe1        ldr     w1, [sp, #12]
   80068:       b9400be0        ldr     w0, [sp, #8]
   8006c:       6b00003f        cmp     w1, w0
   80070:       5400006c        b.gt    8007c <find_lower_int+0x24>

This stores the value in w0 to stack at offset #12. It then unconditionally branches to location 0x80080.

    // if the value of x is less than or equal to y
    // execute the code in this block
    return x;
   80074:       b9400fe0        ldr     w0, [sp, #12]
   80078:       14000002        b       80080 <find_lower_int+0x28>

This loads register w0 from stack at offset #8.

  } else {
    // if the condition above is false
    // execute code in this block
    return y;
   8007c:       b9400be0        ldr     w0, [sp, #8]
  }
}

It finally adjusts the stack by 0x10 to remove the space it had allocated for itself at the start of the function.

   80080:       910043ff        add     sp, sp, #0x10
   80084:       d65f03c0        ret

You would have noticed, with -O0 option that we had used to disable optimizations makes the generated code very verbose. It forces the generated assembly code to make use of many unnecessary stack operations to save and restore intermediate results. At the same time, using a higher optimization level like -O3 would make it hard to relate the C statements with generated disassembly. For example, here is the if-else example from above compiled with -O1.

Use the command line below to to compile:

aarch64-none-elf-gcc -O1 -ffreestanding -nostdinc -nostdlib -nostartfiles -I../include/ \
-c if_else_example.c -o if_else_example.o

Let us first look at the disassembly output of the main function:

0000000000000000 <main>:

int main()
{
   0:	a9bf7bfd 	stp	x29, x30, [sp, #-16]!

    int lower_int_val = find_lower_int(2, 6);
    uart_print_num(lower_int_val);
   4:	d2800040 	mov	x0, #0x2                   	// #2
   8:	910003fd 	mov	x29, sp
    uart_print_num(lower_int_val);
   c:	94000000 	bl	0 <uart_print_num>
  10:	a8c17bfd 	ldp	x29, x30, [sp], #16
  14:	d65f03c0 	ret

Compilers are capable of performing fairly advanced optimizations. Here, the compiler appears to have intelligently figured out that 0x2 is lower that 0x6 and moves value 0x2 to register x0, thereby completely avoiding the need to perform call to find_lower_int function!! It was able to do this because we passed 2 and 6 as literal values. So the compiler figured out that in this instance the output of find_lower_int() will always be 2.

Let us now see the code for find_lower_int function:

0000000000000000 <find_lower_int>:
  } else {
    // if the condition above is false
    // execute code in this block
    return y;
  }
}
   0:	6b01001f 	cmp	w0, w1
   4:	1a81d000 	csel	w0, w0, w1, le
   8:	d65f03c0 	ret

Although, the main function completely avoided call to find_lower_int function, the compiler has still generates code for the find_lower_int function so that it can honour calls from any other function that could not be optimized. It chooses to perform the entire functionality by using the csel instruction and returns the result in x0 to its caller.

As an exercise, you can try generating just the disassembly output without interleaving source code for this example.

At -O0 the compiler generates very verbose code with a lot of unnecessary loads and stores. At higher optimization levels it could completely remove C code. This is a good thing in the real world but not useful when we want to show readers examples of what C code looks like when compiled. Therefore, for the remaining examples, we will use an optimization level that helps illustrate the point we want to convey.

Switch statement example

Let us now look at the switch statement from the C language syntax section:

#include "uart.h"
int SAMPLE_VALUE = 25;

void main() {
  int x = SAMPLE_VALUE;

  switch (x) {
    case 5:
      uart_puts("X is 5");
      break;
    case 10:
    case 15:
      uart_puts("X is 10 or 15");
      break;
    case 25:
      uart_puts("X is 25");
      break;
    default:
      uart_puts("I don't know what X is...");
      break;
  }
}

Path of example:

exercises/c_functions/switch_example.c

Compiler/Linker commands:

aarch64-none-elf-gcc -O1 -g -ffreestanding -nostdinc -nostdlib -nostartfiles -I../include/ \
-c switch_example.c -o switch_example.o
aarch64-none-elf-ld -nostdlib -nostartfiles start.o switch_example.o ../common/uart.o \
-T link.ld -o switch_example.elf

Disassembly command:

aarch64-none-elf-objdump --source -d switch_example.elf > switch_example.disass

Disassembly output:

#include "uart.h"
int SAMPLE_VALUE = 25;

void main() {
   80058:	a9bf7bfd 	stp	x29, x30, [sp, #-16]!
   8005c:	910003fd 	mov	x29, sp
  int x = SAMPLE_VALUE;
   80060:	90000000 	adrp	x0, 80000 <_start>
   80064:	b945bc00 	ldr	w0, [x0, #1468]

  switch (x) {
   80068:	71003c1f 	cmp	w0, #0xf
   8006c:	540000c0 	b.eq	80084 <main+0x2c>  // b.none
   80070:	5400012c 	b.gt	80094 <main+0x3c>
   80074:	7100141f 	cmp	w0, #0x5
   80078:	540001a0 	b.eq	800ac <main+0x54>  // b.none
   8007c:	7100281f 	cmp	w0, #0xa
   80080:	54000201 	b.ne	800c0 <main+0x68>  // b.any
    case 5:
      uart_puts("X is 5");
      break;
    case 10:
    case 15:
      uart_puts("X is 10 or 15");
   80084:	90000000 	adrp	x0, 80000 <_start>
   80088:	91162000 	add	x0, x0, #0x588
   8008c:	94000062 	bl	80214 <uart_puts>
      break;
   80090:	1400000a 	b	800b8 <main+0x60>
  switch (x) {
   80094:	7100641f 	cmp	w0, #0x19
   80098:	54000141 	b.ne	800c0 <main+0x68>  // b.any
    case 25:
      uart_puts("X is 25");
   8009c:	90000000 	adrp	x0, 80000 <_start>
   800a0:	91166000 	add	x0, x0, #0x598
   800a4:	9400005c 	bl	80214 <uart_puts>
      break;
   800a8:	14000004 	b	800b8 <main+0x60>
      uart_puts("X is 5");
   800ac:	90000000 	adrp	x0, 80000 <_start>
   800b0:	91160000 	add	x0, x0, #0x580
   800b4:	94000058 	bl	80214 <uart_puts>
      break;
   800b8:	a8c17bfd 	ldp	x29, x30, [sp], #16
   800bc:	d65f03c0 	ret
    default:
      uart_puts("I dont know what X is...");
      break;
   800c0:	90000000 	adrp	x0, 80000 <_start>
   800c4:	91168000 	add	x0, x0, #0x5a0
   800c8:	94000053 	bl	80214 <uart_puts>
}
   800cc:	17fffffb 	b	800b8 <main+0x60>

Let us now look at the output in more detail on how the compiler has transformed the switch and case statements:

It has converted the switch statement to a series of compare and conditional branch instructions. It compares the input argument against the values used in case statement and branches to the appropriate section of code generated for the corresponding case statement as shown below:

  switch (x) {
   80068:	71003c1f 	cmp	w0, #0xf
   8006c:	540000c0 	b.eq	80084 <main+0x2c>  // b.none
   80070:	5400012c 	b.gt	80094 <main+0x3c>
   80074:	7100141f 	cmp	w0, #0x5
   80078:	540001a0 	b.eq	800ac <main+0x54>  // b.none
   8007c:	7100281f 	cmp	w0, #0xa
   80080:	54000201 	b.ne	800c0 <main+0x68>  // b.any

The code for each case statement gets the address of string literal to be printed using adrp instruction and performs a call to uart_puts function to print the string:

      uart_puts("X is 5");
   800ac:	90000000 	adrp	x0, 80000 <_start>
   800b0:	91160000 	add	x0, x0, #0x580
   800b4:	94000058 	bl	80214 <uart_puts>

The code for default statement too similarly loads the corresponding string:

    default:
      uart_puts("I dont know what X is...");
      break;
   800c0:	90000000 	adrp	x0, 80000 <_start>
   800c4:	91168000 	add	x0, x0, #0x5a0
   800c8:	94000053 	bl	80214 <uart_puts>

For Loop

Let us now look at the For loop example from the C language syntax section:

#include "uart.h"
int NUM = 5;
void main() {
  int i;
  for(i = 0; i < NUM; i++) {
    uart_puts("Value of i: ");
    uart_print_num(i);
  }
}

Path of example:

exercises/c_functions/for_loop_example.c

Compiler/Linker commands:

aarch64-none-elf-gcc -O1 -ffreestanding -nostdinc -nostdlib -nostartfiles -I../include/ \
-c for_loop_example.c -o for_loop_example.o
aarch64-none-elf-ld -nostdlib -nostartfiles start.o for_loop_example.o ../common/uart.o -T link.ld -o for_loop_example.elf

Disassembly command:

aarch64-none-elf-objdump --source -d for_loop_example.o > for_loop_example.disass

Disassembly output:

void main() {
   80068:	a9bd7bfd 	stp	x29, x30, [sp, #-48]!
   8006c:	910003fd 	mov	x29, sp
   80070:	a90153f3 	stp	x19, x20, [sp, #16]
   80074:	f90013f5 	str	x21, [sp, #32]
  for(i = 0; i < NUM; i++) {
   80078:	d2800013 	mov	x19, #0x0                   	// #0
    uart_puts("Value of i: ");
   8007c:	90000015 	adrp	x21, 80000 <_start>
   80080:	9115c2b5 	add	x21, x21, #0x570
  for(i = 0; i < NUM; i++) {
   80084:	90000014 	adrp	x20, 80000 <_start>
   80088:	91160294 	add	x20, x20, #0x580
    uart_puts("Value of i: ");
   8008c:	aa1503e0 	mov	x0, x21
   80090:	9400005d 	bl	80204 <uart_puts>
    uart_print_num(i);
   80094:	aa1303e0 	mov	x0, x19
   80098:	940000de 	bl	80410 <uart_print_num>
  for(i = 0; i < NUM; i++) {
   8009c:	91000673 	add	x19, x19, #0x1
   800a0:	b9400280 	ldr	w0, [x20]
   800a4:	6b13001f 	cmp	w0, w19
   800a8:	54ffff2c 	b.gt	8008c <main+0x34>
  }
}
   800ac:	a94153f3 	ldp	x19, x20, [sp, #16]
   800b0:	f94013f5 	ldr	x21, [sp, #32]
   800b4:	a8c37bfd 	ldp	x29, x30, [sp], #48
   800b8:	d65f03c0 	ret

Let us now look at the output in more detail on how the for loop gets converted to assembly instructions:

As we recall from for loop section of C language syntax, the for loop syntax has three expressions: expr1 is the initial expression that is executed just once. The expression expr2 is a loop termination condition and is executed at the beginning of every iteration of the loop. The final expr3 typically changes variables that count the current iteration of the loop. The below code to move 0 to x19 corresponds to initial expression that initializes i with 0:

  for(i = 0; i < NUM; i++) {
   80078:	d2800013 	mov	x19, #0x0                   	// #0

It subsequently initializes x21 with address of string used in uart_puts within loop so that it need not repeat executing adrp instruction every iteration of the for loop:

    uart_puts("Value of i: ");
   8007c:	90000015 	adrp	x21, 80000 <_start>
   80080:	9115c2b5 	add	x21, x21, #0x570

It similarly initializes x20 with the address of global variable NUM:

   80084:	90000014 	adrp	x20, 80000 <_start>
   80088:	91160294 	add	x20, x20, #0x580

The following assembly instructions correspond to expr2 and expr3 of the for loop. The x19 value is incremented by 1 by the add instruction. The ldr instruction loads the value in global variable NUM using the x20 register it had initialized before the loop and compares the value with x19. The result of the compare is used to decide whether to jump back to the start of the loop(address 0x8008c) and exit the loop by not taking the branch (b.gt). for(i = 0; i < NUM; i++) { 8009c: 91000673 add x19, x19, #0x1 800a0: b9400280 ldr w0, [x20] 800a4: 6b13001f cmp w0, w19 800a8: 54ffff2c b.gt 8008c <main+0x34>

It is left as an exercise to the reader to generate disassebly for the while and do-while loop examples in C Language Syntax section and analyze the disassembly output.