Writing Assembly Programs
This was a very fun and interesting lab for the SPO600 course. I apologize now for the length of this blog post, but I wanted to document everything I learned, even some of the less important information. Writing in assembler is very different from other programming languages, mostly because of how simple each instruction is. The complexity in assembly seems to only come from length of code, where in other languages, you can create very complex data structures and objects with single line commands. It is almost relaxing to do things one step at a time in assembly, however, debugging it can be a nightmare sometimes, since it’s more difficult to read. I would expect large programs to be impossible to manage if they were written in assembly, with so many lines, and so many tiny things that could go wrong. It was however very fun to learn and to write in this language. The goal was to write an x86_64 assembly program(using gas, GNU Assembler) with a loop that increments a integer and prints out the string with the incremented number. There are multiple stages of adding features and ways to change and improve as you go along. You can find the lab here. Another fun part about this lab, is at each step you complete, you are to port the assembly code from x86_64 over to aarch64. Here I will be going over many of the issues I had and solutions I found. Below is what output we needed for our program:
Loop: 0 Loop: 1 Loop: 2 Loop: 3 Loop: 4 Loop: 5 Loop: 6 Loop: 7 Loop: 8 Loop: 9 Loop: 10 Loop: 11 ... and so on
Help and Tips
When running into trouble and errors with assembly code, there were three things that helped a lot. First was to write a C program that performed a similar function and then analyse the binary and instructions within, this helped for both seeing where different syntax was used and how it was used. Second, reading the some of the instruction set and some of the information from the ARMv8_ISA_Overview. Finally, reading the quick start guides on the zenit wiki over multiple time for both AARCH64 and X86_64.
Initially I will post the final assembly program and then slowly take them apart and explain what is going on and in some cases why it’s different between x86_64 and aarch64.
.text .globl _start start = 0 max = 31 _start: mov $start,%r15 /* starting value in register */ and $0,%r12 /* value of 0 */ add $0x30,%r12 /* convert to ascii 0 */ loop: // division for 2-digit number and $0,%rdx /* clear remainder */ mov %r15,%rax /* set dividend */ mov $10,%r10 /* set divisor */ div %r10 /* divide */ mov %rax,%r14 /* store first digit */ mov %rdx,%r13 /* store second digit */ // modify msg add $0x30,%r14 /* convert increment to ascii */ add $0x30,%r13 /* convert increment to ascii */ mov %r13b,msg+7 /* modify single byte in msg */ // skip if first digit is 0 cmp %r12,%r14 je continue mov %r14b,msg+6 /* modify single byte in msg */ continue: // write mov $len,%rdx /* length of string */ mov $msg,%rsi /* string */ mov $1,%rdi /* file descriptor 1 = stdout */ mov $1,%rax /* syscall 1 = write */ syscall // loop inc %r15 /* increment register */ cmp $max,%r15 /* compare 10 to increment value */ jne loop /* jump to loop if not equal */ // exit mov $0,%rdi /* exit status */ mov $60,%rax /* syscall 60 = exit */ syscall .data msg: .ascii "Loop: \n" len = . - msg
In the following snippet we are simply preparing for the later parts of the program. We add the starting point to a register so we can use it later, we are adding the value 0 to another register and we are converting that value 0 to ascii, this could have been done in one instruction if I looked up the ascii value of 0. We will need the ascii value of zero at a later point in the program.
.text .globl _start start = 0 max = 31 _start: mov $start,%r15 /* starting value in register */ and $0,%r12 /* value of 0 */ add $0x30,%r12 /* convert to ascii 0 */
In this next part we are dividing the incrementing value of the loop by 10, and turning it into multiple digits. This is necessary, because we need to convert it to a ascii digit which only goes from 0-9, there is no ascii value for “10” because that is two bytes, the #1 byte “1” and the #0 byte “0”. First we clear the remainder, sometimes without clearing it the remainder gives incorrect results. Next, move the dividend into register rax, this is the number we want to divide. Now we set the divisor, which can be placed in any valid register, this value should be equal to 10. Then we run the divide instruction, which will divide the value stored in register rax by the value in the register we give it, which is r10. The values that are returned are the quotient(gets stored in rax), which is the first digit, and the remainder(gets stored in rdx), which is the second digit. Finally, I move the registers to safe registers to save for later(Some safe registers are: rsp,rbp, rbx, r12, r13, r14, and r15).
loop: // division for 2-digit number and $0,%rdx /* clear remainder */ mov %r15,%rax /* set dividend */ mov $10,%r10 /* set divisor */ div %r10 /* divide */ mov %rax,%r14 /* store first digit */ mov %rdx,%r13 /* store second digit */
Assembly Ascii Conversion
The first two add instructions just convert the 2 digits from their number values to their ascii counterparts. This is one of the tricky spots I found in both x86_64 and aarch64, we now need to modify a specific byte in a string we create in our data section(this string is at the end of the file). In order to do this we use special syntax within the mov instruction, register r13 holds our second digit ascii value, but the register is a 64bit register, so you need to use %r13b. The added “b” means byte, it will move on 1 byte over to the memory location specified. We are putting the byte into the memory address of msg, but we speicify msg+7, since our string is “Loop: \n”, we have 10 bytes in our string, the msg+7 puts the byte into the 7th byte of the msg string, which just happens to be a space.
// modify msg add $0x30,%r14 /* convert increment to ascii */ add $0x30,%r13 /* convert increment to ascii */ mov %r13b,msg+7 /* modify single byte in msg */
This next part is fairly similar to the last step of placing a ascii value into the string. This ascii byte is the first digit of the string, however we do not want to show this digit if it is a “0”. So we need to make a comparison between the value of this ascii value and the ascii “0” value we created at the start of the program. Next we if it is equal to a “0”, we jump to the label continue, we skips the single instruction that would be used to place the first digit in the string.
// skip if first digit is 0 cmp %r12,%r14 je continue mov %r14b,msg+6 /* modify single byte in msg */ continue:
Assembly System Call
This next part prints out the string we created plus the modified numbers we added. To use the sys_write call we need: a file descriptor, a string, and the length of the string. We put the length into the 3rd arg register, the msg into the second arg register, and the file descriptor “1”(1 = stdout) into the first register. We then put “1” into the rax register, on x86_64 while using the syscall instruction, “1” means sys_write. The syscall instruction then invokes the system call.
// write mov $len,%rdx /* length of string */ mov $msg,%rsi /* string */ mov $1,%rdi /* file descriptor 1 = stdout */ mov $1,%rax /* syscall 1 = write */ syscall
This is how the loop functions for this assembly program. We have a register containing a value that starts at 0 and increments each time it runs the inc instruction. Then the cmp instruction compares the max amount of times we want to run to the incrementing value. Finally, if the values are not equal, it jumps back to the label “loop”, which is at the beginning of the code.
// loop inc %r15 /* increment register */ cmp $max,%r15 /* compare 10 to increment value */ jne loop /* jump to loop if not equal */
This final bit of code is a syscall used to exit the program. Below it is the .data directive/section, which was used to create 2 labels/”variables”. One of them holds the string that we print and the other holds the length of the string.
// exit mov $0,%rdi /* exit status */ mov $60,%rax /* syscall 60 = exit */ syscall .data msg: .ascii "Loop: \n" len = . - msg
The aarch64 version of the above code, does the same thing as the X86_64 code, except they do things a little different. One of the main differences with aarch64 is that it yells at you every time you try and use a label or value as the first argument of a instruction. Along with the reversed direction of arguments makes things a little confusing sometimes. For example:
// move value from register 1 to register 0 mov x0,x1 /* AARCH64 */ // move value from register 14 to register 13 mov %r14,%r13 /* X86_64 */
Another thing to note is that X86_64 gas assembly uses %’s to mark registers and $’s to mark values. The are a few other different things that aarch64 does, such as different ways of modifying memory, different instructions, and new syntax. I found that aarch64 instructions seem much simpler, and more powerful. Aarch64 has very few ways to write each instruction, and if it’s not the right way, it will complain and tell you it’s wrong. On x86_64 it will just not work properly, because each instruction can be written in so many ways, with many different functions. Here is the same program above in aarch64:
.text .globl _start start = 0 max = 31 _start: mov x28,start /* start value */ mov w20,0 /* get value 0 */ add w26,w20,0x30 /* convert to ascii 0 */ loop: // div mov x20, 10 /* use value 10 */ udiv x21,x28,x20 /* divide by 10 */ msub x22,x20,x21,x28 /* get remainder */ // modify msg add w23,w21,0x30 /* convert increment to ascii */ add w24,w22,0x30 /* convert increment to ascii */ adr x25,msg /* save address of msg in register */ strb w24,[x25,7] /* store byte in msg, offset 6 */ cmp w23,w26 /* compare if it is ascii 0 */ beq continue /* skip next instruction if above is ascii 0 */ strb w23,[x25,6] /* store byte in msg, offset 6 */ continue: // write mov x2,len /* length of string */ adr x1,msg /* save address of msg */ mov x0,1 /* file descriptor 1 = stdout */ mov x8,64 /* syscall 64 = write */ svc 0 // loop add x28,x28,1 /* increment register */ cmp x28,max /* check max size */ bne loop /* branch to loop if not equal */ // exit mov x0,0 /* exit status */ mov x8,93 /* syscall 93 = exit */ svc 0 .data msg: .ascii "Loop: \n" len= . - msg
Aarch64 Assembly Instruction Differences
I really liked a few of the instructons for aarch64, such as the add instruction. It functions almost like a add and mov instruction combined together. It takes the second and third arguments, adds them together and puts them inside the first argument.
Aaarch64 Assembly Division
Another interesting difference if with the division. The udiv instruction divides the second argument by the third and places the quotient in the first argument. However this means that there is no remainder obtained from the udiv instruction. In order to get it you must use a msub instruction with the following formula:
remainder = divisor - (divident * quotient)
mov x20, 10 /* use value 10 */ udiv x21,x28,x20 /* divide by 10 */ msub x22,x20,x21,x28 /* get remainder */
Aarch64 Assembly Memory Addresses
One of the main problems I had on aarch64 was trying to change a single byte inside a string. It is not the same as x86_64 because it requires you to use the adr instruction instead of the mov. First you use the adr instruction with the label msg, which saves the address into the register. You have to do this because it does not allow you to put msg directly inside the strb instruction(for some reason?). Next you use the instruction strb(store byte), this instruction requires that you use a “w” register for the first argument. The second argument contains the address, which we saved to the register, and the final number in there is the offset of that address, which byte in that string you’d like to use.
adr x25,msg /* save address of msg in register */ strb w24,[x25,7] /* store byte in msg, offset 6 */