You may need to install gdb (on my RPi, running Arch, the command was: “pacman -S gdb”). When you have done so, you can use it to look at what is happening as your program is running. This may help you to correct something that is going wrong. For now, though, we are going to look at a sample program that is working correctly.
If you want to use gdb, you need to invoke the assembler with some additional options. Adding the gdb functionality makes your executable larger, so you won’t want to do this unless you are actually planning a gdb session. The as command will look like this:
as -gstabs -o filename.o filename.s
Okay, here’s the sample program we are going to look at. It is very simple!
@ use_gdb.s
@ demo program
.section .data
.section .text
.globl  _start
mov r1, $5 @ load r1 with 5
cmp r1, $4 @ compare r1 with 4
sub r1, r1, $1 @ subtract 1 
cmp r1, $4      @ r1 now DOES equal 4
sub r1, r1, $1
cmp r1, $4

mov r7, $1 @ exit syscall
svc $0  @ wake kernel
Here’s a screen-shot of what to do:
When gdb starts, we need to set a break-point. The execution of the program will stop there and we can step forwards one instruction at a time from that point. Here, I am setting the break-point at the _start label.
(gdb) break *_start
(gdb) run
After you have given the run command, gdb will execute the operations up to the break-point and await instructions. In this example, I typed “next” twice to cause the next two instructions to be executed. Then I typed “info registers” to see the contents of my registers.
As you can see, r1 holds the value 5. The stack pointer has an address in memory and the program counter shows where we are up to in the program.
The other register I am interested in is “cpsr”, this is the program status register which shows which flags have been set. At this point, its contents are not very interesting as we have not done any comparisons.
After one more instruction, the registers look like this:
You can see that the program counter shows we have moved 4 bytes forward (each instruction is 4 bytes).  Also the program status register shows the result of comparing r1 with 4.
If we move on a couple of instructions, we get to this position.
Now r1 holds 4 and the “zero” flag on the cpsr has been set. This is because “cmp” actually does a subtraction and then sets the flags acordingly. Since  4 – 4 = 0, the zero flag is set.
If we move on once more, we come to this position.
We are now almost at the end of the program, r1 no longer equals 4 and so the last “cmp” instruction has caused the zero flag to be unset.
That’s the end of this brief look at gsb.

Use a template

This post ought to have come earlier …
When writing asm, it’s a good idea to have a simple template upon which to build your code. Here’s an example:
@ dummy.s
@ a template for asm programs
.section .data
.section .text
.globl  _start
nop  @ no operation
mov r7, $1 @ exit syscall
svc $0  @ wake kernel
This will help you to remember to include the important sections of the program and the exit instructions.
Anything that starts with a full-stop is a directive to the assembler rather than an actual instruction. The final directive (“.end”) is optional, but I think it is useful. For instance, if the last line of your program has a comment on it, the assembler will complain.
Commenting your code is important. We can use ‘@’ to begin a comment in asm programs. I like to use the first line to record the name of the program.

Printing a number

In a couple of our recent examples, we created programs which performed some kind of mathematical operation. The only way we had to find the result of these operations was to check the program’s return value. We noted that this was unsatisfactory in that only the least significant byte of r0 is available in this way.
This program looks at one way to print a number to the screen using assembly. For demonstration purposes, the number is hard-coded into the program.
Since the computer stores numbers in binary, this program converts the number into a series of denary digits and then displays these digits to the screen.
The program also makes use of hexadecimal since this is a convenient way for programmers to represent binary data.
When we print to the screen, we use ascii codes. Just as a reminder, the ascii code for 0 is 48 (or 0×30 in hex). We find the ascii code for a digit by adding 48. For instance, the ascii code for ’7′ is 55 (or 0×37).
When we perform the conversion from binary to denary, the digits are produced in reverse order. It is convenient to push each digit (or its ascii code) onto the stack. We can then pop them off and print them in the correct order.
The Stack
Here’s the command we use to push something onto the stack in this program:
stmfd sp!, {r0}
We are SToring values to Memory in a Descending stack.
The ‘f’ indicates a “Full” stack. This means that the stack pointer (sp) will point to the last occupied address. This suits us as we can load r1 with the sp in order to print the character who’s ascii code is held there.
The exclamation mark in “stmfd sp!” means that we are writing back the new stack pointer value, thus keeping track of the correct position of the top of the stack. This is similar to the write-back we used when walking over a list of numbers in our maximum program.
The curly braces hold the registers that we want to push onto the stack. In this simple example, we are only pushing the contents of r0.
The command to pop items off the stack is almost identical, but now we are LoaDing from Memory.
There are one or two other new commands in this program. When we decrement the character counter with “subs”, the ‘s’ suffix also sets the status flags. This means we can do without a “cmp” instruction and the “ble exit” line will take us to the exit label if the subtraction caused the zero flag to be set.
We use a logical OR as a more efficient way to add 48 to our digits. For example:
110000 (48)
001001 (9)
111001 result of logical OR (57)
@ number.s
@ test program to print a number.
.section .data
num:                        @ create a variable
 .long 12345 
.globl _start
 ldr r3, =num     @ load r3 with address of num
 ldr r4, [r3]     @ load r4 with the number
 mov r5, $0      @ set counter to 0
 cmp r4, $9          @ if r4 <= 9 ...
 ble print           @ print digit
 sub r4, r4, $10     @ subtract 10 from r4
 add r5, r5, $1      @ add one to counter
 bal loop            @ back to top of loop
 add r0, r4, $0x30   @ load r0 with r4 + 48 (ascii code)
 bl PrintChar        @ call PrintChar function
 cmp r5, $0          @ if the counter is zero ...
        beq exit            @ we are done
        mov r4, r5          @ load r4 with counter
 mov r5, $0          @ set counter to zero
 bal loop            @ back to top of loop
 mov  r0, $0xA @ print a newline
 bl PrintChar
 mov r0, $0  @ exit
 mov r7, $1
 svc $0

 stmfd sp!, {r0-r5, r7, lr} @ push registers onto stack
 mov r1, sp   @ stack pointer holds char value
 mov r0, $1   @ stdout
 mov r2, r0   @ one char
 mov r7, $4   @ write syscall
 swi $0                      @ wake kernel
 ldmfd sp!, {r0-r5, r7, pc} @ restore registers

Try typing in this code and seeing what the output is.
To assemble, link and run on your RPI:
as -o number.o number.s
ld -o number number.o
Happy hacking!

Euclid’s gcd algorithm

Euclid’s algorithm for finding the greatest common divisor of two numbers is probably the oldest known algorithm.
The code that follows is based on Donald Knuth’s explanation of the algorithm in The Art of Computer Programming (Vol. i, pg. 9).
@ euclid.s
@ GCD algorithm (example taken from Knuth TAOCP, p.9)

@ r1 holds the first value
@ r0 holds the second 

.section .data
    .long 6099, 2166
.section .text
.globl  _start

    ldr r2, =vals @ make r1 point to start of vals
    ldr r1, [r2] @ load first number into r2.
    ldr r0, [r2, #4] @ load second number into r0.

    cmp r0, r1  @ compare r0 and r1
    subgt r0, r0, r1 @ if r0 > r1, r0 = r0 - r1
    sublt r1, r1, r0 @ if r0 < r1, r1 = r1 - r0
    bne gcd   @ if r0 != r1, repeat

    mov     r7, $1 @ prepare to exit
    swi     0  @ wake kernel

In this example, I have omitted the ‘%’ signs from the register labels. The assembler is able to cope just fine with this.
as -o euclid.o euclid.s
ld -o euclid euclid.o
echo $?
I’ll leave you to find out what result your program returns. Make sure it gives the correct result and if something is wrong, you’ll have to debug!

Find the maximum from a series of numbers

In this program, we look through a list of numbers and find the largest value.
@ max.s
@ find the maximum from a list of numbers.
@ nicked & refactored for ARM from "Programming from the Ground Up."

@ r1 - used to hold address of data items
@ r0 - used for the largest item
@ r3 - used for current data item

.section .data

numbers:                       @ the data we are going to use.
.long 3,67,34,222,45,75,54,34,44,33,22,11,66,0

.globl _start

ldr %r1, =numbers  @ set r1 to start address of "numbers"
ldr %r3, [%r1]          @ load r3 with first number 
mov %r0, %r3   @ at start the current number must be the largest.

  cmp %r3, $0   @ check if r3 holds zero (end of list)
  beq exit   @ if so, exit
  ldr %r3, [%r1,#4]!  @ load next item into r3
  cmp %r3, %r0   @ compare r0 and r3
  ble loop   @ goto start of loop if r3 <= r0
  mov %r0, %r3   @ otherwise put r3 into r0
  bal loop   @ goto start of loop (Branch ALways)

exit:    @ largest value is now in r0
mov %r7, $1   @ prepare to exit
svc $0    @ wake kernel
.end    @ marks the end of the code
Hopefully the comments make what is happening clear enough.
Happy hacking!

Hello ARM!

Here is some assembly code for a version of the traditional “Hello World” program.
@ hello.s - "Hello ARM!" program

.data                @ store string and length in 2 variables
    .ascii  "Hello ARM!\n"
len = . - msg

.globl   _start
    mov r0, $1
    ldr r1, =msg       @ pointer string to be printed in r1
    ldr r2, =len       @ length of string in r2
    mov r7, $4         @ prepare to output to console
    swi $0             @ wake kernel

    mov r0, $0         @ set the return value to 0
    mov r7, $1         @ prepare to exit
    svc $0             @ wake kernel

Using the GNU assembler

In the examples on this blog, I am going to be using the GNU assembler.
Unless otherwise stated, the commands I use to assemble and link asm programs will be as follows:

as -o filename.o filename.s
ld -o filename.o
The first program just adds two numbers.
@ add.s - simple addition code

@ instructions that start with a dot are instructions for the assembler.
.section    .data    @ a section for data (unused)
.section    .text    @ a section for text (unused)
.globl      _start   @ sets the global label "_start"

_start:              @ code starts here.
mov %r1, $7          @ puts the value 7 in r1
mov %r0, $8  @ puts the value 8 in r0
add %r0, %r0, %r1 @ add r0 to r1 and store result in r0

mov %r7, $1          @ this is how to set an exit syscall
svc $0               @ the interrupt to wake the kernel.
To to run the program, assemble and link like so:

as -o add.o add.s
ld -o add add.o
Then you can run the program as follows:

The following command will display the program’s return value (which is where the result of the addition is stored).

echo $?
Note: this return value is limited to one byte, so it can only be used to add and display numbers up to 255. This is purely a demonstration example.