Skip to content
Assembly Programming Guide — x86-64 and ARM Architecture Basics

Assembly Programming Guide — x86-64 and ARM Architecture Basics

DodaTech Updated Jun 7, 2026 10 min read

Assembly language is the lowest-level human-readable programming language — a direct mnemonic representation of the machine instructions that a CPU executes, giving you complete control over hardware.

What You’ll Learn

  • CPU registers and memory addressing modes
  • x86-64 and ARM instruction sets
  • Stack operations and calling conventions
  • NASM vs GAS syntax differences
  • Disassembling and understanding compiled code

Why It Matters

Assembly is the language of reverse engineering, malware analysis, operating systems, embedded firmware, and performance-critical code. Durga Antivirus Pro uses assembly-level signature detection to identify malware patterns that higher-level scanners miss. Understanding assembly makes you a better C programmer — you’ll understand what the compiler generates, how the stack works, and why certain code is fast or slow. Security researchers, game engine developers, and compiler engineers all work directly with assembly daily.

Learning Path

    flowchart LR
  A[CPU Architecture<br/>You are here] --> B[Registers & Memory]
  B --> C[x86-64 Instructions]
  C --> D[ARM Architecture]
  D --> E[Calling Conventions & Disassembly]
  

CPU Registers

Registers are the CPU’s fastest memory — typically 16–32 general-purpose registers per core.

x86-64 Registers (64-bit)

General Purpose (64-bit):  RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP
                          R8-R15
  32-bit subset:          EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP
  16-bit subset:          AX, BX, CX, DX
   8-bit subset:          AL, AH, BL, BH, CL, CH, DL, DH

ARM64 Registers

General Purpose:          X0-X30 (64-bit)
  32-bit subset:          W0-W30
Special:                  SP (stack pointer), LR (X30, link register),
                          PC (program counter), FP (X29, frame pointer)

Your First Assembly Program (NASM, x86-64 Linux)

; hello.asm
section .data
    msg db 'Hello, Assembly!', 0xa
    len equ $ - msg

section .text
    global _start

_start:
    ; sys_write (syscall 1)
    mov rax, 1        ; syscall number
    mov rdi, 1        ; fd = stdout
    mov rsi, msg      ; buffer
    mov rdx, len      ; length
    syscall

    ; sys_exit (syscall 60)
    mov rax, 60       ; syscall number
    xor rdi, rdi      ; exit code 0
    syscall
nasm -f elf64 hello.asm -o hello.o
ld hello.o -o hello
./hello
# Hello, Assembly!

Memory Addressing Modes

x86-64 supports several addressing modes for accessing memory:

section .data
    array dq 10, 20, 30, 40, 50
    index dq 2

section .text
    global _start

_start:
    ; Immediate addressing: value in instruction
    mov rax, 42

    ; Register addressing: value in register
    mov rbx, rax

    ; Direct memory addressing: [address]
    mov rcx, [array]       ; loads 10

    ; Register indirect: [register]
    mov rsi, array
    mov rdx, [rsi]         ; loads 10

    ; Base+displacement: [register + offset]
    mov r8, [rsi + 8]      ; loads 20 (8 bytes per qword)

    ; Indexed: [base + index*scale]
    mov r9, [rsi + rdi*8]  ; array[rdi]
    ; Before using, set rdi = 2
    mov rdi, 2
    mov r9, [rsi + rdi*8]  ; loads 30

    ; Exit
    mov rax, 60
    xor rdi, rdi
    syscall

Stack Operations

The stack grows downward (toward lower addresses). RSP is the stack pointer.

section .text
    global _start

_start:
    ; Push values onto stack
    mov rax, 10
    mov rbx, 20
    push rax         ; rsp -= 8, [rsp] = 10
    push rbx         ; rsp -= 8, [rsp] = 20

    ; Pop values (reverse order — LIFO)
    pop rcx          ; rcx = 20, rsp += 8
    pop rdx          ; rdx = 10, rsp += 8

    ; Stack frame setup (typical in functions)
    push rbp         ; save old base pointer
    mov rbp, rsp     ; set new base pointer
    sub rsp, 32      ; allocate 32 bytes for locals

    ; ... function body ...

    mov rsp, rbp     ; restore stack
    pop rbp          ; restore base pointer
    ret              ; return

    ; Exit
    mov rax, 60
    xor rdi, rdi
    syscall

Instructions: mov, add, sub, jmp, call

section .data
    result dq 0

section .text
    global _start

_start:
    ; mov: copy data
    mov rax, 100           ; rax = 100
    mov rbx, 50            ; rbx = 50

    ; add: addition
    add rax, rbx           ; rax = 150
    add rax, 10            ; rax = 160

    ; sub: subtraction
    sub rax, 60            ; rax = 100

    ; imul: signed multiply
    mov rax, 7
    mov rbx, 6
    imul rax, rbx          ; rax = 42

    ; idiv: signed divide (dividend in RDX:RAX)
    mov rax, 100           ; dividend low
    xor rdx, rdx           ; dividend high = 0
    mov rbx, 3             ; divisor
    idiv rbx               ; rax = 33 (quotient), rdx = 1 (remainder)

    ; jmp: unconditional jump
    mov rax, 1
    jmp skip
    mov rax, 999           ; skipped
skip:
    ; rax is now 1

    ; call/ret: function call
    call my_function       ; push return address, jump to label
    jmp done

my_function:
    push rbp
    mov rbp, rsp
    ; function body
    pop rbp
    ret

done:
    mov [result], rax

    ; Exit
    mov rax, 60
    xor rdi, rdi
    syscall

NASM vs GAS Syntax

FeatureNASM (Intel)GAS (AT&T)
Operand ordermov rax, rbx (dest, src)mov %rbx, %rax (src, dest)
Register prefixrax%rax
Immediate prefix42$42
Memory syntax[rax + rbx*8](%rax, %rbx, 8)
Directive prefixsection .text.text
Size suffixesmov qwordmovq
; NASM (Intel) style
mov rax, [rsi + rdi*8 + 16]   ; array[rdi + 2]

; GAS (AT&T) equivalent
movq 16(%rsi, %rdi, 8), %rax  ; array[rdi + 2]

Most Linux tools (GCC, GDB, objdump) default to AT&T syntax. Use -M intel with objdump to get Intel syntax.

ARM Assembly Basics

ARM uses a load-store architecture — only load and store instructions access memory.

; ARM64 (AArch64) example
.section .data
msg:    .asciz "Hello, ARM!\n"
len = . - msg

.section .text
.global _start

_start:
    ; sys_write (64)
    mov x0, #1          ; fd = stdout
    ldr x1, =msg        ; buffer address
    ldr x2, =len        ; length
    mov x8, #64         ; syscall number (write)
    svc #0              ; supervisor call

    ; sys_exit (93)
    mov x0, #0          ; exit code
    mov x8, #93         ; syscall number (exit)
    svc #0

Key ARM differences from x86:

  • Load-store architecture: only ldr/str access memory
  • Conditional execution: most instructions can be conditional (addeq, subne)
  • Link register: blr stores return address in X30 (LR) instead of the stack
  • No push/pop: use stp/ldp (store/load pair)

Calling Conventions

x86-64 Linux (System V AMD64)

ParameterRegister
1stRDI
2ndRSI
3rdRDX
4thRCX
5thR8
6thR9
ReturnRAX

ARM64 Linux

ParameterRegister
1stX0
2ndX1
3rdX2
X3–X7
ReturnX0

Disassembly with objdump and GDB

# Disassemble a binary
objdump -d program        # AT&T syntax
objdump -d -M intel program  # Intel syntax

# Interactive debugging
gdb ./program
(gdb) break main
(gdb) run
(gdb) info registers
(gdb) disas
(gdb) x/10gx $rsp      # examine stack memory
(gdb) stepi             # single-step one instruction

Common Mistakes

1. Forgetting operand order differences

NASM: mov dest, src. GAS: mov src, dest. Mixing them up is the most common assembly bug.

2. Not preserving callee-saved registers

Registers RBX, RBP, R12–R15 must be saved before use and restored before return. Violating this crashes the caller.

3. Using 32-bit registers for addresses in 64-bit code

mov eax, [address] zero-extends to 64 bits. mov rax, [address] is correct for 64-bit addresses. Use mov eax only when you know the address fits in 32 bits.

4. Mismatched call/ret

Every call must have a matching ret. Unbalanced pushes/pops cause the ret to jump to the wrong address.

5. Buffer overflow from fixed-size buffers

Assembly has no bounds checking. Writing past the end of a buffer corrupts adjacent data — a classic security vulnerability.

6. Off-by-one in syscall arguments

Linux syscalls expect arguments in specific registers (RDI, RSI, RDX, R10, R8, R9). Passing an argument in the wrong register causes EFAULT or wrong results.

Practice Questions

  1. What is the difference between mov rax, [rbx] and mov rax, rbx? mov rax, [rbx] loads the VALUE AT the address stored in RBX (memory access). mov rax, rbx copies the VALUE in RBX to RAX (register-to-register).

  2. Why does the stack grow downward? Historical convention and simplicity. The heap grows upward from the end of the data segment. The two grow toward each other, using memory efficiently without fragmentation.

  3. What is a calling convention? A set of rules defining how functions pass arguments, return values, and save registers. The System V AMD64 ABI is standard on Linux x86-64.

  4. How does ARM differ from x86 in memory access? ARM is load-store — only ldr/str access memory. x86 allows ALU operations (add, sub) directly on memory operands. ARM also has conditional execution flags on most instructions.

  5. What does objdump -d do? Disassembles a binary file, showing machine code addresses, instruction bytes, and assembly mnemonics. Essential for reverse engineering and understanding compiler output.

Challenge: Write an x86-64 NASM program that reads two integers from stdin, computes their GCD using Euclid’s algorithm, and prints the result to stdout.

Mini Project — Simple CPU Benchmark

Measure function call overhead by timing a tight loop:

; bench.asm
section .data
    iterations dq 100000000
    msg db "Completed ", 0
    msg2 db " iterations in ", 0
    ms db " ms", 0xa, 0
    newline db 0xa, 0

section .bss
    buffer resb 32

section .text
    global _start

; Convert unsigned integer to string
itoa:
    mov rcx, 10
    mov rdi, buffer + 31
    mov byte [rdi], 0
    dec rdi
    mov [rdi], byte 0xa
    dec rdi
.loop:
    xor rdx, rdx
    div rcx
    add dl, '0'
    mov [rdi], dl
    dec rdi
    test rax, rax
    jnz .loop
    inc rdi
    ret

; Print string
print:
    push rax
    mov rax, 1        ; sys_write
    mov rdi, 1        ; stdout
    syscall
    pop rax
    ret

_start:
    ; Get start time (clock_gettime)
    sub rsp, 16       ; timespec struct
    mov rax, 228      ; clock_gettime syscall
    xor rdi, rdi      ; CLOCK_MONOTONIC
    mov rsi, rsp
    syscall
    mov r12, [rsp]    ; tv_sec
    mov r13, [rsp+8]  ; tv_nsec

    ; Run empty loop
    mov r14, [iterations]
.loop:
    dec r14
    jnz .loop

    ; Get end time
    mov rax, 228
    xor rdi, rdi
    mov rsi, rsp
    syscall
    mov r14, [rsp]    ; end tv_sec
    mov r15, [rsp+8]  ; end tv_nsec

    ; Calculate elapsed ms
    sub r14, r12      ; sec diff
    imul r14, 1000    ; to ms
    sub r15, r13      ; nsec diff
    mov rax, 1000000
    xor rdx, rdx
    div rax           ; nsec to ms
    add r14, rax

    ; Print iterations
    mov rax, [iterations]
    call itoa
    mov rdx, buffer + 31
    sub rdx, rdi
    mov rsi, rdi
    call print

    ; Print " iterations in "
    mov rax, 1
    mov rdi, 1
    mov rsi, msg2
    mov rdx, 16
    syscall

    ; Print elapsed ms
    mov rax, r14
    call itoa
    mov rdx, buffer + 31
    sub rdx, rdi
    mov rsi, rdi
    call print

    ; Exit
    mov rax, 60
    xor rdi, rdi
    syscall
nasm -f elf64 bench.asm -o bench.o
ld bench.o -o bench
./bench
# Completed 100000000 iterations in 312 ms

FAQ

Do I need to learn assembly today?
For most application developers, no — compilers generate excellent assembly. For security researchers, reverse engineers, embedded developers, and performance engineers, assembly is essential. Understanding it makes you a better programmer regardless of your field.
Which assembly should I learn first?
x86-64 is the most widely used (desktops, servers) and has the best tooling and documentation. ARM is more common in mobile and embedded devices. Start with x86-64, then learn ARM — the concepts transfer.
Is NASM or GAS better for beginners?
NASM’s Intel syntax is more readable — mov rax, 42 vs movq $42, %rax. NASM also has better error messages. Most tutorials and textbooks use NASM or Intel syntax.
How does C code become assembly?
The C compiler transforms your code through: preprocessing → parsing → optimization → code generation → assembly output. Run gcc -S file.c to see the generated assembly.
What is a buffer overflow in assembly terms?
Writing more data to a memory region than was allocated. In assembly, there are no bounds checks — mov [rbx+rcx], rax will happily write past the buffer end, corrupting adjacent data on the stack or heap.
Can I write a whole program in assembly?
Yes, and people do for bootloaders, OS kernels, embedded firmware, and demoscene productions. But for most applications, the effort-to-result ratio favors a higher-level language with assembly only in hot paths.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro