Assembly Programming Guide — x86-64 and ARM Architecture Basics
Assembly language is the lowest-level human-readable programming language — a direct mnemonic representation of the machine instructions that a CPU executes, giving you complete control over hardware.
What You’ll Learn
- CPU registers and memory addressing modes
- x86-64 and ARM instruction sets
- Stack operations and calling conventions
- NASM vs GAS syntax differences
- Disassembling and understanding compiled code
Why It Matters
Assembly is the language of reverse engineering, malware analysis, operating systems, embedded firmware, and performance-critical code. Durga Antivirus Pro uses assembly-level signature detection to identify malware patterns that higher-level scanners miss. Understanding assembly makes you a better C programmer — you’ll understand what the compiler generates, how the stack works, and why certain code is fast or slow. Security researchers, game engine developers, and compiler engineers all work directly with assembly daily.
Learning Path
flowchart LR
A[CPU Architecture<br/>You are here] --> B[Registers & Memory]
B --> C[x86-64 Instructions]
C --> D[ARM Architecture]
D --> E[Calling Conventions & Disassembly]
CPU Registers
Registers are the CPU’s fastest memory — typically 16–32 general-purpose registers per core.
x86-64 Registers (64-bit)
General Purpose (64-bit): RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP
R8-R15
32-bit subset: EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP
16-bit subset: AX, BX, CX, DX
8-bit subset: AL, AH, BL, BH, CL, CH, DL, DHARM64 Registers
General Purpose: X0-X30 (64-bit)
32-bit subset: W0-W30
Special: SP (stack pointer), LR (X30, link register),
PC (program counter), FP (X29, frame pointer)Your First Assembly Program (NASM, x86-64 Linux)
; hello.asm
section .data
msg db 'Hello, Assembly!', 0xa
len equ $ - msg
section .text
global _start
_start:
; sys_write (syscall 1)
mov rax, 1 ; syscall number
mov rdi, 1 ; fd = stdout
mov rsi, msg ; buffer
mov rdx, len ; length
syscall
; sys_exit (syscall 60)
mov rax, 60 ; syscall number
xor rdi, rdi ; exit code 0
syscallnasm -f elf64 hello.asm -o hello.o
ld hello.o -o hello
./hello
# Hello, Assembly!Memory Addressing Modes
x86-64 supports several addressing modes for accessing memory:
section .data
array dq 10, 20, 30, 40, 50
index dq 2
section .text
global _start
_start:
; Immediate addressing: value in instruction
mov rax, 42
; Register addressing: value in register
mov rbx, rax
; Direct memory addressing: [address]
mov rcx, [array] ; loads 10
; Register indirect: [register]
mov rsi, array
mov rdx, [rsi] ; loads 10
; Base+displacement: [register + offset]
mov r8, [rsi + 8] ; loads 20 (8 bytes per qword)
; Indexed: [base + index*scale]
mov r9, [rsi + rdi*8] ; array[rdi]
; Before using, set rdi = 2
mov rdi, 2
mov r9, [rsi + rdi*8] ; loads 30
; Exit
mov rax, 60
xor rdi, rdi
syscallStack Operations
The stack grows downward (toward lower addresses). RSP is the stack pointer.
section .text
global _start
_start:
; Push values onto stack
mov rax, 10
mov rbx, 20
push rax ; rsp -= 8, [rsp] = 10
push rbx ; rsp -= 8, [rsp] = 20
; Pop values (reverse order — LIFO)
pop rcx ; rcx = 20, rsp += 8
pop rdx ; rdx = 10, rsp += 8
; Stack frame setup (typical in functions)
push rbp ; save old base pointer
mov rbp, rsp ; set new base pointer
sub rsp, 32 ; allocate 32 bytes for locals
; ... function body ...
mov rsp, rbp ; restore stack
pop rbp ; restore base pointer
ret ; return
; Exit
mov rax, 60
xor rdi, rdi
syscallInstructions: mov, add, sub, jmp, call
section .data
result dq 0
section .text
global _start
_start:
; mov: copy data
mov rax, 100 ; rax = 100
mov rbx, 50 ; rbx = 50
; add: addition
add rax, rbx ; rax = 150
add rax, 10 ; rax = 160
; sub: subtraction
sub rax, 60 ; rax = 100
; imul: signed multiply
mov rax, 7
mov rbx, 6
imul rax, rbx ; rax = 42
; idiv: signed divide (dividend in RDX:RAX)
mov rax, 100 ; dividend low
xor rdx, rdx ; dividend high = 0
mov rbx, 3 ; divisor
idiv rbx ; rax = 33 (quotient), rdx = 1 (remainder)
; jmp: unconditional jump
mov rax, 1
jmp skip
mov rax, 999 ; skipped
skip:
; rax is now 1
; call/ret: function call
call my_function ; push return address, jump to label
jmp done
my_function:
push rbp
mov rbp, rsp
; function body
pop rbp
ret
done:
mov [result], rax
; Exit
mov rax, 60
xor rdi, rdi
syscallNASM vs GAS Syntax
| Feature | NASM (Intel) | GAS (AT&T) |
|---|---|---|
| Operand order | mov rax, rbx (dest, src) | mov %rbx, %rax (src, dest) |
| Register prefix | rax | %rax |
| Immediate prefix | 42 | $42 |
| Memory syntax | [rax + rbx*8] | (%rax, %rbx, 8) |
| Directive prefix | section .text | .text |
| Size suffixes | mov qword | movq |
; NASM (Intel) style
mov rax, [rsi + rdi*8 + 16] ; array[rdi + 2]
; GAS (AT&T) equivalent
movq 16(%rsi, %rdi, 8), %rax ; array[rdi + 2]Most Linux tools (GCC, GDB, objdump) default to AT&T syntax. Use -M intel with objdump to get Intel syntax.
ARM Assembly Basics
ARM uses a load-store architecture — only load and store instructions access memory.
; ARM64 (AArch64) example
.section .data
msg: .asciz "Hello, ARM!\n"
len = . - msg
.section .text
.global _start
_start:
; sys_write (64)
mov x0, #1 ; fd = stdout
ldr x1, =msg ; buffer address
ldr x2, =len ; length
mov x8, #64 ; syscall number (write)
svc #0 ; supervisor call
; sys_exit (93)
mov x0, #0 ; exit code
mov x8, #93 ; syscall number (exit)
svc #0Key ARM differences from x86:
- Load-store architecture: only
ldr/straccess memory - Conditional execution: most instructions can be conditional (
addeq,subne) - Link register:
blrstores return address in X30 (LR) instead of the stack - No
push/pop: usestp/ldp(store/load pair)
Calling Conventions
x86-64 Linux (System V AMD64)
| Parameter | Register |
|---|---|
| 1st | RDI |
| 2nd | RSI |
| 3rd | RDX |
| 4th | RCX |
| 5th | R8 |
| 6th | R9 |
| Return | RAX |
ARM64 Linux
| Parameter | Register |
|---|---|
| 1st | X0 |
| 2nd | X1 |
| 3rd | X2 |
| … | X3–X7 |
| Return | X0 |
Disassembly with objdump and GDB
# Disassemble a binary
objdump -d program # AT&T syntax
objdump -d -M intel program # Intel syntax
# Interactive debugging
gdb ./program
(gdb) break main
(gdb) run
(gdb) info registers
(gdb) disas
(gdb) x/10gx $rsp # examine stack memory
(gdb) stepi # single-step one instructionCommon Mistakes
1. Forgetting operand order differences
NASM: mov dest, src. GAS: mov src, dest. Mixing them up is the most common assembly bug.
2. Not preserving callee-saved registers
Registers RBX, RBP, R12–R15 must be saved before use and restored before return. Violating this crashes the caller.
3. Using 32-bit registers for addresses in 64-bit code
mov eax, [address] zero-extends to 64 bits. mov rax, [address] is correct for 64-bit addresses. Use mov eax only when you know the address fits in 32 bits.
4. Mismatched call/ret
Every call must have a matching ret. Unbalanced pushes/pops cause the ret to jump to the wrong address.
5. Buffer overflow from fixed-size buffers
Assembly has no bounds checking. Writing past the end of a buffer corrupts adjacent data — a classic security vulnerability.
6. Off-by-one in syscall arguments
Linux syscalls expect arguments in specific registers (RDI, RSI, RDX, R10, R8, R9). Passing an argument in the wrong register causes EFAULT or wrong results.
Practice Questions
What is the difference between
mov rax, [rbx]andmov rax, rbx?mov rax, [rbx]loads the VALUE AT the address stored in RBX (memory access).mov rax, rbxcopies the VALUE in RBX to RAX (register-to-register).Why does the stack grow downward? Historical convention and simplicity. The heap grows upward from the end of the data segment. The two grow toward each other, using memory efficiently without fragmentation.
What is a calling convention? A set of rules defining how functions pass arguments, return values, and save registers. The System V AMD64 ABI is standard on Linux x86-64.
How does ARM differ from x86 in memory access? ARM is load-store — only
ldr/straccess memory. x86 allows ALU operations (add, sub) directly on memory operands. ARM also has conditional execution flags on most instructions.What does
objdump -ddo? Disassembles a binary file, showing machine code addresses, instruction bytes, and assembly mnemonics. Essential for reverse engineering and understanding compiler output.
Challenge: Write an x86-64 NASM program that reads two integers from stdin, computes their GCD using Euclid’s algorithm, and prints the result to stdout.
Mini Project — Simple CPU Benchmark
Measure function call overhead by timing a tight loop:
; bench.asm
section .data
iterations dq 100000000
msg db "Completed ", 0
msg2 db " iterations in ", 0
ms db " ms", 0xa, 0
newline db 0xa, 0
section .bss
buffer resb 32
section .text
global _start
; Convert unsigned integer to string
itoa:
mov rcx, 10
mov rdi, buffer + 31
mov byte [rdi], 0
dec rdi
mov [rdi], byte 0xa
dec rdi
.loop:
xor rdx, rdx
div rcx
add dl, '0'
mov [rdi], dl
dec rdi
test rax, rax
jnz .loop
inc rdi
ret
; Print string
print:
push rax
mov rax, 1 ; sys_write
mov rdi, 1 ; stdout
syscall
pop rax
ret
_start:
; Get start time (clock_gettime)
sub rsp, 16 ; timespec struct
mov rax, 228 ; clock_gettime syscall
xor rdi, rdi ; CLOCK_MONOTONIC
mov rsi, rsp
syscall
mov r12, [rsp] ; tv_sec
mov r13, [rsp+8] ; tv_nsec
; Run empty loop
mov r14, [iterations]
.loop:
dec r14
jnz .loop
; Get end time
mov rax, 228
xor rdi, rdi
mov rsi, rsp
syscall
mov r14, [rsp] ; end tv_sec
mov r15, [rsp+8] ; end tv_nsec
; Calculate elapsed ms
sub r14, r12 ; sec diff
imul r14, 1000 ; to ms
sub r15, r13 ; nsec diff
mov rax, 1000000
xor rdx, rdx
div rax ; nsec to ms
add r14, rax
; Print iterations
mov rax, [iterations]
call itoa
mov rdx, buffer + 31
sub rdx, rdi
mov rsi, rdi
call print
; Print " iterations in "
mov rax, 1
mov rdi, 1
mov rsi, msg2
mov rdx, 16
syscall
; Print elapsed ms
mov rax, r14
call itoa
mov rdx, buffer + 31
sub rdx, rdi
mov rsi, rdi
call print
; Exit
mov rax, 60
xor rdi, rdi
syscallnasm -f elf64 bench.asm -o bench.o
ld bench.o -o bench
./bench
# Completed 100000000 iterations in 312 msFAQ
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro