Assembler (x86/x64)

Instructions

Instructions have one to three operands, most often two:

not  eax
add  eax, ebx
imul eax, edx, 64

mov

Apparently, there is no instruction to directly move data from one memory location to another. It always needs to go via a register.

cs, eip and ip cannot be the destination.

The following mov instruction is illegal because the registers are not the same size.

mov al, esi

The following mov instruction, however, is legal: it assigns the byte that is pointed at by esi to al:

mov al, [esi]         ; think  char al = *((char*) esi)

Assign a constant to the destination:

mov al, 0ABh

Memory reference (pointers)

add [esp], eax ; add value of eax to value of memory pointed at by esp (top of the stack).

lea

lea stands for load effective address. It does not dereference the address but «returns» a calculated (value of an) address.

lea does not alter flags.

lea eax, [ eax + ebx   + 42 ] ; eax := eax + ebx + 42
lea eax, [ eax + ecx        ] ; eax := eax + ecx
lea eax, [ ebx + ebx*N      ] ; eax := ebx + N*ebx = (N+1)*ebx, N ∈ {1,2,4,8}
lea eax, [ rdi + rsi*4 - 8  ] ; …

Both of the following statements increment eax by one, but lea does not change eflags while inc might do (or always does) so:

lea eax, [eax + 1]
inc eax

LEA vs MOV

lea eax, <memory-expression>  ; eax := address of memory expression 
mov eax, <memory-expression>  ; eax := value   at memory expression

Calculating 37 * a

lea allows to calculate 37*x with only two instructions:

leal    (%rdi,%rdi,8), %eax      # eax = a * 9
leal    (%rdi,%rax,4), %eax      # eax = a + 4*(a*9)
ret

div / idiv

Results stored in eax and edx (rest of division)

mov eax, 42 ; load dividend
mov ecx,  5 ; load divisor
div ecx     ;

Jump instructions (branching)

A branching instruction changes the eip (instruction pointer) register if the given conditions are met.

The call instruction jumps uncoditionally.

It stores the eip on the stack so that the callee can return to the caller.

push / pos

push first decrements(!) the stack pointer esp, then places the pushed value to the memory location pointed at with esp.

pop first loads the indicated destination with the value that esp points at, then increments esp.

Stack frame operations

The stack pointer esp stores the top (bottom) of the stack.

The base pointer (ebp) stores the value of the stack pointer when the current function was entered.

Thus, the first instructions of a function are commonly:

pushl	%ebp            ; Save base pointer of calling function
movl	%esp, %ebp      ; Load base pointer with current stack pointer
subl	$40, %esp       ; Make room for a few local variables.

After setting up the stackframe like this, various values can be accessed via the base pointer (whose value does not change in the context of the function):

ebp+ 0: the previous value of the ebp register.
ebp+ 4: the return address.
ebp+ 8: the first parameter.
ebp+12: the second parameter, etc.
ebp- 4: the first local variable.
ebp- 8: the second local variable, etc.

A function is then left with

leave                ; FIRST set esp = ebp THEN pop ebp.
ret                  : esp points now to return address: pop it into eip.

leave is equivalent to:

mov esp, ebp
pop ebp

Sections

Sections group portions of code and data which have similar purpose or should have the same memory permissions.

Common names:

.text: code, never to be paged out.
.data: read/write (global variables)
.rdata: read only data (e.g. strings)
.bss: block storage start (or block started by symbol). Uninitialized data (only size of objects is specified). The .bss section seems to be merged into the .data section by the linker. Since it contains unintialized data, it helps to reduce the size of the object file and is »expanded« into memory when the executable is loaded.
.reloc: relocation information, used to modify addresses.
.idata: import address table. (Seems to be merged into with .text or .rdata).
.edata: export information
.rsrc: ressources
PAGE*: pagable code. Apparently mainly used for kernel drivers.

Unneeded seections can be disposed of with strip.

Data for operands

The data for an operand can be stored

in an immediate (const value?)
register
memory location

The address of a memory location can be calculated by base + index*scale + displacement.

base: e{a,b,c,d}x, e{s,b}p, e{s,d}i

index: e{a,b,c,d}x, ebp, e{s,d}i

scale: 1, 2, 4 or 8

displacement: 0, 8, 16 or 32 bit

Intel syntax: [base + index*scale + disp]

AT&T syntax: disp(base, index, scale).

AT & T: $ prefix

In AT&T syntax, a number or symbol without the $ prefix is always a memory operand: movw 0x0a, %ax loads the value pointed at in ds:0x0a into %ax.

With the $ prefix, movw $0x0A, %ax loads %ax with the value $0x0a.

WinREPL

WinREPL is a "read-eval-print loop" shell on Windows that is useful for testing/learning x86 and x64 assembly.

Misc

Setting a register to zero (clearing it):

xor eax, eax

Clear three registers in four bytes:

xor ebx, ebx
mul ebx

Assemblers for x86

gas

nasm

yasm (which apperently is modelled after nasm)

HLA: High Level Assembler, uses a high level language like syntax for declarations of functions and procedure calls and allows for control structures (if, while …).

As high level assembler, it requires a (real) low level assembler such as as or masm.

First instruction of an x86

An x86 begins execution with the instruction stored in ffff:fff0, aka reset vector.

Apparently, there is a jump to the BIOS start routine:

ljmp $0xf000, $0xe05b         ;  or $0xffff ????