diff --git a/course/extra/nasm.md b/course/extra/nasm.md new file mode 100644 index 0000000000000000000000000000000000000000..321527d129581eee845b9dd2fa9feb4419a6bf15 --- /dev/null +++ b/course/extra/nasm.md @@ -0,0 +1,256 @@ +--- +author: Florent Gluck - Florent.Gluck@hesge.ch + +title: NASM + +date: \vspace{.5cm} \footnotesize \today + +pandoc-latex-fontsize: + - classes: [tiny] + size: tiny + - classes: [verysmall] + size: scriptsize + - classes: [small] + size: footnotesize + - classes: [huge, important] + size: huge +--- + +[//]: # ---------------------------------------------------------------- +## NASM: Netwide Assembler + +- Syntax: `nasm [-f format] [-o outfile] filename` + +- Supported outpout formats (non exhaustive): + ```{.tiny} + * bin flat-form binary files (e.g. DOS .COM, .SYS) + aout Linux a.out object files + aoutb NetBSD/FreeBSD a.out object files + coff COFF (i386) object files (e.g. DJGPP for DOS) + elf32 ELF32 (i386) object files (e.g. Linux) + elf64 ELF64 (x86_64) object files (e.g. Linux) + elfx32 ELFX32 (x86_64) object files (e.g. Linux) + as86 Linux as86 (bin86 version 0.3) object files + obj MS-DOS 16-bit/32-bit OMF object files + win32 Microsoft Win32 (i386) object files + win64 Microsoft Win64 (x86-64) object files + rdf Relocatable Dynamic Object File Format v2.0 + ieee IEEE-695 (LADsoft variant) object file format + macho32 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (i386) object files + macho64 NeXTstep/OpenStep/Rhapsody/Darwin/MacOS X (x86_64) object files + ``` + +[//]: # ---------------------------------------------------------------- +## General syntax + +\small + +- General syntax of a line of code: + ```{.verysmall .assembler} + label: instruction operands ; comment + ``` + label is optional and anything after `;` is a comment + +- Bases + ```{.verysmall .assembler} + 100 ; decimal + 0xCAFE ; hexadecimal + 777q ; octal + 10011b ; binary + ``` + +- Code example + ```{.verysmall .assembler} + my_func: + mov eax,0x1F ; eax = 0x1F + add eax,ebx ; eax = eax + ebx + ret ; function return + ``` + +[//]: # ---------------------------------------------------------------- +## Allocating data (1/2) + +\small + +- Allocating anonymous data + ```{.verysmall .assembler} + db value ; allocate 1 byte (8 bits) + dw value ; allocate 1 word (16 bits) + dd value ; allocate 1 double word (32 bits) + ``` +- Allocate a named data (= variable) + ```{.verysmall .assembler} + my_value db 77 ; allocate 1 byte, initialized to 77 + ``` +- Allocating non-initialized data + ```{.verysmall .assembler} + buf: resb 64 ; allocate 64 bytes and make label + ; "buf" point to it + x: resw 1 ; allocate 1 word (16 bits) + y: resd 1 ; allocate 1 double word (32 bits) + ``` + +[//]: # ---------------------------------------------------------------- +## Allocating data (2/2) + +- Allocating C strings + ```{.verysmall .assembler} + db "abcd",0 ; equivalent to "abcd" in C + db 'a',"b",'cd',0 ; identical to previous line + db '"x"'," and 'y'.",0 ; the string (without []): + ; ["x" and 'y'.] + ``` + +[//]: # ---------------------------------------------------------------- +## Useful keywords + +\small + +- The `times` keyword is used to repeat a statement.\ + For example to declare 20 bytes, all initialized at 0 + ```{.verysmall .assembler} + times 20 db 0 + ``` +- To align code or data to a given boundary size + ```{.verysmall .assembler} + align 4 ; starting from here, code and data will + ; be 4-bytes aligned + ``` +- Use `.section` to declare a section + ```{.tiny .assembler} + section .text + + func_return_77: + mov eax,77 + ret + + section .bss + + some_int: resd 1 + ``` + +[//]: # ---------------------------------------------------------------- +## Symbol visibility + +- Symbols visibility is the opposite of C +- By default symbols are local (i.e. not visible outside a module) +- Use the `global` keyword to make a symbol global + ```{.verysmall .assembler} + global main ; main is now visible outside + ; the current module + ``` +- The `extern` keyword indicates the symbol is not defined in the current module and that it will be resolved at link time + ```{.verysmall .assembler} + extern printf + ``` + +[//]: # ---------------------------------------------------------------- +## Generating an executable + +- Usually `nasm` is used to generate relocatable object files + - Then `gcc` or `ld` is used to link the object files into an executable +- Here is how to compile with `nasm` and link with `gcc` for IA-32 + ```{.verysmall} + nasm -f elf32 mycode.s + gcc -m32 mycode.o -o mycode + ``` +- Note that the following packages are required on Ubuntu + ```{.verysmall} + nasm + lib32gcc-9-dev (Ubuntu 20.04) + lib32gcc-7-dev (Ubuntu 18.04) + libc6-dev-i386 + ``` + +[//]: # ---------------------------------------------------------------- +## Quiz: what does this code do? + +```{.tiny .assembler} +extern printf +global main + +section .data +val dd 5 +fmt db "val=%d, eax=%d", 10, 0 + +section .text +main: + push ebp + mov ebp,esp + mov eax,[val] + add eax,2 + push eax + mov eax,[val] + push eax + mov eax,fmt + push eax + call printf + add esp,12 + mov esp,ebp + pop ebp + mov eax,0 + ret +``` + +[//]: # ---------------------------------------------------------------- +## Quiz: what does this code do? + +```{.tiny .assembler} +extern printf ; extern function: the C function to be called +global main ; program entry point + +section .data ; data section (initialized variables) +val dd 5 ; int val=5; +fmt db "val=%d, eax=%d", 10, 0 ; The printf format, "\n",'0' + +section .text ; code section +main: ; entrypoint label (linker searches this symbol) + push ebp ; set up stack frame + mov ebp,esp + mov eax,[val] ; eax = [val] + add eax,2 ; eax = [val]+2 + push eax ; push value ([val]+2) on stack + mov eax,[val] ; eax = [val] + push eax ; push value of [val] on stack + mov eax,fmt ; eax = address of fmt string + push eax ; push address of fmt on stack + call printf ; call printf + add esp,12 ; clear up stack ptr (the 3 push of the 3 args) + mov esp,ebp ; "free" stack frame + pop ebp + mov eax,0 ; status code: success + ret ; return to caller (exit main -> exit program) +``` + +[//]: # ---------------------------------------------------------------- +## Executable generation + +- A `.o` file remains a `.o` file whether it be compiled from C source code with `gcc` or `.s` assembly code with `nasm` +- Linking is performed exactly the same as what is done with C code +- `gcc` or `ld` are used to perform the linking +- Example targeting an IA-32 architecture + ```{.verysmall} + nasm -f elf32 func.s + gcc -m32 -c prog.c + gcc -m32 func.o prog.o -o prog + ``` + +[//]: # ---------------------------------------------------------------- +## Debugging assembly code + +- The easiest way to debug assembly code is to use `kdbg` (or `dbg` for the most masochistics among you) +- Always compile your code with the `-g` option (assembly and C codes) +- Don't forget to pass `-g` to the linker as well +- For `nasm`, you **must specify** "dwarf" debugging symbol format with the `-Fdwarf` option + ```{.verysmall} + gcc -c -g -m32 -Wall main.c + nasm -g -f elf32 -Fdwarf func.s + gcc -g -m32 main.o func.o -o prog + kdbg prog& + ``` + +[//]: # ---------------------------------------------------------------- +## Bibliography + +- NASM documentation\ +\small [\textcolor{myblue}{https://www.nasm.us/doc/}](https://www.nasm.us/doc/) diff --git a/course/extra/nasm.pdf b/course/extra/nasm.pdf new file mode 100644 index 0000000000000000000000000000000000000000..6dfc74bb28a9e892b04295f8d65b1b60b8fb0398 Binary files /dev/null and b/course/extra/nasm.pdf differ