Skip to content
Snippets Groups Projects
Commit b6e543b7 authored by Florent Gluck's avatar Florent Gluck
Browse files

added 07-System_calls

parent c1f34184
Branches
No related tags found
No related merge requests found
---
author: Florent Gluck - Florent.Gluck@hesge.ch
title: System Calls
date: \vspace{.5cm} \footnotesize \today
pandoc-latex-fontsize:
- classes: [tiny]
size: tiny
- classes: [verysmall]
size: scriptsize
- classes: [small]
size: footnotesize
- classes: [huge, important]
size: huge
---
[//]: # ----------------------------------------------------------------
##
\centering
![](images/syscalls_julia_evans.png){ width=100% }
[//]: # ----------------------------------------------------------------
# Reminder: why a kernel?
[//]: # ----------------------------------------------------------------
## Kernel's purpose
- **Purpose** of the kernel:
- to **provide services** to tasks (read a file, access devices, e.g. screen, keyboard, etc.)
- to **multiplex hardware** ressources (RAM, CPU, devices) among tasks (processes)
- to **isolate** tasks from each others and from the kernel (protection)
- \textcolor{myred}{A kernel bug can \textbf{crash} the whole system!}
- However: a buggy application **should not** be able to crash the system or impact other applications!
[//]: # ----------------------------------------------------------------
## Kernel mode vs user mode
- \textcolor{myred}{\textbf{Kernel mode}} is **privileged**
- kernel can do anything, without any restrictions
\vspace{.5cm}
- \textcolor{mygreen}{\textbf{User mode}} is **restricted**
- user applications are **unprivileged**
- what an application can or cannot do is **controlled by the kernel**
[//]: # ----------------------------------------------------------------
## Reminder: Protection mechanism
- IA-32 supports 4 privilege levels, called **rings**
- The most privileged level is \textcolor{myred}{\textbf{ring 0}}
- Usually, the kernel runs in \textcolor{myred}{\textbf{ring 0}} while user code (applications) run in \textcolor{mygreen}{\textbf{ring 3}}
\vspace{.2cm}
\centering
![](images/protection_rings.png){ width=60% }
[//]: # ----------------------------------------------------------------
## CPU privileged mode
\textcolor{myred}{\textbf{Ring 0}} is the most privileged level
- Bootloader and kernel run in \textcolor{myred}{ring 0}
- \textcolor{myred}{Ring 0} is privileged, because in \textcolor{myred}{ring 0} we can:
- access the full instruction set and registers
- access the whole CPU address space
- program the MMU (paging) and interrupt vector table (IDT)
- restrict a task's PMIO address space (`in/out` instructions)
[//]: # ----------------------------------------------------------------
## CPU restricted (unprivileged) mode
\small
\textcolor{mygreen}{\textbf{Ring 3}} is the least privileged, the one in which applications run
- In \textcolor{mygreen}{ring 3}, we **\textcolor{myred}{CANNOT}**:
- use the whole set of CPU instructions
- read/write outside the task's address space (i.e. non-mapped pages)
- access the task's page dir/table (and read/write `cr3` register)
- access and load the IDT (and execute the` ldtr` instruction)
- access restricted areas of the PMIO address space
- mask/unmask hardware interrupts (`cli`/`sti` instructions)
- If a **task does any of the above**, the **CPU raises an exception**
- must be caught by kernel $\rightarrow$ typically kills the offending task!
[//]: # ----------------------------------------------------------------
## Services
::: incremental
- Given a task in \textcolor{mygreen}{ring 3 (user mode)} is limited/non-privileged, how can it access system resources, such as screen, disk, keyboard, etc.?
- Through **system calls**
- kernel exposes a limited number of functions callable from \textcolor{mygreen}{ring 3 (user mode)}
- these functions are called "system calls" (syscalls)
- during the execution of a system call, a **change of privilege** occurs
:::
[//]: # ----------------------------------------------------------------
## Change of privilege during syscall
\centering
![](images/syscalls.png){ width=100% }
[//]: # ----------------------------------------------------------------
## Why system calls?
- System calls (syscalls) are the kernel API
- the set of functions the kernel exposes to user applications
- syscalls are services to applications!
- \textcolor{myred}{Without syscalls, user tasks wouldn't be able to do anything (almost)}
- can't access devices (I/O)
- can't extend their adress space (e.g. `malloc()`)
- etc.
- Consequently, user applications **require syscalls** to do anything meaningful!
[//]: # ----------------------------------------------------------------
## Examples of system calls
\textcolor{mygreen}{Syscalls are \textbf{required} for anything related to}:
- I/O (device) access
- dynamic memory allocation (`malloc`)
- accessing files
- executing/terminating a process
- inter-process communication (IPC)
- timers
- etc.
[//]: # ----------------------------------------------------------------
## When system calls are not needed
\textcolor{myred}{However, syscalls are \textbf{not needed} for}:
- changing the content of a string
- calling a function
- copying memory within the task's address space
- etc.
[//]: # ----------------------------------------------------------------
## Linux system calls
\small
Example of Linux system calls:
- `open, read, write, close, fork, mmap, getpid`, etc.
List of Linux system calls : `man syscalls`
```{.tiny}
System call Kernel Notes
--------------------------------------------------------------------
...
accept(2) 2.0 See notes on socketcall(2)
accept4(2) 2.6.28
...
bind(2) 2.0 See notes on socketcall(2)
bpf(2) 3.18
...
capset(2) 2.2
chdir(2) 1.0
chmod(2) 1.0
...
clone2(2) 2.4 IA-64 only
clone(2) 1.0
clone3(2) 5.3
...
```
[//]: # ----------------------------------------------------------------
## System calls and privilege levels
- As stated before, an unprivileged task (running in \textcolor{mygreen}{ring 3}) is extremely limited
- Such a task requires functions exposed by the kernel to do anything meaningful
- these functions are called **system calls**
- System calls are functions **implemented in the kernel** (executing in \textcolor{myred}{ring 0})...
- But they **can be called by unprivileged tasks** (executing in \textcolor{mygreen}{ring 3})!
[//]: # ----------------------------------------------------------------
## System calls: how?
- How to allow some code executing at a low privilege level (\textcolor{mygreen}{ring 3}) to call code at a higher privilege level (\textcolor{myred}{ring 0})?
- We program a special **software interrupt** that's allowed to be called from a lower privilege level
- achieved by creating a specifically built interrupt descriptor callable from \textcolor{mygreen}{ring 3}
- A more efficient way, but less portable, is to use dedicated CPU instructions:
- `sysenter/sysexit` on Intel CPUs
- `syscall/sysret` on AMD CPUs
[//]: # ----------------------------------------------------------------
## System call implementation example
- Implementation example using a software interrupt
- Using the code provided in lab3, `idt.c` is modified by adding a new interrupt handler for software interrupts:
```{.verysmall .c}
// IDT entry 123: system call
idt[123] = idt_build_entry(GDT_KERNEL_CODE_SELECTOR,
(uint32_t)&_syscall_handler,
TYPE_TRAP_GATE,
DPL_USER);
```
- Then, from some unprivileged task code, a syscall is executed by triggering software interrupt 123 with (in assembly):
```{.small .c}
int 123
```
[//]: # ----------------------------------------------------------------
## Many system calls
- Typically, a kernel implements many syscalls (Linux has > 350)
- not enough software interrupts for all possible syscalls
- Moreover: syscalls usually have arguments
- How to solve these two problems?
[//]: # ----------------------------------------------------------------
## How to handle many system calls?
Basic Idea:
- All syscalls use the **same** software interrupt
- The syscall number is just an extra argument passed to the software interrupt
[//]: # ----------------------------------------------------------------
## How to handle system call arguments?
How to pass multiple arguments to the software interrupt?
- Three solutions:
(1) use CPU registers
(1) use the stack
(1) use a dedicated area of memory shared between the kernel and the task
Here, we will use (1)
[//]: # ----------------------------------------------------------------
## System calls dispatch table
- How to handle the different syscalls given a **single** software interrupt is triggered?
- Solution: by using a syscall **dispatch table**:
- a table of functions, one per syscall, is implemented in the kernel (array of pointers to functions)
- the syscall number is an index in this table
[//]: # ----------------------------------------------------------------
## System calls: overview
\centering
![](images/syscalls_dispatch_table.png){ width=100% }
[//]: # ----------------------------------------------------------------
## Full workflow of a syscall
\centering
![](images/syscall_full_workflow.png){ width=100% }
[//]: # ----------------------------------------------------------------
## System library
\small
- System calls ressemble function calls
- However:
- syscalls are not very readable
- programmer must remember every syscall number
- usually too low-level
- Solution?
- introduce a system library that abstracts system calls and provides a programmer-friendly API
- under GNU Linux, the system library is the glibc
- the system library is often a mixture of system calls and pure user space code
- the system linker typically links every application to the system library
[//]: # ----------------------------------------------------------------
## System calls overhead
\small
::: incremental
Compared to function calls, system calls are **very expensive**
- Why?
- Because each system call implies:
1. the caller context must be saved
1. change of privilege (security checks)
1. kernel code execution
1. change of privilege
1. caller context must be restored
:::
[//]: # ----------------------------------------------------------------
## System calls performance consideration
::: incremental
Given system calls are expensive:
- Applications should minimize the number of times they call system calls
- Kernels should avoid exposing more system calls than really necessary
\vspace{.5cm}
- Quiz: difference between `read` and `fread`?
:::
[//]: # ----------------------------------------------------------------
## Resources
\small
- Operating Systems: Three Easy Pieces, Remzi H. and Andrea C. Arpaci-Dusseau. Arpaci-Dusseau Books\
\footnotesize [\textcolor{myblue}{http://pages.cs.wisc.edu/~remzi/OSTEP/}](http://pages.cs.wisc.edu/~remzi/OSTEP/)
File added
No preview for this file type
course/images/syscall_full_workflow.png

290 KiB | W: | H:

course/images/syscall_full_workflow.png

299 KiB | W: | H:

course/images/syscall_full_workflow.png
course/images/syscall_full_workflow.png
course/images/syscall_full_workflow.png
course/images/syscall_full_workflow.png
  • 2-up
  • Swipe
  • Onion skin
No preview for this file type
course/images/syscalls_dispatch_table.png

257 KiB | W: | H:

course/images/syscalls_dispatch_table.png

273 KiB | W: | H:

course/images/syscalls_dispatch_table.png
course/images/syscalls_dispatch_table.png
course/images/syscalls_dispatch_table.png
course/images/syscalls_dispatch_table.png
  • 2-up
  • Swipe
  • Onion skin
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment