Search notes:

sys calls

A program requests the execution of a service provided by the kernel through system calls (aka «kernel calls»).
These calls are identified by a number (see arch/x86/entry/syscalls/syscall_64.tbl and arch/x86/include/generated/asm/syscalls_64.h).

Executing syscalls and passing parameters

For 64 bit x86 applications, arch/x86/entry/entry_64.S contains the following useful comment:
64-bit SYSCALL instruction entry. Up to 6 arguments in registers.
This is the only entry point used for 64-bit system calls. The hardware interface is reasonably well designed and the register to argument mapping Linux uses fits well with the registers that are available when SYSCALL is used.
SYSCALL instructions can be found inlined in libc implementations as well as some other programs and libraries. There are also a handful of SYSCALL instructions in the vDSO used, for example, as a clock_gettimeofday fallback.
64-bit SYSCALL saves rip to rcx, clears rflags.RF, then saves rflags to r11, then loads new ss, cs, and rip from previously programmed MSRs. rflags gets masked by a value from another MSR (so CLD and CLAC are not needed). SYSCALL does not save anything on the stack and does not change rsp.
Registers on entry: rax system call number rcx return address r11 saved rflags (note: r11 is callee-clobbered register in C ABI) rdi arg0 rsi arg1 rdx arg2 r10 arg3 (needs to be moved to rcx to conform to C ABI) r8 arg4 r9 arg5 (note: r12-r15, rbp, rbx are callee-preserved in C ABI)
Only called from user space.
See also the note about x64 assembler dialects where the exit syscall is used to demonstrate some dialects.

Assembler code to call a syscall

.intel_syntax noprefix

.data

string_to_print:
.ascii "Hello, World\n"

.text
.globl _start

_start:
  # syscall write(int fd, const void *buf, size_t count)
    mov rax, 1                       # 1 is the syscall number for write. Store this number in register RAX.
    mov rdi, 1                       # The first parameter of a syscall is stored in RDI. For write, this is the file descriptor (1 is stdout)
    lea rsi, string_to_print[rip]    # Load the address of string_to_print relative to the instruction pointer (position independent code) into the second paramter (RSI)
    mov rdx, 13                      # The length of the string is stored in the third parameter (RDX).
    syscall                          # Invoke syscall

  # syscall exit(int status)
    mov rax, 60        # syscall number for exit
    xor rdi, rdi       # return code 0            # exit with code 0
    syscall            # invoke kernel
This code can be compiled into an executable with
$ gcc -nostdlib -o /tmp/hello /tmp/hello.S
See also demonstration of X64 assembler dialects

Where some sys calls are defined

accept, defined in net/socket.c
accept4, defined in net/socket.c
access, defined in fs/open.c
acct, defined in kernel/acct.c
add_key, defined in security/keys/keyctl.c
adjtimex, defined in kernel/time/time.c
alarm, defined in kernel/time/timer.c
bdflush, defined in fs/buffer.c
bind, defined in net/socket.c
bpf, defined in kernel/bpf/syscall.c
brk, defined in mm/mmap.c and mm/nommu.c. TODO: brk.
capget, defined in kernel/capability.c
capset, defined in kernel/capability.c
chdir, defined in fs/open.c
chmod, defined in fs/open.c
chown, defined in fs/open.c
chown16, defined in kernel/uid16.c
chroot, defined in fs/open.c
clock_adjtime, defined in kernel/time/posix-timers.c
clock_getres, defined in kernel/time/posix-timers.c
clock_gettime, defined in kernel/time/posix-timers.c
clock_nanosleep, defined in kernel/time/posix-timers.c
clock_settime, defined in kernel/time/posix-timers.c
clone, defined in kernel/fork.c
close, defined in fs/open.c
connect, defined in net/socket.c
copy_file_range, defined in fs/read_write.c
creat, defined in fs/open.c
delete_module, defined in kernel/module.c
dup, defined in fs/file.c
dup2, defined in fs/file.c
dup3, defined in fs/file.c
epoll_create, defined in fs/eventpoll.c
epoll_create1, defined in fs/eventpoll.c
epoll_ctl, defined in fs/eventpoll.c
epoll_pwait, defined in fs/eventpoll.c
epoll_wait, defined in fs/eventpoll.c
eventfd, defined in fs/eventfd.c
eventfd2, defined in fs/eventfd.c
execve, defined in fs/exec.c
execveat, defined in fs/exec.c
exit, defined in kernel/exit.c
exit_group, defined in kernel/exit.c
faccessat, defined in fs/open.c
fadvise64, defined in mm/fadvise.c
fadvise64_64, defined in mm/fadvise.c
fallocate, defined in fs/open.c
fanotify_init, defined in fs/notify/fanotify/fanotify_user.c
fanotify_mark, defined in fs/notify/fanotify/fanotify_user.c
fchdir, defined in fs/open.c
fchmod, defined in fs/open.c
fchmodat, defined in fs/open.c
fchown, defined in fs/open.c
fchown16, defined in kernel/uid16.c
fchownat, defined in fs/open.c
fcntl, defined in fs/fcntl.c
fcntl64, defined in fs/fcntl.c
fdatasync, defined in fs/sync.c
fgetxattr, defined in fs/xattr.c
finit_module, defined in kernel/module.c
flistxattr, defined in fs/xattr.c
flock, defined in fs/locks.c
fork, defined in kernel/fork.c
fremovexattr, defined in fs/xattr.c
fsetxattr, defined in fs/xattr.c
fstat, defined in fs/stat.c
fstat64, defined in fs/stat.c
fstatat64, defined in fs/stat.c
fstatfs, defined in fs/statfs.c
fstatfs64, defined in fs/statfs.c
fsync, defined in fs/sync.c
ftruncate, defined in fs/open.c
ftruncate64, defined in fs/open.c
futex, defined in kernel/futex.c
futimesat, defined in fs/utimes.c
get_mempolicy, defined in mm/mempolicy.c
get_robust_list, defined in kernel/futex.c
get_thread_area, defined in arch/x86/um/tls_32.c and arch/x86/kernel/tls.c
getcpu, defined in kernel/sys.c
getcwd, defined in fs/dcache.c
getdents, defined in fs/readdir.c
getdents64, defined in fs/readdir.c
getegid, defined in kernel/sys.c
getegid16, defined in kernel/uid16.c
geteuid, defined in kernel/sys.c
geteuid16, defined in kernel/uid16.c
getgid, defined in kernel/sys.c
getgid16, defined in kernel/uid16.c
getgroups, defined in kernel/groups.c
getgroups16, defined in kernel/uid16.c
gethostname, defined in kernel/sys.c
getitimer, defined in kernel/time/itimer.c
getpeername, defined in net/socket.c
getpgid, defined in kernel/sys.c
getpgrp, defined in kernel/sys.c
getpid, defined in kernel/sys.c
getppid, defined in kernel/sys.c
getpriority, defined in kernel/sys.c
getrandom, defined in drivers/char/random.c
getresgid, defined in kernel/sys.c
getresgid16, defined in kernel/uid16.c
getresuid, defined in kernel/sys.c
getresuid16, defined in kernel/uid16.c
getrlimit, defined in kernel/sys.c
getrusage, defined in kernel/sys.c
getsid, defined in kernel/sys.c
getsockname, defined in net/socket.c
getsockopt, defined in net/socket.c
gettid, defined in kernel/sys.c
gettimeofday, defined in kernel/time/time.c
getuid, defined in kernel/sys.c
getuid16, defined in kernel/uid16.c
getxattr, defined in fs/xattr.c
init_module, defined in kernel/module.c
inotify_add_watch, defined in fs/notify/inotify/inotify_user.c
inotify_init, defined in fs/notify/inotify/inotify_user.c
inotify_init1, defined in fs/notify/inotify/inotify_user.c
inotify_rm_watch, defined in fs/notify/inotify/inotify_user.c
io_cancel, defined in fs/aio.c
io_destroy, defined in fs/aio.c
io_getevents, defined in fs/aio.c
io_setup, defined in fs/aio.c
io_submit, defined in fs/aio.c
ioctl, defined in fs/ioctl.c
iopl, defined in arch/x86/kernel/ioport.c. TODO iopl.
ioprio_get, defined in block/ioprio.c
ioprio_set, defined in block/ioprio.c
ipc, defined in ipc/syscall.c
kcmp, defined in kernel/kcmp.c
kexec_file_load, defined in kernel/kexec_file.c
kexec_load, defined in kernel/kexec.c
keyctl, defined in security/keys/keyctl.c
kill, defined in kernel/signal.c. TODO: kill.
lchown, defined in fs/open.c
lchown16, defined in kernel/uid16.c
lgetxattr, defined in fs/xattr.c
link, defined in fs/namei.c
linkat, defined in fs/namei.c
listen, defined in net/socket.c
listxattr, defined in fs/xattr.c
llistxattr, defined in fs/xattr.c
llseek, defined in fs/read_write.c
lookup_dcookie, defined in fs/dcookies.c
lremovexattr, defined in fs/xattr.c
lseek, defined in fs/read_write.c
lsetxattr, defined in fs/xattr.c
lstat, defined in fs/stat.c
lstat64, defined in fs/stat.c
madvise, defined in mm/madvise.c
mbind, defined in mm/mempolicy.c
membarrier, defined in kernel/membarrier.c
memfd_create, defined in mm/shmem.c
migrate_pages, defined in mm/mempolicy.c
mincore, defined in mm/mincore.c
mkdir, defined in fs/namei.c
mkdirat, defined in fs/namei.c
mknod, defined in fs/namei.c
mknodat, defined in fs/namei.c
mlock, defined in mm/mlock.c
mlock2, defined in mm/mlock.c
mlockall, defined in mm/mlock.c
mmap, defined in arch/x86/kernel/sys_x86_64.c
mmap_pgoff, defined in mm/mmap.c and mm/nommu.c
mount, defined in fs/namespace.c
move_pages, defined in mm/migrate.c
mprotect, defined in mm/mprotect.c
mq_getsetattr, defined in ipc/mqueue.c
mq_notify, defined in ipc/mqueue.c
mq_open, defined in ipc/mqueue.c
mq_timedreceive, defined in ipc/mqueue.c
mq_timedsend, defined in ipc/mqueue.c
mq_unlink, defined in ipc/mqueue.c
mremap, defined in mm/nommu.c and mm/mremap.c
msgctl, defined in ipc/msg.c
msgget, defined in ipc/msg.c
msgrcv, defined in ipc/msg.c
msgsnd, defined in ipc/msg.c
msync, defined in mm/msync.c
munlock, defined in mm/mlock.c
munlockall, defined in mm/mlock.c
munmap, defined in mm/mmap.c and mm/nommu.c
name_to_handle_at, defined in fs/fhandle.c
nanosleep, defined in kernel/time/hrtimer.c
newfstat, defined in fs/stat.c
newfstatat, defined in fs/stat.c
newlstat, defined in fs/stat.c
newstat, defined in fs/stat.c
newuname, defined in kernel/sys.c
nice, defined in kernel/sched/core.c
old_getrlimit, defined in kernel/sys.c
old_mmap, defined in mm/mmap.c and mm/nommu.c
old_readdir, defined in fs/readdir.c
old_select, defined in fs/select.c
oldumount, defined in fs/namespace.c
olduname, defined in kernel/sys.c
open, defined in fs/open.c
open_by_handle_at, defined in fs/fhandle.c
openat, defined in fs/open.c
pause, defined in kernel/signal.c
pciconfig_read, defined in drivers/pci/syscall.c
pciconfig_write, defined in drivers/pci/syscall.c
perf_event_open, defined in kernel/events/core.c
personality, defined in kernel/exec_domain.c
pipe, defined in fs/pipe.c
pipe2, defined in fs/pipe.c
pivot_root, defined in fs/namespace.c
poll, defined in fs/select.c
ppoll, defined in fs/select.c
prctl, defined in kernel/sys.c
pread64, defined in fs/read_write.c
preadv, defined in fs/read_write.c
preadv2, defined in fs/read_write.c
prlimit64, defined in kernel/sys.c
process_vm_readv, defined in mm/process_vm_access.c
process_vm_writev, defined in mm/process_vm_access.c
pselect6, defined in fs/select.c
ptrace, defined in kernel/ptrace.c (TODO: See also ptrace).
pwrite64, defined in fs/read_write.c
pwritev, defined in fs/read_write.c
pwritev2, defined in fs/read_write.c
quotactl, defined in fs/quota/quota.c
read, defined in fs/read_write.c
readahead, defined in mm/readahead.c
readlink, defined in fs/stat.c
readlinkat, defined in fs/stat.c
readv, defined in fs/read_write.c
reboot, defined in kernel/reboot.c
recv, defined in net/socket.c
recvfrom, defined in net/socket.c
recvmmsg, defined in net/socket.c
recvmsg, defined in net/socket.c
remap_file_pages, defined in mm/mmap.c
removexattr, defined in fs/xattr.c
rename, defined in fs/namei.c
renameat, defined in fs/namei.c
renameat2, defined in fs/namei.c
request_key, defined in security/keys/keyctl.c
restart_syscall, defined in kernel/signal.c
rmdir, defined in fs/namei.c
rt_sigaction, defined in kernel/signal.c
rt_sigpending, defined in kernel/signal.c
rt_sigprocmask, defined in kernel/signal.c
rt_sigqueueinfo, defined in kernel/signal.c
rt_sigsuspend, defined in kernel/signal.c
rt_sigtimedwait, defined in kernel/signal.c
rt_tgsigqueueinfo, defined in kernel/signal.c
sched_get_priority_max, defined in kernel/sched/core.c
sched_get_priority_min, defined in kernel/sched/core.c
sched_getaffinity, defined in kernel/sched/core.c
sched_getattr, defined in kernel/sched/core.c
sched_getparam, defined in kernel/sched/core.c
sched_getscheduler, defined in kernel/sched/core.c
sched_rr_get_interval, defined in kernel/sched/core.c
sched_setaffinity, defined in kernel/sched/core.c
sched_setattr, defined in kernel/sched/core.c
sched_setparam, defined in kernel/sched/core.c
sched_setscheduler, defined in kernel/sched/core.c
sched_yield, defined in kernel/sched/core.c
seccomp, defined in kernel/seccomp.c
select, defined in fs/select.c
semctl, defined in ipc/sem.c
semget, defined in ipc/sem.c
semop, defined in ipc/sem.c
semtimedop, defined in ipc/sem.c
send, defined in net/socket.c
sendfile, defined in fs/read_write.c
sendfile64, defined in fs/read_write.c
sendmmsg, defined in net/socket.c
sendmsg, defined in net/socket.c
sendto, defined in net/socket.c
set_mempolicy, defined in mm/mempolicy.c
set_robust_list, defined in kernel/futex.c
set_thread_area, defined in arch/x86/um/tls_32.c and arch/x86/kernel/tls.c
set_tid_address, defined in kernel/fork.c
setdomainname, defined in kernel/sys.c
setfsgid, defined in kernel/sys.c
setfsgid16, defined in kernel/uid16.c
setfsuid, defined in kernel/sys.c
setfsuid16, defined in kernel/uid16.c
setgid, defined in kernel/sys.c
setgid16, defined in kernel/uid16.c
setgroups, defined in kernel/groups.c
setgroups16, defined in kernel/uid16.c
sethostname, defined in kernel/sys.c
setitimer, defined in kernel/time/itimer.c
setns, defined in kernel/nsproxy.c
setpgid, defined in kernel/sys.c
setpriority, defined in kernel/sys.c
setregid, defined in kernel/sys.c
setregid16, defined in kernel/uid16.c
setresgid, defined in kernel/sys.c
setresgid16, defined in kernel/uid16.c
setresuid, defined in kernel/sys.c
setresuid16, defined in kernel/uid16.c
setreuid, defined in kernel/sys.c
setreuid16, defined in kernel/uid16.c
setrlimit, defined in kernel/sys.c
setsid, defined in kernel/sys.c
setsockopt, defined in net/socket.c
settimeofday, defined in kernel/time/time.c
setuid, defined in kernel/sys.c
setuid16, defined in kernel/uid16.c
setxattr, defined in fs/xattr.c
sgetmask, defined in kernel/signal.c
shmat, defined in ipc/shm.c
shmctl, defined in ipc/shm.c
shmdt, defined in ipc/shm.c
shmget, defined in ipc/shm.c
shutdown, defined in net/socket.c
sigaction, defined in kernel/signal.c
sigaltstack, defined in kernel/signal.c
signal, defined in kernel/signal.c
signalfd, defined in fs/signalfd.c
signalfd4, defined in fs/signalfd.c
sigpending, defined in kernel/signal.c
sigprocmask, defined in kernel/signal.c
sigsuspend, defined in kernel/signal.c and kernel/signal.c
socket, defined in net/socket.c
socketcall, defined in net/socket.c
socketpair, defined in net/socket.c
splice, defined in fs/splice.c
ssetmask, defined in kernel/signal.c
stat, defined in fs/stat.c
stat64, defined in fs/stat.c
statfs, defined in fs/statfs.c
statfs64, defined in fs/statfs.c
stime, defined in kernel/time/time.c
swapoff, defined in mm/swapfile.c
swapon, defined in mm/swapfile.c
symlink, defined in fs/namei.c
symlinkat, defined in fs/namei.c
sync, defined in fs/sync.c
sync_file_range, defined in fs/sync.c
sync_file_range2, defined in fs/sync.c
syncfs, defined in fs/sync.c
sysctl, defined in kernel/sysctl_binary.c
sysfs, defined in fs/filesystems.c
sysinfo, defined in kernel/sys.c
syslog, defined in kernel/printk/printk.c
tee, defined in fs/splice.c
tgkill, defined in kernel/signal.c
time, defined in kernel/time/time.c
timer_create, defined in kernel/time/posix-timers.c
timer_delete, defined in kernel/time/posix-timers.c
timer_getoverrun, defined in kernel/time/posix-timers.c
timer_gettime, defined in kernel/time/posix-timers.c
timer_settime, defined in kernel/time/posix-timers.c
timerfd_create, defined in fs/timerfd.c
timerfd_gettime, defined in fs/timerfd.c
timerfd_settime, defined in fs/timerfd.c
times, defined in kernel/sys.c
tkill, defined in kernel/signal.c
truncate, defined in fs/open.c
truncate64, defined in fs/open.c
umask, defined in kernel/sys.c
umount, defined in fs/namespace.c
uname, defined in kernel/sys.c
unlink, defined in fs/namei.c
unlinkat, defined in fs/namei.c
unshare, defined in kernel/fork.c
uselib, defined in fs/exec.c
userfaultfd, defined in fs/userfaultfd.c
ustat, defined in fs/statfs.c
utime, defined in fs/utimes.c
utimensat, defined in fs/utimes.c
utimes, defined in fs/utimes.c
vfork, defined in kernel/fork.c
vhangup, defined in fs/open.c
vm86, defined in arch/x86/kernel/vm86_32.c
vm86old, defined in arch/x86/kernel/vm86_32.c
vmsplice, defined in fs/splice.c
wait4, defined in kernel/exit.c
waitid, defined in kernel/exit.c
waitpid, defined in kernel/exit.c
write, defined in fs/read_write.c
writev, defined in fs/read_write.c

Alternatives to system calls

Linux offers a few alternatives to system calls with which a userland program can interact with the kernel:

TODO

Time related syscall interfaces are found in kernel/time/time.c.

Definition and implementation (source code)

Sys calls are defined in include/linux/syscalls.h with asmlinkage and a sys_ prefix to their names:
asmlinkage long sys_ioctl(unsigned int fd, unsigned int cmd, unsigned long arg);
Sys calls are implemented in various C files using SYSCALL_DEFINE1SYSCALL_DEFINE6 macros (defined in include/linux/syscalls.h):
SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg) {
  …
}

Grepping sys calls

The following command seems to grep for syscalls in the kernel source tree:
grep -r '^SYSCALL_DEFINE' --include='*.c'

See also

System calls are covered in the man page section 2.
System calls can be traced with strace.
Documentation/adding-syscalls.txt
libc's syscall, (defined in unistd.h).
Relation to POSIX.
User vs kernel space
/sys/kernel/debug/tracing/available_events
scripts/checksyscalls.sh
/usr/include/sys/syscall.h
With Windows Subsystem for Linux, Version 2, the full range of sys calls is available in Windows. (See also syscalls in WSL)

Index