This little website here is dedicated to the documentation of Linux containers. As mentioned elsewhere, in a sense there
are no containers per se, but Linux kernel features such as namespaces and cgroups that are
bundled and used in different ways to provide an abstraction we call container
. Examples of
this bundling are Docker, CoreOS appc, OCI runc,
Canonical LXC/LXD, and OpenVZ.
Terminology
Conceptually, a Linux container is made up of three things:
- namespaces for providing compute isolation
- cgroups for resource consumption throttling and resource consumption accounting
- copy-on-write filesystems for state
A container's core is a process (group). The ER diagram for namespaces, cgroups and process (groups) looks as follows:
Read above ER diagram as: a process (group) can be in one or more namespaces and can be controlled by one or more cgroups
Linux namespaces
- Mount/
CLONE_NEWNS
(since Linux 2.4.19) via/proc/$PID/mounts
: filesystem mount points - UTS/
CLONE_NEWUTS
(since Linux 2.6.19) viauname -n
,hostname -f
: nodename/hostname and (NIS) domain name - IPC/
CLONE_NEWIPC
(since Linux 2.6.19) via/proc/sys/fs/mqueue
,/proc/sys/kernel
,/proc/sysvipc
: interprocess communication resource isolation: System V IPC objects, POSIX message queues - PID/
CLONE_NEWPID
(since Linux 2.6.24) via/proc/$PID/status
: process ID number space isolation: PID inside/PID outside the namespace; PID namespaces can be nested - Network/
CLONE_NEWNET
(completed in Linux 2.6.29) viaip netns list
,/proc/net
,/sys/class/net
: network system resources: network devices, IP addresses, IP routing tables, port numbers, etc. - User/
CLONE_NEWUSER
(completed in Linux 3.8) viaid
,/proc/$PID/uid_map
,/proc/$PID/gid_map
: user and group ID number space isolation. UID+GIDs inside/outside the namespace - Cgroup/
CLONE_NEWCGROUP
(since Linux 4.6) via/sys/fs/cgroup/
,/proc/cgroups
,/proc/$PID/cgroup
: cgroups
Linux cgroups
- cpu/
CONFIG_CGROUP_SCHED
(since Linux 2.6.24) - cpuacct/
CONFIG_CGROUP_CPUACCT
(since Linux 2.6.24) - cpuset/
CONFIG_CPUSETS
(since Linux 2.6.24) - memory/
CONFIG_MEMCG
(since Linux 2.6.25) - devices/
CONFIG_CGROUP_DEVICE
(since Linux 2.6.26) - freezer/
CONFIG_CGROUP_FREEZER
(since Linux 2.6.28) - net_cls/
CONFIG_CGROUP_NET_CLASSID
(since Linux 2.6.29) - blkio/
CONFIG_BLK_CGROUP
(since Linux 2.6.33) - perf_event/
CONFIG_CGROUP_PERF
(since Linux 2.6.39) - net_prio/
CONFIG_CGROUP_NET_PRIO
(since Linux 3.3) - hugetlb/
CONFIG_CGROUP_HUGETLB
(since Linux 3.5) - pids/
CONFIG_CGROUP_PIDS
(since Linux 4.3)
COW filesystems
- AUFS
- btrfs
- Overlay Filesystem
- Unionfs
- ZFS on Linux
Tooling
namespaces and cgroups
- cinf
- nsenter
- unshare
- man lsns (also: announcement lsns)
- systemd-cgtop
- cgroup-utils
- yadutaf/ctop
See also …
namespaces and cgroups
- The Unofficial Linux Perf Events Web-Page
- Netdev 1.1 - Namespaces and CGroups, the basis of Linux containers, Rami Rosen, video (2016)
- Hands on Linux sandbox with namespaces and cgroups, Tristan Cacqueray (2015)
- Namespaces in operation part 2: the namespaces API, Michael Kerrisk (2013)
- Namespaces in operation part 1: namespaces overview, Michael Kerrisk (2013)
- Resource management: Linux kernel Namespaces and cgroups, Rami Rosen (2013)
- The Linux Programming Interface, Michael Kerrisk (2010)
filesystems
- Docker storage drivers, Docker docs
- Deep dive into Docker storage drivers, Jérôme Petazzoni (2015)
- THE /proc FILESYSTEM, Terrehon Bowden et al (1999 - 2009)
- Unioning file systems: Architecture, features, and design choices, Valerie Aurora, (2009)
- Copy-On-Write 101 – Part 1: What Is It?, Ville Laurikari, (2009)