summaryrefslogtreecommitdiffstats
path: root/man/man7/namespaces.7
diff options
context:
space:
mode:
Diffstat (limited to 'man/man7/namespaces.7')
-rw-r--r--man/man7/namespaces.7417
1 files changed, 417 insertions, 0 deletions
diff --git a/man/man7/namespaces.7 b/man/man7/namespaces.7
new file mode 100644
index 000000000..bdcdde3c8
--- /dev/null
+++ b/man/man7/namespaces.7
@@ -0,0 +1,417 @@
+'\" t
+.\" Copyright (c) 2013, 2016, 2017 by Michael Kerrisk <mtk.manpages@gmail.com>
+.\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com>
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.\"
+.TH namespaces 7 (date) "Linux man-pages (unreleased)"
+.SH NAME
+namespaces \- overview of Linux namespaces
+.SH DESCRIPTION
+A namespace wraps a global system resource in an abstraction that
+makes it appear to the processes within the namespace that they
+have their own isolated instance of the global resource.
+Changes to the global resource are visible to other processes
+that are members of the namespace, but are invisible to other processes.
+One use of namespaces is to implement containers.
+.P
+This page provides pointers to information on the various namespace types,
+describes the associated
+.I /proc
+files, and summarizes the APIs for working with namespaces.
+.\"
+.SS Namespace types
+The following table shows the namespace types available on Linux.
+The second column of the table shows the flag value that is used to specify
+the namespace type in various APIs.
+The third column identifies the manual page that provides details
+on the namespace type.
+The last column is a summary of the resources that are isolated by
+the namespace type.
+.TS
+lB lB lB lB
+l1 lB1 l1 l.
+Namespace Flag Page Isolates
+Cgroup CLONE_NEWCGROUP \fBcgroup_namespaces\fP(7) T{
+Cgroup root directory
+T}
+IPC CLONE_NEWIPC \fBipc_namespaces\fP(7) T{
+System V IPC,
+POSIX message queues
+T}
+Network CLONE_NEWNET \fBnetwork_namespaces\fP(7) T{
+Network devices,
+stacks, ports, etc.
+T}
+Mount CLONE_NEWNS \fBmount_namespaces\fP(7) Mount points
+PID CLONE_NEWPID \fBpid_namespaces\fP(7) Process IDs
+Time CLONE_NEWTIME \fBtime_namespaces\fP(7) T{
+Boot and monotonic
+clocks
+T}
+User CLONE_NEWUSER \fBuser_namespaces\fP(7) T{
+User and group IDs
+T}
+UTS CLONE_NEWUTS \fButs_namespaces\fP(7) T{
+Hostname and NIS
+domain name
+T}
+.TE
+.\"
+.\" ==================== The namespaces API ====================
+.\"
+.SS The namespaces API
+As well as various
+.I /proc
+files described below,
+the namespaces API includes the following system calls:
+.TP
+.BR clone (2)
+The
+.BR clone (2)
+system call creates a new process.
+If the
+.I flags
+argument of the call specifies one or more of the
+.B CLONE_NEW*
+flags listed above, then new namespaces are created for each flag,
+and the child process is made a member of those namespaces.
+(This system call also implements a number of features
+unrelated to namespaces.)
+.TP
+.BR setns (2)
+The
+.BR setns (2)
+system call allows the calling process to join an existing namespace.
+The namespace to join is specified via a file descriptor that refers to
+one of the
+.IR /proc/ pid /ns
+files described below.
+.TP
+.BR unshare (2)
+The
+.BR unshare (2)
+system call moves the calling process to a new namespace.
+If the
+.I flags
+argument of the call specifies one or more of the
+.B CLONE_NEW*
+flags listed above, then new namespaces are created for each flag,
+and the calling process is made a member of those namespaces.
+(This system call also implements a number of features
+unrelated to namespaces.)
+.TP
+.BR ioctl (2)
+Various
+.BR ioctl (2)
+operations can be used to discover information about namespaces.
+These operations are described in
+.BR ioctl_ns (2).
+.P
+Creation of new namespaces using
+.BR clone (2)
+and
+.BR unshare (2)
+in most cases requires the
+.B CAP_SYS_ADMIN
+capability, since, in the new namespace,
+the creator will have the power to change global resources
+that are visible to other processes that are subsequently created in,
+or join the namespace.
+User namespaces are the exception: since Linux 3.8,
+no privilege is required to create a user namespace.
+.\"
+.\" ==================== The /proc/[pid]/ns/ directory ====================
+.\"
+.SS The \fI/proc/\fPpid\fI/ns/\fP directory
+Each process has a
+.IR /proc/ pid /ns/
+.\" See commit 6b4e306aa3dc94a0545eb9279475b1ab6209a31f
+subdirectory containing one entry for each namespace that
+supports being manipulated by
+.BR setns (2):
+.P
+.in +4n
+.EX
+$ \fBls \-l /proc/$$/ns | awk \[aq]{print $1, $9, $10, $11}\[aq]\fP
+total 0
+lrwxrwxrwx. cgroup \-> cgroup:[4026531835]
+lrwxrwxrwx. ipc \-> ipc:[4026531839]
+lrwxrwxrwx. mnt \-> mnt:[4026531840]
+lrwxrwxrwx. net \-> net:[4026531969]
+lrwxrwxrwx. pid \-> pid:[4026531836]
+lrwxrwxrwx. pid_for_children \-> pid:[4026531834]
+lrwxrwxrwx. time \-> time:[4026531834]
+lrwxrwxrwx. time_for_children \-> time:[4026531834]
+lrwxrwxrwx. user \-> user:[4026531837]
+lrwxrwxrwx. uts \-> uts:[4026531838]
+.EE
+.in
+.P
+Bind mounting (see
+.BR mount (2))
+one of the files in this directory
+to somewhere else in the filesystem keeps
+the corresponding namespace of the process specified by
+.I pid
+alive even if all processes currently in the namespace terminate.
+.P
+Opening one of the files in this directory
+(or a file that is bind mounted to one of these files)
+returns a file handle for
+the corresponding namespace of the process specified by
+.IR pid .
+As long as this file descriptor remains open,
+the namespace will remain alive,
+even if all processes in the namespace terminate.
+The file descriptor can be passed to
+.BR setns (2).
+.P
+In Linux 3.7 and earlier, these files were visible as hard links.
+Since Linux 3.8,
+.\" commit bf056bfa80596a5d14b26b17276a56a0dcb080e5
+they appear as symbolic links.
+If two processes are in the same namespace,
+then the device IDs and inode numbers of their
+.IR /proc/ pid /ns/ xxx
+symbolic links will be the same; an application can check this using the
+.I stat.st_dev
+.\" Eric Biederman: "I reserve the right for st_dev to be significant
+.\" when comparing namespaces."
+.\" https://lore.kernel.org/lkml/87poky5ca9.fsf@xmission.com/
+.\" Re: Documenting the ioctl interfaces to discover relationships...
+.\" Date: Mon, 12 Dec 2016 11:30:38 +1300
+and
+.I stat.st_ino
+fields returned by
+.BR stat (2).
+The content of this symbolic link is a string containing
+the namespace type and inode number as in the following example:
+.P
+.in +4n
+.EX
+$ \fBreadlink /proc/$$/ns/uts\fP
+uts:[4026531838]
+.EE
+.in
+.P
+The symbolic links in this subdirectory are as follows:
+.TP
+.IR /proc/ pid /ns/cgroup " (since Linux 4.6)"
+This file is a handle for the cgroup namespace of the process.
+.TP
+.IR /proc/ pid /ns/ipc " (since Linux 3.0)"
+This file is a handle for the IPC namespace of the process.
+.TP
+.IR /proc/ pid /ns/mnt " (since Linux 3.8)"
+.\" commit 8823c079ba7136dc1948d6f6dcb5f8022bde438e
+This file is a handle for the mount namespace of the process.
+.TP
+.IR /proc/ pid /ns/net " (since Linux 3.0)"
+This file is a handle for the network namespace of the process.
+.TP
+.IR /proc/ pid /ns/pid " (since Linux 3.8)"
+.\" commit 57e8391d327609cbf12d843259c968b9e5c1838f
+This file is a handle for the PID namespace of the process.
+This handle is permanent for the lifetime of the process
+(i.e., a process's PID namespace membership never changes).
+.TP
+.IR /proc/ pid /ns/pid_for_children " (since Linux 4.12)"
+.\" commit eaa0d190bfe1ed891b814a52712dcd852554cb08
+This file is a handle for the PID namespace of
+child processes created by this process.
+This can change as a consequence of calls to
+.BR unshare (2)
+and
+.BR setns (2)
+(see
+.BR pid_namespaces (7)),
+so the file may differ from
+.IR /proc/ pid /ns/pid .
+The symbolic link gains a value only after the first child process
+is created in the namespace.
+(Beforehand,
+.BR readlink (2)
+of the symbolic link will return an empty buffer.)
+.TP
+.IR /proc/ pid /ns/time " (since Linux 5.6)"
+This file is a handle for the time namespace of the process.
+.TP
+.IR /proc/ pid /ns/time_for_children " (since Linux 5.6)"
+This file is a handle for the time namespace of
+child processes created by this process.
+This can change as a consequence of calls to
+.BR unshare (2)
+and
+.BR setns (2)
+(see
+.BR time_namespaces (7)),
+so the file may differ from
+.IR /proc/ pid /ns/time .
+.TP
+.IR /proc/ pid /ns/user " (since Linux 3.8)"
+.\" commit cde1975bc242f3e1072bde623ef378e547b73f91
+This file is a handle for the user namespace of the process.
+.TP
+.IR /proc/ pid /ns/uts " (since Linux 3.0)"
+This file is a handle for the UTS namespace of the process.
+.P
+Permission to dereference or read
+.RB ( readlink (2))
+these symbolic links is governed by a ptrace access mode
+.B PTRACE_MODE_READ_FSCREDS
+check; see
+.BR ptrace (2).
+.\"
+.\" ==================== The /proc/sys/user directory ====================
+.\"
+.SS The \fI/proc/sys/user\fP directory
+The files in the
+.I /proc/sys/user
+directory (which is present since Linux 4.9) expose limits
+on the number of namespaces of various types that can be created.
+The files are as follows:
+.TP
+.I max_cgroup_namespaces
+The value in this file defines a per-user limit on the number of
+cgroup namespaces that may be created in the user namespace.
+.TP
+.I max_ipc_namespaces
+The value in this file defines a per-user limit on the number of
+ipc namespaces that may be created in the user namespace.
+.TP
+.I max_mnt_namespaces
+The value in this file defines a per-user limit on the number of
+mount namespaces that may be created in the user namespace.
+.TP
+.I max_net_namespaces
+The value in this file defines a per-user limit on the number of
+network namespaces that may be created in the user namespace.
+.TP
+.I max_pid_namespaces
+The value in this file defines a per-user limit on the number of
+PID namespaces that may be created in the user namespace.
+.TP
+.IR max_time_namespaces " (since Linux 5.7)"
+.\" commit eeec26d5da8248ea4e240b8795bb4364213d3247
+The value in this file defines a per-user limit on the number of
+time namespaces that may be created in the user namespace.
+.TP
+.I max_user_namespaces
+The value in this file defines a per-user limit on the number of
+user namespaces that may be created in the user namespace.
+.TP
+.I max_uts_namespaces
+The value in this file defines a per-user limit on the number of
+uts namespaces that may be created in the user namespace.
+.P
+Note the following details about these files:
+.IP \[bu] 3
+The values in these files are modifiable by privileged processes.
+.IP \[bu]
+The values exposed by these files are the limits for the user namespace
+in which the opening process resides.
+.IP \[bu]
+The limits are per-user.
+Each user in the same user namespace
+can create namespaces up to the defined limit.
+.IP \[bu]
+The limits apply to all users, including UID 0.
+.IP \[bu]
+These limits apply in addition to any other per-namespace
+limits (such as those for PID and user namespaces) that may be enforced.
+.IP \[bu]
+Upon encountering these limits,
+.BR clone (2)
+and
+.BR unshare (2)
+fail with the error
+.BR ENOSPC .
+.IP \[bu]
+For the initial user namespace,
+the default value in each of these files is half the limit on the number
+of threads that may be created
+.RI ( /proc/sys/kernel/threads\-max ).
+In all descendant user namespaces, the default value in each file is
+.BR MAXINT .
+.IP \[bu]
+When a namespace is created, the object is also accounted
+against ancestor namespaces.
+More precisely:
+.RS
+.IP \[bu] 3
+Each user namespace has a creator UID.
+.IP \[bu]
+When a namespace is created,
+it is accounted against the creator UIDs in each of the
+ancestor user namespaces,
+and the kernel ensures that the corresponding namespace limit
+for the creator UID in the ancestor namespace is not exceeded.
+.IP \[bu]
+The aforementioned point ensures that creating a new user namespace
+cannot be used as a means to escape the limits in force
+in the current user namespace.
+.RE
+.\"
+.SS Namespace lifetime
+Absent any other factors,
+a namespace is automatically torn down when the last process in
+the namespace terminates or leaves the namespace.
+However, there are a number of other factors that may pin
+a namespace into existence even though it has no member processes.
+These factors include the following:
+.IP \[bu] 3
+An open file descriptor or a bind mount exists for the corresponding
+.IR /proc/ pid /ns/*
+file.
+.IP \[bu]
+The namespace is hierarchical (i.e., a PID or user namespace),
+and has a child namespace.
+.IP \[bu]
+It is a user namespace that owns one or more nonuser namespaces.
+.IP \[bu]
+It is a PID namespace,
+and there is a process that refers to the namespace via a
+.IR /proc/ pid /ns/pid_for_children
+symbolic link.
+.IP \[bu]
+It is a time namespace,
+and there is a process that refers to the namespace via a
+.IR /proc/ pid /ns/time_for_children
+symbolic link.
+.IP \[bu]
+It is an IPC namespace, and a corresponding mount of an
+.I mqueue
+filesystem (see
+.BR mq_overview (7))
+refers to this namespace.
+.IP \[bu]
+It is a PID namespace, and a corresponding mount of a
+.BR proc (5)
+filesystem refers to this namespace.
+.SH EXAMPLES
+See
+.BR clone (2)
+and
+.BR user_namespaces (7).
+.SH SEE ALSO
+.BR nsenter (1),
+.BR readlink (1),
+.BR unshare (1),
+.BR clone (2),
+.BR ioctl_ns (2),
+.BR setns (2),
+.BR unshare (2),
+.BR proc (5),
+.BR capabilities (7),
+.BR cgroup_namespaces (7),
+.BR cgroups (7),
+.BR credentials (7),
+.BR ipc_namespaces (7),
+.BR network_namespaces (7),
+.BR pid_namespaces (7),
+.BR user_namespaces (7),
+.BR uts_namespaces (7),
+.BR lsns (8),
+.BR switch_root (8)