diff options
author | Alejandro Colomar <alx@kernel.org> | 2023-08-15 20:35:38 +0200 |
---|---|---|
committer | Alejandro Colomar <alx@kernel.org> | 2023-08-15 23:19:17 +0200 |
commit | bfc1299e7d6794265dc89806f601737d2bb958dc (patch) | |
tree | 160de735507a3f988d2dcc4d05a17e2c8c7d8bfa /man5 | |
parent | 30bcb8114ea32e33b99caacc018c68a5893468dc (diff) |
proc.5, proc_sys.5: Split /proc/sys/ from proc(5)
Signed-off-by: Alejandro Colomar <alx@kernel.org>
Diffstat (limited to 'man5')
-rw-r--r-- | man5/proc.5 | 1610 | ||||
-rw-r--r-- | man5/proc_sys.5 | 1623 |
2 files changed, 1623 insertions, 1610 deletions
diff --git a/man5/proc.5 b/man5/proc.5 index d7d706e5f..de29415eb 100644 --- a/man5/proc.5 +++ b/man5/proc.5 @@ -1,7 +1,6 @@ '\" t .\" Copyright (C) 1994, 1995, Daniel Quinlan <quinlan@yggdrasil.com> .\" Copyright (C) 2002-2008, 2017, Michael Kerrisk <mtk.manpages@gmail.com> -.\" and sysctl additions from Andries Brouwer (aeb@cwi.nl) .\" and System V IPC (as well as various other) additions from .\" Michael Kerrisk <mtk.manpages@gmail.com> .\" @@ -228,1615 +227,6 @@ hierarchy. .\" FIXME Document /proc/sched_debug (since Linux 2.6.23) .\" See also /proc/[pid]/sched .TP -.I /proc/sys -This directory (present since Linux 1.3.57) contains a number of files -and subdirectories corresponding to kernel variables. -These variables can be read and in some cases modified using -the \fI/proc\fP filesystem, and the (deprecated) -.BR sysctl (2) -system call. -.IP -String values may be terminated by either \[aq]\e0\[aq] or \[aq]\en\[aq]. -.IP -Integer and long values may be written either in decimal or in -hexadecimal notation (e.g., 0x3FFF). -When writing multiple integer or long values, these may be separated -by any of the following whitespace characters: -\[aq]\ \[aq], \[aq]\et\[aq], or \[aq]\en\[aq]. -Using other separators leads to the error -.BR EINVAL . -.TP -.IR /proc/sys/abi " (since Linux 2.4.10)" -This directory may contain files with application binary information. -.\" On some systems, it is not present. -See the Linux kernel source file -.I Documentation/sysctl/abi.rst -(or -.I Documentation/sysctl/abi.txt -before Linux 5.3) -for more information. -.TP -.I /proc/sys/debug -This directory may be empty. -.TP -.I /proc/sys/dev -This directory contains device-specific information (e.g., -.IR dev/cdrom/info ). -On -some systems, it may be empty. -.TP -.I /proc/sys/fs -This directory contains the files and subdirectories for kernel variables -related to filesystems. -.TP -.IR /proc/sys/fs/aio\-max\-nr " and " /proc/sys/fs/aio\-nr " (since Linux 2.6.4)" -.I aio\-nr -is the running total of the number of events specified by -.BR io_setup (2) -calls for all currently active AIO contexts. -If -.I aio\-nr -reaches -.IR aio\-max\-nr , -then -.BR io_setup (2) -will fail with the error -.BR EAGAIN . -Raising -.I aio\-max\-nr -does not result in the preallocation or resizing -of any kernel data structures. -.TP -.I /proc/sys/fs/binfmt_misc -Documentation for files in this directory can be found -in the Linux kernel source in the file -.I Documentation/admin\-guide/binfmt\-misc.rst -(or in -.I Documentation/binfmt_misc.txt -on older kernels). -.TP -.IR /proc/sys/fs/dentry\-state " (since Linux 2.2)" -This file contains information about the status of the -directory cache (dcache). -The file contains six numbers, -.IR nr_dentry , -.IR nr_unused , -.I age_limit -(age in seconds), -.I want_pages -(pages requested by system) and two dummy values. -.RS -.IP \[bu] 3 -.I nr_dentry -is the number of allocated dentries (dcache entries). -This field is unused in Linux 2.2. -.IP \[bu] -.I nr_unused -is the number of unused dentries. -.IP \[bu] -.I age_limit -.\" looks like this is unused in Linux 2.2 to Linux 2.6 -is the age in seconds after which dcache entries -can be reclaimed when memory is short. -.IP \[bu] -.I want_pages -.\" looks like this is unused in Linux 2.2 to Linux 2.6 -is nonzero when the kernel has called shrink_dcache_pages() and the -dcache isn't pruned yet. -.RE -.TP -.I /proc/sys/fs/dir\-notify\-enable -This file can be used to disable or enable the -.I dnotify -interface described in -.BR fcntl (2) -on a system-wide basis. -A value of 0 in this file disables the interface, -and a value of 1 enables it. -.TP -.I /proc/sys/fs/dquot\-max -This file shows the maximum number of cached disk quota entries. -On some (2.4) systems, it is not present. -If the number of free cached disk quota entries is very low and -you have some awesome number of simultaneous system users, -you might want to raise the limit. -.TP -.I /proc/sys/fs/dquot\-nr -This file shows the number of allocated disk quota -entries and the number of free disk quota entries. -.TP -.IR /proc/sys/fs/epoll " (since Linux 2.6.28)" -This directory contains the file -.IR max_user_watches , -which can be used to limit the amount of kernel memory consumed by the -.I epoll -interface. -For further details, see -.BR epoll (7). -.TP -.I /proc/sys/fs/file\-max -This file defines -a system-wide limit on the number of open files for all processes. -System calls that fail when encountering this limit fail with the error -.BR ENFILE . -(See also -.BR setrlimit (2), -which can be used by a process to set the per-process limit, -.BR RLIMIT_NOFILE , -on the number of files it may open.) -If you get lots -of error messages in the kernel log about running out of file handles -(open file descriptions) -(look for "VFS: file\-max limit <number> reached"), -try increasing this value: -.IP -.in +4n -.EX -echo 100000 > /proc/sys/fs/file\-max -.EE -.in -.IP -Privileged processes -.RB ( CAP_SYS_ADMIN ) -can override the -.I file\-max -limit. -.TP -.I /proc/sys/fs/file\-nr -This (read-only) file contains three numbers: -the number of allocated file handles -(i.e., the number of open file descriptions; see -.BR open (2)); -the number of free file handles; -and the maximum number of file handles (i.e., the same value as -.IR /proc/sys/fs/file\-max ). -If the number of allocated file handles is close to the -maximum, you should consider increasing the maximum. -Before Linux 2.6, -the kernel allocated file handles dynamically, -but it didn't free them again. -Instead the free file handles were kept in a list for reallocation; -the "free file handles" value indicates the size of that list. -A large number of free file handles indicates that there was -a past peak in the usage of open file handles. -Since Linux 2.6, the kernel does deallocate freed file handles, -and the "free file handles" value is always zero. -.TP -.IR /proc/sys/fs/inode\-max " (only present until Linux 2.2)" -This file contains the maximum number of in-memory inodes. -This value should be 3\[en]4 times larger -than the value in -.IR file\-max , -since \fIstdin\fP, \fIstdout\fP -and network sockets also need an inode to handle them. -When you regularly run out of inodes, you need to increase this value. -.IP -Starting with Linux 2.4, -there is no longer a static limit on the number of inodes, -and this file is removed. -.TP -.I /proc/sys/fs/inode\-nr -This file contains the first two values from -.IR inode\-state . -.TP -.I /proc/sys/fs/inode\-state -This file -contains seven numbers: -.IR nr_inodes , -.IR nr_free_inodes , -.IR preshrink , -and four dummy values (always zero). -.IP -.I nr_inodes -is the number of inodes the system has allocated. -.\" This can be slightly more than -.\" .I inode\-max -.\" because Linux allocates them one page full at a time. -.I nr_free_inodes -represents the number of free inodes. -.IP -.I preshrink -is nonzero when the -.I nr_inodes -> -.I inode\-max -and the system needs to prune the inode list instead of allocating more; -since Linux 2.4, this field is a dummy value (always zero). -.TP -.IR /proc/sys/fs/inotify " (since Linux 2.6.13)" -This directory contains files -.IR max_queued_events ", " max_user_instances ", and " max_user_watches , -that can be used to limit the amount of kernel memory consumed by the -.I inotify -interface. -For further details, see -.BR inotify (7). -.TP -.I /proc/sys/fs/lease\-break\-time -This file specifies the grace period that the kernel grants to a process -holding a file lease -.RB ( fcntl (2)) -after it has sent a signal to that process notifying it -that another process is waiting to open the file. -If the lease holder does not remove or downgrade the lease within -this grace period, the kernel forcibly breaks the lease. -.TP -.I /proc/sys/fs/leases\-enable -This file can be used to enable or disable file leases -.RB ( fcntl (2)) -on a system-wide basis. -If this file contains the value 0, leases are disabled. -A nonzero value enables leases. -.TP -.IR /proc/sys/fs/mount\-max " (since Linux 4.9)" -.\" commit d29216842a85c7970c536108e093963f02714498 -The value in this file specifies the maximum number of mounts that may exist -in a mount namespace. -The default value in this file is 100,000. -.TP -.IR /proc/sys/fs/mqueue " (since Linux 2.6.6)" -This directory contains files -.IR msg_max ", " msgsize_max ", and " queues_max , -controlling the resources used by POSIX message queues. -See -.BR mq_overview (7) -for details. -.TP -.IR /proc/sys/fs/nr_open " (since Linux 2.6.25)" -.\" commit 9cfe015aa424b3c003baba3841a60dd9b5ad319b -This file imposes a ceiling on the value to which the -.B RLIMIT_NOFILE -resource limit can be raised (see -.BR getrlimit (2)). -This ceiling is enforced for both unprivileged and privileged process. -The default value in this file is 1048576. -(Before Linux 2.6.25, the ceiling for -.B RLIMIT_NOFILE -was hard-coded to the same value.) -.TP -.IR /proc/sys/fs/overflowgid " and " /proc/sys/fs/overflowuid -These files -allow you to change the value of the fixed UID and GID. -The default is 65534. -Some filesystems support only 16-bit UIDs and GIDs, although in Linux -UIDs and GIDs are 32 bits. -When one of these filesystems is mounted -with writes enabled, any UID or GID that would exceed 65535 is translated -to the overflow value before being written to disk. -.TP -.IR /proc/sys/fs/pipe\-max\-size " (since Linux 2.6.35)" -See -.BR pipe (7). -.TP -.IR /proc/sys/fs/pipe\-user\-pages\-hard " (since Linux 4.5)" -See -.BR pipe (7). -.TP -.IR /proc/sys/fs/pipe\-user\-pages\-soft " (since Linux 4.5)" -See -.BR pipe (7). -.TP -.IR /proc/sys/fs/protected_fifos " (since Linux 4.19)" -The value in this file is/can be set to one of the following: -.RS -.TP 4 -0 -Writing to FIFOs is unrestricted. -.TP -1 -Don't allow -.B O_CREAT -.BR open (2) -on FIFOs that the caller doesn't own in world-writable sticky directories, -unless the FIFO is owned by the owner of the directory. -.TP -2 -As for the value 1, -but the restriction also applies to group-writable sticky directories. -.RE -.IP -The intent of the above protections is to avoid unintentional writes to an -attacker-controlled FIFO when a program expected to create a regular file. -.TP -.IR /proc/sys/fs/protected_hardlinks " (since Linux 3.6)" -.\" commit 800179c9b8a1e796e441674776d11cd4c05d61d7 -When the value in this file is 0, -no restrictions are placed on the creation of hard links -(i.e., this is the historical behavior before Linux 3.6). -When the value in this file is 1, -a hard link can be created to a target file -only if one of the following conditions is true: -.RS -.IP \[bu] 3 -The calling process has the -.B CAP_FOWNER -capability in its user namespace -and the file UID has a mapping in the namespace. -.IP \[bu] -The filesystem UID of the process creating the link matches -the owner (UID) of the target file -(as described in -.BR credentials (7), -a process's filesystem UID is normally the same as its effective UID). -.IP \[bu] -All of the following conditions are true: -.RS 4 -.IP \[bu] 3 -the target is a regular file; -.IP \[bu] -the target file does not have its set-user-ID mode bit enabled; -.IP \[bu] -the target file does not have both its set-group-ID and -group-executable mode bits enabled; and -.IP \[bu] -the caller has permission to read and write the target file -(either via the file's permissions mask or because it has -suitable capabilities). -.RE -.RE -.IP -The default value in this file is 0. -Setting the value to 1 -prevents a longstanding class of security issues caused by -hard-link-based time-of-check, time-of-use races, -most commonly seen in world-writable directories such as -.IR /tmp . -The common method of exploiting this flaw -is to cross privilege boundaries when following a given hard link -(i.e., a root process follows a hard link created by another user). -Additionally, on systems without separated partitions, -this stops unauthorized users from "pinning" vulnerable set-user-ID and -set-group-ID files against being upgraded by -the administrator, or linking to special files. -.TP -.IR /proc/sys/fs/protected_regular " (since Linux 4.19)" -The value in this file is/can be set to one of the following: -.RS -.TP 4 -0 -Writing to regular files is unrestricted. -.TP -1 -Don't allow -.B O_CREAT -.BR open (2) -on regular files that the caller doesn't own in -world-writable sticky directories, -unless the regular file is owned by the owner of the directory. -.TP -2 -As for the value 1, -but the restriction also applies to group-writable sticky directories. -.RE -.IP -The intent of the above protections is similar to -.IR protected_fifos , -but allows an application to -avoid writes to an attacker-controlled regular file, -where the application expected to create one. -.TP -.IR /proc/sys/fs/protected_symlinks " (since Linux 3.6)" -.\" commit 800179c9b8a1e796e441674776d11cd4c05d61d7 -When the value in this file is 0, -no restrictions are placed on following symbolic links -(i.e., this is the historical behavior before Linux 3.6). -When the value in this file is 1, symbolic links are followed only -in the following circumstances: -.RS -.IP \[bu] 3 -the filesystem UID of the process following the link matches -the owner (UID) of the symbolic link -(as described in -.BR credentials (7), -a process's filesystem UID is normally the same as its effective UID); -.IP \[bu] -the link is not in a sticky world-writable directory; or -.IP \[bu] -the symbolic link and its parent directory have the same owner (UID) -.RE -.IP -A system call that fails to follow a symbolic link -because of the above restrictions returns the error -.B EACCES -in -.IR errno . -.IP -The default value in this file is 0. -Setting the value to 1 avoids a longstanding class of security issues -based on time-of-check, time-of-use races when accessing symbolic links. -.TP -.IR /proc/sys/fs/suid_dumpable " (since Linux 2.6.13)" -.\" The following is based on text from Documentation/sysctl/kernel.txt -The value in this file is assigned to a process's "dumpable" flag -in the circumstances described in -.BR prctl (2). -In effect, -the value in this file determines whether core dump files are -produced for set-user-ID or otherwise protected/tainted binaries. -The "dumpable" setting also affects the ownership of files in a process's -.IR /proc/ pid -directory, as described above. -.IP -Three different integer values can be specified: -.RS -.TP -\fI0\ (default)\fP -.\" In kernel source: SUID_DUMP_DISABLE -This provides the traditional (pre-Linux 2.6.13) behavior. -A core dump will not be produced for a process which has -changed credentials (by calling -.BR seteuid (2), -.BR setgid (2), -or similar, or by executing a set-user-ID or set-group-ID program) -or whose binary does not have read permission enabled. -.TP -\fI1\ ("debug")\fP -.\" In kernel source: SUID_DUMP_USER -All processes dump core when possible. -(Reasons why a process might nevertheless not dump core are described in -.BR core (5).) -The core dump is owned by the filesystem user ID of the dumping process -and no security is applied. -This is intended for system debugging situations only: -this mode is insecure because it allows unprivileged users to -examine the memory contents of privileged processes. -.TP -\fI2\ ("suidsafe")\fP -.\" In kernel source: SUID_DUMP_ROOT -Any binary which normally would not be dumped (see "0" above) -is dumped readable by root only. -This allows the user to remove the core dump file but not to read it. -For security reasons core dumps in this mode will not overwrite one -another or other files. -This mode is appropriate when administrators are -attempting to debug problems in a normal environment. -.IP -Additionally, since Linux 3.6, -.\" 9520628e8ceb69fa9a4aee6b57f22675d9e1b709 -.I /proc/sys/kernel/core_pattern -must either be an absolute pathname -or a pipe command, as detailed in -.BR core (5). -Warnings will be written to the kernel log if -.I core_pattern -does not follow these rules, and no core dump will be produced. -.\" 54b501992dd2a839e94e76aa392c392b55080ce8 -.RE -.IP -For details of the effect of a process's "dumpable" setting -on ptrace access mode checking, see -.BR ptrace (2). -.TP -.I /proc/sys/fs/super\-max -This file -controls the maximum number of superblocks, and -thus the maximum number of mounted filesystems the kernel -can have. -You need increase only -.I super\-max -if you need to mount more filesystems than the current value in -.I super\-max -allows you to. -.TP -.I /proc/sys/fs/super\-nr -This file -contains the number of filesystems currently mounted. -.TP -.I /proc/sys/kernel -This directory contains files controlling a range of kernel parameters, -as described below. -.TP -.I /proc/sys/kernel/acct -This file -contains three numbers: -.IR highwater , -.IR lowwater , -and -.IR frequency . -If BSD-style process accounting is enabled, these values control -its behavior. -If free space on filesystem where the log lives goes below -.I lowwater -percent, accounting suspends. -If free space gets above -.I highwater -percent, accounting resumes. -.I frequency -determines -how often the kernel checks the amount of free space (value is in -seconds). -Default values are 4, 2, and 30. -That is, suspend accounting if 2% or less space is free; resume it -if 4% or more space is free; consider information about amount of free space -valid for 30 seconds. -.TP -.IR /proc/sys/kernel/auto_msgmni " (Linux 2.6.27 to Linux 3.18)" -.\" commit 9eefe520c814f6f62c5d36a2ddcd3fb99dfdb30e (introduces feature) -.\" commit 0050ee059f7fc86b1df2527aaa14ed5dc72f9973 (rendered redundant) -From Linux 2.6.27 to Linux 3.18, -this file was used to control recomputing of the value in -.I /proc/sys/kernel/msgmni -upon the addition or removal of memory or upon IPC namespace creation/removal. -Echoing "1" into this file enabled -.I msgmni -automatic recomputing (and triggered a recomputation of -.I msgmni -based on the current amount of available memory and number of IPC namespaces). -Echoing "0" disabled automatic recomputing. -(Automatic recomputing was also disabled if a value was explicitly assigned to -.IR /proc/sys/kernel/msgmni .) -The default value in -.I auto_msgmni -was 1. -.IP -Since Linux 3.19, the content of this file has no effect (because -.I msgmni -.\" FIXME Must document the 3.19 'msgmni' changes. -defaults to near the maximum value possible), -and reads from this file always return the value "0". -.TP -.IR /proc/sys/kernel/cap_last_cap " (since Linux 3.2)" -See -.BR capabilities (7). -.TP -.IR /proc/sys/kernel/cap\-bound " (from Linux 2.2 to Linux 2.6.24)" -This file holds the value of the kernel -.I "capability bounding set" -(expressed as a signed decimal number). -This set is ANDed against the capabilities permitted to a process -during -.BR execve (2). -Starting with Linux 2.6.25, -the system-wide capability bounding set disappeared, -and was replaced by a per-thread bounding set; see -.BR capabilities (7). -.TP -.I /proc/sys/kernel/core_pattern -See -.BR core (5). -.TP -.I /proc/sys/kernel/core_pipe_limit -See -.BR core (5). -.TP -.I /proc/sys/kernel/core_uses_pid -See -.BR core (5). -.TP -.I /proc/sys/kernel/ctrl\-alt\-del -This file -controls the handling of Ctrl-Alt-Del from the keyboard. -When the value in this file is 0, Ctrl-Alt-Del is trapped and -sent to the -.BR init (1) -program to handle a graceful restart. -When the value is greater than zero, Linux's reaction to a Vulcan -Nerve Pinch (tm) will be an immediate reboot, without even -syncing its dirty buffers. -Note: when a program (like dosemu) has the keyboard in "raw" -mode, the Ctrl-Alt-Del is intercepted by the program before it -ever reaches the kernel tty layer, and it's up to the program -to decide what to do with it. -.TP -.IR /proc/sys/kernel/dmesg_restrict " (since Linux 2.6.37)" -The value in this file determines who can see kernel syslog contents. -A value of 0 in this file imposes no restrictions. -If the value is 1, only privileged users can read the kernel syslog. -(See -.BR syslog (2) -for more details.) -Since Linux 3.4, -.\" commit 620f6e8e855d6d447688a5f67a4e176944a084e8 -only users with the -.B CAP_SYS_ADMIN -capability may change the value in this file. -.TP -.IR /proc/sys/kernel/domainname " and " /proc/sys/kernel/hostname -can be used to set the NIS/YP domainname and the -hostname of your box in exactly the same way as the commands -.BR domainname (1) -and -.BR hostname (1), -that is: -.IP -.in +4n -.EX -.RB "#" " echo \[aq]darkstar\[aq] > /proc/sys/kernel/hostname" -.RB "#" " echo \[aq]mydomain\[aq] > /proc/sys/kernel/domainname" -.EE -.in -.IP -has the same effect as -.IP -.in +4n -.EX -.RB "#" " hostname \[aq]darkstar\[aq]" -.RB "#" " domainname \[aq]mydomain\[aq]" -.EE -.in -.IP -Note, however, that the classic darkstar.frop.org has the -hostname "darkstar" and DNS (Internet Domain Name Server) -domainname "frop.org", not to be confused with the NIS (Network -Information Service) or YP (Yellow Pages) domainname. -These two -domain names are in general different. -For a detailed discussion -see the -.BR hostname (1) -man page. -.TP -.I /proc/sys/kernel/hotplug -This file -contains the pathname for the hotplug policy agent. -The default value in this file is -.IR /sbin/hotplug . -.TP -.\" Removed in commit 87f504e5c78b910b0c1d6ffb89bc95e492322c84 (tglx/history.git) -.IR /proc/sys/kernel/htab\-reclaim " (before Linux 2.4.9.2)" -(PowerPC only) If this file is set to a nonzero value, -the PowerPC htab -.\" removed in commit 1b483a6a7b2998e9c98ad985d7494b9b725bd228, before Linux 2.6.28 -(see kernel file -.IR Documentation/powerpc/ppc_htab.txt ) -is pruned -each time the system hits the idle loop. -.TP -.I /proc/sys/kernel/keys/* -This directory contains various files that define parameters and limits -for the key-management facility. -These files are described in -.BR keyrings (7). -.TP -.IR /proc/sys/kernel/kptr_restrict " (since Linux 2.6.38)" -.\" 455cd5ab305c90ffc422dd2e0fb634730942b257 -The value in this file determines whether kernel addresses are exposed via -.I /proc -files and other interfaces. -A value of 0 in this file imposes no restrictions. -If the value is 1, kernel pointers printed using the -.I %pK -format specifier will be replaced with zeros unless the user has the -.B CAP_SYSLOG -capability. -If the value is 2, kernel pointers printed using the -.I %pK -format specifier will be replaced with zeros regardless -of the user's capabilities. -The initial default value for this file was 1, -but the default was changed -.\" commit 411f05f123cbd7f8aa1edcae86970755a6e2a9d9 -to 0 in Linux 2.6.39. -Since Linux 3.4, -.\" commit 620f6e8e855d6d447688a5f67a4e176944a084e8 -only users with the -.B CAP_SYS_ADMIN -capability can change the value in this file. -.TP -.I /proc/sys/kernel/l2cr -(PowerPC only) This file -contains a flag that controls the L2 cache of G3 processor -boards. -If 0, the cache is disabled. -Enabled if nonzero. -.TP -.I /proc/sys/kernel/modprobe -This file contains the pathname for the kernel module loader. -The default value is -.IR /sbin/modprobe . -The file is present only if the kernel is built with the -.B CONFIG_MODULES -.RB ( CONFIG_KMOD -in Linux 2.6.26 and earlier) -option enabled. -It is described by the Linux kernel source file -.I Documentation/kmod.txt -(present only in Linux 2.4 and earlier). -.TP -.IR /proc/sys/kernel/modules_disabled " (since Linux 2.6.31)" -.\" 3d43321b7015387cfebbe26436d0e9d299162ea1 -.\" From Documentation/sysctl/kernel.txt -A toggle value indicating if modules are allowed to be loaded -in an otherwise modular kernel. -This toggle defaults to off (0), but can be set true (1). -Once true, modules can be neither loaded nor unloaded, -and the toggle cannot be set back to false. -The file is present only if the kernel is built with the -.B CONFIG_MODULES -option enabled. -.TP -.IR /proc/sys/kernel/msgmax " (since Linux 2.2)" -This file defines -a system-wide limit specifying the maximum number of bytes in -a single message written on a System V message queue. -.TP -.IR /proc/sys/kernel/msgmni " (since Linux 2.4)" -This file defines the system-wide limit on the number of -message queue identifiers. -See also -.IR /proc/sys/kernel/auto_msgmni . -.TP -.IR /proc/sys/kernel/msgmnb " (since Linux 2.2)" -This file defines a system-wide parameter used to initialize the -.I msg_qbytes -setting for subsequently created message queues. -The -.I msg_qbytes -setting specifies the maximum number of bytes that may be written to the -message queue. -.TP -.IR /proc/sys/kernel/ngroups_max " (since Linux 2.6.4)" -This is a read-only file that displays the upper limit on the -number of a process's group memberships. -.TP -.IR /proc/sys/kernel/ns_last_pid " (since Linux 3.3)" -See -.BR pid_namespaces (7). -.TP -.IR /proc/sys/kernel/ostype " and " /proc/sys/kernel/osrelease -These files -give substrings of -.IR /proc/version . -.TP -.IR /proc/sys/kernel/overflowgid " and " /proc/sys/kernel/overflowuid -These files duplicate the files -.I /proc/sys/fs/overflowgid -and -.IR /proc/sys/fs/overflowuid . -.TP -.I /proc/sys/kernel/panic -This file gives read/write access to the kernel variable -.IR panic_timeout . -If this is zero, the kernel will loop on a panic; if nonzero, -it indicates that the kernel should autoreboot after this number -of seconds. -When you use the -software watchdog device driver, the recommended setting is 60. -.TP -.IR /proc/sys/kernel/panic_on_oops " (since Linux 2.5.68)" -This file controls the kernel's behavior when an oops -or BUG is encountered. -If this file contains 0, then the system -tries to continue operation. -If it contains 1, then the system -delays a few seconds (to give klogd time to record the oops output) -and then panics. -If the -.I /proc/sys/kernel/panic -file is also nonzero, then the machine will be rebooted. -.TP -.IR /proc/sys/kernel/pid_max " (since Linux 2.5.34)" -This file specifies the value at which PIDs wrap around -(i.e., the value in this file is one greater than the maximum PID). -PIDs greater than this value are not allocated; -thus, the value in this file also acts as a system-wide limit -on the total number of processes and threads. -The default value for this file, 32768, -results in the same range of PIDs as on earlier kernels. -On 32-bit platforms, 32768 is the maximum value for -.IR pid_max . -On 64-bit systems, -.I pid_max -can be set to any value up to 2\[ha]22 -.RB ( PID_MAX_LIMIT , -approximately 4 million). -.\" Prior to Linux 2.6.10, pid_max could also be raised above 32768 on 32-bit -.\" platforms, but this broke /proc/[pid] -.\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=109513010926152&w=2 -.TP -.IR /proc/sys/kernel/powersave\-nap " (PowerPC only)" -This file contains a flag. -If set, Linux-PPC will use the "nap" mode of -powersaving, -otherwise the "doze" mode will be used. -.TP -.I /proc/sys/kernel/printk -See -.BR syslog (2). -.TP -.IR /proc/sys/kernel/pty " (since Linux 2.6.4)" -This directory contains two files relating to the number of UNIX 98 -pseudoterminals (see -.BR pts (4)) -on the system. -.TP -.I /proc/sys/kernel/pty/max -This file defines the maximum number of pseudoterminals. -.\" FIXME Document /proc/sys/kernel/pty/reserve -.\" New in Linux 3.3 -.\" commit e9aba5158a80098447ff207a452a3418ae7ee386 -.TP -.I /proc/sys/kernel/pty/nr -This read-only file -indicates how many pseudoterminals are currently in use. -.TP -.I /proc/sys/kernel/random -This directory -contains various parameters controlling the operation of the file -.IR /dev/random . -See -.BR random (4) -for further information. -.TP -.IR /proc/sys/kernel/random/uuid " (since Linux 2.4)" -Each read from this read-only file returns a randomly generated 128-bit UUID, -as a string in the standard UUID format. -.TP -.IR /proc/sys/kernel/randomize_va_space " (since Linux 2.6.12)" -.\" Some further details can be found in Documentation/sysctl/kernel.txt -Select the address space layout randomization (ASLR) policy for the system -(on architectures that support ASLR). -Three values are supported for this file: -.RS -.TP -.B 0 -Turn ASLR off. -This is the default for architectures that don't support ASLR, -and when the kernel is booted with the -.I norandmaps -parameter. -.TP -.B 1 -Make the addresses of -.BR mmap (2) -allocations, the stack, and the VDSO page randomized. -Among other things, this means that shared libraries will be -loaded at randomized addresses. -The text segment of PIE-linked binaries will also be loaded -at a randomized address. -This value is the default if the kernel was configured with -.BR CONFIG_COMPAT_BRK . -.TP -.B 2 -(Since Linux 2.6.25) -.\" commit c1d171a002942ea2d93b4fbd0c9583c56fce0772 -Also support heap randomization. -This value is the default if the kernel was not configured with -.BR CONFIG_COMPAT_BRK . -.RE -.TP -.I /proc/sys/kernel/real\-root\-dev -This file is documented in the Linux kernel source file -.I Documentation/admin\-guide/initrd.rst -.\" commit 9d85025b0418163fae079c9ba8f8445212de8568 -(or -.I Documentation/initrd.txt -before Linux 4.10). -.TP -.IR /proc/sys/kernel/reboot\-cmd " (Sparc only)" -This file seems to be a way to give an argument to the SPARC -ROM/Flash boot loader. -Maybe to tell it what to do after -rebooting? -.TP -.I /proc/sys/kernel/rtsig\-max -(Up to and including Linux 2.6.7; see -.BR setrlimit (2)) -This file can be used to tune the maximum number -of POSIX real-time (queued) signals that can be outstanding -in the system. -.TP -.I /proc/sys/kernel/rtsig\-nr -(Up to and including Linux 2.6.7.) -This file shows the number of POSIX real-time signals currently queued. -.TP -.IR /proc/ pid /sched_autogroup_enabled " (since Linux 2.6.38)" -.\" commit 5091faa449ee0b7d73bc296a93bca9540fc51d0a -See -.BR sched (7). -.TP -.IR /proc/sys/kernel/sched_child_runs_first " (since Linux 2.6.23)" -If this file contains the value zero, then, after a -.BR fork (2), -the parent is first scheduled on the CPU. -If the file contains a nonzero value, -then the child is scheduled first on the CPU. -(Of course, on a multiprocessor system, -the parent and the child might both immediately be scheduled on a CPU.) -.TP -.IR /proc/sys/kernel/sched_rr_timeslice_ms " (since Linux 3.9)" -See -.BR sched_rr_get_interval (2). -.TP -.IR /proc/sys/kernel/sched_rt_period_us " (since Linux 2.6.25)" -See -.BR sched (7). -.TP -.IR /proc/sys/kernel/sched_rt_runtime_us " (since Linux 2.6.25)" -See -.BR sched (7). -.TP -.IR /proc/sys/kernel/seccomp " (since Linux 4.14)" -.\" commit 8e5f1ad116df6b0de65eac458d5e7c318d1c05af -This directory provides additional seccomp information and -configuration. -See -.BR seccomp (2) -for further details. -.TP -.IR /proc/sys/kernel/sem " (since Linux 2.4)" -This file contains 4 numbers defining limits for System V IPC semaphores. -These fields are, in order: -.RS -.TP -SEMMSL -The maximum semaphores per semaphore set. -.TP -SEMMNS -A system-wide limit on the number of semaphores in all semaphore sets. -.TP -SEMOPM -The maximum number of operations that may be specified in a -.BR semop (2) -call. -.TP -SEMMNI -A system-wide limit on the maximum number of semaphore identifiers. -.RE -.TP -.I /proc/sys/kernel/sg\-big\-buff -This file -shows the size of the generic SCSI device (sg) buffer. -You can't tune it just yet, but you could change it at -compile time by editing -.I include/scsi/sg.h -and changing -the value of -.BR SG_BIG_BUFF . -However, there shouldn't be any reason to change this value. -.TP -.IR /proc/sys/kernel/shm_rmid_forced " (since Linux 3.1)" -.\" commit b34a6b1da371ed8af1221459a18c67970f7e3d53 -.\" See also Documentation/sysctl/kernel.txt -If this file is set to 1, all System V shared memory segments will -be marked for destruction as soon as the number of attached processes -falls to zero; -in other words, it is no longer possible to create shared memory segments -that exist independently of any attached process. -.IP -The effect is as though a -.BR shmctl (2) -.B IPC_RMID -is performed on all existing segments as well as all segments -created in the future (until this file is reset to 0). -Note that existing segments that are attached to no process will be -immediately destroyed when this file is set to 1. -Setting this option will also destroy segments that were created, -but never attached, -upon termination of the process that created the segment with -.BR shmget (2). -.IP -Setting this file to 1 provides a way of ensuring that -all System V shared memory segments are counted against the -resource usage and resource limits (see the description of -.B RLIMIT_AS -in -.BR getrlimit (2)) -of at least one process. -.IP -Because setting this file to 1 produces behavior that is nonstandard -and could also break existing applications, -the default value in this file is 0. -Set this file to 1 only if you have a good understanding -of the semantics of the applications using -System V shared memory on your system. -.TP -.IR /proc/sys/kernel/shmall " (since Linux 2.2)" -This file -contains the system-wide limit on the total number of pages of -System V shared memory. -.TP -.IR /proc/sys/kernel/shmmax " (since Linux 2.2)" -This file -can be used to query and set the run-time limit -on the maximum (System V IPC) shared memory segment size that can be -created. -Shared memory segments up to 1 GB are now supported in the -kernel. -This value defaults to -.BR SHMMAX . -.TP -.IR /proc/sys/kernel/shmmni " (since Linux 2.4)" -This file -specifies the system-wide maximum number of System V shared memory -segments that can be created. -.TP -.IR /proc/sys/kernel/sysctl_writes_strict " (since Linux 3.16)" -.\" commit f88083005ab319abba5d0b2e4e997558245493c8 -.\" commit 2ca9bb456ada8bcbdc8f77f8fc78207653bbaa92 -.\" commit f4aacea2f5d1a5f7e3154e967d70cf3f711bcd61 -.\" commit 24fe831c17ab8149413874f2fd4e5c8a41fcd294 -The value in this file determines how the file offset affects -the behavior of updating entries in files under -.IR /proc/sys . -The file has three possible values: -.RS -.TP 4 -\-1 -This provides legacy handling, with no printk warnings. -Each -.BR write (2) -must fully contain the value to be written, -and multiple writes on the same file descriptor -will overwrite the entire value, regardless of the file position. -.TP -0 -(default) This provides the same behavior as for \-1, -but printk warnings are written for processes that -perform writes when the file offset is not 0. -.TP -1 -Respect the file offset when writing strings into -.I /proc/sys -files. -Multiple writes will -.I append -to the value buffer. -Anything written beyond the maximum length -of the value buffer will be ignored. -Writes to numeric -.I /proc/sys -entries must always be at file offset 0 and the value must be -fully contained in the buffer provided to -.BR write (2). -.\" FIXME . -.\" With /proc/sys/kernel/sysctl_writes_strict==1, writes at an -.\" offset other than 0 do not generate an error. Instead, the -.\" write() succeeds, but the file is left unmodified. -.\" This is surprising. The behavior may change in the future. -.\" See thread.gmane.org/gmane.linux.man/9197 -.\" From: Michael Kerrisk (man-pages <mtk.manpages@...> -.\" Subject: sysctl_writes_strict documentation + an oddity? -.\" Newsgroups: gmane.linux.man, gmane.linux.kernel -.\" Date: 2015-05-09 08:54:11 GMT -.RE -.TP -.I /proc/sys/kernel/sysrq -This file controls the functions allowed to be invoked by the SysRq key. -By default, -the file contains 1 meaning that every possible SysRq request is allowed -(in older kernel versions, SysRq was disabled by default, -and you were required to specifically enable it at run-time, -but this is not the case any more). -Possible values in this file are: -.RS -.TP 5 -0 -Disable sysrq completely -.TP -1 -Enable all functions of sysrq -.TP -> 1 -Bit mask of allowed sysrq functions, as follows: -.PD 0 -.RS -.TP 5 -\ \ 2 -Enable control of console logging level -.TP -\ \ 4 -Enable control of keyboard (SAK, unraw) -.TP -\ \ 8 -Enable debugging dumps of processes etc. -.TP -\ 16 -Enable sync command -.TP -\ 32 -Enable remount read-only -.TP -\ 64 -Enable signaling of processes (term, kill, oom-kill) -.TP -128 -Allow reboot/poweroff -.TP -256 -Allow nicing of all real-time tasks -.RE -.PD -.RE -.IP -This file is present only if the -.B CONFIG_MAGIC_SYSRQ -kernel configuration option is enabled. -For further details see the Linux kernel source file -.I Documentation/admin\-guide/sysrq.rst -.\" commit 9d85025b0418163fae079c9ba8f8445212de8568 -(or -.I Documentation/sysrq.txt -before Linux 4.10). -.TP -.I /proc/sys/kernel/version -This file contains a string such as: -.IP -.in +4n -.EX -#5 Wed Feb 25 21:49:24 MET 1998 -.EE -.in -.IP -The "#5" means that -this is the fifth kernel built from this source base and the -date following it indicates the time the kernel was built. -.TP -.IR /proc/sys/kernel/threads\-max " (since Linux 2.3.11)" -.\" The following is based on Documentation/sysctl/kernel.txt -This file specifies the system-wide limit on the number of -threads (tasks) that can be created on the system. -.IP -Since Linux 4.1, -.\" commit 230633d109e35b0a24277498e773edeb79b4a331 -the value that can be written to -.I threads\-max -is bounded. -The minimum value that can be written is 20. -The maximum value that can be written is given by the -constant -.B FUTEX_TID_MASK -(0x3fffffff). -If a value outside of this range is written to -.IR threads\-max , -the error -.B EINVAL -occurs. -.IP -The value written is checked against the available RAM pages. -If the thread structures would occupy too much (more than 1/8th) -of the available RAM pages, -.I threads\-max -is reduced accordingly. -.TP -.IR /proc/sys/kernel/yama/ptrace_scope " (since Linux 3.5)" -See -.BR ptrace (2). -.TP -.IR /proc/sys/kernel/zero\-paged " (PowerPC only)" -This file -contains a flag. -When enabled (nonzero), Linux-PPC will pre-zero pages in -the idle loop, possibly speeding up get_free_pages. -.TP -.I /proc/sys/net -This directory contains networking stuff. -Explanations for some of the files under this directory can be found in -.BR tcp (7) -and -.BR ip (7). -.TP -.I /proc/sys/net/core/bpf_jit_enable -See -.BR bpf (2). -.TP -.I /proc/sys/net/core/somaxconn -This file defines a ceiling value for the -.I backlog -argument of -.BR listen (2); -see the -.BR listen (2) -manual page for details. -.TP -.I /proc/sys/proc -This directory may be empty. -.TP -.I /proc/sys/sunrpc -This directory supports Sun remote procedure call for network filesystem -(NFS). -On some systems, it is not present. -.TP -.IR /proc/sys/user " (since Linux 4.9)" -See -.BR namespaces (7). -.TP -.I /proc/sys/vm -This directory contains files for memory management tuning, buffer, and -cache management. -.TP -.IR /proc/sys/vm/admin_reserve_kbytes " (since Linux 3.10)" -.\" commit 4eeab4f5580d11bffedc697684b91b0bca0d5009 -This file defines the amount of free memory (in KiB) on the system that -should be reserved for users with the capability -.BR CAP_SYS_ADMIN . -.IP -The default value in this file is the minimum of [3% of free pages, 8MiB] -expressed as KiB. -The default is intended to provide enough for the superuser -to log in and kill a process, if necessary, -under the default overcommit 'guess' mode (i.e., 0 in -.IR /proc/sys/vm/overcommit_memory ). -.IP -Systems running in "overcommit never" mode (i.e., 2 in -.IR /proc/sys/vm/overcommit_memory ) -should increase the value in this file to account -for the full virtual memory size of the programs used to recover (e.g., -.BR login (1) -.BR ssh (1), -and -.BR top (1)) -Otherwise, the superuser may not be able to log in to recover the system. -For example, on x86-64 a suitable value is 131072 (128MiB reserved). -.IP -Changing the value in this file takes effect whenever -an application requests memory. -.TP -.IR /proc/sys/vm/compact_memory " (since Linux 2.6.35)" -When 1 is written to this file, all zones are compacted such that free -memory is available in contiguous blocks where possible. -The effect of this action can be seen by examining -.IR /proc/buddyinfo . -.IP -Present only if the kernel was configured with -.BR CONFIG_COMPACTION . -.TP -.IR /proc/sys/vm/drop_caches " (since Linux 2.6.16)" -Writing to this file causes the kernel to drop clean caches, dentries, and -inodes from memory, causing that memory to become free. -This can be useful for memory management testing and -performing reproducible filesystem benchmarks. -Because writing to this file causes the benefits of caching to be lost, -it can degrade overall system performance. -.IP -To free pagecache, use: -.IP -.in +4n -.EX -echo 1 > /proc/sys/vm/drop_caches -.EE -.in -.IP -To free dentries and inodes, use: -.IP -.in +4n -.EX -echo 2 > /proc/sys/vm/drop_caches -.EE -.in -.IP -To free pagecache, dentries, and inodes, use: -.IP -.in +4n -.EX -echo 3 > /proc/sys/vm/drop_caches -.EE -.in -.IP -Because writing to this file is a nondestructive operation and dirty objects -are not freeable, the -user should run -.BR sync (1) -first. -.TP -.IR /proc/sys/vm/sysctl_hugetlb_shm_group " (since Linux 2.6.7)" -This writable file contains a group ID that is allowed -to allocate memory using huge pages. -If a process has a filesystem group ID or any supplementary group ID that -matches this group ID, -then it can make huge-page allocations without holding the -.B CAP_IPC_LOCK -capability; see -.BR memfd_create (2), -.BR mmap (2), -and -.BR shmget (2). -.TP -.IR /proc/sys/vm/legacy_va_layout " (since Linux 2.6.9)" -.\" The following is from Documentation/filesystems/proc.txt -If nonzero, this disables the new 32-bit memory-mapping layout; -the kernel will use the legacy (2.4) layout for all processes. -.TP -.IR /proc/sys/vm/memory_failure_early_kill " (since Linux 2.6.32)" -.\" The following is based on the text in Documentation/sysctl/vm.txt -Control how to kill processes when an uncorrected memory error -(typically a 2-bit error in a memory module) -that cannot be handled by the kernel -is detected in the background by hardware. -In some cases (like the page still having a valid copy on disk), -the kernel will handle the failure -transparently without affecting any applications. -But if there is no other up-to-date copy of the data, -it will kill processes to prevent any data corruptions from propagating. -.IP -The file has one of the following values: -.RS -.TP -.B 1 -Kill all processes that have the corrupted-and-not-reloadable page mapped -as soon as the corruption is detected. -Note that this is not supported for a few types of pages, -such as kernel internally -allocated data or the swap cache, but works for the majority of user pages. -.TP -.B 0 -Unmap the corrupted page from all processes and kill a process -only if it tries to access the page. -.RE -.IP -The kill is performed using a -.B SIGBUS -signal with -.I si_code -set to -.BR BUS_MCEERR_AO . -Processes can handle this if they want to; see -.BR sigaction (2) -for more details. -.IP -This feature is active only on architectures/platforms with advanced machine -check handling and depends on the hardware capabilities. -.IP -Applications can override the -.I memory_failure_early_kill -setting individually with the -.BR prctl (2) -.B PR_MCE_KILL -operation. -.IP -Present only if the kernel was configured with -.BR CONFIG_MEMORY_FAILURE . -.TP -.IR /proc/sys/vm/memory_failure_recovery " (since Linux 2.6.32)" -.\" The following is based on the text in Documentation/sysctl/vm.txt -Enable memory failure recovery (when supported by the platform). -.RS -.TP -.B 1 -Attempt recovery. -.TP -.B 0 -Always panic on a memory failure. -.RE -.IP -Present only if the kernel was configured with -.BR CONFIG_MEMORY_FAILURE . -.TP -.IR /proc/sys/vm/oom_dump_tasks " (since Linux 2.6.25)" -.\" The following is from Documentation/sysctl/vm.txt -Enables a system-wide task dump (excluding kernel threads) to be -produced when the kernel performs an OOM-killing. -The dump includes the following information -for each task (thread, process): -thread ID, real user ID, thread group ID (process ID), -virtual memory size, resident set size, -the CPU that the task is scheduled on, -oom_adj score (see the description of -.IR /proc/ pid /oom_adj ), -and command name. -This is helpful to determine why the OOM-killer was invoked -and to identify the rogue task that caused it. -.IP -If this contains the value zero, this information is suppressed. -On very large systems with thousands of tasks, -it may not be feasible to dump the memory state information for each one. -Such systems should not be forced to incur a performance penalty in -OOM situations when the information may not be desired. -.IP -If this is set to nonzero, this information is shown whenever the -OOM-killer actually kills a memory-hogging task. -.IP -The default value is 0. -.TP -.IR /proc/sys/vm/oom_kill_allocating_task " (since Linux 2.6.24)" -.\" The following is from Documentation/sysctl/vm.txt -This enables or disables killing the OOM-triggering task in -out-of-memory situations. -.IP -If this is set to zero, the OOM-killer will scan through the entire -tasklist and select a task based on heuristics to kill. -This normally selects a rogue memory-hogging task that -frees up a large amount of memory when killed. -.IP -If this is set to nonzero, the OOM-killer simply kills the task that -triggered the out-of-memory condition. -This avoids a possibly expensive tasklist scan. -.IP -If -.I /proc/sys/vm/panic_on_oom -is nonzero, it takes precedence over whatever value is used in -.IR /proc/sys/vm/oom_kill_allocating_task . -.IP -The default value is 0. -.TP -.IR /proc/sys/vm/overcommit_kbytes " (since Linux 3.14)" -.\" commit 49f0ce5f92321cdcf741e35f385669a421013cb7 -This writable file provides an alternative to -.I /proc/sys/vm/overcommit_ratio -for controlling the -.I CommitLimit -when -.I /proc/sys/vm/overcommit_memory -has the value 2. -It allows the amount of memory overcommitting to be specified as -an absolute value (in kB), -rather than as a percentage, as is done with -.IR overcommit_ratio . -This allows for finer-grained control of -.I CommitLimit -on systems with extremely large memory sizes. -.IP -Only one of -.I overcommit_kbytes -or -.I overcommit_ratio -can have an effect: -if -.I overcommit_kbytes -has a nonzero value, then it is used to calculate -.IR CommitLimit , -otherwise -.I overcommit_ratio -is used. -Writing a value to either of these files causes the -value in the other file to be set to zero. -.TP -.I /proc/sys/vm/overcommit_memory -This file contains the kernel virtual memory accounting mode. -Values are: -.RS -.IP -0: heuristic overcommit (this is the default) -.br -1: always overcommit, never check -.br -2: always check, never overcommit -.RE -.IP -In mode 0, calls of -.BR mmap (2) -with -.B MAP_NORESERVE -are not checked, and the default check is very weak, -leading to the risk of getting a process "OOM-killed". -.IP -In mode 1, the kernel pretends there is always enough memory, -until memory actually runs out. -One use case for this mode is scientific computing applications -that employ large sparse arrays. -Before Linux 2.6.0, any nonzero value implies mode 1. -.IP -In mode 2 (available since Linux 2.6), the total virtual address space -that can be allocated -.RI ( CommitLimit -in -.IR /proc/meminfo ) -is calculated as -.IP -.in +4n -.EX -CommitLimit = (total_RAM \- total_huge_TLB) * - overcommit_ratio / 100 + total_swap -.EE -.in -.IP -where: -.RS -.IP \[bu] 3 -.I total_RAM -is the total amount of RAM on the system; -.IP \[bu] -.I total_huge_TLB -is the amount of memory set aside for huge pages; -.IP \[bu] -.I overcommit_ratio -is the value in -.IR /proc/sys/vm/overcommit_ratio ; -and -.IP \[bu] -.I total_swap -is the amount of swap space. -.RE -.IP -For example, on a system with 16 GB of physical RAM, 16 GB -of swap, no space dedicated to huge pages, and an -.I overcommit_ratio -of 50, this formula yields a -.I CommitLimit -of 24 GB. -.IP -Since Linux 3.14, if the value in -.I /proc/sys/vm/overcommit_kbytes -is nonzero, then -.I CommitLimit -is instead calculated as: -.IP -.in +4n -.EX -CommitLimit = overcommit_kbytes + total_swap -.EE -.in -.IP -See also the description of -.I /proc/sys/vm/admin_reserve_kbytes -and -.IR /proc/sys/vm/user_reserve_kbytes . -.TP -.IR /proc/sys/vm/overcommit_ratio " (since Linux 2.6.0)" -This writable file defines a percentage by which memory -can be overcommitted. -The default value in the file is 50. -See the description of -.IR /proc/sys/vm/overcommit_memory . -.TP -.IR /proc/sys/vm/panic_on_oom " (since Linux 2.6.18)" -.\" The following is adapted from Documentation/sysctl/vm.txt -This enables or disables a kernel panic in -an out-of-memory situation. -.IP -If this file is set to the value 0, -the kernel's OOM-killer will kill some rogue process. -Usually, the OOM-killer is able to kill a rogue process and the -system will survive. -.IP -If this file is set to the value 1, -then the kernel normally panics when out-of-memory happens. -However, if a process limits allocations to certain nodes -using memory policies -.RB ( mbind (2) -.BR MPOL_BIND ) -or cpusets -.RB ( cpuset (7)) -and those nodes reach memory exhaustion status, -one process may be killed by the OOM-killer. -No panic occurs in this case: -because other nodes' memory may be free, -this means the system as a whole may not have reached -an out-of-memory situation yet. -.IP -If this file is set to the value 2, -the kernel always panics when an out-of-memory condition occurs. -.IP -The default value is 0. -1 and 2 are for failover of clustering. -Select either according to your policy of failover. -.TP -.I /proc/sys/vm/swappiness -.\" The following is from Documentation/sysctl/vm.txt -The value in this file controls how aggressively the kernel will swap -memory pages. -Higher values increase aggressiveness, lower values -decrease aggressiveness. -The default value is 60. -.TP -.IR /proc/sys/vm/user_reserve_kbytes " (since Linux 3.10)" -.\" commit c9b1d0981fcce3d9976d7b7a56e4e0503bc610dd -Specifies an amount of memory (in KiB) to reserve for user processes. -This is intended to prevent a user from starting a single memory hogging -process, such that they cannot recover (kill the hog). -The value in this file has an effect only when -.I /proc/sys/vm/overcommit_memory -is set to 2 ("overcommit never" mode). -In this case, the system reserves an amount of memory that is the minimum -of [3% of current process size, -.IR user_reserve_kbytes ]. -.IP -The default value in this file is the minimum of [3% of free pages, 128MiB] -expressed as KiB. -.IP -If the value in this file is set to zero, -then a user will be allowed to allocate all free memory with a single process -(minus the amount reserved by -.IR /proc/sys/vm/admin_reserve_kbytes ). -Any subsequent attempts to execute a command will result in -"fork: Cannot allocate memory". -.IP -Changing the value in this file takes effect whenever -an application requests memory. -.TP -.IR /proc/sys/vm/unprivileged_userfaultfd " (since Linux 5.2)" -.\" cefdca0a86be517bc390fc4541e3674b8e7803b0 -This (writable) file exposes a flag that controls whether -unprivileged processes are allowed to employ -.BR userfaultfd (2). -If this file has the value 1, then unprivileged processes may use -.BR userfaultfd (2). -If this file has the value 0, then only processes that have the -.B CAP_SYS_PTRACE -capability may employ -.BR userfaultfd (2). -The default value in this file is 1. -.TP .IR /proc/sysrq\-trigger " (since Linux 2.4.21)" Writing a character to this file triggers the same SysRq function as typing ALT-SysRq-<character> (see the description of diff --git a/man5/proc_sys.5 b/man5/proc_sys.5 new file mode 100644 index 000000000..78f0c192c --- /dev/null +++ b/man5/proc_sys.5 @@ -0,0 +1,1623 @@ +'\" t +.\" Copyright (C) 1994, 1995, Daniel Quinlan <quinlan@yggdrasil.com> +.\" Copyright (C) 2002-2008, 2017, Michael Kerrisk <mtk.manpages@gmail.com> +.\" Copyright (C) , Andries Brouwer <aeb@cwi.nl> +.\" Copyright (C) 2023, Alejandro Colomar <alx@kernel.org> +.\" +.\" SPDX-License-Identifier: GPL-3.0-or-later +.\" +.TH proc_sys 5 (date) "Linux man-pages (unreleased)" +.SH NAME +/proc/sys/ \- system information, and sysctl pseudo-filesystem +.SH DESCRIPTION +.TP +.I /proc/sys/ +This directory (present since Linux 1.3.57) contains a number of files +and subdirectories corresponding to kernel variables. +These variables can be read and in some cases modified using +the \fI/proc\fP filesystem, and the (deprecated) +.BR sysctl (2) +system call. +.IP +String values may be terminated by either \[aq]\e0\[aq] or \[aq]\en\[aq]. +.IP +Integer and long values may be written either in decimal or in +hexadecimal notation (e.g., 0x3FFF). +When writing multiple integer or long values, these may be separated +by any of the following whitespace characters: +\[aq]\ \[aq], \[aq]\et\[aq], or \[aq]\en\[aq]. +Using other separators leads to the error +.BR EINVAL . +.TP +.IR /proc/sys/abi/ " (since Linux 2.4.10)" +This directory may contain files with application binary information. +.\" On some systems, it is not present. +See the Linux kernel source file +.I Documentation/sysctl/abi.rst +(or +.I Documentation/sysctl/abi.txt +before Linux 5.3) +for more information. +.TP +.I /proc/sys/debug/ +This directory may be empty. +.TP +.I /proc/sys/dev/ +This directory contains device-specific information (e.g., +.IR dev/cdrom/info ). +On +some systems, it may be empty. +.TP +.I /proc/sys/fs/ +This directory contains the files and subdirectories for kernel variables +related to filesystems. +.TP +.IR /proc/sys/fs/aio\-max\-nr " and " /proc/sys/fs/aio\-nr " (since Linux 2.6.4)" +.I aio\-nr +is the running total of the number of events specified by +.BR io_setup (2) +calls for all currently active AIO contexts. +If +.I aio\-nr +reaches +.IR aio\-max\-nr , +then +.BR io_setup (2) +will fail with the error +.BR EAGAIN . +Raising +.I aio\-max\-nr +does not result in the preallocation or resizing +of any kernel data structures. +.TP +.I /proc/sys/fs/binfmt_misc +Documentation for files in this directory can be found +in the Linux kernel source in the file +.I Documentation/admin\-guide/binfmt\-misc.rst +(or in +.I Documentation/binfmt_misc.txt +on older kernels). +.TP +.IR /proc/sys/fs/dentry\-state " (since Linux 2.2)" +This file contains information about the status of the +directory cache (dcache). +The file contains six numbers, +.IR nr_dentry , +.IR nr_unused , +.I age_limit +(age in seconds), +.I want_pages +(pages requested by system) and two dummy values. +.RS +.IP \[bu] 3 +.I nr_dentry +is the number of allocated dentries (dcache entries). +This field is unused in Linux 2.2. +.IP \[bu] +.I nr_unused +is the number of unused dentries. +.IP \[bu] +.I age_limit +.\" looks like this is unused in Linux 2.2 to Linux 2.6 +is the age in seconds after which dcache entries +can be reclaimed when memory is short. +.IP \[bu] +.I want_pages +.\" looks like this is unused in Linux 2.2 to Linux 2.6 +is nonzero when the kernel has called shrink_dcache_pages() and the +dcache isn't pruned yet. +.RE +.TP +.I /proc/sys/fs/dir\-notify\-enable +This file can be used to disable or enable the +.I dnotify +interface described in +.BR fcntl (2) +on a system-wide basis. +A value of 0 in this file disables the interface, +and a value of 1 enables it. +.TP +.I /proc/sys/fs/dquot\-max +This file shows the maximum number of cached disk quota entries. +On some (2.4) systems, it is not present. +If the number of free cached disk quota entries is very low and +you have some awesome number of simultaneous system users, +you might want to raise the limit. +.TP +.I /proc/sys/fs/dquot\-nr +This file shows the number of allocated disk quota +entries and the number of free disk quota entries. +.TP +.IR /proc/sys/fs/epoll/ " (since Linux 2.6.28)" +This directory contains the file +.IR max_user_watches , +which can be used to limit the amount of kernel memory consumed by the +.I epoll +interface. +For further details, see +.BR epoll (7). +.TP +.I /proc/sys/fs/file\-max +This file defines +a system-wide limit on the number of open files for all processes. +System calls that fail when encountering this limit fail with the error +.BR ENFILE . +(See also +.BR setrlimit (2), +which can be used by a process to set the per-process limit, +.BR RLIMIT_NOFILE , +on the number of files it may open.) +If you get lots +of error messages in the kernel log about running out of file handles +(open file descriptions) +(look for "VFS: file\-max limit <number> reached"), +try increasing this value: +.IP +.in +4n +.EX +echo 100000 > /proc/sys/fs/file\-max +.EE +.in +.IP +Privileged processes +.RB ( CAP_SYS_ADMIN ) +can override the +.I file\-max +limit. +.TP +.I /proc/sys/fs/file\-nr +This (read-only) file contains three numbers: +the number of allocated file handles +(i.e., the number of open file descriptions; see +.BR open (2)); +the number of free file handles; +and the maximum number of file handles (i.e., the same value as +.IR /proc/sys/fs/file\-max ). +If the number of allocated file handles is close to the +maximum, you should consider increasing the maximum. +Before Linux 2.6, +the kernel allocated file handles dynamically, +but it didn't free them again. +Instead the free file handles were kept in a list for reallocation; +the "free file handles" value indicates the size of that list. +A large number of free file handles indicates that there was +a past peak in the usage of open file handles. +Since Linux 2.6, the kernel does deallocate freed file handles, +and the "free file handles" value is always zero. +.TP +.IR /proc/sys/fs/inode\-max " (only present until Linux 2.2)" +This file contains the maximum number of in-memory inodes. +This value should be 3\[en]4 times larger +than the value in +.IR file\-max , +since \fIstdin\fP, \fIstdout\fP +and network sockets also need an inode to handle them. +When you regularly run out of inodes, you need to increase this value. +.IP +Starting with Linux 2.4, +there is no longer a static limit on the number of inodes, +and this file is removed. +.TP +.I /proc/sys/fs/inode\-nr +This file contains the first two values from +.IR inode\-state . +.TP +.I /proc/sys/fs/inode\-state +This file +contains seven numbers: +.IR nr_inodes , +.IR nr_free_inodes , +.IR preshrink , +and four dummy values (always zero). +.IP +.I nr_inodes +is the number of inodes the system has allocated. +.\" This can be slightly more than +.\" .I inode\-max +.\" because Linux allocates them one page full at a time. +.I nr_free_inodes +represents the number of free inodes. +.IP +.I preshrink +is nonzero when the +.I nr_inodes +> +.I inode\-max +and the system needs to prune the inode list instead of allocating more; +since Linux 2.4, this field is a dummy value (always zero). +.TP +.IR /proc/sys/fs/inotify/ " (since Linux 2.6.13)" +This directory contains files +.IR max_queued_events ", " max_user_instances ", and " max_user_watches , +that can be used to limit the amount of kernel memory consumed by the +.I inotify +interface. +For further details, see +.BR inotify (7). +.TP +.I /proc/sys/fs/lease\-break\-time +This file specifies the grace period that the kernel grants to a process +holding a file lease +.RB ( fcntl (2)) +after it has sent a signal to that process notifying it +that another process is waiting to open the file. +If the lease holder does not remove or downgrade the lease within +this grace period, the kernel forcibly breaks the lease. +.TP +.I /proc/sys/fs/leases\-enable +This file can be used to enable or disable file leases +.RB ( fcntl (2)) +on a system-wide basis. +If this file contains the value 0, leases are disabled. +A nonzero value enables leases. +.TP +.IR /proc/sys/fs/mount\-max " (since Linux 4.9)" +.\" commit d29216842a85c7970c536108e093963f02714498 +The value in this file specifies the maximum number of mounts that may exist +in a mount namespace. +The default value in this file is 100,000. +.TP +.IR /proc/sys/fs/mqueue/ " (since Linux 2.6.6)" +This directory contains files +.IR msg_max ", " msgsize_max ", and " queues_max , +controlling the resources used by POSIX message queues. +See +.BR mq_overview (7) +for details. +.TP +.IR /proc/sys/fs/nr_open " (since Linux 2.6.25)" +.\" commit 9cfe015aa424b3c003baba3841a60dd9b5ad319b +This file imposes a ceiling on the value to which the +.B RLIMIT_NOFILE +resource limit can be raised (see +.BR getrlimit (2)). +This ceiling is enforced for both unprivileged and privileged process. +The default value in this file is 1048576. +(Before Linux 2.6.25, the ceiling for +.B RLIMIT_NOFILE +was hard-coded to the same value.) +.TP +.IR /proc/sys/fs/overflowgid " and " /proc/sys/fs/overflowuid +These files +allow you to change the value of the fixed UID and GID. +The default is 65534. +Some filesystems support only 16-bit UIDs and GIDs, although in Linux +UIDs and GIDs are 32 bits. +When one of these filesystems is mounted +with writes enabled, any UID or GID that would exceed 65535 is translated +to the overflow value before being written to disk. +.TP +.IR /proc/sys/fs/pipe\-max\-size " (since Linux 2.6.35)" +See +.BR pipe (7). +.TP +.IR /proc/sys/fs/pipe\-user\-pages\-hard " (since Linux 4.5)" +See +.BR pipe (7). +.TP +.IR /proc/sys/fs/pipe\-user\-pages\-soft " (since Linux 4.5)" +See +.BR pipe (7). +.TP +.IR /proc/sys/fs/protected_fifos " (since Linux 4.19)" +The value in this file is/can be set to one of the following: +.RS +.TP 4 +0 +Writing to FIFOs is unrestricted. +.TP +1 +Don't allow +.B O_CREAT +.BR open (2) +on FIFOs that the caller doesn't own in world-writable sticky directories, +unless the FIFO is owned by the owner of the directory. +.TP +2 +As for the value 1, +but the restriction also applies to group-writable sticky directories. +.RE +.IP +The intent of the above protections is to avoid unintentional writes to an +attacker-controlled FIFO when a program expected to create a regular file. +.TP +.IR /proc/sys/fs/protected_hardlinks " (since Linux 3.6)" +.\" commit 800179c9b8a1e796e441674776d11cd4c05d61d7 +When the value in this file is 0, +no restrictions are placed on the creation of hard links +(i.e., this is the historical behavior before Linux 3.6). +When the value in this file is 1, +a hard link can be created to a target file +only if one of the following conditions is true: +.RS +.IP \[bu] 3 +The calling process has the +.B CAP_FOWNER +capability in its user namespace +and the file UID has a mapping in the namespace. +.IP \[bu] +The filesystem UID of the process creating the link matches +the owner (UID) of the target file +(as described in +.BR credentials (7), +a process's filesystem UID is normally the same as its effective UID). +.IP \[bu] +All of the following conditions are true: +.RS 4 +.IP \[bu] 3 +the target is a regular file; +.IP \[bu] +the target file does not have its set-user-ID mode bit enabled; +.IP \[bu] +the target file does not have both its set-group-ID and +group-executable mode bits enabled; and +.IP \[bu] +the caller has permission to read and write the target file +(either via the file's permissions mask or because it has +suitable capabilities). +.RE +.RE +.IP +The default value in this file is 0. +Setting the value to 1 +prevents a longstanding class of security issues caused by +hard-link-based time-of-check, time-of-use races, +most commonly seen in world-writable directories such as +.IR /tmp . +The common method of exploiting this flaw +is to cross privilege boundaries when following a given hard link +(i.e., a root process follows a hard link created by another user). +Additionally, on systems without separated partitions, +this stops unauthorized users from "pinning" vulnerable set-user-ID and +set-group-ID files against being upgraded by +the administrator, or linking to special files. +.TP +.IR /proc/sys/fs/protected_regular " (since Linux 4.19)" +The value in this file is/can be set to one of the following: +.RS +.TP 4 +0 +Writing to regular files is unrestricted. +.TP +1 +Don't allow +.B O_CREAT +.BR open (2) +on regular files that the caller doesn't own in +world-writable sticky directories, +unless the regular file is owned by the owner of the directory. +.TP +2 +As for the value 1, +but the restriction also applies to group-writable sticky directories. +.RE +.IP +The intent of the above protections is similar to +.IR protected_fifos , +but allows an application to +avoid writes to an attacker-controlled regular file, +where the application expected to create one. +.TP +.IR /proc/sys/fs/protected_symlinks " (since Linux 3.6)" +.\" commit 800179c9b8a1e796e441674776d11cd4c05d61d7 +When the value in this file is 0, +no restrictions are placed on following symbolic links +(i.e., this is the historical behavior before Linux 3.6). +When the value in this file is 1, symbolic links are followed only +in the following circumstances: +.RS +.IP \[bu] 3 +the filesystem UID of the process following the link matches +the owner (UID) of the symbolic link +(as described in +.BR credentials (7), +a process's filesystem UID is normally the same as its effective UID); +.IP \[bu] +the link is not in a sticky world-writable directory; or +.IP \[bu] +the symbolic link and its parent directory have the same owner (UID) +.RE +.IP +A system call that fails to follow a symbolic link +because of the above restrictions returns the error +.B EACCES +in +.IR errno . +.IP +The default value in this file is 0. +Setting the value to 1 avoids a longstanding class of security issues +based on time-of-check, time-of-use races when accessing symbolic links. +.TP +.IR /proc/sys/fs/suid_dumpable " (since Linux 2.6.13)" +.\" The following is based on text from Documentation/sysctl/kernel.txt +The value in this file is assigned to a process's "dumpable" flag +in the circumstances described in +.BR prctl (2). +In effect, +the value in this file determines whether core dump files are +produced for set-user-ID or otherwise protected/tainted binaries. +The "dumpable" setting also affects the ownership of files in a process's +.IR /proc/ pid +directory, as described above. +.IP +Three different integer values can be specified: +.RS +.TP +\fI0\ (default)\fP +.\" In kernel source: SUID_DUMP_DISABLE +This provides the traditional (pre-Linux 2.6.13) behavior. +A core dump will not be produced for a process which has +changed credentials (by calling +.BR seteuid (2), +.BR setgid (2), +or similar, or by executing a set-user-ID or set-group-ID program) +or whose binary does not have read permission enabled. +.TP +\fI1\ ("debug")\fP +.\" In kernel source: SUID_DUMP_USER +All processes dump core when possible. +(Reasons why a process might nevertheless not dump core are described in +.BR core (5).) +The core dump is owned by the filesystem user ID of the dumping process +and no security is applied. +This is intended for system debugging situations only: +this mode is insecure because it allows unprivileged users to +examine the memory contents of privileged processes. +.TP +\fI2\ ("suidsafe")\fP +.\" In kernel source: SUID_DUMP_ROOT +Any binary which normally would not be dumped (see "0" above) +is dumped readable by root only. +This allows the user to remove the core dump file but not to read it. +For security reasons core dumps in this mode will not overwrite one +another or other files. +This mode is appropriate when administrators are +attempting to debug problems in a normal environment. +.IP +Additionally, since Linux 3.6, +.\" 9520628e8ceb69fa9a4aee6b57f22675d9e1b709 +.I /proc/sys/kernel/core_pattern +must either be an absolute pathname +or a pipe command, as detailed in +.BR core (5). +Warnings will be written to the kernel log if +.I core_pattern +does not follow these rules, and no core dump will be produced. +.\" 54b501992dd2a839e94e76aa392c392b55080ce8 +.RE +.IP +For details of the effect of a process's "dumpable" setting +on ptrace access mode checking, see +.BR ptrace (2). +.TP +.I /proc/sys/fs/super\-max +This file +controls the maximum number of superblocks, and +thus the maximum number of mounted filesystems the kernel +can have. +You need increase only +.I super\-max +if you need to mount more filesystems than the current value in +.I super\-max +allows you to. +.TP +.I /proc/sys/fs/super\-nr +This file +contains the number of filesystems currently mounted. +.TP +.I /proc/sys/kernel/ +This directory contains files controlling a range of kernel parameters, +as described below. +.TP +.I /proc/sys/kernel/acct +This file +contains three numbers: +.IR highwater , +.IR lowwater , +and +.IR frequency . +If BSD-style process accounting is enabled, these values control +its behavior. +If free space on filesystem where the log lives goes below +.I lowwater +percent, accounting suspends. +If free space gets above +.I highwater +percent, accounting resumes. +.I frequency +determines +how often the kernel checks the amount of free space (value is in +seconds). +Default values are 4, 2, and 30. +That is, suspend accounting if 2% or less space is free; resume it +if 4% or more space is free; consider information about amount of free space +valid for 30 seconds. +.TP +.IR /proc/sys/kernel/auto_msgmni " (Linux 2.6.27 to Linux 3.18)" +.\" commit 9eefe520c814f6f62c5d36a2ddcd3fb99dfdb30e (introduces feature) +.\" commit 0050ee059f7fc86b1df2527aaa14ed5dc72f9973 (rendered redundant) +From Linux 2.6.27 to Linux 3.18, +this file was used to control recomputing of the value in +.I /proc/sys/kernel/msgmni +upon the addition or removal of memory or upon IPC namespace creation/removal. +Echoing "1" into this file enabled +.I msgmni +automatic recomputing (and triggered a recomputation of +.I msgmni +based on the current amount of available memory and number of IPC namespaces). +Echoing "0" disabled automatic recomputing. +(Automatic recomputing was also disabled if a value was explicitly assigned to +.IR /proc/sys/kernel/msgmni .) +The default value in +.I auto_msgmni +was 1. +.IP +Since Linux 3.19, the content of this file has no effect (because +.I msgmni +.\" FIXME Must document the 3.19 'msgmni' changes. +defaults to near the maximum value possible), +and reads from this file always return the value "0". +.TP +.IR /proc/sys/kernel/cap_last_cap " (since Linux 3.2)" +See +.BR capabilities (7). +.TP +.IR /proc/sys/kernel/cap\-bound " (from Linux 2.2 to Linux 2.6.24)" +This file holds the value of the kernel +.I "capability bounding set" +(expressed as a signed decimal number). +This set is ANDed against the capabilities permitted to a process +during +.BR execve (2). +Starting with Linux 2.6.25, +the system-wide capability bounding set disappeared, +and was replaced by a per-thread bounding set; see +.BR capabilities (7). +.TP +.I /proc/sys/kernel/core_pattern +See +.BR core (5). +.TP +.I /proc/sys/kernel/core_pipe_limit +See +.BR core (5). +.TP +.I /proc/sys/kernel/core_uses_pid +See +.BR core (5). +.TP +.I /proc/sys/kernel/ctrl\-alt\-del +This file +controls the handling of Ctrl-Alt-Del from the keyboard. +When the value in this file is 0, Ctrl-Alt-Del is trapped and +sent to the +.BR init (1) +program to handle a graceful restart. +When the value is greater than zero, Linux's reaction to a Vulcan +Nerve Pinch (tm) will be an immediate reboot, without even +syncing its dirty buffers. +Note: when a program (like dosemu) has the keyboard in "raw" +mode, the Ctrl-Alt-Del is intercepted by the program before it +ever reaches the kernel tty layer, and it's up to the program +to decide what to do with it. +.TP +.IR /proc/sys/kernel/dmesg_restrict " (since Linux 2.6.37)" +The value in this file determines who can see kernel syslog contents. +A value of 0 in this file imposes no restrictions. +If the value is 1, only privileged users can read the kernel syslog. +(See +.BR syslog (2) +for more details.) +Since Linux 3.4, +.\" commit 620f6e8e855d6d447688a5f67a4e176944a084e8 +only users with the +.B CAP_SYS_ADMIN +capability may change the value in this file. +.TP +.IR /proc/sys/kernel/domainname " and " /proc/sys/kernel/hostname +can be used to set the NIS/YP domainname and the +hostname of your box in exactly the same way as the commands +.BR domainname (1) +and +.BR hostname (1), +that is: +.IP +.in +4n +.EX +.RB "#" " echo \[aq]darkstar\[aq] > /proc/sys/kernel/hostname" +.RB "#" " echo \[aq]mydomain\[aq] > /proc/sys/kernel/domainname" +.EE +.in +.IP +has the same effect as +.IP +.in +4n +.EX +.RB "#" " hostname \[aq]darkstar\[aq]" +.RB "#" " domainname \[aq]mydomain\[aq]" +.EE +.in +.IP +Note, however, that the classic darkstar.frop.org has the +hostname "darkstar" and DNS (Internet Domain Name Server) +domainname "frop.org", not to be confused with the NIS (Network +Information Service) or YP (Yellow Pages) domainname. +These two +domain names are in general different. +For a detailed discussion +see the +.BR hostname (1) +man page. +.TP +.I /proc/sys/kernel/hotplug +This file +contains the pathname for the hotplug policy agent. +The default value in this file is +.IR /sbin/hotplug . +.TP +.\" Removed in commit 87f504e5c78b910b0c1d6ffb89bc95e492322c84 (tglx/history.git) +.IR /proc/sys/kernel/htab\-reclaim " (before Linux 2.4.9.2)" +(PowerPC only) If this file is set to a nonzero value, +the PowerPC htab +.\" removed in commit 1b483a6a7b2998e9c98ad985d7494b9b725bd228, before Linux 2.6.28 +(see kernel file +.IR Documentation/powerpc/ppc_htab.txt ) +is pruned +each time the system hits the idle loop. +.TP +.I /proc/sys/kernel/keys/ +This directory contains various files that define parameters and limits +for the key-management facility. +These files are described in +.BR keyrings (7). +.TP +.IR /proc/sys/kernel/kptr_restrict " (since Linux 2.6.38)" +.\" 455cd5ab305c90ffc422dd2e0fb634730942b257 +The value in this file determines whether kernel addresses are exposed via +.I /proc +files and other interfaces. +A value of 0 in this file imposes no restrictions. +If the value is 1, kernel pointers printed using the +.I %pK +format specifier will be replaced with zeros unless the user has the +.B CAP_SYSLOG +capability. +If the value is 2, kernel pointers printed using the +.I %pK +format specifier will be replaced with zeros regardless +of the user's capabilities. +The initial default value for this file was 1, +but the default was changed +.\" commit 411f05f123cbd7f8aa1edcae86970755a6e2a9d9 +to 0 in Linux 2.6.39. +Since Linux 3.4, +.\" commit 620f6e8e855d6d447688a5f67a4e176944a084e8 +only users with the +.B CAP_SYS_ADMIN +capability can change the value in this file. +.TP +.I /proc/sys/kernel/l2cr +(PowerPC only) This file +contains a flag that controls the L2 cache of G3 processor +boards. +If 0, the cache is disabled. +Enabled if nonzero. +.TP +.I /proc/sys/kernel/modprobe +This file contains the pathname for the kernel module loader. +The default value is +.IR /sbin/modprobe . +The file is present only if the kernel is built with the +.B CONFIG_MODULES +.RB ( CONFIG_KMOD +in Linux 2.6.26 and earlier) +option enabled. +It is described by the Linux kernel source file +.I Documentation/kmod.txt +(present only in Linux 2.4 and earlier). +.TP +.IR /proc/sys/kernel/modules_disabled " (since Linux 2.6.31)" +.\" 3d43321b7015387cfebbe26436d0e9d299162ea1 +.\" From Documentation/sysctl/kernel.txt +A toggle value indicating if modules are allowed to be loaded +in an otherwise modular kernel. +This toggle defaults to off (0), but can be set true (1). +Once true, modules can be neither loaded nor unloaded, +and the toggle cannot be set back to false. +The file is present only if the kernel is built with the +.B CONFIG_MODULES +option enabled. +.TP +.IR /proc/sys/kernel/msgmax " (since Linux 2.2)" +This file defines +a system-wide limit specifying the maximum number of bytes in +a single message written on a System V message queue. +.TP +.IR /proc/sys/kernel/msgmni " (since Linux 2.4)" +This file defines the system-wide limit on the number of +message queue identifiers. +See also +.IR /proc/sys/kernel/auto_msgmni . +.TP +.IR /proc/sys/kernel/msgmnb " (since Linux 2.2)" +This file defines a system-wide parameter used to initialize the +.I msg_qbytes +setting for subsequently created message queues. +The +.I msg_qbytes +setting specifies the maximum number of bytes that may be written to the +message queue. +.TP +.IR /proc/sys/kernel/ngroups_max " (since Linux 2.6.4)" +This is a read-only file that displays the upper limit on the +number of a process's group memberships. +.TP +.IR /proc/sys/kernel/ns_last_pid " (since Linux 3.3)" +See +.BR pid_namespaces (7). +.TP +.IR /proc/sys/kernel/ostype " and " /proc/sys/kernel/osrelease +These files +give substrings of +.IR /proc/version . +.TP +.IR /proc/sys/kernel/overflowgid " and " /proc/sys/kernel/overflowuid +These files duplicate the files +.I /proc/sys/fs/overflowgid +and +.IR /proc/sys/fs/overflowuid . +.TP +.I /proc/sys/kernel/panic +This file gives read/write access to the kernel variable +.IR panic_timeout . +If this is zero, the kernel will loop on a panic; if nonzero, +it indicates that the kernel should autoreboot after this number +of seconds. +When you use the +software watchdog device driver, the recommended setting is 60. +.TP +.IR /proc/sys/kernel/panic_on_oops " (since Linux 2.5.68)" +This file controls the kernel's behavior when an oops +or BUG is encountered. +If this file contains 0, then the system +tries to continue operation. +If it contains 1, then the system +delays a few seconds (to give klogd time to record the oops output) +and then panics. +If the +.I /proc/sys/kernel/panic +file is also nonzero, then the machine will be rebooted. +.TP +.IR /proc/sys/kernel/pid_max " (since Linux 2.5.34)" +This file specifies the value at which PIDs wrap around +(i.e., the value in this file is one greater than the maximum PID). +PIDs greater than this value are not allocated; +thus, the value in this file also acts as a system-wide limit +on the total number of processes and threads. +The default value for this file, 32768, +results in the same range of PIDs as on earlier kernels. +On 32-bit platforms, 32768 is the maximum value for +.IR pid_max . +On 64-bit systems, +.I pid_max +can be set to any value up to 2\[ha]22 +.RB ( PID_MAX_LIMIT , +approximately 4 million). +.\" Prior to Linux 2.6.10, pid_max could also be raised above 32768 on 32-bit +.\" platforms, but this broke /proc/[pid] +.\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=109513010926152&w=2 +.TP +.IR /proc/sys/kernel/powersave\-nap " (PowerPC only)" +This file contains a flag. +If set, Linux-PPC will use the "nap" mode of +powersaving, +otherwise the "doze" mode will be used. +.TP +.I /proc/sys/kernel/printk +See +.BR syslog (2). +.TP +.IR /proc/sys/kernel/pty " (since Linux 2.6.4)" +This directory contains two files relating to the number of UNIX 98 +pseudoterminals (see +.BR pts (4)) +on the system. +.TP +.I /proc/sys/kernel/pty/max +This file defines the maximum number of pseudoterminals. +.\" FIXME Document /proc/sys/kernel/pty/reserve +.\" New in Linux 3.3 +.\" commit e9aba5158a80098447ff207a452a3418ae7ee386 +.TP +.I /proc/sys/kernel/pty/nr +This read-only file +indicates how many pseudoterminals are currently in use. +.TP +.I /proc/sys/kernel/random/ +This directory +contains various parameters controlling the operation of the file +.IR /dev/random . +See +.BR random (4) +for further information. +.TP +.IR /proc/sys/kernel/random/uuid " (since Linux 2.4)" +Each read from this read-only file returns a randomly generated 128-bit UUID, +as a string in the standard UUID format. +.TP +.IR /proc/sys/kernel/randomize_va_space " (since Linux 2.6.12)" +.\" Some further details can be found in Documentation/sysctl/kernel.txt +Select the address space layout randomization (ASLR) policy for the system +(on architectures that support ASLR). +Three values are supported for this file: +.RS +.TP +.B 0 +Turn ASLR off. +This is the default for architectures that don't support ASLR, +and when the kernel is booted with the +.I norandmaps +parameter. +.TP +.B 1 +Make the addresses of +.BR mmap (2) +allocations, the stack, and the VDSO page randomized. +Among other things, this means that shared libraries will be +loaded at randomized addresses. +The text segment of PIE-linked binaries will also be loaded +at a randomized address. +This value is the default if the kernel was configured with +.BR CONFIG_COMPAT_BRK . +.TP +.B 2 +(Since Linux 2.6.25) +.\" commit c1d171a002942ea2d93b4fbd0c9583c56fce0772 +Also support heap randomization. +This value is the default if the kernel was not configured with +.BR CONFIG_COMPAT_BRK . +.RE +.TP +.I /proc/sys/kernel/real\-root\-dev +This file is documented in the Linux kernel source file +.I Documentation/admin\-guide/initrd.rst +.\" commit 9d85025b0418163fae079c9ba8f8445212de8568 +(or +.I Documentation/initrd.txt +before Linux 4.10). +.TP +.IR /proc/sys/kernel/reboot\-cmd " (Sparc only)" +This file seems to be a way to give an argument to the SPARC +ROM/Flash boot loader. +Maybe to tell it what to do after +rebooting? +.TP +.I /proc/sys/kernel/rtsig\-max +(Up to and including Linux 2.6.7; see +.BR setrlimit (2)) +This file can be used to tune the maximum number +of POSIX real-time (queued) signals that can be outstanding +in the system. +.TP +.I /proc/sys/kernel/rtsig\-nr +(Up to and including Linux 2.6.7.) +This file shows the number of POSIX real-time signals currently queued. +.TP +.IR /proc/ pid /sched_autogroup_enabled " (since Linux 2.6.38)" +.\" commit 5091faa449ee0b7d73bc296a93bca9540fc51d0a +See +.BR sched (7). +.TP +.IR /proc/sys/kernel/sched_child_runs_first " (since Linux 2.6.23)" +If this file contains the value zero, then, after a +.BR fork (2), +the parent is first scheduled on the CPU. +If the file contains a nonzero value, +then the child is scheduled first on the CPU. +(Of course, on a multiprocessor system, +the parent and the child might both immediately be scheduled on a CPU.) +.TP +.IR /proc/sys/kernel/sched_rr_timeslice_ms " (since Linux 3.9)" +See +.BR sched_rr_get_interval (2). +.TP +.IR /proc/sys/kernel/sched_rt_period_us " (since Linux 2.6.25)" +See +.BR sched (7). +.TP +.IR /proc/sys/kernel/sched_rt_runtime_us " (since Linux 2.6.25)" +See +.BR sched (7). +.TP +.IR /proc/sys/kernel/seccomp/ " (since Linux 4.14)" +.\" commit 8e5f1ad116df6b0de65eac458d5e7c318d1c05af +This directory provides additional seccomp information and +configuration. +See +.BR seccomp (2) +for further details. +.TP +.IR /proc/sys/kernel/sem " (since Linux 2.4)" +This file contains 4 numbers defining limits for System V IPC semaphores. +These fields are, in order: +.RS +.TP +SEMMSL +The maximum semaphores per semaphore set. +.TP +SEMMNS +A system-wide limit on the number of semaphores in all semaphore sets. +.TP +SEMOPM +The maximum number of operations that may be specified in a +.BR semop (2) +call. +.TP +SEMMNI +A system-wide limit on the maximum number of semaphore identifiers. +.RE +.TP +.I /proc/sys/kernel/sg\-big\-buff +This file +shows the size of the generic SCSI device (sg) buffer. +You can't tune it just yet, but you could change it at +compile time by editing +.I include/scsi/sg.h +and changing +the value of +.BR SG_BIG_BUFF . +However, there shouldn't be any reason to change this value. +.TP +.IR /proc/sys/kernel/shm_rmid_forced " (since Linux 3.1)" +.\" commit b34a6b1da371ed8af1221459a18c67970f7e3d53 +.\" See also Documentation/sysctl/kernel.txt +If this file is set to 1, all System V shared memory segments will +be marked for destruction as soon as the number of attached processes +falls to zero; +in other words, it is no longer possible to create shared memory segments +that exist independently of any attached process. +.IP +The effect is as though a +.BR shmctl (2) +.B IPC_RMID +is performed on all existing segments as well as all segments +created in the future (until this file is reset to 0). +Note that existing segments that are attached to no process will be +immediately destroyed when this file is set to 1. +Setting this option will also destroy segments that were created, +but never attached, +upon termination of the process that created the segment with +.BR shmget (2). +.IP +Setting this file to 1 provides a way of ensuring that +all System V shared memory segments are counted against the +resource usage and resource limits (see the description of +.B RLIMIT_AS +in +.BR getrlimit (2)) +of at least one process. +.IP +Because setting this file to 1 produces behavior that is nonstandard +and could also break existing applications, +the default value in this file is 0. +Set this file to 1 only if you have a good understanding +of the semantics of the applications using +System V shared memory on your system. +.TP +.IR /proc/sys/kernel/shmall " (since Linux 2.2)" +This file +contains the system-wide limit on the total number of pages of +System V shared memory. +.TP +.IR /proc/sys/kernel/shmmax " (since Linux 2.2)" +This file +can be used to query and set the run-time limit +on the maximum (System V IPC) shared memory segment size that can be +created. +Shared memory segments up to 1 GB are now supported in the +kernel. +This value defaults to +.BR SHMMAX . +.TP +.IR /proc/sys/kernel/shmmni " (since Linux 2.4)" +This file +specifies the system-wide maximum number of System V shared memory +segments that can be created. +.TP +.IR /proc/sys/kernel/sysctl_writes_strict " (since Linux 3.16)" +.\" commit f88083005ab319abba5d0b2e4e997558245493c8 +.\" commit 2ca9bb456ada8bcbdc8f77f8fc78207653bbaa92 +.\" commit f4aacea2f5d1a5f7e3154e967d70cf3f711bcd61 +.\" commit 24fe831c17ab8149413874f2fd4e5c8a41fcd294 +The value in this file determines how the file offset affects +the behavior of updating entries in files under +.IR /proc/sys . +The file has three possible values: +.RS +.TP 4 +\-1 +This provides legacy handling, with no printk warnings. +Each +.BR write (2) +must fully contain the value to be written, +and multiple writes on the same file descriptor +will overwrite the entire value, regardless of the file position. +.TP +0 +(default) This provides the same behavior as for \-1, +but printk warnings are written for processes that +perform writes when the file offset is not 0. +.TP +1 +Respect the file offset when writing strings into +.I /proc/sys +files. +Multiple writes will +.I append +to the value buffer. +Anything written beyond the maximum length +of the value buffer will be ignored. +Writes to numeric +.I /proc/sys +entries must always be at file offset 0 and the value must be +fully contained in the buffer provided to +.BR write (2). +.\" FIXME . +.\" With /proc/sys/kernel/sysctl_writes_strict==1, writes at an +.\" offset other than 0 do not generate an error. Instead, the +.\" write() succeeds, but the file is left unmodified. +.\" This is surprising. The behavior may change in the future. +.\" See thread.gmane.org/gmane.linux.man/9197 +.\" From: Michael Kerrisk (man-pages <mtk.manpages@...> +.\" Subject: sysctl_writes_strict documentation + an oddity? +.\" Newsgroups: gmane.linux.man, gmane.linux.kernel +.\" Date: 2015-05-09 08:54:11 GMT +.RE +.TP +.I /proc/sys/kernel/sysrq +This file controls the functions allowed to be invoked by the SysRq key. +By default, +the file contains 1 meaning that every possible SysRq request is allowed +(in older kernel versions, SysRq was disabled by default, +and you were required to specifically enable it at run-time, +but this is not the case any more). +Possible values in this file are: +.RS +.TP 5 +0 +Disable sysrq completely +.TP +1 +Enable all functions of sysrq +.TP +> 1 +Bit mask of allowed sysrq functions, as follows: +.PD 0 +.RS +.TP 5 +\ \ 2 +Enable control of console logging level +.TP +\ \ 4 +Enable control of keyboard (SAK, unraw) +.TP +\ \ 8 +Enable debugging dumps of processes etc. +.TP +\ 16 +Enable sync command +.TP +\ 32 +Enable remount read-only +.TP +\ 64 +Enable signaling of processes (term, kill, oom-kill) +.TP +128 +Allow reboot/poweroff +.TP +256 +Allow nicing of all real-time tasks +.RE +.PD +.RE +.IP +This file is present only if the +.B CONFIG_MAGIC_SYSRQ +kernel configuration option is enabled. +For further details see the Linux kernel source file +.I Documentation/admin\-guide/sysrq.rst +.\" commit 9d85025b0418163fae079c9ba8f8445212de8568 +(or +.I Documentation/sysrq.txt +before Linux 4.10). +.TP +.I /proc/sys/kernel/version +This file contains a string such as: +.IP +.in +4n +.EX +#5 Wed Feb 25 21:49:24 MET 1998 +.EE +.in +.IP +The "#5" means that +this is the fifth kernel built from this source base and the +date following it indicates the time the kernel was built. +.TP +.IR /proc/sys/kernel/threads\-max " (since Linux 2.3.11)" +.\" The following is based on Documentation/sysctl/kernel.txt +This file specifies the system-wide limit on the number of +threads (tasks) that can be created on the system. +.IP +Since Linux 4.1, +.\" commit 230633d109e35b0a24277498e773edeb79b4a331 +the value that can be written to +.I threads\-max +is bounded. +The minimum value that can be written is 20. +The maximum value that can be written is given by the +constant +.B FUTEX_TID_MASK +(0x3fffffff). +If a value outside of this range is written to +.IR threads\-max , +the error +.B EINVAL +occurs. +.IP +The value written is checked against the available RAM pages. +If the thread structures would occupy too much (more than 1/8th) +of the available RAM pages, +.I threads\-max +is reduced accordingly. +.TP +.IR /proc/sys/kernel/yama/ptrace_scope " (since Linux 3.5)" +See +.BR ptrace (2). +.TP +.IR /proc/sys/kernel/zero\-paged " (PowerPC only)" +This file +contains a flag. +When enabled (nonzero), Linux-PPC will pre-zero pages in +the idle loop, possibly speeding up get_free_pages. +.TP +.I /proc/sys/net +This directory contains networking stuff. +Explanations for some of the files under this directory can be found in +.BR tcp (7) +and +.BR ip (7). +.TP +.I /proc/sys/net/core/bpf_jit_enable +See +.BR bpf (2). +.TP +.I /proc/sys/net/core/somaxconn +This file defines a ceiling value for the +.I backlog +argument of +.BR listen (2); +see the +.BR listen (2) +manual page for details. +.TP +.I /proc/sys/proc +This directory may be empty. +.TP +.I /proc/sys/sunrpc +This directory supports Sun remote procedure call for network filesystem +(NFS). +On some systems, it is not present. +.TP +.IR /proc/sys/user " (since Linux 4.9)" +See +.BR namespaces (7). +.TP +.I /proc/sys/vm/ +This directory contains files for memory management tuning, buffer, and +cache management. +.TP +.IR /proc/sys/vm/admin_reserve_kbytes " (since Linux 3.10)" +.\" commit 4eeab4f5580d11bffedc697684b91b0bca0d5009 +This file defines the amount of free memory (in KiB) on the system that +should be reserved for users with the capability +.BR CAP_SYS_ADMIN . +.IP +The default value in this file is the minimum of [3% of free pages, 8MiB] +expressed as KiB. +The default is intended to provide enough for the superuser +to log in and kill a process, if necessary, +under the default overcommit 'guess' mode (i.e., 0 in +.IR /proc/sys/vm/overcommit_memory ). +.IP +Systems running in "overcommit never" mode (i.e., 2 in +.IR /proc/sys/vm/overcommit_memory ) +should increase the value in this file to account +for the full virtual memory size of the programs used to recover (e.g., +.BR login (1) +.BR ssh (1), +and +.BR top (1)) +Otherwise, the superuser may not be able to log in to recover the system. +For example, on x86-64 a suitable value is 131072 (128MiB reserved). +.IP +Changing the value in this file takes effect whenever +an application requests memory. +.TP +.IR /proc/sys/vm/compact_memory " (since Linux 2.6.35)" +When 1 is written to this file, all zones are compacted such that free +memory is available in contiguous blocks where possible. +The effect of this action can be seen by examining +.IR /proc/buddyinfo . +.IP +Present only if the kernel was configured with +.BR CONFIG_COMPACTION . +.TP +.IR /proc/sys/vm/drop_caches " (since Linux 2.6.16)" +Writing to this file causes the kernel to drop clean caches, dentries, and +inodes from memory, causing that memory to become free. +This can be useful for memory management testing and +performing reproducible filesystem benchmarks. +Because writing to this file causes the benefits of caching to be lost, +it can degrade overall system performance. +.IP +To free pagecache, use: +.IP +.in +4n +.EX +echo 1 > /proc/sys/vm/drop_caches +.EE +.in +.IP +To free dentries and inodes, use: +.IP +.in +4n +.EX +echo 2 > /proc/sys/vm/drop_caches +.EE +.in +.IP +To free pagecache, dentries, and inodes, use: +.IP +.in +4n +.EX +echo 3 > /proc/sys/vm/drop_caches +.EE +.in +.IP +Because writing to this file is a nondestructive operation and dirty objects +are not freeable, the +user should run +.BR sync (1) +first. +.TP +.IR /proc/sys/vm/sysctl_hugetlb_shm_group " (since Linux 2.6.7)" +This writable file contains a group ID that is allowed +to allocate memory using huge pages. +If a process has a filesystem group ID or any supplementary group ID that +matches this group ID, +then it can make huge-page allocations without holding the +.B CAP_IPC_LOCK +capability; see +.BR memfd_create (2), +.BR mmap (2), +and +.BR shmget (2). +.TP +.IR /proc/sys/vm/legacy_va_layout " (since Linux 2.6.9)" +.\" The following is from Documentation/filesystems/proc.txt +If nonzero, this disables the new 32-bit memory-mapping layout; +the kernel will use the legacy (2.4) layout for all processes. +.TP +.IR /proc/sys/vm/memory_failure_early_kill " (since Linux 2.6.32)" +.\" The following is based on the text in Documentation/sysctl/vm.txt +Control how to kill processes when an uncorrected memory error +(typically a 2-bit error in a memory module) +that cannot be handled by the kernel +is detected in the background by hardware. +In some cases (like the page still having a valid copy on disk), +the kernel will handle the failure +transparently without affecting any applications. +But if there is no other up-to-date copy of the data, +it will kill processes to prevent any data corruptions from propagating. +.IP +The file has one of the following values: +.RS +.TP +.B 1 +Kill all processes that have the corrupted-and-not-reloadable page mapped +as soon as the corruption is detected. +Note that this is not supported for a few types of pages, +such as kernel internally +allocated data or the swap cache, but works for the majority of user pages. +.TP +.B 0 +Unmap the corrupted page from all processes and kill a process +only if it tries to access the page. +.RE +.IP +The kill is performed using a +.B SIGBUS +signal with +.I si_code +set to +.BR BUS_MCEERR_AO . +Processes can handle this if they want to; see +.BR sigaction (2) +for more details. +.IP +This feature is active only on architectures/platforms with advanced machine +check handling and depends on the hardware capabilities. +.IP +Applications can override the +.I memory_failure_early_kill +setting individually with the +.BR prctl (2) +.B PR_MCE_KILL +operation. +.IP +Present only if the kernel was configured with +.BR CONFIG_MEMORY_FAILURE . +.TP +.IR /proc/sys/vm/memory_failure_recovery " (since Linux 2.6.32)" +.\" The following is based on the text in Documentation/sysctl/vm.txt +Enable memory failure recovery (when supported by the platform). +.RS +.TP +.B 1 +Attempt recovery. +.TP +.B 0 +Always panic on a memory failure. +.RE +.IP +Present only if the kernel was configured with +.BR CONFIG_MEMORY_FAILURE . +.TP +.IR /proc/sys/vm/oom_dump_tasks " (since Linux 2.6.25)" +.\" The following is from Documentation/sysctl/vm.txt +Enables a system-wide task dump (excluding kernel threads) to be +produced when the kernel performs an OOM-killing. +The dump includes the following information +for each task (thread, process): +thread ID, real user ID, thread group ID (process ID), +virtual memory size, resident set size, +the CPU that the task is scheduled on, +oom_adj score (see the description of +.IR /proc/ pid /oom_adj ), +and command name. +This is helpful to determine why the OOM-killer was invoked +and to identify the rogue task that caused it. +.IP +If this contains the value zero, this information is suppressed. +On very large systems with thousands of tasks, +it may not be feasible to dump the memory state information for each one. +Such systems should not be forced to incur a performance penalty in +OOM situations when the information may not be desired. +.IP +If this is set to nonzero, this information is shown whenever the +OOM-killer actually kills a memory-hogging task. +.IP +The default value is 0. +.TP +.IR /proc/sys/vm/oom_kill_allocating_task " (since Linux 2.6.24)" +.\" The following is from Documentation/sysctl/vm.txt +This enables or disables killing the OOM-triggering task in +out-of-memory situations. +.IP +If this is set to zero, the OOM-killer will scan through the entire +tasklist and select a task based on heuristics to kill. +This normally selects a rogue memory-hogging task that +frees up a large amount of memory when killed. +.IP +If this is set to nonzero, the OOM-killer simply kills the task that +triggered the out-of-memory condition. +This avoids a possibly expensive tasklist scan. +.IP +If +.I /proc/sys/vm/panic_on_oom +is nonzero, it takes precedence over whatever value is used in +.IR /proc/sys/vm/oom_kill_allocating_task . +.IP +The default value is 0. +.TP +.IR /proc/sys/vm/overcommit_kbytes " (since Linux 3.14)" +.\" commit 49f0ce5f92321cdcf741e35f385669a421013cb7 +This writable file provides an alternative to +.I /proc/sys/vm/overcommit_ratio +for controlling the +.I CommitLimit +when +.I /proc/sys/vm/overcommit_memory +has the value 2. +It allows the amount of memory overcommitting to be specified as +an absolute value (in kB), +rather than as a percentage, as is done with +.IR overcommit_ratio . +This allows for finer-grained control of +.I CommitLimit +on systems with extremely large memory sizes. +.IP +Only one of +.I overcommit_kbytes +or +.I overcommit_ratio +can have an effect: +if +.I overcommit_kbytes +has a nonzero value, then it is used to calculate +.IR CommitLimit , +otherwise +.I overcommit_ratio +is used. +Writing a value to either of these files causes the +value in the other file to be set to zero. +.TP +.I /proc/sys/vm/overcommit_memory +This file contains the kernel virtual memory accounting mode. +Values are: +.RS +.IP +0: heuristic overcommit (this is the default) +.br +1: always overcommit, never check +.br +2: always check, never overcommit +.RE +.IP +In mode 0, calls of +.BR mmap (2) +with +.B MAP_NORESERVE +are not checked, and the default check is very weak, +leading to the risk of getting a process "OOM-killed". +.IP +In mode 1, the kernel pretends there is always enough memory, +until memory actually runs out. +One use case for this mode is scientific computing applications +that employ large sparse arrays. +Before Linux 2.6.0, any nonzero value implies mode 1. +.IP +In mode 2 (available since Linux 2.6), the total virtual address space +that can be allocated +.RI ( CommitLimit +in +.IR /proc/meminfo ) +is calculated as +.IP +.in +4n +.EX +CommitLimit = (total_RAM \- total_huge_TLB) * + overcommit_ratio / 100 + total_swap +.EE +.in +.IP +where: +.RS +.IP \[bu] 3 +.I total_RAM +is the total amount of RAM on the system; +.IP \[bu] +.I total_huge_TLB +is the amount of memory set aside for huge pages; +.IP \[bu] +.I overcommit_ratio +is the value in +.IR /proc/sys/vm/overcommit_ratio ; +and +.IP \[bu] +.I total_swap +is the amount of swap space. +.RE +.IP +For example, on a system with 16 GB of physical RAM, 16 GB +of swap, no space dedicated to huge pages, and an +.I overcommit_ratio +of 50, this formula yields a +.I CommitLimit +of 24 GB. +.IP +Since Linux 3.14, if the value in +.I /proc/sys/vm/overcommit_kbytes +is nonzero, then +.I CommitLimit +is instead calculated as: +.IP +.in +4n +.EX +CommitLimit = overcommit_kbytes + total_swap +.EE +.in +.IP +See also the description of +.I /proc/sys/vm/admin_reserve_kbytes +and +.IR /proc/sys/vm/user_reserve_kbytes . +.TP +.IR /proc/sys/vm/overcommit_ratio " (since Linux 2.6.0)" +This writable file defines a percentage by which memory +can be overcommitted. +The default value in the file is 50. +See the description of +.IR /proc/sys/vm/overcommit_memory . +.TP +.IR /proc/sys/vm/panic_on_oom " (since Linux 2.6.18)" +.\" The following is adapted from Documentation/sysctl/vm.txt +This enables or disables a kernel panic in +an out-of-memory situation. +.IP +If this file is set to the value 0, +the kernel's OOM-killer will kill some rogue process. +Usually, the OOM-killer is able to kill a rogue process and the +system will survive. +.IP +If this file is set to the value 1, +then the kernel normally panics when out-of-memory happens. +However, if a process limits allocations to certain nodes +using memory policies +.RB ( mbind (2) +.BR MPOL_BIND ) +or cpusets +.RB ( cpuset (7)) +and those nodes reach memory exhaustion status, +one process may be killed by the OOM-killer. +No panic occurs in this case: +because other nodes' memory may be free, +this means the system as a whole may not have reached +an out-of-memory situation yet. +.IP +If this file is set to the value 2, +the kernel always panics when an out-of-memory condition occurs. +.IP +The default value is 0. +1 and 2 are for failover of clustering. +Select either according to your policy of failover. +.TP +.I /proc/sys/vm/swappiness +.\" The following is from Documentation/sysctl/vm.txt +The value in this file controls how aggressively the kernel will swap +memory pages. +Higher values increase aggressiveness, lower values +decrease aggressiveness. +The default value is 60. +.TP +.IR /proc/sys/vm/user_reserve_kbytes " (since Linux 3.10)" +.\" commit c9b1d0981fcce3d9976d7b7a56e4e0503bc610dd +Specifies an amount of memory (in KiB) to reserve for user processes. +This is intended to prevent a user from starting a single memory hogging +process, such that they cannot recover (kill the hog). +The value in this file has an effect only when +.I /proc/sys/vm/overcommit_memory +is set to 2 ("overcommit never" mode). +In this case, the system reserves an amount of memory that is the minimum +of [3% of current process size, +.IR user_reserve_kbytes ]. +.IP +The default value in this file is the minimum of [3% of free pages, 128MiB] +expressed as KiB. +.IP +If the value in this file is set to zero, +then a user will be allowed to allocate all free memory with a single process +(minus the amount reserved by +.IR /proc/sys/vm/admin_reserve_kbytes ). +Any subsequent attempts to execute a command will result in +"fork: Cannot allocate memory". +.IP +Changing the value in this file takes effect whenever +an application requests memory. +.TP +.IR /proc/sys/vm/unprivileged_userfaultfd " (since Linux 5.2)" +.\" cefdca0a86be517bc390fc4541e3674b8e7803b0 +This (writable) file exposes a flag that controls whether +unprivileged processes are allowed to employ +.BR userfaultfd (2). +If this file has the value 1, then unprivileged processes may use +.BR userfaultfd (2). +If this file has the value 0, then only processes that have the +.B CAP_SYS_PTRACE +capability may employ +.BR userfaultfd (2). +The default value in this file is 1. +.SH SEE ALSO +.BR proc (5) |