1 files changed, 507 insertions, 0 deletions
diff --git a/man/man2/mlock.2 b/man/man2/mlock.2
new file mode 100644
index 000000000..30f6ac130
--- /dev/null
+++ b/man/man2/mlock.2
@@ -0,0 +1,507 @@
+.\" Copyright (C) Michael Kerrisk, 2004
+.\"	using some material drawn from earlier man pages
+.\"	written by Thomas Kuhn, Copyright 1996
+.\"
+.\" SPDX-License-Identifier: GPL-2.0-or-later
+.\"
+.TH mlock 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+mlock, mlock2, munlock, mlockall, munlockall \- lock and unlock memory
+.SH LIBRARY
+Standard C library
+.RI ( libc ", " \-lc )
+.SH SYNOPSIS
+.nf
+.B #include <sys/mman.h>
+.P
+.BI "int mlock(const void " addr [. len "], size_t " len );
+.BI "int mlock2(const void " addr [. len "], size_t " len ", \
+unsigned int " flags );
+.BI "int munlock(const void " addr [. len "], size_t " len );
+.P
+.BI "int mlockall(int " flags );
+.B int munlockall(void);
+.fi
+.SH DESCRIPTION
+.BR mlock (),
+.BR mlock2 (),
+and
+.BR mlockall ()
+lock part or all of the calling process's virtual address
+space into RAM, preventing that memory from being paged to the
+swap area.
+.P
+.BR munlock ()
+and
+.BR munlockall ()
+perform the converse operation,
+unlocking part or all of the calling process's virtual address space,
+so that pages in the specified virtual address range
+can be swapped out again if required by the kernel memory manager.
+.P
+Memory locking and unlocking are performed in units of whole pages.
+.SS mlock(), mlock2(), and munlock()
+.BR mlock ()
+locks pages in the address range starting at
+.I addr
+and continuing for
+.I len
+bytes.
+All pages that contain a part of the specified address range are
+guaranteed to be resident in RAM when the call returns successfully;
+the pages are guaranteed to stay in RAM until later unlocked.
+.P
+.BR mlock2 ()
+.\" commit a8ca5d0ecbdde5cc3d7accacbd69968b0c98764e
+.\" commit de60f5f10c58d4f34b68622442c0e04180367f3f
+.\" commit b0f205c2a3082dd9081f9a94e50658c5fa906ff1
+also locks pages in the specified range starting at
+.I addr
+and continuing for
+.I len
+bytes.
+However, the state of the pages contained in that range after the call
+returns successfully will depend on the value in the
+.I flags
+argument.
+.P
+The
+.I flags
+argument can be either 0 or the following constant:
+.TP
+.B MLOCK_ONFAULT
+Lock pages that are currently resident and mark the entire range so
+that the remaining nonresident pages are locked when they are populated
+by a page fault.
+.P
+If
+.I flags
+is 0,
+.BR mlock2 ()
+behaves exactly the same as
+.BR mlock ().
+.P
+.BR munlock ()
+unlocks pages in the address range starting at
+.I addr
+and continuing for
+.I len
+bytes.
+After this call, all pages that contain a part of the specified
+memory range can be moved to external swap space again by the kernel.
+.SS mlockall() and munlockall()
+.BR mlockall ()
+locks all pages mapped into the address space of the
+calling process.
+This includes the pages of the code, data, and stack
+segment, as well as shared libraries, user space kernel data, shared
+memory, and memory-mapped files.
+All mapped pages are guaranteed
+to be resident in RAM when the call returns successfully;
+the pages are guaranteed to stay in RAM until later unlocked.
+.P
+The
+.I flags
+argument is constructed as the bitwise OR of one or more of the
+following constants:
+.TP
+.B MCL_CURRENT
+Lock all pages which are currently mapped into the address space of
+the process.
+.TP
+.B MCL_FUTURE
+Lock all pages which will become mapped into the address space of the
+process in the future.
+These could be, for instance, new pages required
+by a growing heap and stack as well as new memory-mapped files or
+shared memory regions.
+.TP
+.BR MCL_ONFAULT " (since Linux 4.4)"
+Used together with
+.BR MCL_CURRENT ,
+.BR MCL_FUTURE ,
+or both.
+Mark all current (with
+.BR MCL_CURRENT )
+or future (with
+.BR MCL_FUTURE )
+mappings to lock pages when they are faulted in.
+When used with
+.BR MCL_CURRENT ,
+all present pages are locked, but
+.BR mlockall ()
+will not fault in non-present pages.
+When used with
+.BR MCL_FUTURE ,
+all future mappings will be marked to lock pages when they are faulted
+in, but they will not be populated by the lock when the mapping is
+created.
+.B MCL_ONFAULT
+must be used with either
+.B MCL_CURRENT
+or
+.B MCL_FUTURE
+or both.
+.P
+If
+.B MCL_FUTURE
+has been specified, then a later system call (e.g.,
+.BR mmap (2),
+.BR sbrk (2),
+.BR malloc (3)),
+may fail if it would cause the number of locked bytes to exceed
+the permitted maximum (see below).
+In the same circumstances, stack growth may likewise fail:
+the kernel will deny stack expansion and deliver a
+.B SIGSEGV
+signal to the process.
+.P
+.BR munlockall ()
+unlocks all pages mapped into the address space of the
+calling process.
+.SH RETURN VALUE
+On success, these system calls return 0.
+On error, \-1 is returned,
+.I errno
+is set to indicate the error,
+and no changes are made to any locks in the
+address space of the process.
+.SH ERRORS
+.\"SVr4 documents an additional EAGAIN error code.
+.TP
+.B EAGAIN
+.RB ( mlock (),
+.BR mlock2 (),
+and
+.BR munlock ())
+Some or all of the specified address range could not be locked.
+.TP
+.B EINVAL
+.RB ( mlock (),
+.BR mlock2 (),
+and
+.BR munlock ())
+The result of the addition
+.IR addr + len
+was less than
+.I addr
+(e.g., the addition may have resulted in an overflow).
+.TP
+.B EINVAL
+.RB ( mlock2 ())
+Unknown \fIflags\fP were specified.
+.TP
+.B EINVAL
+.RB ( mlockall ())
+Unknown \fIflags\fP were specified or
+.B MCL_ONFAULT
+was specified without either
+.B MCL_FUTURE
+or
+.BR MCL_CURRENT .
+.TP
+.B EINVAL
+(Not on Linux)
+.I addr
+was not a multiple of the page size.
+.TP
+.B ENOMEM
+.RB ( mlock (),
+.BR mlock2 (),
+and
+.BR munlock ())
+Some of the specified address range does not correspond to mapped
+pages in the address space of the process.
+.TP
+.B ENOMEM
+.RB ( mlock (),
+.BR mlock2 (),
+and
+.BR munlock ())
+Locking or unlocking a region would result in the total number of
+mappings with distinct attributes (e.g., locked versus unlocked)
+exceeding the allowed maximum.
+.\" I.e., the number of VMAs would exceed the 64kB maximum
+(For example, unlocking a range in the middle of a currently locked
+mapping would result in three mappings:
+two locked mappings at each end and an unlocked mapping in the middle.)
+.TP
+.B ENOMEM
+(Linux 2.6.9 and later) the caller had a nonzero
+.B RLIMIT_MEMLOCK
+soft resource limit, but tried to lock more memory than the limit
+permitted.
+This limit is not enforced if the process is privileged
+.RB ( CAP_IPC_LOCK ).
+.TP
+.B ENOMEM
+(Linux 2.4 and earlier) the calling process tried to lock more than
+half of RAM.
+.\" In the case of mlock(), this check is somewhat buggy: it doesn't
+.\" take into account whether the to-be-locked range overlaps with
+.\" already locked pages.  Thus, suppose we allocate
+.\" (num_physpages / 4 + 1) of memory, and lock those pages once using
+.\" mlock(), and then lock the *same* page range a second time.
+.\" In the case, the second mlock() call will fail, since the check
+.\" calculates that the process is trying to lock (num_physpages / 2 + 2)
+.\" pages, which of course is not true.  (MTK, Nov 04, kernel 2.4.28)
+.TP
+.B EPERM
+The caller is not privileged, but needs privilege
+.RB ( CAP_IPC_LOCK )
+to perform the requested operation.
+.TP
+.B EPERM
+.RB ( munlockall ())
+(Linux 2.6.8 and earlier) The caller was not privileged
+.RB ( CAP_IPC_LOCK ).
+.SH VERSIONS
+.SS Linux
+Under Linux,
+.BR mlock (),
+.BR mlock2 (),
+and
+.BR munlock ()
+automatically round
+.I addr
+down to the nearest page boundary.
+However, the POSIX.1 specification of
+.BR mlock ()
+and
+.BR munlock ()
+allows an implementation to require that
+.I addr
+is page aligned, so portable applications should ensure this.
+.P
+The
+.I VmLck
+field of the Linux-specific
+.IR /proc/ pid /status
+file shows how many kilobytes of memory the process with ID
+.I PID
+has locked using
+.BR mlock (),
+.BR mlock2 (),
+.BR mlockall (),
+and
+.BR mmap (2)
+.BR MAP_LOCKED .
+.SH STANDARDS
+.TP
+.BR mlock ()
+.TQ
+.BR munlock ()
+.TQ
+.BR mlockall ()
+.TQ
+.BR munlockall ()
+POSIX.1-2008.
+.TP
+.BR mlock2 ()
+Linux.
+.P
+On POSIX systems on which
+.BR mlock ()
+and
+.BR munlock ()
+are available,
+.B _POSIX_MEMLOCK_RANGE
+is defined in \fI<unistd.h>\fP and the number of bytes in a page
+can be determined from the constant
+.B PAGESIZE
+(if defined) in \fI<limits.h>\fP or by calling
+.IR sysconf(_SC_PAGESIZE) .
+.P
+On POSIX systems on which
+.BR mlockall ()
+and
+.BR munlockall ()
+are available,
+.B _POSIX_MEMLOCK
+is defined in \fI<unistd.h>\fP to a value greater than 0.
+(See also
+.BR sysconf (3).)
+.\" POSIX.1-2001: It shall be defined to -1 or 0 or 200112L.
+.\" -1: unavailable, 0: ask using sysconf().
+.\" glibc defines it to 1.
+.SH HISTORY
+.TP
+.BR mlock ()
+.TQ
+.BR munlock ()
+.TQ
+.BR mlockall ()
+.TQ
+.BR munlockall ()
+POSIX.1-2001, POSIX.1-2008, SVr4.
+.TP
+.BR mlock2 ()
+Linux 4.4,
+glibc 2.27.
+.SH NOTES
+Memory locking has two main applications: real-time algorithms and
+high-security data processing.
+Real-time applications require
+deterministic timing, and, like scheduling, paging is one major cause
+of unexpected program execution delays.
+Real-time applications will
+usually also switch to a real-time scheduler with
+.BR sched_setscheduler (2).
+Cryptographic security software often handles critical bytes like
+passwords or secret keys as data structures.
+As a result of paging,
+these secrets could be transferred onto a persistent swap store medium,
+where they might be accessible to the enemy long after the security
+software has erased the secrets in RAM and terminated.
+(But be aware that the suspend mode on laptops and some desktop
+computers will save a copy of the system's RAM to disk, regardless
+of memory locks.)
+.P
+Real-time processes that are using
+.BR mlockall ()
+to prevent delays on page faults should reserve enough
+locked stack pages before entering the time-critical section,
+so that no page fault can be caused by function calls.
+This can be achieved by calling a function that allocates a
+sufficiently large automatic variable (an array) and writes to the
+memory occupied by this array in order to touch these stack pages.
+This way, enough pages will be mapped for the stack and can be
+locked into RAM.
+The dummy writes ensure that not even copy-on-write
+page faults can occur in the critical section.
+.P
+Memory locks are not inherited by a child created via
+.BR fork (2)
+and are automatically removed (unlocked) during an
+.BR execve (2)
+or when the process terminates.
+The
+.BR mlockall ()
+.B MCL_FUTURE
+and
+.B MCL_FUTURE | MCL_ONFAULT
+settings are not inherited by a child created via
+.BR fork (2)
+and are cleared during an
+.BR execve (2).
+.P
+Note that
+.BR fork (2)
+will prepare the address space for a copy-on-write operation.
+The consequence is that any write access that follows will cause
+a page fault that in turn may cause high latencies for a real-time process.
+Therefore, it is crucial not to invoke
+.BR fork (2)
+after an
+.BR mlockall ()
+or
+.BR mlock ()
+operation\[em]not even from a thread which runs at a low priority within
+a process which also has a thread running at elevated priority.
+.P
+The memory lock on an address range is automatically removed
+if the address range is unmapped via
+.BR munmap (2).
+.P
+Memory locks do not stack, that is, pages which have been locked several times
+by calls to
+.BR mlock (),
+.BR mlock2 (),
+or
+.BR mlockall ()
+will be unlocked by a single call to
+.BR munlock ()
+for the corresponding range or by
+.BR munlockall ().
+Pages which are mapped to several locations or by several processes stay
+locked into RAM as long as they are locked at least at one location or by
+at least one process.
+.P
+If a call to
+.BR mlockall ()
+which uses the
+.B MCL_FUTURE
+flag is followed by another call that does not specify this flag, the
+changes made by the
+.B MCL_FUTURE
+call will be lost.
+.P
+The
+.BR mlock2 ()
+.B MLOCK_ONFAULT
+flag and the
+.BR mlockall ()
+.B MCL_ONFAULT
+flag allow efficient memory locking for applications that deal with
+large mappings where only a (small) portion of pages in the mapping are touched.
+In such cases, locking all of the pages in a mapping would incur
+a significant penalty for memory locking.
+.SS Limits and permissions
+In Linux 2.6.8 and earlier,
+a process must be privileged
+.RB ( CAP_IPC_LOCK )
+in order to lock memory and the
+.B RLIMIT_MEMLOCK
+soft resource limit defines a limit on how much memory the process may lock.
+.P
+Since Linux 2.6.9, no limits are placed on the amount of memory
+that a privileged process can lock and the
+.B RLIMIT_MEMLOCK
+soft resource limit instead defines a limit on how much memory an
+unprivileged process may lock.
+.SH BUGS
+In Linux 4.8 and earlier,
+a bug in the kernel's accounting of locked memory for unprivileged processes
+(i.e., without
+.BR CAP_IPC_LOCK )
+meant that if the region specified by
+.I addr
+and
+.I len
+overlapped an existing lock,
+then the already locked bytes in the overlapping region were counted twice
+when checking against the limit.
+Such double accounting could incorrectly calculate a "total locked memory"
+value for the process that exceeded the
+.B RLIMIT_MEMLOCK
+limit, with the result that
+.BR mlock ()
+and
+.BR mlock2 ()
+would fail on requests that should have succeeded.
+This bug was fixed
+.\" commit 0cf2f6f6dc605e587d2c1120f295934c77e810e8
+in Linux 4.9.
+.P
+In Linux 2.4 series of kernels up to and including Linux 2.4.17,
+a bug caused the
+.BR mlockall ()
+.B MCL_FUTURE
+flag to be inherited across a
+.BR fork (2).
+This was rectified in Linux 2.4.18.
+.P
+Since Linux 2.6.9, if a privileged process calls
+.I mlockall(MCL_FUTURE)
+and later drops privileges (loses the
+.B CAP_IPC_LOCK
+capability by, for example,
+setting its effective UID to a nonzero value),
+then subsequent memory allocations (e.g.,
+.BR mmap (2),
+.BR brk (2))
+will fail if the
+.B RLIMIT_MEMLOCK
+resource limit is encountered.
+.\" See the following LKML thread:
+.\" http://marc.theaimsgroup.com/?l=linux-kernel&m=113801392825023&w=2
+.\" "Rationale for RLIMIT_MEMLOCK"
+.\" 23 Jan 2006
+.SH SEE ALSO
+.BR mincore (2),
+.BR mmap (2),
+.BR setrlimit (2),
+.BR shmctl (2),
+.BR sysconf (3),
+.BR proc (5),
+.BR capabilities (7)