summaryrefslogtreecommitdiffstats
path: root/man/man2/memfd_secret.2
diff options
context:
space:
mode:
Diffstat (limited to 'man/man2/memfd_secret.2')
-rw-r--r--man/man2/memfd_secret.2204
1 files changed, 204 insertions, 0 deletions
diff --git a/man/man2/memfd_secret.2 b/man/man2/memfd_secret.2
new file mode 100644
index 000000000..5d2436cc5
--- /dev/null
+++ b/man/man2/memfd_secret.2
@@ -0,0 +1,204 @@
+.\" Copyright (c) 2021, IBM Corporation.
+.\" Written by Mike Rapoport <rppt@linux.ibm.com>
+.\"
+.\" Based on memfd_create(2) man page
+.\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
+.\" and Copyright (C) 2014 David Herrmann <dh.herrmann@gmail.com>
+.\"
+.\" SPDX-License-Identifier: GPL-2.0-or-later
+.\"
+.TH memfd_secret 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+memfd_secret \- create an anonymous RAM-based file
+to access secret memory regions
+.SH LIBRARY
+Standard C library
+.RI ( libc ", " \-lc )
+.SH SYNOPSIS
+.nf
+.P
+.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
+.B #include <unistd.h>
+.P
+.BI "int syscall(SYS_memfd_secret, unsigned int " flags );
+.fi
+.P
+.IR Note :
+glibc provides no wrapper for
+.BR memfd_secret (),
+necessitating the use of
+.BR syscall (2).
+.SH DESCRIPTION
+.BR memfd_secret ()
+creates an anonymous RAM-based file and returns a file descriptor
+that refers to it.
+The file provides a way to create and access memory regions
+with stronger protection than usual RAM-based files and
+anonymous memory mappings.
+Once all open references to the file are closed,
+it is automatically released.
+The initial size of the file is set to 0.
+Following the call, the file size should be set using
+.BR ftruncate (2).
+.P
+The memory areas backing the file created with
+.BR memfd_secret (2)
+are visible only to the processes that have access to the file descriptor.
+The memory region is removed from the kernel page tables
+and only the page tables of the processes holding the file descriptor
+map the corresponding physical memory.
+(Thus, the pages in the region can't be accessed by the kernel itself,
+so that, for example, pointers to the region can't be passed to
+system calls.)
+.P
+The following values may be bitwise ORed in
+.I flags
+to control the behavior of
+.BR memfd_secret ():
+.TP
+.B FD_CLOEXEC
+Set the close-on-exec flag on the new file descriptor,
+which causes the region to be removed from the process on
+.BR execve (2).
+See the description of the
+.B O_CLOEXEC
+flag in
+.BR open (2)
+.P
+As its return value,
+.BR memfd_secret ()
+returns a new file descriptor that refers to an anonymous file.
+This file descriptor is opened for both reading and writing
+.RB ( O_RDWR )
+and
+.B O_LARGEFILE
+is set for the file descriptor.
+.P
+With respect to
+.BR fork (2)
+and
+.BR execve (2),
+the usual semantics apply for the file descriptor created by
+.BR memfd_secret ().
+A copy of the file descriptor is inherited by the child produced by
+.BR fork (2)
+and refers to the same file.
+The file descriptor is preserved across
+.BR execve (2),
+unless the close-on-exec flag has been set.
+.P
+The memory region is locked into memory in the same way as with
+.BR mlock (2),
+so that it will never be written into swap,
+and hibernation is inhibited for as long as any
+.BR memfd_secret ()
+descriptions exist.
+However the implementation of
+.BR memfd_secret ()
+will not try to populate the whole range during the
+.BR mmap (2)
+call that attaches the region into the process's address space;
+instead, the pages are only actually allocated
+as they are faulted in.
+The amount of memory allowed for memory mappings
+of the file descriptor obeys the same rules as
+.BR mlock (2)
+and cannot exceed
+.BR RLIMIT_MEMLOCK .
+.SH RETURN VALUE
+On success,
+.BR memfd_secret ()
+returns a new file descriptor.
+On error, \-1 is returned and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+.TP
+.B EINVAL
+.I flags
+included unknown bits.
+.TP
+.B EMFILE
+The per-process limit on the number of open file descriptors has been reached.
+.TP
+.B EMFILE
+The system-wide limit on the total number of open files has been reached.
+.TP
+.B ENOMEM
+There was insufficient memory to create a new anonymous file.
+.TP
+.B ENOSYS
+.BR memfd_secret ()
+is not implemented on this architecture,
+or has not been enabled on the kernel command-line with
+.BR secretmem_enable =1.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 5.14.
+.SH NOTES
+The
+.BR memfd_secret ()
+system call is designed to allow a user-space process
+to create a range of memory that is inaccessible to anybody else -
+kernel included.
+There is no 100% guarantee that kernel won't be able to access
+memory ranges backed by
+.BR memfd_secret ()
+in any circumstances, but nevertheless,
+it is much harder to exfiltrate data from these regions.
+.P
+.BR memfd_secret ()
+provides the following protections:
+.IP \[bu] 3
+Enhanced protection
+(in conjunction with all the other in-kernel attack prevention systems)
+against ROP attacks.
+Absence of any in-kernel primitive for accessing memory backed by
+.BR memfd_secret ()
+means that one-gadget ROP attack
+can't work to perform data exfiltration.
+The attacker would need to find enough ROP gadgets
+to reconstruct the missing page table entries,
+which significantly increases difficulty of the attack,
+especially when other protections like the kernel stack size limit
+and address space layout randomization are in place.
+.IP \[bu]
+Prevent cross-process user-space memory exposures.
+Once a region for a
+.BR memfd_secret ()
+memory mapping is allocated,
+the user can't accidentally pass it into the kernel
+to be transmitted somewhere.
+The memory pages in this region cannot be accessed via the direct map
+and they are disallowed in get_user_pages.
+.IP \[bu]
+Harden against exploited kernel flaws.
+In order to access memory areas backed by
+.BR memfd_secret (),
+a kernel-side attack would need to
+either walk the page tables and create new ones,
+or spawn a new privileged user-space process to perform
+secrets exfiltration using
+.BR ptrace (2).
+.P
+The way
+.BR memfd_secret ()
+allocates and locks the memory may impact overall system performance,
+therefore the system call is disabled by default and only available
+if the system administrator turned it on using
+"secretmem.enable=y" kernel parameter.
+.P
+To prevent potential data leaks of memory regions backed by
+.BR memfd_secret ()
+from a hybernation image,
+hybernation is prevented when there are active
+.BR memfd_secret ()
+users.
+.SH SEE ALSO
+.BR fcntl (2),
+.BR ftruncate (2),
+.BR mlock (2),
+.BR memfd_create (2),
+.BR mmap (2),
+.BR setrlimit (2)