summaryrefslogtreecommitdiffstats
path: root/man2/seccomp_unotify.2
diff options
context:
space:
mode:
Diffstat (limited to 'man2/seccomp_unotify.2')
-rw-r--r--man2/seccomp_unotify.2178
1 files changed, 89 insertions, 89 deletions
diff --git a/man2/seccomp_unotify.2 b/man2/seccomp_unotify.2
index f11dabf3a..1de5601fe 100644
--- a/man2/seccomp_unotify.2
+++ b/man2/seccomp_unotify.2
@@ -37,9 +37,9 @@ flag, the
action value, and the
.B SECCOMP_GET_NOTIF_SIZES
operation described in
-.BR seccomp (2),
+.MR seccomp 2 ,
this mechanism involves the use of a number of related
-.BR ioctl (2)
+.MR ioctl 2
operations (described below).
.\"
.SS Overview
@@ -79,12 +79,12 @@ but with two differences:
.RS
.IP \[bu] 3
The
-.BR seccomp (2)
+.MR seccomp 2
.I flags
argument includes the flag
.BR SECCOMP_FILTER_FLAG_NEW_LISTENER .
Consequently, the return value of the (successful)
-.BR seccomp (2)
+.MR seccomp 2
call is a new "listening"
file descriptor that can be used to receive notifications.
Only one "listening" seccomp filter can be installed for a thread.
@@ -117,9 +117,9 @@ over a UNIX domain socket connection between the target and the supervisor
(using the
.B SCM_RIGHTS
ancillary message type described in
-.BR unix (7)).
+.MR unix 7 ).
Another way to do this is through the use of
-.BR pidfd_getfd (2).
+.MR pidfd_getfd 2 .
.\" Jann Horn:
.\" Instead of using unix domain sockets to send the fd to the
.\" parent, I think you could also use clone3() with
@@ -138,7 +138,7 @@ These events are returned as structures of type
Because this structure and its size may evolve over kernel versions,
the supervisor must first determine the size of this structure
using the
-.BR seccomp (2)
+.MR seccomp 2
.B SECCOMP_GET_NOTIF_SIZES
operation, which returns a structure of type
.IR seccomp_notif_sizes .
@@ -171,7 +171,7 @@ listening file descriptor for
events.
To do this, the supervisor uses the
.B SECCOMP_IOCTL_NOTIF_RECV
-.BR ioctl (2)
+.MR ioctl 2
operation to read information about a notification event;
this operation blocks until an event is available.
The operation returns a
@@ -180,10 +180,10 @@ structure containing information about the system call
that is being attempted by the target.
(As described in NOTES,
the file descriptor can also be monitored with
-.BR select (2),
-.BR poll (2),
+.MR select 2 ,
+.MR poll 2 ,
or
-.BR epoll (7).)
+.MR epoll 7 .)
.\" FIXME
.\" Christian Brauner:
.\"
@@ -220,7 +220,7 @@ values of pointer arguments for the target's system call.
One way in which the supervisor can do this is to open the corresponding
.IR /proc/ tid /mem
file (see
-.BR proc (5))
+.MR proc 5 )
and read bytes from the location that corresponds to one of
the pointer arguments whose value is supplied in the notification event.
.\" Tycho Andersen mentioned that there are alternatives to /proc/PID/mem,
@@ -229,7 +229,7 @@ the pointer arguments whose value is supplied in the notification event.
a race condition that can occur when doing this;
see the description of the
.B SECCOMP_IOCTL_NOTIF_ID_VALID
-.BR ioctl (2)
+.MR ioctl 2
operation below.)
In addition,
the supervisor can access other system information that is visible
@@ -260,7 +260,7 @@ variable of the target.
.IP
The response is sent using the
.B SECCOMP_IOCTL_NOTIF_SEND
-.BR ioctl (2)
+.MR ioctl 2
operation, which is used to transmit a
.I seccomp_notif_resp
structure to the kernel.
@@ -294,13 +294,13 @@ below.
.\"
.SH IOCTL OPERATIONS
The following
-.BR ioctl (2)
+.MR ioctl 2
operations are supported by the seccomp user-space
notification file descriptor.
For each of these operations, the first (file descriptor) argument of
-.BR ioctl (2)
+.MR ioctl 2
is the listening file descriptor returned by a call to
-.BR seccomp (2)
+.MR seccomp 2
with the
.B SECCOMP_FILTER_FLAG_NEW_LISTENER
flag.
@@ -313,7 +313,7 @@ notification event.
If no such event is currently pending,
the operation blocks until an event occurs.
The third
-.BR ioctl (2)
+.MR ioctl 2
argument is a pointer to a structure of the following form
which contains information about the event.
This structure must be zeroed out before the call.
@@ -339,7 +339,7 @@ seccomp filter.
.IP \[bu] 3
The cookie can be used with the
.B SECCOMP_IOCTL_NOTIF_ID_VALID
-.BR ioctl (2)
+.MR ioctl 2
operation described below.
.IP \[bu]
When returning a notification response to the kernel,
@@ -365,7 +365,7 @@ structure containing information about the system call that
triggered the notification.
This is the same structure that is passed to the seccomp filter.
See
-.BR seccomp (2)
+.MR seccomp 2
for details of this structure.
.P
On success, this operation returns 0; on failure, \-1 is returned, and
@@ -425,7 +425,7 @@ operation is still valid
is still blocked waiting for a response).
.P
The third
-.BR ioctl (2)
+.MR ioctl 2
argument is a pointer to the cookie
.RI ( id )
returned by the
@@ -452,7 +452,7 @@ Another thread or process is created on the system that by chance reuses the
TID that was freed when the target terminated.
.IP (4)
The supervisor
-.BR open (2)s
+.MR open 2 s
the
.IR /proc/ tid /mem
file for the TID obtained in step 1, with the intention of (say)
@@ -462,14 +462,14 @@ the system call that triggered the notification in step 1.
In the above scenario, the risk is that the supervisor may try
to access the memory of a process other than the target.
This race can be avoided by following the call to
-.BR open (2)
+.MR open 2
with a
.B SECCOMP_IOCTL_NOTIF_ID_VALID
operation to verify that the process that generated the notification
is still alive.
(Note that if the target terminates after the latter step,
a subsequent
-.BR read (2)
+.MR read 2
from the file descriptor may return 0, indicating end of file.)
.\" Jann Horn:
.\" the PID can be reused, but the /proc/$pid directory is
@@ -497,7 +497,7 @@ The
operation (available since Linux 5.0)
is used to send a notification response back to the kernel.
The third
-.BR ioctl (2)
+.MR ioctl 2
argument of this structure is a pointer to a structure of the following form:
.P
.in +4n
@@ -652,7 +652,7 @@ into the target's file descriptor table.
Much like the use of
.B SCM_RIGHTS
messages described in
-.BR unix (7),
+.MR unix 7 ,
this operation is semantically equivalent to duplicating
a file descriptor from the supervisor's file descriptor table
into the target's file descriptor table.
@@ -660,16 +660,16 @@ into the target's file descriptor table.
The
.B SECCOMP_IOCTL_NOTIF_ADDFD
operation permits the supervisor to emulate a target system call (such as
-.BR socket (2)
+.MR socket 2
or
-.BR openat (2))
+.MR openat 2 )
that generates a file descriptor.
The supervisor can perform the system call that generates
the file descriptor (and associated open file description)
and then use this operation to allocate
a file descriptor that refers to the same open file description in the target.
(For an explanation of open file descriptions, see
-.BR open (2).)
+.MR open 2 .)
.P
Once this operation has been performed,
the supervisor can close its copy of the file descriptor.
@@ -688,7 +688,7 @@ and
of the target.
.P
The third
-.BR ioctl (2)
+.MR ioctl 2
argument is a pointer to a structure of the following form:
.P
.in +4n
@@ -776,12 +776,12 @@ Set the close-on-exec flag on the received file descriptor.
.RE
.P
On success, this
-.BR ioctl (2)
+.MR ioctl 2
call returns the number of the file descriptor that was allocated
in the target.
Assuming that the emulated system call is one that returns
a file descriptor as its function result (e.g.,
-.BR socket (2)),
+.MR socket 2 ),
this value can be used as the return value
.RI ( resp.val )
that is supplied in the response that is subsequently sent with the
@@ -798,7 +798,7 @@ This operation can fail with the following errors:
Allocating the file descriptor in the target would cause the target's
.B RLIMIT_NOFILE
limit to be exceeded (see
-.BR getrlimit (2)).
+.MR getrlimit 2 ).
.TP
.B EBUSY
If the flag
@@ -842,7 +842,7 @@ or the target has terminated.
Here is some sample code (with error handling omitted) that uses the
.B SECCOMP_ADDFD_FLAG_SETFD
operation (here, to emulate a call to
-.BR openat (2)):
+.MR openat 2 ):
.P
.EX
.in +4n
@@ -879,10 +879,10 @@ the processes inside the container)
to mount block devices or create device nodes for the container.
The mount use case provides an example of where the
.B SECCOMP_USER_NOTIF_FLAG_CONTINUE
-.BR ioctl (2)
+.MR ioctl 2
operation is useful.
Upon receiving a notification for the
-.BR mount (2)
+.MR mount 2
system call, the container manager (the "supervisor") can distinguish
a request to mount a block filesystem
(which would not be possible for a "target" process inside the container)
@@ -890,28 +890,28 @@ and mount that file system.
If, on the other hand, the container manager detects that the operation
could be performed by the process inside the container
(e.g., a mount of a
-.BR tmpfs (5)
+.MR tmpfs 5
filesystem), it can notify the kernel that the target process's
-.BR mount (2)
+.MR mount 2
system call can continue.
.\"
.SS select()/poll()/epoll semantics
The file descriptor returned when
-.BR seccomp (2)
+.MR seccomp 2
is employed with the
.B SECCOMP_FILTER_FLAG_NEW_LISTENER
flag can be monitored using
-.BR poll (2),
-.BR epoll (7),
+.MR poll 2 ,
+.MR epoll 7 ,
and
-.BR select (2).
+.MR select 2 .
These interfaces indicate that the file descriptor is ready as follows:
.IP \[bu] 3
When a notification is pending,
these interfaces indicate that the file descriptor is readable.
Following such an indication, a subsequent
.B SECCOMP_IOCTL_NOTIF_RECV
-.BR ioctl (2)
+.MR ioctl 2
will not block, returning either information about a notification
or else failing with the error
.B EINTR
@@ -920,22 +920,22 @@ has been interrupted by a signal handler.
.IP \[bu]
After the notification has been received (i.e., by the
.B SECCOMP_IOCTL_NOTIF_RECV
-.BR ioctl (2)
+.MR ioctl 2
operation), these interfaces indicate that the file descriptor is writable,
meaning that a notification response can be sent using the
.B SECCOMP_IOCTL_NOTIF_SEND
-.BR ioctl (2)
+.MR ioctl 2
operation.
.IP \[bu]
After the last thread using the filter has terminated and been reaped using
-.BR waitpid (2)
+.MR waitpid 2
(or similar),
the file descriptor indicates an end-of-file condition (readable in
-.BR select (2);
+.MR select 2 ;
.BR POLLHUP / EPOLLHUP
in
-.BR poll (2)/
-.BR epoll_wait (2)).
+.MR poll 2 /
+.MR epoll_wait 2 ).
.SS Design goals; use of SECCOMP_USER_NOTIF_FLAG_CONTINUE
The intent of the user-space notification feature is
to allow system calls to be performed on behalf of the target.
@@ -959,13 +959,13 @@ rewriting the system call arguments.
.P
Note furthermore that a user-space notifier can be bypassed if
the existing filters allow the use of
-.BR seccomp (2)
+.MR seccomp 2
or
-.BR prctl (2)
+.MR prctl 2
to install a filter that returns an action value with a higher precedence than
.B SECCOMP_RET_USER_NOTIF
(see
-.BR seccomp (2)).
+.MR seccomp 2 ).
.P
It should thus be absolutely clear that the
seccomp user-space notification mechanism
@@ -983,7 +983,7 @@ the system call if its arguments are rewritten to something unsafe.
.SS Caveats regarding the use of \fI/proc/\fPtid\fI/mem\fP
The discussion above noted the need to use the
.B SECCOMP_IOCTL_NOTIF_ID_VALID
-.BR ioctl (2)
+.MR ioctl 2
when opening the
.IR /proc/ tid /mem
file of the target
@@ -991,19 +991,19 @@ to avoid the possibility of accessing the memory of the wrong process
in the event that the target terminates and its ID
is recycled by another (unrelated) thread.
However, the use of this
-.BR ioctl (2)
+.MR ioctl 2
operation is also necessary in other situations,
as explained in the following paragraphs.
.P
Consider the following scenario, where the supervisor
tries to read the pathname argument of a target's blocked
-.BR mount (2)
+.MR mount 2
system call:
.IP (1) 5
From one of its functions
.RI ( func() ),
the target calls
-.BR mount (2),
+.MR mount 2 ,
which triggers a user-space notification and causes the target to block.
.IP (2)
The supervisor receives the notification, opens
@@ -1013,7 +1013,7 @@ and (successfully) performs the
check.
.IP (3)
The target receives a signal, which causes the
-.BR mount (2)
+.MR mount 2
to abort.
.IP (4)
The signal handler executes in the target, and returns.
@@ -1029,7 +1029,7 @@ the supervisor reads from the target's memory location that used to
contain the pathname.
.IP (7)
The supervisor now calls
-.BR mount (2)
+.MR mount 2
with some arbitrary bytes obtained in the previous step.
.P
The conclusion from the above scenario is this:
@@ -1056,7 +1056,7 @@ be considered safe.
.\"
.SS Caveats regarding blocking system calls
Suppose that the target performs a blocking system call (e.g.,
-.BR accept (2))
+.MR accept 2 )
that the supervisor should handle.
The supervisor might then in turn execute the same blocking system call.
.P
@@ -1069,13 +1069,13 @@ If the supervisor does not take suitable steps to
actively discover that the target's system call has been canceled,
various difficulties can occur.
Taking the example of
-.BR accept (2),
+.MR accept 2 ,
the supervisor might remain blocked in its
-.BR accept (2)
+.MR accept 2
holding a port number that the target
(which, after the interruption by the signal handler,
perhaps closed its listening socket) might expect to be able to reuse in a
-.BR bind (2)
+.MR bind 2
call.
.P
Therefore, when the supervisor wishes to emulate a blocking system call,
@@ -1087,12 +1087,12 @@ that uses the
.B SECCOMP_IOCTL_NOTIF_ID_VALID
operation to check if the target is still blocked in its system call.
Alternatively, in the
-.BR accept (2)
+.MR accept 2
example, the supervisor might use
-.BR poll (2)
+.MR poll 2
to monitor both the notification file descriptor
(so as to discover when the target's
-.BR accept (2)
+.MR accept 2
call has been interrupted) and the listening file descriptor
(so as to know when a connection is available).
.P
@@ -1104,7 +1104,7 @@ that it acquired on behalf of the target.
Consider the following scenario:
.IP (1) 5
The target process has used
-.BR sigaction (2)
+.MR sigaction 2
to install a signal handler with the
.B SA_RESTART
flag.
@@ -1117,7 +1117,7 @@ A signal is delivered to the target and the signal handler is executed.
.IP (4)
When (if) the supervisor attempts to send a notification response, the
.B SECCOMP_IOCTL_NOTIF_SEND
-.BR ioctl (2))
+.MR ioctl 2 )
operation will fail with the
.B ENOENT
error.
@@ -1131,7 +1131,7 @@ the same instance of a system call in the target.
.P
One oddity is that system call restarting as described in this scenario
will occur even for the blocking system calls listed in
-.BR signal (7)
+.MR signal 7
that would
.B never
normally be restarted by the
@@ -1167,11 +1167,11 @@ making sure no file descriptors are inadvertently leaked into the target.
.SH BUGS
If a
.B SECCOMP_IOCTL_NOTIF_RECV
-.BR ioctl (2)
+.MR ioctl 2
operation
.\" or a poll/epoll/select
is performed after the target terminates, then the
-.BR ioctl (2)
+.MR ioctl 2
call simply blocks (rather than returning an error to indicate that the
target no longer exists).
.\" FIXME
@@ -1189,28 +1189,28 @@ The program creates a child process that serves as the "target" process.
The child process installs a seccomp filter that returns the
.B SECCOMP_RET_USER_NOTIF
action value if a call is made to
-.BR mkdir (2).
+.MR mkdir 2 .
The child process then calls
-.BR mkdir (2)
+.MR mkdir 2
once for each of the supplied command-line arguments,
and reports the result returned by the call.
After processing all arguments, the child process terminates.
.P
The parent process acts as the supervisor, listening for the notifications
that are generated when the target process calls
-.BR mkdir (2).
+.MR mkdir 2 .
When such a notification occurs,
the supervisor examines the memory of the target process (using
.IR /proc/ pid /mem )
to discover the pathname argument that was supplied to the
-.BR mkdir (2)
+.MR mkdir 2
call, and performs one of the following actions:
.IP \[bu] 3
If the pathname begins with the prefix "/tmp/",
then the supervisor attempts to create the specified directory,
and then spoofs a return for the target process based on the return
value of the supervisor's
-.BR mkdir (2)
+.MR mkdir 2
call.
In the event that that call succeeds,
the spoofed success return value is the length of the pathname.
@@ -1220,13 +1220,13 @@ the supervisor sends a
.B SECCOMP_USER_NOTIF_FLAG_CONTINUE
response to the kernel to say that the kernel should execute
the target process's
-.BR mkdir (2)
+.MR mkdir 2
call.
.IP \[bu]
If the pathname begins with some other prefix,
the supervisor spoofs an error return for the target process,
so that the target process's
-.BR mkdir (2)
+.MR mkdir 2
call appears to fail with the error
.B EOPNOTSUPP
("Operation not supported").
@@ -1245,7 +1245,7 @@ In the following example, the target attempts to create the directory
Upon receiving the notification, the supervisor creates the directory on the
target's behalf,
and spoofs a success return to be received by the target process's
-.BR mkdir (2)
+.MR mkdir 2
call.
.P
.in +4n
@@ -1269,7 +1269,7 @@ In the above output, note that the spoofed return value seen by the target
process is 6 (the length of the pathname
.IR /tmp/x ),
whereas a normal
-.BR mkdir (2)
+.MR mkdir 2
call returns 0 on success.
.P
In the next example, the target attempts to create a directory using the
@@ -1280,7 +1280,7 @@ the supervisor sends a
.B SECCOMP_USER_NOTIF_FLAG_CONTINUE
response to the kernel,
and the kernel then (successfully) executes the target process's
-.BR mkdir (2)
+.MR mkdir 2
call.
.P
.in +4n
@@ -1305,7 +1305,7 @@ a pathname that doesn't start with "." and doesn't begin with the prefix
.RB ( EOPNOTSUPP ,
"Operation not supported")
for the target's
-.BR mkdir (2)
+.MR mkdir 2
call (which is not executed):
.P
.in +4n
@@ -1329,13 +1329,13 @@ the target process attempts to create a directory with the pathname
.BR /tmp/nosuchdir/b .
Upon receiving the notification,
the supervisor attempts to create that directory, but the
-.BR mkdir (2)
+.MR mkdir 2
call fails because the directory
.B /tmp/nosuchdir
does not exist.
Consequently, the supervisor spoofs an error return that passes the error
that it received back to the target process's
-.BR mkdir (2)
+.MR mkdir 2
call.
.P
.in +4n
@@ -1357,12 +1357,12 @@ T: terminating
.P
If the supervisor receives a notification and sees that the
argument of the target's
-.BR mkdir (2)
+.MR mkdir 2
is the string "/bye", then (as well as spoofing an
.B EOPNOTSUPP
error), the supervisor terminates.
If the target process subsequently executes another
-.BR mkdir (2)
+.MR mkdir 2
that triggers its seccomp filter to return the
.B SECCOMP_RET_USER_NOTIF
action value, then the kernel causes the target process's system call to
@@ -2002,10 +2002,10 @@ main(int argc, char *argv[])
.EE
.\" SRC END
.SH SEE ALSO
-.BR ioctl (2),
-.BR pidfd_getfd (2),
-.BR pidfd_open (2),
-.BR seccomp (2)
+.MR ioctl 2 ,
+.MR pidfd_getfd 2 ,
+.MR pidfd_open 2 ,
+.MR seccomp 2
.P
A further example program can be found in the kernel source file
.IR samples/seccomp/user-trap.c .