diff options
Diffstat (limited to 'man2/seccomp_unotify.2')
-rw-r--r-- | man2/seccomp_unotify.2 | 178 |
1 files changed, 89 insertions, 89 deletions
diff --git a/man2/seccomp_unotify.2 b/man2/seccomp_unotify.2 index f11dabf3a..1de5601fe 100644 --- a/man2/seccomp_unotify.2 +++ b/man2/seccomp_unotify.2 @@ -37,9 +37,9 @@ flag, the action value, and the .B SECCOMP_GET_NOTIF_SIZES operation described in -.BR seccomp (2), +.MR seccomp 2 , this mechanism involves the use of a number of related -.BR ioctl (2) +.MR ioctl 2 operations (described below). .\" .SS Overview @@ -79,12 +79,12 @@ but with two differences: .RS .IP \[bu] 3 The -.BR seccomp (2) +.MR seccomp 2 .I flags argument includes the flag .BR SECCOMP_FILTER_FLAG_NEW_LISTENER . Consequently, the return value of the (successful) -.BR seccomp (2) +.MR seccomp 2 call is a new "listening" file descriptor that can be used to receive notifications. Only one "listening" seccomp filter can be installed for a thread. @@ -117,9 +117,9 @@ over a UNIX domain socket connection between the target and the supervisor (using the .B SCM_RIGHTS ancillary message type described in -.BR unix (7)). +.MR unix 7 ). Another way to do this is through the use of -.BR pidfd_getfd (2). +.MR pidfd_getfd 2 . .\" Jann Horn: .\" Instead of using unix domain sockets to send the fd to the .\" parent, I think you could also use clone3() with @@ -138,7 +138,7 @@ These events are returned as structures of type Because this structure and its size may evolve over kernel versions, the supervisor must first determine the size of this structure using the -.BR seccomp (2) +.MR seccomp 2 .B SECCOMP_GET_NOTIF_SIZES operation, which returns a structure of type .IR seccomp_notif_sizes . @@ -171,7 +171,7 @@ listening file descriptor for events. To do this, the supervisor uses the .B SECCOMP_IOCTL_NOTIF_RECV -.BR ioctl (2) +.MR ioctl 2 operation to read information about a notification event; this operation blocks until an event is available. The operation returns a @@ -180,10 +180,10 @@ structure containing information about the system call that is being attempted by the target. (As described in NOTES, the file descriptor can also be monitored with -.BR select (2), -.BR poll (2), +.MR select 2 , +.MR poll 2 , or -.BR epoll (7).) +.MR epoll 7 .) .\" FIXME .\" Christian Brauner: .\" @@ -220,7 +220,7 @@ values of pointer arguments for the target's system call. One way in which the supervisor can do this is to open the corresponding .IR /proc/ tid /mem file (see -.BR proc (5)) +.MR proc 5 ) and read bytes from the location that corresponds to one of the pointer arguments whose value is supplied in the notification event. .\" Tycho Andersen mentioned that there are alternatives to /proc/PID/mem, @@ -229,7 +229,7 @@ the pointer arguments whose value is supplied in the notification event. a race condition that can occur when doing this; see the description of the .B SECCOMP_IOCTL_NOTIF_ID_VALID -.BR ioctl (2) +.MR ioctl 2 operation below.) In addition, the supervisor can access other system information that is visible @@ -260,7 +260,7 @@ variable of the target. .IP The response is sent using the .B SECCOMP_IOCTL_NOTIF_SEND -.BR ioctl (2) +.MR ioctl 2 operation, which is used to transmit a .I seccomp_notif_resp structure to the kernel. @@ -294,13 +294,13 @@ below. .\" .SH IOCTL OPERATIONS The following -.BR ioctl (2) +.MR ioctl 2 operations are supported by the seccomp user-space notification file descriptor. For each of these operations, the first (file descriptor) argument of -.BR ioctl (2) +.MR ioctl 2 is the listening file descriptor returned by a call to -.BR seccomp (2) +.MR seccomp 2 with the .B SECCOMP_FILTER_FLAG_NEW_LISTENER flag. @@ -313,7 +313,7 @@ notification event. If no such event is currently pending, the operation blocks until an event occurs. The third -.BR ioctl (2) +.MR ioctl 2 argument is a pointer to a structure of the following form which contains information about the event. This structure must be zeroed out before the call. @@ -339,7 +339,7 @@ seccomp filter. .IP \[bu] 3 The cookie can be used with the .B SECCOMP_IOCTL_NOTIF_ID_VALID -.BR ioctl (2) +.MR ioctl 2 operation described below. .IP \[bu] When returning a notification response to the kernel, @@ -365,7 +365,7 @@ structure containing information about the system call that triggered the notification. This is the same structure that is passed to the seccomp filter. See -.BR seccomp (2) +.MR seccomp 2 for details of this structure. .P On success, this operation returns 0; on failure, \-1 is returned, and @@ -425,7 +425,7 @@ operation is still valid is still blocked waiting for a response). .P The third -.BR ioctl (2) +.MR ioctl 2 argument is a pointer to the cookie .RI ( id ) returned by the @@ -452,7 +452,7 @@ Another thread or process is created on the system that by chance reuses the TID that was freed when the target terminated. .IP (4) The supervisor -.BR open (2)s +.MR open 2 s the .IR /proc/ tid /mem file for the TID obtained in step 1, with the intention of (say) @@ -462,14 +462,14 @@ the system call that triggered the notification in step 1. In the above scenario, the risk is that the supervisor may try to access the memory of a process other than the target. This race can be avoided by following the call to -.BR open (2) +.MR open 2 with a .B SECCOMP_IOCTL_NOTIF_ID_VALID operation to verify that the process that generated the notification is still alive. (Note that if the target terminates after the latter step, a subsequent -.BR read (2) +.MR read 2 from the file descriptor may return 0, indicating end of file.) .\" Jann Horn: .\" the PID can be reused, but the /proc/$pid directory is @@ -497,7 +497,7 @@ The operation (available since Linux 5.0) is used to send a notification response back to the kernel. The third -.BR ioctl (2) +.MR ioctl 2 argument of this structure is a pointer to a structure of the following form: .P .in +4n @@ -652,7 +652,7 @@ into the target's file descriptor table. Much like the use of .B SCM_RIGHTS messages described in -.BR unix (7), +.MR unix 7 , this operation is semantically equivalent to duplicating a file descriptor from the supervisor's file descriptor table into the target's file descriptor table. @@ -660,16 +660,16 @@ into the target's file descriptor table. The .B SECCOMP_IOCTL_NOTIF_ADDFD operation permits the supervisor to emulate a target system call (such as -.BR socket (2) +.MR socket 2 or -.BR openat (2)) +.MR openat 2 ) that generates a file descriptor. The supervisor can perform the system call that generates the file descriptor (and associated open file description) and then use this operation to allocate a file descriptor that refers to the same open file description in the target. (For an explanation of open file descriptions, see -.BR open (2).) +.MR open 2 .) .P Once this operation has been performed, the supervisor can close its copy of the file descriptor. @@ -688,7 +688,7 @@ and of the target. .P The third -.BR ioctl (2) +.MR ioctl 2 argument is a pointer to a structure of the following form: .P .in +4n @@ -776,12 +776,12 @@ Set the close-on-exec flag on the received file descriptor. .RE .P On success, this -.BR ioctl (2) +.MR ioctl 2 call returns the number of the file descriptor that was allocated in the target. Assuming that the emulated system call is one that returns a file descriptor as its function result (e.g., -.BR socket (2)), +.MR socket 2 ), this value can be used as the return value .RI ( resp.val ) that is supplied in the response that is subsequently sent with the @@ -798,7 +798,7 @@ This operation can fail with the following errors: Allocating the file descriptor in the target would cause the target's .B RLIMIT_NOFILE limit to be exceeded (see -.BR getrlimit (2)). +.MR getrlimit 2 ). .TP .B EBUSY If the flag @@ -842,7 +842,7 @@ or the target has terminated. Here is some sample code (with error handling omitted) that uses the .B SECCOMP_ADDFD_FLAG_SETFD operation (here, to emulate a call to -.BR openat (2)): +.MR openat 2 ): .P .EX .in +4n @@ -879,10 +879,10 @@ the processes inside the container) to mount block devices or create device nodes for the container. The mount use case provides an example of where the .B SECCOMP_USER_NOTIF_FLAG_CONTINUE -.BR ioctl (2) +.MR ioctl 2 operation is useful. Upon receiving a notification for the -.BR mount (2) +.MR mount 2 system call, the container manager (the "supervisor") can distinguish a request to mount a block filesystem (which would not be possible for a "target" process inside the container) @@ -890,28 +890,28 @@ and mount that file system. If, on the other hand, the container manager detects that the operation could be performed by the process inside the container (e.g., a mount of a -.BR tmpfs (5) +.MR tmpfs 5 filesystem), it can notify the kernel that the target process's -.BR mount (2) +.MR mount 2 system call can continue. .\" .SS select()/poll()/epoll semantics The file descriptor returned when -.BR seccomp (2) +.MR seccomp 2 is employed with the .B SECCOMP_FILTER_FLAG_NEW_LISTENER flag can be monitored using -.BR poll (2), -.BR epoll (7), +.MR poll 2 , +.MR epoll 7 , and -.BR select (2). +.MR select 2 . These interfaces indicate that the file descriptor is ready as follows: .IP \[bu] 3 When a notification is pending, these interfaces indicate that the file descriptor is readable. Following such an indication, a subsequent .B SECCOMP_IOCTL_NOTIF_RECV -.BR ioctl (2) +.MR ioctl 2 will not block, returning either information about a notification or else failing with the error .B EINTR @@ -920,22 +920,22 @@ has been interrupted by a signal handler. .IP \[bu] After the notification has been received (i.e., by the .B SECCOMP_IOCTL_NOTIF_RECV -.BR ioctl (2) +.MR ioctl 2 operation), these interfaces indicate that the file descriptor is writable, meaning that a notification response can be sent using the .B SECCOMP_IOCTL_NOTIF_SEND -.BR ioctl (2) +.MR ioctl 2 operation. .IP \[bu] After the last thread using the filter has terminated and been reaped using -.BR waitpid (2) +.MR waitpid 2 (or similar), the file descriptor indicates an end-of-file condition (readable in -.BR select (2); +.MR select 2 ; .BR POLLHUP / EPOLLHUP in -.BR poll (2)/ -.BR epoll_wait (2)). +.MR poll 2 / +.MR epoll_wait 2 ). .SS Design goals; use of SECCOMP_USER_NOTIF_FLAG_CONTINUE The intent of the user-space notification feature is to allow system calls to be performed on behalf of the target. @@ -959,13 +959,13 @@ rewriting the system call arguments. .P Note furthermore that a user-space notifier can be bypassed if the existing filters allow the use of -.BR seccomp (2) +.MR seccomp 2 or -.BR prctl (2) +.MR prctl 2 to install a filter that returns an action value with a higher precedence than .B SECCOMP_RET_USER_NOTIF (see -.BR seccomp (2)). +.MR seccomp 2 ). .P It should thus be absolutely clear that the seccomp user-space notification mechanism @@ -983,7 +983,7 @@ the system call if its arguments are rewritten to something unsafe. .SS Caveats regarding the use of \fI/proc/\fPtid\fI/mem\fP The discussion above noted the need to use the .B SECCOMP_IOCTL_NOTIF_ID_VALID -.BR ioctl (2) +.MR ioctl 2 when opening the .IR /proc/ tid /mem file of the target @@ -991,19 +991,19 @@ to avoid the possibility of accessing the memory of the wrong process in the event that the target terminates and its ID is recycled by another (unrelated) thread. However, the use of this -.BR ioctl (2) +.MR ioctl 2 operation is also necessary in other situations, as explained in the following paragraphs. .P Consider the following scenario, where the supervisor tries to read the pathname argument of a target's blocked -.BR mount (2) +.MR mount 2 system call: .IP (1) 5 From one of its functions .RI ( func() ), the target calls -.BR mount (2), +.MR mount 2 , which triggers a user-space notification and causes the target to block. .IP (2) The supervisor receives the notification, opens @@ -1013,7 +1013,7 @@ and (successfully) performs the check. .IP (3) The target receives a signal, which causes the -.BR mount (2) +.MR mount 2 to abort. .IP (4) The signal handler executes in the target, and returns. @@ -1029,7 +1029,7 @@ the supervisor reads from the target's memory location that used to contain the pathname. .IP (7) The supervisor now calls -.BR mount (2) +.MR mount 2 with some arbitrary bytes obtained in the previous step. .P The conclusion from the above scenario is this: @@ -1056,7 +1056,7 @@ be considered safe. .\" .SS Caveats regarding blocking system calls Suppose that the target performs a blocking system call (e.g., -.BR accept (2)) +.MR accept 2 ) that the supervisor should handle. The supervisor might then in turn execute the same blocking system call. .P @@ -1069,13 +1069,13 @@ If the supervisor does not take suitable steps to actively discover that the target's system call has been canceled, various difficulties can occur. Taking the example of -.BR accept (2), +.MR accept 2 , the supervisor might remain blocked in its -.BR accept (2) +.MR accept 2 holding a port number that the target (which, after the interruption by the signal handler, perhaps closed its listening socket) might expect to be able to reuse in a -.BR bind (2) +.MR bind 2 call. .P Therefore, when the supervisor wishes to emulate a blocking system call, @@ -1087,12 +1087,12 @@ that uses the .B SECCOMP_IOCTL_NOTIF_ID_VALID operation to check if the target is still blocked in its system call. Alternatively, in the -.BR accept (2) +.MR accept 2 example, the supervisor might use -.BR poll (2) +.MR poll 2 to monitor both the notification file descriptor (so as to discover when the target's -.BR accept (2) +.MR accept 2 call has been interrupted) and the listening file descriptor (so as to know when a connection is available). .P @@ -1104,7 +1104,7 @@ that it acquired on behalf of the target. Consider the following scenario: .IP (1) 5 The target process has used -.BR sigaction (2) +.MR sigaction 2 to install a signal handler with the .B SA_RESTART flag. @@ -1117,7 +1117,7 @@ A signal is delivered to the target and the signal handler is executed. .IP (4) When (if) the supervisor attempts to send a notification response, the .B SECCOMP_IOCTL_NOTIF_SEND -.BR ioctl (2)) +.MR ioctl 2 ) operation will fail with the .B ENOENT error. @@ -1131,7 +1131,7 @@ the same instance of a system call in the target. .P One oddity is that system call restarting as described in this scenario will occur even for the blocking system calls listed in -.BR signal (7) +.MR signal 7 that would .B never normally be restarted by the @@ -1167,11 +1167,11 @@ making sure no file descriptors are inadvertently leaked into the target. .SH BUGS If a .B SECCOMP_IOCTL_NOTIF_RECV -.BR ioctl (2) +.MR ioctl 2 operation .\" or a poll/epoll/select is performed after the target terminates, then the -.BR ioctl (2) +.MR ioctl 2 call simply blocks (rather than returning an error to indicate that the target no longer exists). .\" FIXME @@ -1189,28 +1189,28 @@ The program creates a child process that serves as the "target" process. The child process installs a seccomp filter that returns the .B SECCOMP_RET_USER_NOTIF action value if a call is made to -.BR mkdir (2). +.MR mkdir 2 . The child process then calls -.BR mkdir (2) +.MR mkdir 2 once for each of the supplied command-line arguments, and reports the result returned by the call. After processing all arguments, the child process terminates. .P The parent process acts as the supervisor, listening for the notifications that are generated when the target process calls -.BR mkdir (2). +.MR mkdir 2 . When such a notification occurs, the supervisor examines the memory of the target process (using .IR /proc/ pid /mem ) to discover the pathname argument that was supplied to the -.BR mkdir (2) +.MR mkdir 2 call, and performs one of the following actions: .IP \[bu] 3 If the pathname begins with the prefix "/tmp/", then the supervisor attempts to create the specified directory, and then spoofs a return for the target process based on the return value of the supervisor's -.BR mkdir (2) +.MR mkdir 2 call. In the event that that call succeeds, the spoofed success return value is the length of the pathname. @@ -1220,13 +1220,13 @@ the supervisor sends a .B SECCOMP_USER_NOTIF_FLAG_CONTINUE response to the kernel to say that the kernel should execute the target process's -.BR mkdir (2) +.MR mkdir 2 call. .IP \[bu] If the pathname begins with some other prefix, the supervisor spoofs an error return for the target process, so that the target process's -.BR mkdir (2) +.MR mkdir 2 call appears to fail with the error .B EOPNOTSUPP ("Operation not supported"). @@ -1245,7 +1245,7 @@ In the following example, the target attempts to create the directory Upon receiving the notification, the supervisor creates the directory on the target's behalf, and spoofs a success return to be received by the target process's -.BR mkdir (2) +.MR mkdir 2 call. .P .in +4n @@ -1269,7 +1269,7 @@ In the above output, note that the spoofed return value seen by the target process is 6 (the length of the pathname .IR /tmp/x ), whereas a normal -.BR mkdir (2) +.MR mkdir 2 call returns 0 on success. .P In the next example, the target attempts to create a directory using the @@ -1280,7 +1280,7 @@ the supervisor sends a .B SECCOMP_USER_NOTIF_FLAG_CONTINUE response to the kernel, and the kernel then (successfully) executes the target process's -.BR mkdir (2) +.MR mkdir 2 call. .P .in +4n @@ -1305,7 +1305,7 @@ a pathname that doesn't start with "." and doesn't begin with the prefix .RB ( EOPNOTSUPP , "Operation not supported") for the target's -.BR mkdir (2) +.MR mkdir 2 call (which is not executed): .P .in +4n @@ -1329,13 +1329,13 @@ the target process attempts to create a directory with the pathname .BR /tmp/nosuchdir/b . Upon receiving the notification, the supervisor attempts to create that directory, but the -.BR mkdir (2) +.MR mkdir 2 call fails because the directory .B /tmp/nosuchdir does not exist. Consequently, the supervisor spoofs an error return that passes the error that it received back to the target process's -.BR mkdir (2) +.MR mkdir 2 call. .P .in +4n @@ -1357,12 +1357,12 @@ T: terminating .P If the supervisor receives a notification and sees that the argument of the target's -.BR mkdir (2) +.MR mkdir 2 is the string "/bye", then (as well as spoofing an .B EOPNOTSUPP error), the supervisor terminates. If the target process subsequently executes another -.BR mkdir (2) +.MR mkdir 2 that triggers its seccomp filter to return the .B SECCOMP_RET_USER_NOTIF action value, then the kernel causes the target process's system call to @@ -2002,10 +2002,10 @@ main(int argc, char *argv[]) .EE .\" SRC END .SH SEE ALSO -.BR ioctl (2), -.BR pidfd_getfd (2), -.BR pidfd_open (2), -.BR seccomp (2) +.MR ioctl 2 , +.MR pidfd_getfd 2 , +.MR pidfd_open 2 , +.MR seccomp 2 .P A further example program can be found in the kernel source file .IR samples/seccomp/user-trap.c . |