Commit | Line | Data |
---|---|---|
09b24357 AL |
1 | The execve system call can grant a newly-started program privileges that |
2 | its parent did not have. The most obvious examples are setuid/setgid | |
3 | programs and file capabilities. To prevent the parent program from | |
4 | gaining these privileges as well, the kernel and user code must be | |
5 | careful to prevent the parent from doing anything that could subvert the | |
6 | child. For example: | |
7 | ||
8 | - The dynamic loader handles LD_* environment variables differently if | |
9 | a program is setuid. | |
10 | ||
11 | - chroot is disallowed to unprivileged processes, since it would allow | |
12 | /etc/passwd to be replaced from the point of view of a process that | |
13 | inherited chroot. | |
14 | ||
15 | - The exec code has special handling for ptrace. | |
16 | ||
17 | These are all ad-hoc fixes. The no_new_privs bit (since Linux 3.5) is a | |
18 | new, generic mechanism to make it safe for a process to modify its | |
19 | execution environment in a manner that persists across execve. Any task | |
20 | can set no_new_privs. Once the bit is set, it is inherited across fork, | |
21 | clone, and execve and cannot be unset. With no_new_privs set, execve | |
22 | promises not to grant the privilege to do anything that could not have | |
23 | been done without the execve call. For example, the setuid and setgid | |
24 | bits will no longer change the uid or gid; file capabilities will not | |
25 | add to the permitted set, and LSMs will not relax constraints after | |
26 | execve. | |
27 | ||
c540521b AL |
28 | To set no_new_privs, use prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0). |
29 | ||
30 | Be careful, though: LSMs might also not tighten constraints on exec | |
31 | in no_new_privs mode. (This means that setting up a general-purpose | |
32 | service launcher to set no_new_privs before execing daemons may | |
33 | interfere with LSM-based sandboxing.) | |
34 | ||
09b24357 AL |
35 | Note that no_new_privs does not prevent privilege changes that do not |
36 | involve execve. An appropriately privileged task can still call | |
37 | setuid(2) and receive SCM_RIGHTS datagrams. | |
38 | ||
39 | There are two main use cases for no_new_privs so far: | |
40 | ||
41 | - Filters installed for the seccomp mode 2 sandbox persist across | |
42 | execve and can change the behavior of newly-executed programs. | |
43 | Unprivileged users are therefore only allowed to install such filters | |
44 | if no_new_privs is set. | |
45 | ||
46 | - By itself, no_new_privs can be used to reduce the attack surface | |
47 | available to an unprivileged user. If everything running with a | |
48 | given uid has no_new_privs set, then that uid will be unable to | |
49 | escalate its privileges by directly attacking setuid, setgid, and | |
50 | fcap-using binaries; it will need to compromise something without the | |
51 | no_new_privs bit set first. | |
52 | ||
53 | In the future, other potentially dangerous kernel features could become | |
54 | available to unprivileged tasks if no_new_privs is set. In principle, | |
55 | several options to unshare(2) and clone(2) would be safe when | |
56 | no_new_privs is set, and no_new_privs + chroot is considerable less | |
57 | dangerous than chroot by itself. |