Drop root UID while retaining CAP_SYS_NICE
Asked Answered
G

2

10

I'm trying to write a daemon that will start as root using a setuid bit, but then quickly revert to the user running the process. The daemon, however needs to retain the ability to set new threads to "realtime" priority. The code that I'm using to set the priority is as follows (runs in a thread once it is created):

struct sched_param sched_param;
memset(&sched_param, 0, sizeof(sched_param));
sched_param.sched_priority = 90;

if(-1 == sched_setscheduler(0, SCHED_FIFO, &sched_param)) {
  // If we get here, we have an error, for example "Operation not permitted"
}

However the part I'm having problems with is setting the uid, while retaining the ability to make the above call to sched_setscheduler.

I have some code that runs close to startup in the main thread of my application:

if (getgid() != getegid() || getuid() != geteuid()) {
  cap_value_t cap_values[] = {CAP_SYS_NICE};
  cap_t caps;
  caps = cap_get_proc();
  cap_set_flag(caps, CAP_PERMITTED, 1, cap_values, CAP_SET);
  cap_set_proc(caps);
  prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0);
  cap_free(caps);
  setegid(getgid());
  seteuid(getuid());
}

The problem is that after running this code, I get "Operation not permitted" when calling sched_setscheduler as alluded to in the comment above. What am I doing wrong?

Gilgamesh answered 1/11, 2012 at 18:28 Comment(3)
Instead of seteuid(geteuid());, use an explicit seteuid(0); and use seteuid() throughout the code, except the very first call to setuid(0);.Culbertson
@H2CO3 there are a couple of things I don't understand about your comment, first I do not use the code "seteuid(geteuid())" (there's an extra 'e' in yours). Second, why would I call "setuid(0)", I'm trying to drop the "root" designation. Maybe you could post an answer to flesh out your proposal.Gilgamesh
yep that was a typo, sorry. So, I mean after dropping root, you can re-gain root by setuid(0); in order not to get errors related to an insufficient level of privileges... Isn't that you're trying to fix? Or am I missing something?Culbertson
S
27

Edited to describe the reason for the original failure:

There are three sets of capabilities in Linux: inheritable, permitted, and effective. Inheritable defines which capabilities stay permitted across an exec(). Permitted defines which capabilities are permitted for a process. Effective defines which capabilities are currently in effect.

When changing the owner or group of a process from root to non-root, the effective capability set is always cleared.

By default, also the permitted capability set is cleared, but calling prctl(PR_SET_KEEPCAPS, 1L) before the identity change tells the kernel to keep the permitted set intact.

After the process has changed the identity back to the unprivileged user, the CAP_SYS_NICE must be added to the effective set. (It must also be set in the permitted set, so if you clear your capability set, remember to set it also. If you just modify the current capability set, then you know it is already set because you inherited it.)

Here is the procedure I recommend you should follow:

  1. Save real user ID, real group ID, and supplemental group IDs:

     #define  _GNU_SOURCE
     #define  _BSD_SOURCE
     #include <unistd.h>
     #include <sys/types.h>
     #include <sys/capability.h>
     #include <sys/prctl.h>
     #include <grp.h>
    
     uid_t   user = getuid();
     gid_t   group = getgid();
     gid_t  *gid;
     int     gids, n;
    
     gids = getgroups(0, NULL);
     if (gids < 0) /* error */
    
     gid = malloc((gids + 1) * sizeof *gid);
     if (!gid) /* error */
    
     gids = getgroups(gids, gid);
     if (gids < 0) /* error */
    
  2. Filter out unnecessary and privileged supplementary groups (be paranoid!)

     n = 0;
     while (n < gids)
         if (gid[n] == 0 || gid[n] == group)
             gid[n] = gid[--gids];
         else
             n++;
    

    Because you cannot "clear" the supplementary group IDs (that just requests the current number), make sure the list is never empty. You can always add the real group ID to the supplementary list to make it non-empty.

     if (gids < 1) {
         gid[0] = group;
         gids = 1;
     }
    
  3. Switch real and effective user IDs to root

     if (setresuid(0, 0, 0)) /* error */
    
  4. Set the CAP_SYS_NICE capability in the CAP_PERMITTED set. I prefer to clear the entire set, and only keep the four capabilities that are required for this approach to work (and later on, drop all but CAP_SYS_NICE):

     cap_value_t capability[4] = { CAP_SYS_NICE, CAP_SETUID, CAP_SETGID, CAP_SETPCAP };
     cap_t       capabilities;
    
     capabilities = cap_get_proc();
     if (cap_clear(capabilities)) /* error */
     if (cap_set_flag(capabilities, CAP_EFFECTIVE, 4, capability, CAP_SET)) /* error */
     if (cap_set_flag(capabilities, CAP_PERMITTED, 4, capability, CAP_SET)) /* error */
     if (cap_set_proc(capabilities)) /* error */
    
  5. Tell the kernel you wish to retain the capabilities over the change from root to the unprivileged user; by default, the capabilities are cleared to zero when changing from root to non-root identity

     if (prctl(PR_SET_KEEPCAPS, 1L)) /* error */
    
  6. Set real, effective, and saved group IDs to the initially saved group ID

     if (setresgid(group, group, group)) /* error */
    
  7. Set supplemental group IDs

     if (setgroups(gids, gid)) /* error */
    
  8. Set real, effective and saved user IDs to the initially saved user ID

     if (setresuid(user, user, user)) /* error */
    

    At this point you effectively drop root privileges (without the ability to gain them back anymore), except for the CAP_SYS_NICE capability. Due to the transition from root to non-root user, the capability is never effective; the kernel will always clear the effective capability set on such a transition.

  9. Set the CAP_SYS_NICE capability in the CAP_PERMITTED and CAP_EFFECTIVE set

     if (cap_clear(capabilities)) /* error */
     if (cap_set_flag(capabilities, CAP_PERMITTED, 1, capability, CAP_SET))  /* error */
     if (cap_set_flag(capabilities, CAP_EFFECTIVE, 1, capability, CAP_SET))  /* error */
     if (cap_set_flag(capabilities, CAP_PERMITTED, 3, capability + 1, CAP_CLEAR))  /* error */
     if (cap_set_flag(capabilities, CAP_EFFECTIVE, 3, capability + 1, CAP_CLEAR))  /* error */
    
     if (cap_set_proc(capabilities)) /* error */
    

    Note that the latter two cap_set_flag() operations clear the three capabilities no longer needed, so that only the first one, CAP_SYS_NICE remains.

    At this point the capabilities' descriptor is no longer needed, so it's a good idea to free it.

     if (cap_free(capabilities)) /* error */
    
  10. Tell the kernel you don't wish to retain the capability over any further changes from root (again, just paranoia)

     if (prctl(PR_SET_KEEPCAPS, 0L)) /* error */
    

This works on x86-64 using GCC-4.6.3, libc6-2.15.0ubuntu10.3, and linux-3.5.0-18 kernel on Xubuntu 12.04.1 LTS, after installing the libcap-dev package.

Edited to add:

You can simplify the process by relying only on the effective user ID being root, as the executable is setuid root. In that case, you don't need to worry about the supplementary groups either, as the setuid root only affects the effective user ID and nothing else. Returning back to the original real user, you technically only need the one setresuid() call at the end of the procedure (and the setresgid() if the executable also happens to be marked setgid root), to set both saved and effective user (and group) IDs to the real user.

However, the case where you regain the original users' identity is rare, and the case where you gain the identity of a named user is common, and this procedure here was originally designed for the latter. You would use initgroups() to gain the correct supplementary groups for the named user, and so on. In that case, taking care of the real, effective, and saved user and group IDs and supplementary group IDs this carefully is important, as otherwise the process would inherit supplementary groups from the user that executed the process.

The procedure here is paranoid, but paranoia is not a bad thing when you are dealing with security-sensitive issues. For the revert-back-to-real-user case, it can be simplified.


Edited on 2013-03-17 to show a simple test program. This assumes it is installed setuid root, but it will drop all privileges and capabilities (except CAP_SYS_NICE, which is required for scheduler manipulation above the normal rules). I pared down the "excess" operations I prefer to do, in the hopes that others find this easier to read.

#define  _GNU_SOURCE
#define  _BSD_SOURCE
#include <unistd.h>
#include <sys/types.h>
#include <sys/capability.h>
#include <sys/prctl.h>
#include <grp.h>
#include <errno.h>

#include <string.h>
#include <sched.h>
#include <stdio.h>


void test_priority(const char *const name, const int policy)
{
    const pid_t         me = getpid();
    struct sched_param  param;

    param.sched_priority = sched_get_priority_max(policy);
    printf("sched_get_priority_max(%s) = %d\n", name, param.sched_priority);
    if (sched_setscheduler(me, policy, &param) == -1)
        printf("sched_setscheduler(getpid(), %s, { %d }): %s.\n", name, param.sched_priority, strerror(errno));
    else
        printf("sched_setscheduler(getpid(), %s, { %d }): Ok.\n", name, param.sched_priority);

    param.sched_priority = sched_get_priority_min(policy);
    printf("sched_get_priority_min(%s) = %d\n", name, param.sched_priority);
    if (sched_setscheduler(me, policy, &param) == -1)
        printf("sched_setscheduler(getpid(), %s, { %d }): %s.\n", name, param.sched_priority, strerror(errno));
    else
        printf("sched_setscheduler(getpid(), %s, { %d }): Ok.\n", name, param.sched_priority);

}


int main(void)
{
    uid_t       user;
    cap_value_t root_caps[2] = { CAP_SYS_NICE, CAP_SETUID };
    cap_value_t user_caps[1] = { CAP_SYS_NICE };
    cap_t       capabilities;

    /* Get real user ID. */
    user = getuid();

    /* Get full root privileges. Normally being effectively root
     * (see man 7 credentials, User and Group Identifiers, for explanation
     *  for effective versus real identity) is enough, but some security
     * modules restrict actions by processes that are only effectively root.
     * To make sure we don't hit those problems, we switch to root fully. */
    if (setresuid(0, 0, 0)) {
        fprintf(stderr, "Cannot switch to root: %s.\n", strerror(errno));
        return 1;
    }

    /* Create an empty set of capabilities. */
    capabilities = cap_init();

    /* Capabilities have three subsets:
     *      INHERITABLE:    Capabilities permitted after an execv()
     *      EFFECTIVE:      Currently effective capabilities
     *      PERMITTED:      Limiting set for the two above.
     * See man 7 capabilities for details, Thread Capability Sets.
     *
     * We need the following capabilities:
     *      CAP_SYS_NICE    For nice(2), setpriority(2),
     *                      sched_setscheduler(2), sched_setparam(2),
     *                      sched_setaffinity(2), etc.
     *      CAP_SETUID      For setuid(), setresuid()
     * in the last two subsets. We do not need to retain any capabilities
     * over an exec().
    */
    if (cap_set_flag(capabilities, CAP_PERMITTED, sizeof root_caps / sizeof root_caps[0], root_caps, CAP_SET) ||
        cap_set_flag(capabilities, CAP_EFFECTIVE, sizeof root_caps / sizeof root_caps[0], root_caps, CAP_SET)) {
        fprintf(stderr, "Cannot manipulate capability data structure as root: %s.\n", strerror(errno));
        return 1;
    }

    /* Above, we just manipulated the data structure describing the flags,
     * not the capabilities themselves. So, set those capabilities now. */
    if (cap_set_proc(capabilities)) {
        fprintf(stderr, "Cannot set capabilities as root: %s.\n", strerror(errno));
        return 1;
    }

    /* We wish to retain the capabilities across the identity change,
     * so we need to tell the kernel. */
    if (prctl(PR_SET_KEEPCAPS, 1L)) {
        fprintf(stderr, "Cannot keep capabilities after dropping privileges: %s.\n", strerror(errno));
        return 1;
    }

    /* Drop extra privileges (aside from capabilities) by switching
     * to the original real user. */
    if (setresuid(user, user, user)) {
        fprintf(stderr, "Cannot drop root privileges: %s.\n", strerror(errno));
        return 1;
    }

    /* We can still switch to a different user due to having the CAP_SETUID
     * capability. Let's clear the capability set, except for the CAP_SYS_NICE
     * in the permitted and effective sets. */
    if (cap_clear(capabilities)) {
        fprintf(stderr, "Cannot clear capability data structure: %s.\n", strerror(errno));
        return 1;
    }
    if (cap_set_flag(capabilities, CAP_PERMITTED, sizeof user_caps / sizeof user_caps[0], user_caps, CAP_SET) ||
        cap_set_flag(capabilities, CAP_EFFECTIVE, sizeof user_caps / sizeof user_caps[0], user_caps, CAP_SET)) {
        fprintf(stderr, "Cannot manipulate capability data structure as user: %s.\n", strerror(errno));
        return 1;
    }

    /* Apply modified capabilities. */
    if (cap_set_proc(capabilities)) {
        fprintf(stderr, "Cannot set capabilities as user: %s.\n", strerror(errno));
        return 1;
    }

    /*
     * Now we have just the normal user privileges,
     * plus user_caps.
    */

    test_priority("SCHED_OTHER", SCHED_OTHER);
    test_priority("SCHED_BATCH", SCHED_BATCH);
    test_priority("SCHED_IDLE", SCHED_IDLE);
    test_priority("SCHED_FIFO", SCHED_FIFO);
    test_priority("SCHED_RR", SCHED_RR);

    return 0;
}

Note that if you know the binary is only run on relatively recent Linux kernels, you can rely on file capabilities. Then, your main() needs none of the identity or capability manipulation -- you can remove everything in main() except the test_priority() functions --, and you just give your binary, say ./testprio, the CAP_SYS_NICE priority:

sudo setcap 'cap_sys_nice=pe' ./testprio

You can run getcap to see which priorities are granted when a binary is executed:

getcap ./testprio

which should display

./testprio = cap_sys_nice+ep

File capabilities seem to be little used thus far. On my own system, gnome-keyring-daemon is the only one with file capabilities (CAP_IPC_LOCK, for locking memory).

Sirloin answered 1/11, 2012 at 21:51 Comment(16)
Thanks, I think I understand. Why is step 3 needed? Presumably the executable is as setuid and is owned by root? Also, I think you have "CAP_SET_NICE" a couple of places where you mean "CAP_SYS_NICE"?Gilgamesh
@brooks94, thanks; fixed. Effective user ID being root technically suffices, but switching completely to root user should make it less likely for any shenanigans (based on signals or say /proc/PID/ accesses) to succeed while the process has elevated privileges. So consider it a purely defensive, and more than slightly paranoid, detail.Sirloin
Is there also a missing call to cap_set_proc(capabilities) in step 4?Gilgamesh
Also, a couple other bugs I found while actually implementing it: order of args of getgroups() in step 1 is flipped; and I believe it needs a call to cap_set_flag(capabilities, CAP_PERMITTED, 1, capability, CAP_SET) in step 9, otherwise I'm getting an "operation not permitted" when calling cap_set_proc()Gilgamesh
@brooks94: Thanks for pointing those out. Copy-paste errors. I compared my test program to above, and now they seem to be the same. My test program tests all five policies, setting both minimum and maximum priorities for the current thread (only the initial thread in the process). On my system, only SCHED_FIFO and SCHED_RR have a priority range, both 1..99 inclusive.Sirloin
The cap_set_flag(..., CAP_CLEAR) calls seems redundant after cap_clear according to the manual page: initializes the capability state in working storage identified by cap_p so that all capability flags are cleared. The source code of libcap confirms this.Dissent
If I remove CAP_SETPCAP, things still work. In November 2009 (Linux 2.6.33), file capabilities are always enabled. Thus the following description from capabilities(7) applies: add any capability from the calling thread's bounding set to its inheritable set; drop capabilities from the bounding set (via prctl(2) PR_CAPBSET_DROP); make changes to the securebits flags. It seems unnecessary here as you're root already.Dissent
@Lekensteyn: The cap_set_flag(..., CAP_CLEAR) are not required after cap_clear(), but they cause no harm. Like I noted, I prefer to have them, as a defensive technique, since this is very security-sensitive code. As to CAP_SETPCAP, having it causes no harm, but may stop it from working with pre-2.6.33 kernels (and there are lots of those in production).Sirloin
@NominalAnimal From your description, it looks like those CAP_CLEAR thingeys are mandatory, but it is not and may confuse the reader (like me) rather than adding defence-in-depth. I also found in the source code that cap_get_proc is actually cap_init + capget. As the manual page (and the code) mention, cap_get_proc + cap_clear is the same as cap_init. As for CAP_SETPCAP, can you mention in your answer why this is (un)necessary to add?Dissent
@Lekensteyn: Apologies; I forgot about this. To be honest, after another bit of further research, I'm no longer sure when, if ever, CAP_SETPCAP is needed in this situation. It might be much cleaner to rely on file capabilities alone (granting the capabilities on binary execution), which I think is a much better approach. I appended a simple example scheduler test program into my answer; hopefully it clears up some of the confusion I've inadvertently caused.Sirloin
How did you figure out the valid arguments to setcap? Read the source code? The manpages don't actually explain this.Tripura
@reinierpost: The man 7 capabilities man page explains each capability you can use in detail. The man 8 setcap command says it uses cap_from_text() to parse the capabilities, and man 3 cap_from_text explains the capability format, with examples. The Linux man-pages project is a good reference.Sirloin
I had read all three (I had to read cap_from_text online, as it is not present on my Ubuntu 12.04 system and I don't know how to obtain it) and it wasn't clear to me that the 'clauses' in the textual representation are the valid arguments to setcap.Tripura
@reinierpost: You need the libcap-dev package for writing your own capability-aware code in C, and also to get those cap_ man pages. Yeah, the man pages could be clearer, but nobody has thought up better wording, yet. If you (or somebody else) have a better wording in mind, please do contribute.Sirloin
The testprio code is missing cap_free(capabilities) before the test_priority() calls.Drawn
@EarlCrapstone: Well, kinda. Adding it would be good form, yes -- but it would not change the operation of the program in any way at all, and it is not a bug to leave it out. It would release any dynamically allocated memory (related to capabilities), that's all. Since the program is exiting soon after, why bother? (Note: I'm not sure myself whether I should add it or not. A good argument will change my mind.)Sirloin
O
1

I have some code that runs close to startup in the main thread of my application:

You must acquire these capabilities in each thread in which you want to use them, or use the CAP_INHERITABLE set.

From capabilities(7):

Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute.

Outoftheway answered 1/11, 2012 at 19:17 Comment(2)
So then I have no choice, my daemon needs to retain root privileges throught its lifetime?Gilgamesh
I don't think so. Try changing CAP_PERMITTED to CAP_INHERITABLE.Outoftheway

© 2022 - 2024 — McMap. All rights reserved.