Per-process namespaces

Linux has had this neat feature for quite some time now: since 2.4.19 according to the docs. Yet, it is neither very known nor very used. I couldn't even find a program that would create a new namespace for its subprocesses, similar to what chroot does with the root of the file hierarchy.

This neat feature allows each process to have a different set of mount points. While you most of the time want processes to share their mount points, there are some cases where you may want to have some processes have a different set of mount points. Combined with bind mounts, it can allow some useful setups.

In case you are not familiar with bind mounts, they allow to "attach" a part of the file hierarchy to some place else. For example:

$ ls /mnt
$ ls /usr
bin games include lib lib32 lib64 local sbin share src X11R6
$ mount --bind /usr /mnt
$ ls /mnt
bin games include lib lib32 lib64 local sbin share src X11R6

Now, take pam-tmpdir, for example. It sets $TMPDIR and $TMP to point to a user-specific temporary directory. Sadly, it is pretty useless for applications that don't follow the standards of using these environment variables.

Without namespaces, if you'd create a temporary directory and bind-mount it to /tmp, this new /tmp would be visible to everyone, to every process. But with namespaces, you can make this new /tmp only available to subprocesses. If pam-tmpdir were to do this, it would also allow applications trying to write to /tmp without resorting to $TMP or $TMPDIR to be using the temporary space, without impacting external processes, that would still be using the original /tmp.

On x86-64, you can run both 64-bits and 32-bits applications. 64-bits applications would take libraries from /usr/lib, and 32-bits applications would search libraries in /usr/lib32. But badly crafted 32-bits applications could be trying to load libraries from /usr/lib, where only 64-bits versions are available.

With namespaces, the broken 32-bits application could have /usr/lib32 bind-mounted to /usr/lib without the 64-bits applications knowing.

You could certainly get a similar result with the following set of commands:

$ mount --rbind / /chroot
$ mount --rbind /usr/lib32 /chroot/usr/lib
$ chroot /chroot $application

(--rbind also attaches submounts, contrary to --bind)

The downside, here, is that external processes will see all this setup under /chroot. The whole setup would be invisible to external processes if namespaces were used.

Another nice use of namespaces would be to mount encrypted volumes under a different namespace, so that only a limited set of processes would be allowed to read the decrypted data. The sad thing is that you need the admin capability to create a new namespace, so that would need to be done by a setuid root program.

There are, as far as a few hours fiddling showed me, 2 system calls that will setup a new namespace: clone(2), and unshare(2). The second is easier to use, though only available since 2.6.16. But while etch ships 2.6.18, the glibc coming with it doesn't implement unshare(2), so we need to use syscall(2) instead. The following code will run /bin/sh, or any command given as argument after creating a new namespace. The new process and its subprocesses will inherit the new namespace.

#include <sched.h>
#include <syscall.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
  syscall(SYS_unshare, CLONE_NEWNS);
  if (argc > 1)
    return execvp(argv[1], &argv[1]);
  return execv("/bin/sh", NULL);
}

This tool, once built, is called newns in the following example.

$ mkdir /tmp/abc
$ ./newns
$ mount -n --bind /tmp/abc /tmp
$ touch /tmp/a
$ ls /tmp
a
<in another terminal>
$ ls /tmp
abc
$ ls /tmp/abc
a

When using namespaces, it is better not to have mount fill /etc/mtab, using the -n argument. /proc/mounts will contain the proper mount information about the namespace of the process reading it. /proc/PID/mounts will contain the mount information for the given process.

As bind mounts also work on files, you can override some files. The following will run dash instead of bash (in subprocesses, too, obviously):

$ ./newns sh -c "mount -n --bind /bin/dash /bin/bash; /bin/bash"

Back to the idea of having encrypted volumes only available to some processes, the following should work (unverified):

$ ./newns sh -c "encfs /tmp/crypt-raw /tmp/crypt; /bin/bash"

Only the opened bash, and its subprocesses, would have access to /tmp/crypt.

The newns tool used above could, to allow normal users to be able to fiddle with namespaces, be improved to be a setuid root program that would drop its privileges right after unshare(2) to take the same privileges as the calling process.

As you can see, per-process namespaces have a wide range of possible uses ; it's astonishing that it's not more used yet, considering its age.

Additionally to per-process namespaces, there are also a bunch of other (more recent) features that allow to implement vserver-like features with a vanilla kernel, such as networking netspaces (work under progress, though), PID namespaces and utsname (see uname(2)) namespaces. Actually, these features are designed to be used by vserver and openvz.

I am looking forward to having unprivileged mounts implemented, so that users could fool around with bind mounts. Unprivileged namespaces would be a nice addition.

2008-12-12 23:06:33+0900

p.d.o

Both comments and pings are currently closed.

4 Responses to “Per-process namespaces”

  1. Per-process Namespaces - pam-namespace | etbe - Russell Coker Says:

    […] Per-process Namespaces – pam-namespace Mike writes about his work in using namespaces on Linux [1]. In 2006 I presented a paper titled “Polyinstantiation of directories in an SE Linux system” about this at the SAGE-AU conference [2]. […]

  2. Ean Schuessler Says:

    Hmmm. This reminds me of Plan 9.

  3. priyanka Says:

    Hi there
    I tried to run your example about per-process namespace given:http://glandium.org/blog/?p=217

    But When I tried to run it and mount directories like you my mount and modifications reflected in other processes (like if i open and check in another terminal).
    Can you tell me the exact way ?
    I didnt find the place to comment on that blog thats why m writing here :)
    please reply me on my email id.
    If you can help me out…It will be great help for me :)

  4. glandium Says:

    Were you root when running the newns program?