Efficient way to get files off the page cache

There's this great feature in modern operating systems called the page cache. Simply put, it keeps in memory what normally is stored on disk and helps both read and write performance. While it's all nice for a day to day use, it often gets in the way when one wants to track performance issues with "cold cache" (when the files you need to access are not in the page cache yet).

A commonly used command to flush the Linux page cache is the following:

# echo 3 > /proc/sys/vm/drop_caches

Unfortunately, its effect is broad, and it flushes the whole page cache. When working on cold startup performance of an application like Firefox, what you really want is to have your page cache in a state close to what it was before you started the application.

One way to get in a better position than flushing the entire page cache is to reboot: the page cache will be filled with system and desktop environment libraries during the boot process, making the application startup closer to what you want. But it takes time. A whole lot of it.

In one of my "what if" moments, I wondered what happens to the page cache when using posix_fadvise with the POSIX_FADV_DONTNEED hint. Guess what? It actually reliably flushes the page cache for the given range in the given file. At least it does so with Debian Squeeze's Linux kernel. Provided you have a list of files your application loads that aren't already in the page cache, you can flush these files and only these from the page cache.

The following source code compiles to a tool to which you give a list of files as arguments, and that flushes these files:

#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
int main(int argc, char *argv[]) {
  int i, fd;
  for (i = 1; i < argc; i++) {
    if ((fd = open(argv[i], O_RDONLY)) != -1) {
      struct stat st;
      fstat(fd, &st);
      posix_fadvise(fd, 0, st.st_size, POSIX_FADV_DONTNEED);
      close(fd);
    }
  }
  return 0;
}

It's actually a pretty scary feature, especially on multi-user environments, because any user can flush any file she can open, repeatedly, possibly hitting system performance. By the way, on systems using lxc (and maybe other containers, I don't know), running the echo 3 > /proc/sys/vm/drop_caches command from a root shell in a container does flush the host page cache, which could be dangerous for VPS hosting services using these solutions.

Update: I have to revise my judgment, it appears posix_fadvise(,,,POSIX_FADV_DONTNEED) doesn't flush (parts of) files that are still in use by other processes, which still makes it very useful for my usecase, but also makes it less dangerous than I thought. The drop_cache problem is still real with lxc, though.

2010-12-29 19:34:37+0900

miscellaneous, p.d.o, p.m.o

You can leave a response, or trackback from your own site.

11 Responses to “Efficient way to get files off the page cache”

  1. Csillag Tamas Says:

    The cache drop thing does not work with OpenVZ.

  2. foo Says:

    That sounds like a security issue (DoS), please get a CVE.

  3. voracity Says:

    I’ve always wondered why you can’t just copy the application files and start from the freshly copied files to get something like a cold start. Is there are a problem with that kind of approach?

    (NB: I have no idea how the file caches work — maybe it’s hash-based or something? Changing the hash wouldn’t be hard though. And I assume file dependencies outside the app would still be cached, but I don’t think that’s a big deal in Firefox’s case, is it?)

  4. Andrew Pollock Says:

    Interesting article. How do you tell what is or isn’t in the page cache?

  5. glandium Says:

    voracity: because when you copy the files, they get in the page cache.

    Andrew Pollock: that, i’d love to know. AFAIK, there is no easy way to do so.

  6. voracity Says:

    @glandium: I naively thought only reads caused blocks to get copied into the page cache (not writes), but I’m now the wiser. I’m surprised there’s no low-level disk write commands that avoid going through the cache.

    Does the page cache survive unmounting/remounting? (Possibly with different mount params or mountpoints?)

  7. glandium Says:

    > I’m surprised there’s no low-level disk write commands that avoid going through the cache.

    There are. Using the O_DIRECT flag when opening a file, for example. Copying the files have a side effect, though: it changes the layout on disk, possibly making comparisons harder.

    > Does the page cache survive unmounting/remounting?

    Usually not, but how do you unmount/remount / or /usr without stopping your desktop environment ?

  8. Neil Rashbrook Says:

    [Typing source of HTML source sucks.]

    Your article contains the following invalid HTML:

    </pre>

    <p></code></p></blockquote>

    The </code> never makes it to Planet making the rest of the page monospace :-(

  9. glandium Says:

    Neil: fixed

  10. voracity Says:

    Why would you have to unmount/remount / or /usr? I was imagining creating a special partition just for Firefox in a portable format, and unmounting/remounting that. It sounds like you’ve got a pretty good method for cold starts on Linux anyway. I guess I was thinking something like this might work for Windows, where, if I remember correctly, cold starts were more difficult.

  11. glandium Says:

    voracity: because there also are system files involved in Firefox startup.

Leave a Reply