Archive for May, 2012

With a little help from the kernel

[ Disclaimer: simplified, high-level view ahead. ]

When a program reads or writes data at a given virtual address, it uses instructions telling the CPU to do so. When the CPU doesn’t know the address, it faults. When it knows the address, but its access rights don’t allow the read or write operation the program wanted, it faults, too. Operating systems do trap these faults, and the system kernel handles them, allowing the program to continue.

As a virtual address can point to a various range of different things, the kernel keeps track of what address ranges are backed by what. The most typical backing is physical memory: a given virtual address corresponds to a given physical RAM address.

Other typical backings include zero-memory (memory full of zero), copy-on-write, file-backed mappings (mmap with a file descriptor), etc. Or a combination of those.

When a file is mapped into memory by a program, the program may access data from that file through “standard” reads/write to memory, and the kernel does its job of getting the data from disk, putting it in physical memory, and telling the CPU to look there.

When physical memory becomes short for the demand, the kernel may choose to throw away anything that it can get back in physical memory later, like file-backed mappings, which can be read again from disk when needed. Another strategy is to move parts of physical memory to disk. This is “swapping” or “paging”.

Anyways. When faulty.lib loads a library from a Zip archive, it reserves (shared) memory for the uncompressed library, and marks it as non-readable and non-writable. When code or data from the library is accessed, the kernel handles the CPU fault, and ends up throwing a segmentation fault signal (SIGSEGV) to the process. The process handles the signal, and fills the memory buffer with parts of the uncompressed library that are necessary, and flags them with the appropriate access rights. On further accesses to the same location, the already uncompressed data will be accessed directly.

The downside of this approach is that besides paging/swapping, there is no way to get rid of the unused parts in case of memory pressure. And since Android devices don’t do paging/swapping, it’s effectively wasted memory.

The facility we’re using on Android for that shared memory, ashmem (currently in staging for mainline kernel), has a mechanism that could almost help us: a program can “unpin” ashmem ranges, indicating to the kernel memory regions it is allowed to throw away when it is under memory pressure. Further accesses to memory that the kernel threw away are like accesses to anonymous memory for the first time: zeroed-out.

If the program does NULL checks, it can figure whether the kernel may have thrown data away. But in faulty.lib’s case, that’s not quite possible. Any part of the code in a library may directly jump into a region that the kernel freed, and the resulting zeroed-out memory will just be executed instead of being filled.

So, in faulty.lib’s case, it would be interesting if the kernel had a special backing for such userspace-filled memory regions, where it would consider throwing them away like it does for “unpinned” ashmem. Afterwards, accesses to these memory regions would trigger some signal for the program to fill the memory again.

The current proposal, now part of a plumber’s wishlist thanks to Lennart Poettering, involves a new flag for madvise() and would make the kernel send a SIGBUS signal to a process when memory is accessed after the kernel has thrown it away. This proposal has received some interest from Andi Kleen.

And it would be useful for more than just faulty.lib: application caches (images, network, etc.) (although ashmem fulfills that need to some extent), JIT code, live decompression of content other than libraries, you name it.

2012-05-14 18:21:42+0900

faulty.lib | No Comments »