# Archive for November, 2012

## Hooking the memory allocator in Firefox

Supplanting the system memory allocator usually involves some tricks. In a cross-platform software like Firefox, this involves different tricks on different platforms. Firefox uses such tricks to implant jemalloc. Sadly, this makes replacing jemalloc itself even trickier.

For instance, trace-malloc, our leak detection tool, used on debug builds, requires that jemalloc is disabled.

Work is under way to make supplanting jemalloc much easier. It is not yet clear if this will be enabled by default on release builds, but it would make sense to enable the feature at least on nightlies.

What does the feature provide? A way to hook or replace jemalloc in Firefox at startup time (as opposed to build time, like trace-malloc). The idea is to build a specialized library (more on that further below) and make Firefox use it instead, or on top of jemalloc, with some weak linking tricks. To enable the feature, pass --enable-replace-malloc to configure or add ac_add_options --enable-replace-malloc to your mozconfig (provided you applied the patches or got a tree where the patches are landed).

With the feature built, you can start Firefox with a malloc replacement library easily:

• On GNU/Linux:
$LD_PRELOAD=/path/to/library.so firefox • On OSX: $ DYLD_INSERT_LIBRARIES=/path/to/library.dylib firefox
• On Windows:
MOZ_REPLACE_MALLOC_LIB=drive:\path\to\library.dll firefox As I happen to have built Firefox with the feature enabled for all platforms on try, to validate that it works, you can toy around with these builds. A replacement library is expected to provide the following functions, or any subset: • void replace_init(const malloc_table_t *table) • void *replace_malloc(size_t size) • int replace_posix_memalign(void **ptr, size_t alignment, size_t size) • void *replace_aligned_alloc(size_t alignment, size_t size) • void *replace_calloc(size_t num, size_t size) • void *replace_realloc(void *ptr, size_t size) • void replace_free(void *ptr) • void *replace_memalign(size_t alignment, size_t size) • void *replace_valloc(size_t size) • size_t replace_malloc_usable_size(usable_ptr_t ptr) • size_t replace_malloc_good_size(size_t size) • void replace_jemalloc_stats(jemalloc_stats_t *stats) • void replace_jemalloc_purge_freed_pages() • void replace_jemalloc_free_dirty_pages() The first function, replace_init is the first function from the library that will be called (if it exists), before the first call to any other. It is passed a pointer to a function table containing pointers to the corresponding jemalloc functions from Firefox. The last three functions are specific to jemalloc. jemalloc_stats is only important to replace if you want about:memory to still be accurate according to anything you’ve done in other functions, and jemalloc_purge_freed_pages and jemalloc_free_dirty_pages are used to force the allocator to return some unused memory to the system. The other functions are the usual suspects, picked from C89, POSIX, C11, or OSX (malloc_good_size). They should however all be considered cross-platform (especially malloc_good_size). All these functions, when they exist, are called instead of the corresponding jemalloc functions, which makes it the responsibility of the replacing functions to call back the corresponding jemalloc function if necessary. This allows, for example, to: • Replace jemalloc entirely. The third patch bug 804303 does that to allow to replace the (currently default) old fork of jemalloc with a fresh jemalloc. Something similar could be done to test other allocators, like tcmalloc. • Make memory allocation functions randomly return NULL as in Out of Memory conditions, aka fuzzing. • Make all allocations bigger to add tracing data. • Log allocations. • etc. ## A small implementation example Consider the following question: how many times does realloc end up copying data? Stated differently, how many times does realloc not return the pointer it was given? Create the memory/replace/realloc/realloc.c file with the following content: // This header will declare all the replacement functions, such that you don't need // to worry about exporting them with the right idiom (dllexport, visibility...) #include "replace_malloc.h" #include <stdlib.h> #include <stdio.h> static const malloc_table_t *funcs = NULL; static unsigned int total = 0, copies = 0; void print_stats() { printf("%d reallocs, %d copies\n", total, copies); } void replace_init(const malloc_table_t *table) { funcs = table; atexit(print_stats); } void *replace_realloc(void *ptr, size_t size) { void *newptr = funcs->realloc(ptr, size); // Not thread-safe, but it's only an example. total++; // We don't want to count deallocations as copies. if (newptr && newptr != ptr) copies++; return newptr; }  Add a memory/replace/realloc/Makefile.in file: DEPTH = @DEPTH@ topsrcdir = @top_srcdir@ srcdir = @srcdir@ VPATH = @srcdir@ include(DEPTH)/config/autoconf.mk

LIBRARY_NAME = replace_realloc
FORCE_SHARED_LIB = 1
NO_DIST_INSTALL = 1

CSRCS = realloc.c

MOZ_GLUE_LDFLAGS = # Don't link against mozglue
WRAP_LDFLAGS = # Never wrap malloc function calls with -Wl,--wrap

include $(topsrcdir)/config/rules.mk  Add the following to memory/replace/Makefile.in: DIRS += realloc Finally, build objdir/memory/replace. You’ll get a library in objdir/memory/replace/realloc that you can use as described at the beginning of this post. On my system, after starting and quitting Firefox without doing much, it prints: 41078 reallocs, 37197 copies It sure is a simple example, that can actually be fulfilled with other tools (like dtrace), but it’s now up to you, developers, to come up with more useful uses. The blocked bugs already show some. Note this facility still has the advantage of being more cross-platform than tools like dtrace, and to work happily on top of jemalloc (valgrind, for instance, doesn’t support that gracefully), which can be important when looking at some particular aspects of memory allocation. The above example, while simple, is a typical case where the underlying memory allocation library has an impact on the result: other memory allocation libraries have different size classes, which modifies how often realloc will need to actually reallocate, as opposed to grow the existing allocation in-place. 2012-11-27 13:49:15+0900 ## Debian EFI mode boot on a Macbook Pro, without rEFIt Diego’s post got me to switch from grub-pc to grub-efi to boot Debian on my Macbook Pro. But I wanted to go further: getting rid of rEFIt. rEFIt is a pretty useful piece of software, but it’s essentially dead. There is the rEFInd fork, which keeps it up-to-date, but it doesn’t really help with FileVault. Moreover, the boot sequence for a Linux distro with rEFIt/rEFInd looks like: Apple EFI firmware → rEFIt/rEFInd → GRUB → Linux kernel. Each intermediate step adding its own timeout, so rEFIt/rEFInd can be seen as not-so-useful intermediate step. Thankfully, Matthew Garrett did all the research to allow to directly boot GRUB from the Apple EFI firmware. Unfortunately, his blog post didn’t have much actual detail on how to do it. So here it is, for a Debian system: • Install a few packages you’ll need in this process: # apt-get install hfsprogs icnsutils • Create a small HFS+ partition. I have a 9MB one, but it’s only filled by about 500K, so even smaller should work too. If, like me, you were previously using grub-pc, you probably have a GRUB partition, you can repurpose it. In gdisk, it looks like this: Number Start (sector) End (sector) Size Code Name 5 235284480 235302943 9.0 MiB AF00 Apple HFS/HFS+ Partition GUID code: 48465300-0000-11AA-AA11-00306543ECAC (Apple HFS/HFS+) Partition unique GUID: AD1F5465-B777-4178-AC4D-1DE8B2EB1B4B First sector: 235284480 (at 112.2 GiB) Last sector: 235302943 (at 112.2 GiB) Partition size: 18464 sectors (9.0 MiB) Attribute flags: 0000000000000000 Partition name: 'Apple HFS/HFS+'  • Create a HFS+ filesystem on that partition: # mkfs.hfsplus /dev/sda5 -v Debian (replace /dev/sda5 with whatever your partition is) • Add a fstab entry for that filesystem: # echo$(blkid -o export -s UUID /dev/sda5) /boot/efi auto defaults 0 0 >> /etc/fstab
• Mount the filesystem:
# mkdir /boot/efi
# mount /boot/efi

• Edit /usr/sbin/grub-install, look for « xfat », and remove the block of code that looks like:
if test "x$efi_fs" = xfat; then :; else echo "${efidir} doesn't look like an EFI partition." 1>&2
efidir=
fi

• Run grub-install. At this point, there should be a /boot/efi/EFI/debian/grubx64.efi file (if using grub-efi-amd64).
• Create a /boot/efi/System/Library/CoreServices directory:
# mkdir -p /boot/efi/System/Library/CoreServices
# ln /boot/efi/EFI/debian/grubx64.efi /boot/efi/System/Library/CoreServices/boot.efi
• Create a dummy mach_kernel file:
# echo "This file is required for booting" > /boot/efi/mach_kernel
• Grab the mactel-boot source code, unpack and build it:
# wget http://www.codon.org.uk/~mjg59/mactel-boot/mactel-boot-0.9.tar.bz2
# tar -jxf mactel-boot-0.9.tar.bz2
# cd mactel-boot-0.9
# make PRODUCTVERSION=Debian

• Copy the SystemVersion.plist file:
# cp SystemVersion.plist /boot/efi/System/Library/CoreServices/
• Bless the boot file:
# ./hfs-bless /boot/efi/System/Library/CoreServices/boot.efi
# rsvg-convert -w 128 -h 128 -o /tmp/debian.png /usr/share/reportbug/debian-swirl.svg
# png2icns /boot/efi/.VolumeIcon.icns /tmp/debian.png
# rm /tmp/debian.png


Now, the Apple Boot Manager, shown when holding down the option key when booting the Macbook Pro, looks like this:

And the Startup disk preferences dialog under OSX, like this:

2012-11-18 11:18:14+0900

## Fun with weak dynamic linking

Dynamic linkers, at least in the UNIX world, usually allow to load libraries in a process address space at startup. On Linux systems, you load such a library with LD_PRELOAD. On OSX, with DYLD_INSERT_LIBRARIES.

On Linux systems, when using LD_PRELOAD, the dynamic linker will also use symbols from the (pre)loaded library instead of system libraries. For example, if a program calls the write function and a library exporting a write symbol is loaded with LD_PRELOAD, the write function from the loaded library will be used instead of that of the libc (even when the symbol version doesn’t match).

On OSX, symbol resolution is usually done with a “two-level namespace”: symbols are associated with library names, and when resolving symbols, both are used. More than that, the library name is used to find the library in the dyld search path. Libraries loaded with DYLD_INSERT_LIBRARIES won’t be used with two-level namespace symbol resolution, which makes it less useful than LD_PRELOAD. Fortunately, it is also possible to use a “flat” namespace, in which case only the symbol name is considered during symbol resolution, and is searched in all loaded libraries, in the order in which they were loaded. Flat namespace can be triggered by setting the DYLD_FORCE_FLAT_NAMESPACE environment variable, linking the main program with -force_flat_namespace, or linking programs and libraries with the -flat_namespace argument. Note the latter only affects the programs and libraries built with that argument, while the former two force to use a flat namespace for all libraries, including those which, like system ones, were built with a two-level namespace. There are also cases where a single symbol may be resolved with the flat namespace, while others in the program or library are using two-level namespace.

Weak dynamic linking is another feature that can be used to tell the dynamic linker to ignore missing symbols. Consider the following source code:

extern void foo() __attribute__((weak)); // weak_import is preferred on OSX.
int main() {
if (foo)
foo();
return 0;
}


On Linux systems, this just works. Compile the program (you’ll need to build it with -fPIC, though), start it, and it will do nothing, since foo is defined nowhere.

Combined with shared library (pre)loading, this can be used to provide simple hooks in your application. For example, with the following source code built as a shared library:

#include <stdio.h>
void foo() {
printf("foo\n");
}


Running the program again with LD_PRELOAD set to load that shared library (note LD_PRELOAD=foo.so won’t work, a path is needed, like LD_PRELOAD=./foo.so), it will print foo because the dynamic linker will have resolved the foo symbol to that of the shared library.

On OSX, unfortunately, things are not as easy. First, building the test program above fails:

Undefined symbols for architecture x86_64:
"_foo", referenced from:
_main in test-LIeVtB.o
clang: error: linker command failed with exit code 1 (use -v to see invocation)


There are linker options to force undefined symbols to be resolved at runtime:

• -undefined dynamic_lookup, which will mark all undefined symbols as having to be looked up at runtime,
• -Wl,-U,symbol_name, which only does so for the given symbol (note: you have to prepend an underscore to the symbol name)

With one of these options, the program works as on Linux: it does nothing when run alone, and prints foo when loading the library with DYLD_INSERT_LIBRARIES… if you build with XCode 4.5. Running the program built with XCode 4.3 fails when not loading the shared library, with the following error:

dyld: Symbol not found: _foo
Referenced from: ./test
Expected in: flat namespace
in ./test
Trace/BPT trap: 5


And if at build time, you target OSX 10.5 (with MACOSX_DEPLOYMENT_TARGET or -mmacosx-version-min), the error is slightly different:

dyld: Symbol not found: _foo
Referenced from: ./test
Expected in: dynamic lookup

Trace/BPT trap: 5


Each error is due to a different bug:

• Since OSX 10.6, the link edition rules in the __LINKEDIT segment are in a new, compressed, format: DYLD_INFO. The linker in Xcode < 4.5 forgets to flag weak imports as weak in the DYLD_INFO data. Compare the output for dyldinfo with a binary built with Xcode 4.5 vs. the output for a binary built with Xcode 4.3:
$dyldinfo -bind test-xcode4.5 | sed -n '2p;/foo/p' segment section address type addend dylib symbol __DATA __got 0x100001038 pointer 0 flat-namespace _foo (weak import)$ dyldinfo -bind test-xcode4.3 | sed -n '2p;/foo/p'
__DATA  __got            0x100001038    pointer      0 flat-namespace   _foo

Notice the missing weak import. Manually setting the flag with a hexadecimal editor fixes it (in both bind and lazy_bind tables).
• In older OSX releases, the link edition rules use a “classic” format. In that format, the symbol table is used for flags such as N_WEAK_REF (weak reference). In the new DYLD_INFO format, the N_WEAK_REF flag isn’t used, which partly explains the previous bug. When using the old format and two-level namespaces, missing weak symbols are errors. With flat namespace, they aren’t. This means the error we get with a program built for a 10.5 target goes away if building with -flat_namespace. Note this doesn’t work at the symbol level: if the library is built for two-level namespace (and is marked as such in the Mach-O header), and the weak symbol is without a corresponding library name, making it resolved with a flat namespace, it doesn’t work.

At this point, one could think weak dynamic linking is pretty much useless on OSX, at least, that it was before Xcode 4.5 was released. As it turns out, there are other use cases where it actually works. Consider the following code:

#include <malloc/malloc.h>
// The following is defined in malloc/malloc.h:
// extern size_t malloc_zone_pressure_relief(malloc_zone_t *zone, size_t goal) __attribute__((weak_import));
int main() {
if (malloc_zone_pressure_relief)
malloc_zone_pressure_relief(NULL, 0);
return 0;
}


This program in itself is useless, but the point is, the malloc_zone_pressure_relief function is only available since OSX 10.7. Without the weak_import, the program would fail to start with an undefined symbol on OSX 10.6 and below. Without the if, it would crash because the symbol would resolve to NULL, and the program would jump there. But the program itself has to be built on OSX 10.7 at least, for the malloc_zone_pressure_relief function to be there when building.

And in that use-case, we end up with the right flags in DYLD_INFO:

$dyldinfo -bind test | sed -n '2p;/malloc/p' segment section address type addend dylib symbol __DATA __got 0x100001038 pointer 0 libSystem _malloc_zone_pressure_relief (weak import)  This actually gives us a hint for a way out of our misery for our foo function on Xcode < 4.5: linking against a dummy library implementing the weak symbol. When doing so, we get the proper flag in DYLD_INFO, like with malloc_zone_pressure_relief: $ dyldinfo -bind test | sed -n '2p;/foo/p'
__DATA  __got            0x100001038    pointer      0 libfoo           _foo (weak import)


And the linker additionally does something nice: when all symbols needed from a library are weak references, it marks the library import itself as weak:

\$ otool -l test | grep -B 2 libfoo
cmdsize 40
name libfoo.dylib (offset 24)


What this means is that even if the libfoo.dylib is missing, it will still work. The downside is that we are now using two-level namespace, which, as mentioned above, makes DYLD_INSERT_LIBRARIES useless. The program thus needs to be built with a flat namespace.

Update: It turns out some versions of XCode don’t conveniently mark libraries with only weak imports as weakly linked, so the -Wl,-weak_library,libraryname.dylib flag is required instead of -Llibraryname.

In summary, if you want to add optional hooks that a library loaded through LD_PRELOAD/ DYLD_INSERT_LIBRARIES can implement:

• on OSX < 10.6, use weak symbols and build with -flat_namespace,
• on OSX >= 10.6, when building with Xcode < 4.5, use weak symbols, build with -flat_namespace and link against a dummy library implementing the hook functions with -Wl,-weak_library,libraryname.dylib,
• on Linux systems and on OSX >= 10.6, when building with Xcode 4.5, simply use weak symbols,
• on Windows, to the best of my knowledge, weak dynamic linking is not supported.

Stay tuned for the next post, which will describe what this will be used for in Firefox.

2012-11-05 21:49:26+0900