Archive for the 'faulty.lib' Category

With a little help from the kernel

[ Disclaimer: simplified, high-level view ahead. ]

When a program reads or writes data at a given virtual address, it uses instructions telling the CPU to do so. When the CPU doesn't know the address, it faults. When it knows the address, but its access rights don't allow the read or write operation the program wanted, it faults, too. Operating systems do trap these faults, and the system kernel handles them, allowing the program to continue.

As a virtual address can point to a various range of different things, the kernel keeps track of what address ranges are backed by what. The most typical backing is physical memory: a given virtual address corresponds to a given physical RAM address.

Other typical backings include zero-memory (memory full of zero), copy-on-write, file-backed mappings (mmap with a file descriptor), etc. Or a combination of those.

When a file is mapped into memory by a program, the program may access data from that file through "standard" reads/write to memory, and the kernel does its job of getting the data from disk, putting it in physical memory, and telling the CPU to look there.

When physical memory becomes short for the demand, the kernel may choose to throw away anything that it can get back in physical memory later, like file-backed mappings, which can be read again from disk when needed. Another strategy is to move parts of physical memory to disk. This is "swapping" or "paging".

Anyways. When faulty.lib loads a library from a Zip archive, it reserves (shared) memory for the uncompressed library, and marks it as non-readable and non-writable. When code or data from the library is accessed, the kernel handles the CPU fault, and ends up throwing a segmentation fault signal (SIGSEGV) to the process. The process handles the signal, and fills the memory buffer with parts of the uncompressed library that are necessary, and flags them with the appropriate access rights. On further accesses to the same location, the already uncompressed data will be accessed directly.

The downside of this approach is that besides paging/swapping, there is no way to get rid of the unused parts in case of memory pressure. And since Android devices don't do paging/swapping, it's effectively wasted memory.

The facility we're using on Android for that shared memory, ashmem (currently in staging for mainline kernel), has a mechanism that could almost help us: a program can "unpin" ashmem ranges, indicating to the kernel memory regions it is allowed to throw away when it is under memory pressure. Further accesses to memory that the kernel threw away are like accesses to anonymous memory for the first time: zeroed-out.

If the program does NULL checks, it can figure whether the kernel may have thrown data away. But in faulty.lib’s case, that’s not quite possible. Any part of the code in a library may directly jump into a region that the kernel freed, and the resulting zeroed-out memory will just be executed instead of being filled.

So, in faulty.lib's case, it would be interesting if the kernel had a special backing for such userspace-filled memory regions, where it would consider throwing them away like it does for "unpinned" ashmem. Afterwards, accesses to these memory regions would trigger some signal for the program to fill the memory again.

The current proposal, now part of a plumber's wishlist thanks to Lennart Poettering, involves a new flag for madvise() and would make the kernel send a SIGBUS signal to a process when memory is accessed after the kernel has thrown it away. This proposal has received some interest from Andi Kleen.

And it would be useful for more than just faulty.lib: application caches (images, network, etc.) (although ashmem fulfills that need to some extent), JIT code, live decompression of content other than libraries, you name it.

2012-05-14 18:21:42+0900

faulty.lib | No Comments »

libgcc.a symbol visibility considered harmful

I recently got to rebuild an Android NDK with a fresh toolchain again, and hit an interesting problem. I actually had hit it before, but only this time I fully analyzed what's going on. [As a side note, if you build such a NDK, don't use mpfr 3.1.0, as there is a bug in the libtool it ships]

Linking an application or a library pulls many things, that aren't part of the code being built. One of these many things is the libgcc static library. Part of libgcc consists in an implementation of the platform ABI. On Android systems, this means the ARM EABI. GCC, when compiling some instructions, will generate ABI calls. For example, integer divisions may call __aeabi_idiv.

Consider the following minimized real world scenario:

$ echo "int foo(int a) { return 42 % a; }" > foo.c
$ arm-linux-androideabi-gcc -o libfoo.so -shared foo.c -mandroid

GCC will emit a call to __aeabi_idivmod for the % operation. With GCC 4.6.3, this function is in _divsi3.o under libgcc.a. That function itself calls __aeabi_idiv0, which lives in _dvmd_lnx.o under libgcc.a.

When statically linking, ld will thus include foo.o, _divsi3.o and _dvmd_lnx.o, meaning it will include all functions from these object files. That is, foo, __divsi3, __aeabi_idiv, __aeabi_idivmod, __aeabi_idiv0 and __aeabi_ldiv0. And more than being included, these functions are exported, because symbol visibility in libgcc.a is default. So while we expect exporting foo from our library, we're actually exporting much more, including functions that just happened to be near the ones that our code (indirectly) uses.

Now, let's say we want to build another library, using that foo function from libfoo:

$ cat > bar.c <<EOF
extern int foo(int a);
long long bar(long long a) { return foo(a) % a; }
EOF
$ arm-linux-androideabi-gcc -o libbar.so -shared bar.c -mandroid

(The code above has absolutely no meaning, it just triggers the same function calls as what I was getting in the actual real world case)

When statically linking the above code, GCC will generate a call to __aeabi_ldivmod, which calls __aeabi_ldiv0, and many other things, directly or indirectly. When linking as above, nothing particularly nasty is going to happen. However, linking as above is actually wrong: the resulting library has an undefined reference to the foo symbol, and doesn't depend on libfoo. At runtime, if libfoo wasn't already loaded somehow, loading libbar would fail.

The proper way to link is the following:

$ arm-linux-androideabi-gcc -o libbar.so -shared bar.c -mandroid -L. -lfoo

A feature of ELF static linking is that when it resolves undefined symbols, the linker will choose to use the first occurrence of a symbol it finds in the various objects and libraries given on its command line. So with the command line above, for each __aeabi_* symbol, it will first look in libfoo if there isn't one. And while __aeabi_ldivmod is not in libfoo, __aeabi_ldiv0 is (see above).

So instead of including the code for __aeabi_ldiv0 from libgcc.a, it will call the copy from libfoo.

This wouldn't be so much of a problem if __aeabi_ldiv0 wasn't a weak symbol.

Enters faulty.lib. In the real world case, libfoo is loaded by the system dynamic linker, and libbar by faulty.lib. When resolving symbols for libbar, faulty.lib has to resolve libfoo symbols with the system linker, using dlsym(). On Android, dlsym() returns NULL for weak (defined) symbols, so faulty.lib can't resolve __aeabi_ldiv0.

The real world case wasn't a problem with GCC 4.4.3 from the vanilla Android NDK because in that GCC version, __aeabi_ldivmod doesn't call __aeabi_ldiv0.

This wouldn't happen if shared libraries wouldn't expose random platform ABI specific bits depending on what they use and depending on other symbols that happen to be in the same object files.

A similar issue happened a little while ago on Debian powerpc because a shared library was exporting ABI specific bits. Even worse, the toolchain was assuming the symbols would come from libgcc.a and generated wrong relocations for these symbols.

Update: Interestingly, the __aeabi_* symbols are hidden, in libgcc.a as provided on the Debian armel port.

2012-03-06 17:19:34+0900

faulty.lib, p.d.o, p.m.o | 5 Comments »

faulty.lib vs. Firefox for Android

As mentioned in the previous post, Firefox for Android now has a fresh new linker that allows on-demand decompression. All the necessary code is currently in mozilla-central, but on-demand decompression is disabled by default. For good reason, see below.

Building with on-demand decompression only requires adding the following to your .mozconfig:

export MOZ_ENABLE_SZIP=1

This will enable on-demand decompression for libxul.so only.

With this, you can build Firefox for Android normally and ... not feel much of an improvement. The main reason is that the compressed libxul.so library is bigger (because of the chunk compression, see previous post), and read almost entirely. That, in turn, is due to how the code and data is laid out in the library.

Since the way this on-demand decompression of libraries works relies on chunks, each time you need even a few bytes at some not-yet decompressed location, you end up decompressing a complete chunk. And since the static linker (the one used when building libraries) is not smart enough to lay out everything in an order optimal for startup, we end up reading a lot of things at random locations.

There are essentially two phases of Firefox startup related to libraries that are important to us: loading/linking the library, and using it. The sooner the former finishes, the sooner the latter can start.

With a freshly built mozilla-central, this is what the map of decompressed chunks looks like after loading and linking the library:

Each red square is a decompressed chunk. In the above map, 348 chunks are decompressed, out of 998. The main reason for this happening is static initializers.

Then by the time Gecko has been initialized and a web page loaded and started painting, here is what the map looks like:

That is a lot of red: 715 decompressed chunks, out of 998.

Educating the static linker

As explained above, the static linker is not laying out code and data in an on-demand decompression friendly way. We've had various experiences with binary reordering in the past. This unfortunately currently involves some manual work, but we hope our release engineering team will be able to automate the process, and I sure hope we'll be able to make it more reliable and faster. I will detail further below the current state of the art.

If you're not interested in technical details, you can skip to the results.

The mozilla build system parts are almost there. One part of binary reordering involves building with the right flags. It so happens that we're currently not building the javascript engine with the necessary -ffunction-sections -fdata-sections flags. I hope to be able to land that patch later this week.

Another part involves educating the linker of the order in which we want it to put symbols. Unfortunately, neither GNU ld nor gold allow to order by symbol names. Gold allows to order by section with the --section-ordering-file argument (thus the -ffunction-sections -fdata-sections flags), and GNU ld... not quite. However, GNU ld has a very powerful scripting facility. And with a little bit of effort, one can reorder sections with it. So the remaining grunt work is to map symbols to sections.

It would be nice if that was enough, but it turns out it is not. I haven't tried with gold, but it turns out GNU ld, or at least the version shipped in the Android NDK (which, by the way, doesn't come with gold, which is mainly why I made the effort to support GNU ld linker scripts) fails to link when non-virtual thunks are too far away from the corresponding function, even when it's not far enough to grant such a failure (I think that's a known bug of older GNU ld versions). So I went around grouping such related functions together, which is a good idea anyway.

For those interested, the trick for GNU ld to reorder functions is to add -Wl,-T,filename on the link command, with the file filename containing the following:

SECTIONS {
  .text : {
    *(.text.symbol1)
    *(.text.symbol2)
(...)
  }
}
INSERT BEFORE .fini

Not all symbols need to be present in that file. The same can be done with other sections such as .data, provided INSERT BEFORE is adjusted, and several sections can be put in the same file.

The patch integrating all this should hopefully land soon (please note that it requires a backout of changeset 283408b8d8a3).

Profiling environment

Reordering symbols when linking requires that we know in which order symbols are accessed during startup. As far as I know there are no existing tool to do that besides valgrind. Julian Seward has been working on getting valgrind to work on Android devices, and it currently works well on the pandaboard, Nexus S and Xoom.

While a full featured valgrind requires debug info for some system libraries and a lot of memory, for the purpose of binary reordering, we actually won't be using any valgrind tool ; only valgrind core. As such, there are much less constraints on the Android builds required. The only constraint is to have an Android build that has the zygote hooks to wrap applications when starting them. Recent AOSP and Linaro Android has them (those based on Ice Cream Sandwich). Memory might be a problem on the Nexus S, but not on the pandaboard or the xoom.

Pick your weapon of choice. Mine was a pandaboard running Linaro Android 12.02, which works so much better than the gingerbread based version I was using before. I had a few glitches, though. Since it didn't want to use my mouse, I couldn't click through to enable ethernet networking, so I had to enable it manually through adb shell with the following shell commands:

# netcfg etho dhcp
# setprop net.dns1 x.x.x.x

Where x.x.x.x, obviously, is the DNS server IP address.

Another glitch I ran into is that after some time not using it, it would go to sleep, and fail to resume from it. This is a known issue and has a workaround.

Building Valgrind

All necessary tweaks we've needed to get useful data from valgrind have landed on valgrind trunk. So all is needed to build valgrind is to check out svn://svn.valgrind.org/valgrind/trunk and follow the instructions from README.android. Just make sure you have at least revision 12408.

Setup to profile Firefox

After building Firefox with on-demand decompression enabled, we need to create an APK for use with valgrind. This requires that elfhack is disabled, because it confuses valgrind. This can be done at packaging time, just make sure you didn't create a package already, because this leaves libraries in elfhacked state.

$ make -f client.mk package USE_ELF_HACK=

Once you have an APK, you can install it on the Android device, but make sure you have no leftovers first:

$ adb uninstall org.mozilla.fennec_$USER
$ make -f client.mk install

At this point, we're only doing on-demand decompression for libxul.so. There aren't much benefits from on-demand decompression for the other smaller libraries. Valgrind is going to require debug information for that library, so we need to push an unstripped libxul.so onto the device:

$ adb shell mkdir -p /sdcard/symbols/data/data/org.mozilla.fennec_$USER/cache
$ adb push $objdir/dist/lib/libxul.so /sdcard/symbols/data/data/org.mozilla.fennec_$USER/cache/

Finally, we'll need to setup the hook:

$ cat << EOF > /tmp/start_valgrind_fennec
#!/system/bin/sh
export TMPDIR=/data/data/org.mozilla.fennec_$USER
exec /data/local/Inst/bin/valgrind --log-file=/sdcard/valgrind.log --tool=none --vex-guest-chase-thresh=0 --trace-flags=10000000 --demangle=no \$*
EOF
$ adb push /tmp/start_valgrind_fennec /data/local/
$ adb shell chmod 755 /data/local/start_valgrind_fennec
$ adb shell setprop wrap.org.mozilla.fennec_$USER /data/local/start_valgrind_fennec

Profiling Firefox

As mentioned earlier and in the previous post, any time we go through an unusual code path, we're at risk of jumping randomly in the code. The more we do so, the more chunks we're going to have to decompress. We want to avoid that the more we can during startup. I got the best results from running three different profiles and aggregating them:

  • Starting Firefox for the very first time (no existing profile on internal flash, etc.)
  • Starting Firefox again
  • And finally starting it yet again with an url to open

Each time, we need to wait for Firefox to settle, then close it, adb pull /sdcard/valgrind.log and keep the log separate.

But since we built with on-demand decompression enabled, valgrind is not going to be happy, so we also need to say to the dynamic linker to extract the libraries to internal flash instead.

Starting Firefox can thus be done with the following command:

$ adb shell am start -n org.mozilla.fennec_$USER/.App --es env0 MOZ_LINKER_EXTRACT=1

And starting it with an url:

$ adb shell am start -n org.mozilla.fennec_$USER/.App -d http://www.mozilla.org/ --es env0 MOZ_LINKER_EXTRACT=1

As far as I know, we don't have a nice command line allowing to close Firefox. What I was able to do on the pandaboard is to open the application menu with the Menu key from an usb keyboard (mouse didn't work, but keyboard did) and select Quit from there. Another approach I tested was to add a Quit intent to our java code base, but it didn't seem to work properly, even though it did bring the process down and left zygote happy.

Relinking

At this point, we have three valgrind logs. We need to transform them into an aggregated symbols list:

$ awk -F'[ +]' '$10 ~ /libxul.so/ { print $9 }' log1 log2 log3 > /tmp/list

The script dealing with symbols ordering in the mozilla build system will take care of deduplicating entries. Please note merging the symbols lists more cleverly may give better results.

Now we can use that list of symbols to relink libxul.so:

$ rm $objdir/toolkit/library/libxul.so
$ make -C $objdir/toolkit/library SYMBOL_ORDER=/tmp/list

And finalize by packaging it, with elfhack not disabled, then installing:

$ make -f client.mk package install

Results

After loading and linking the library: (146 decompressed chunks out of 998)

A great part of the remaining decompressed chunks above is due to relocation. It is in our roadmap to do them on-demand as well. When that comes, we'll probably need to reorder data as we are currently reordering functions.

Some of the randomly placed red squares are, I think, static initializers from statically linked libraries such as stlport, which aren't built with -ffunction-section.

After starting to display a web page: (356 decompressed chunks out of 998)

The result is pretty good: we've cut the number of decompressed chunks in half. There is still a factor of randomness, which has to be investigated, whether this comes from valgrind missing things, the linker not being able to move some symbols, the linker generating things at inconvenient locations, or something else.

Less decompression means less time spent decompressing, and thus means faster startup. Unfortunately, many other things happen on the java side during startup that make the win from doing less decompression smaller than it should be. We're going to work on reducing, or delaying these things, whatever they are. Especially when starting with an url.

2012-03-02 11:38:08+0900

faulty.lib, p.m.o | 1 Comment »

Introducing faulty.lib

TL;DR link: faulty.lib is on github.

Android applications are essentially zip archives, with the APK extension instead of ZIP. While most of Android is java, and java classes are usually loaded from a ZIP archive (usually with the JAR extension), Android applications using native code need to have native libraries on the file system. These native libraries are found under /data/data/$appid/lib, where $appid is the package name, as defined in the AndroidManifest.xml file.

So, when Android installs an application, it puts that APK file under /data/app. Then, if the APK contains native libraries under a lib/$ABI subdirectory (where $ABI is armeabi, armeabi-v7a or x86), it also decompresses the files and places them under /data/data/$appid/lib. This means native libraries are actually stored twice on internal flash: once compressed and once decompressed.

This is why Chrome for Android takes almost 50MB of internal flash space after installation.

Firefox for Android used to have that problem, and we decided we should stop doing that. Michael Wu thus implemented a custom dynamic linker, which would load most of Firefox libraries directly off the APK. This involves decompressing the zipped data somewhere in memory, and doing ld.so's job to make the library usable (please note that on Android, ld.so is actually named linker). There were initially circumstances under which we would decompress into a file and reuse it the next time Firefox starts, but we subsequently removed that possibility (except for debugging purpose) because it ended up being slower than decompressing each time (thanks to internal flash being so slow).

Anyways, in order to do ld.so's job, our custom linker was directly derived from Android's system linker, with many tweaks. This custom linker has done its job quite well for some time, now, but has been recently replaced, see further below.

Considering Firefox can't do anything useful involving Gecko until its libraries are loaded, in practice, this means Firefox can't display a web page faster than completely decompressing the libraries. Or can it?

Just don’t sit down 'cause I’ve moved your chair

We know that a lot of code and data is not used during Firefox startup. Based on that knowledge, we started working on only loading the necessary bits. The core of the idea is, when a library is requested to be loaded, to reserve anonymous memory for its decompressed size, and... that's all. That memory is protected such that any access to it triggers a segmentation fault. When a segmentation fault happens, the required bits are decompressed, and execution is resumed where it was before the segmentation fault.

The original prototype was decompressing from a normal zip deflated stream, which means it was impossible to seek in it. So, if an access was made at the end of the library, it was necessary to decompress the whole library. With some nasty binary reordering, and some difficulty, it was possible to avoid accessing the end of the library, but the approach is very much fragile. It only takes an unexpected code path to make things much slower than they should be.

Consequently, for the past months, I've been working on improving the original idea and, with some assistance from Julian Seward, implemented the scheme with seekable compressed streams. Instead of letting the zip archive creation tool deflate libraries, we store specially crafted files. Essentially, files are cut in small chunks, and each chunk is compressed individually. This means a less efficient compression, but it also means random access to chunks is possible.

However, instead of stacking on top of our existing custom linker, I started over, from the ground up. First, because it needed a serious clean up (a good part of linker.c is leftovers from the Android linker that we don't use, and APKOpen.cpp is a messy mix of JNI stubs, library decompression handling (which in itself was also a mess) and Gecko initialization code) and most importantly, because it relied on some Android system linker internals and thus required binary compatibility with the system linker. Which, according to Google engineers that contacted us a few months ago, was going to break in what we now know will be called Android Jelly Bean.

The benefit of the clean slate approach is that the new code is not tied to Gecko at all and was designed to work on Android as well as on desktop Linux systems (which made debugging much much easier). We're thus releasing the code as a separate project: faulty.lib. It is licensed under the Mozilla Public License version 2.0. Please feel free to try, contribute, and/or fork it.

This dynamic linker is not meant to completely follow standard ELF rules (most notably for symbol resolution), and as a result does some assumptions. It's also still work in progress, with some obvious optimizations pending (like, avoiding to resolve the same symbols again and again during relocations), and some features missing (for example, symbol versioning).

The next blog post will give some information about how to build Firefox for Android to benefit from on-demand decompression. I will also detail a few of the tricks involved in this dynamic linker in subsequent blog posts.

2012-03-01 17:10:11+0900

faulty.lib, p.d.o, p.m.o | 5 Comments »