faulty.lib vs. Firefox for Android

As mentioned in the previous post, Firefox for Android now has a fresh new linker that allows on-demand decompression. All the necessary code is currently in mozilla-central, but on-demand decompression is disabled by default. For good reason, see below.

Building with on-demand decompression only requires adding the following to your .mozconfig:

export MOZ_ENABLE_SZIP=1

This will enable on-demand decompression for libxul.so only.

With this, you can build Firefox for Android normally and … not feel much of an improvement. The main reason is that the compressed libxul.so library is bigger (because of the chunk compression, see previous post), and read almost entirely. That, in turn, is due to how the code and data is laid out in the library.

Since the way this on-demand decompression of libraries works relies on chunks, each time you need even a few bytes at some not-yet decompressed location, you end up decompressing a complete chunk. And since the static linker (the one used when building libraries) is not smart enough to lay out everything in an order optimal for startup, we end up reading a lot of things at random locations.

There are essentially two phases of Firefox startup related to libraries that are important to us: loading/linking the library, and using it. The sooner the former finishes, the sooner the latter can start.

With a freshly built mozilla-central, this is what the map of decompressed chunks looks like after loading and linking the library:

Each red square is a decompressed chunk. In the above map, 348 chunks are decompressed, out of 998. The main reason for this happening is static initializers.

Then by the time Gecko has been initialized and a web page loaded and started painting, here is what the map looks like:

That is a lot of red: 715 decompressed chunks, out of 998.

Educating the static linker

As explained above, the static linker is not laying out code and data in an on-demand decompression friendly way. We’ve had various experiences with binary reordering in the past. This unfortunately currently involves some manual work, but we hope our release engineering team will be able to automate the process, and I sure hope we’ll be able to make it more reliable and faster. I will detail further below the current state of the art.

If you’re not interested in technical details, you can skip to the results.

The mozilla build system parts are almost there. One part of binary reordering involves building with the right flags. It so happens that we’re currently not building the javascript engine with the necessary -ffunction-sections -fdata-sections flags. I hope to be able to land that patch later this week.

Another part involves educating the linker of the order in which we want it to put symbols. Unfortunately, neither GNU ld nor gold allow to order by symbol names. Gold allows to order by section with the --section-ordering-file argument (thus the -ffunction-sections -fdata-sections flags), and GNU ld… not quite. However, GNU ld has a very powerful scripting facility. And with a little bit of effort, one can reorder sections with it. So the remaining grunt work is to map symbols to sections.

It would be nice if that was enough, but it turns out it is not. I haven’t tried with gold, but it turns out GNU ld, or at least the version shipped in the Android NDK (which, by the way, doesn’t come with gold, which is mainly why I made the effort to support GNU ld linker scripts) fails to link when non-virtual thunks are too far away from the corresponding function, even when it’s not far enough to grant such a failure (I think that’s a known bug of older GNU ld versions). So I went around grouping such related functions together, which is a good idea anyway.

For those interested, the trick for GNU ld to reorder functions is to add -Wl,-T,filename on the link command, with the file filename containing the following:

SECTIONS {
  .text : {
    *(.text.symbol1)
    *(.text.symbol2)
(...)
  }
}
INSERT BEFORE .fini

Not all symbols need to be present in that file. The same can be done with other sections such as .data, provided INSERT BEFORE is adjusted, and several sections can be put in the same file.

The patch integrating all this should hopefully land soon (please note that it requires a backout of changeset 283408b8d8a3).

Profiling environment

Reordering symbols when linking requires that we know in which order symbols are accessed during startup. As far as I know there are no existing tool to do that besides valgrind. Julian Seward has been working on getting valgrind to work on Android devices, and it currently works well on the pandaboard, Nexus S and Xoom.

While a full featured valgrind requires debug info for some system libraries and a lot of memory, for the purpose of binary reordering, we actually won’t be using any valgrind tool ; only valgrind core. As such, there are much less constraints on the Android builds required. The only constraint is to have an Android build that has the zygote hooks to wrap applications when starting them. Recent AOSP and Linaro Android has them (those based on Ice Cream Sandwich). Memory might be a problem on the Nexus S, but not on the pandaboard or the xoom.

Pick your weapon of choice. Mine was a pandaboard running Linaro Android 12.02, which works so much better than the gingerbread based version I was using before. I had a few glitches, though. Since it didn’t want to use my mouse, I couldn’t click through to enable ethernet networking, so I had to enable it manually through adb shell with the following shell commands:

# netcfg etho dhcp
# setprop net.dns1 x.x.x.x

Where x.x.x.x, obviously, is the DNS server IP address.

Another glitch I ran into is that after some time not using it, it would go to sleep, and fail to resume from it. This is a known issue and has a workaround.

Building Valgrind

All necessary tweaks we’ve needed to get useful data from valgrind have landed on valgrind trunk. So all is needed to build valgrind is to check out svn://svn.valgrind.org/valgrind/trunk and follow the instructions from README.android. Just make sure you have at least revision 12408.

Setup to profile Firefox

After building Firefox with on-demand decompression enabled, we need to create an APK for use with valgrind. This requires that elfhack is disabled, because it confuses valgrind. This can be done at packaging time, just make sure you didn’t create a package already, because this leaves libraries in elfhacked state.

$ make -f client.mk package USE_ELF_HACK=

Once you have an APK, you can install it on the Android device, but make sure you have no leftovers first:

$ adb uninstall org.mozilla.fennec_$USER
$ make -f client.mk install

At this point, we’re only doing on-demand decompression for libxul.so. There aren’t much benefits from on-demand decompression for the other smaller libraries. Valgrind is going to require debug information for that library, so we need to push an unstripped libxul.so onto the device:

$ adb shell mkdir -p /sdcard/symbols/data/data/org.mozilla.fennec_$USER/cache
$ adb push $objdir/dist/lib/libxul.so /sdcard/symbols/data/data/org.mozilla.fennec_$USER/cache/

Finally, we’ll need to setup the hook:

$ cat << EOF > /tmp/start_valgrind_fennec
#!/system/bin/sh
export TMPDIR=/data/data/org.mozilla.fennec_$USER
exec /data/local/Inst/bin/valgrind --log-file=/sdcard/valgrind.log --tool=none --vex-guest-chase-thresh=0 --trace-flags=10000000 --demangle=no \$*
EOF
$ adb push /tmp/start_valgrind_fennec /data/local/
$ adb shell chmod 755 /data/local/start_valgrind_fennec
$ adb shell setprop wrap.org.mozilla.fennec_$USER /data/local/start_valgrind_fennec

Profiling Firefox

As mentioned earlier and in the previous post, any time we go through an unusual code path, we’re at risk of jumping randomly in the code. The more we do so, the more chunks we’re going to have to decompress. We want to avoid that the more we can during startup. I got the best results from running three different profiles and aggregating them:

  • Starting Firefox for the very first time (no existing profile on internal flash, etc.)
  • Starting Firefox again
  • And finally starting it yet again with an url to open

Each time, we need to wait for Firefox to settle, then close it, adb pull /sdcard/valgrind.log and keep the log separate.

But since we built with on-demand decompression enabled, valgrind is not going to be happy, so we also need to say to the dynamic linker to extract the libraries to internal flash instead.

Starting Firefox can thus be done with the following command:

$ adb shell am start -n org.mozilla.fennec_$USER/.App --es env0 MOZ_LINKER_EXTRACT=1

And starting it with an url:

$ adb shell am start -n org.mozilla.fennec_$USER/.App -d http://www.mozilla.org/ --es env0 MOZ_LINKER_EXTRACT=1

As far as I know, we don’t have a nice command line allowing to close Firefox. What I was able to do on the pandaboard is to open the application menu with the Menu key from an usb keyboard (mouse didn’t work, but keyboard did) and select Quit from there. Another approach I tested was to add a Quit intent to our java code base, but it didn’t seem to work properly, even though it did bring the process down and left zygote happy.

Relinking

At this point, we have three valgrind logs. We need to transform them into an aggregated symbols list:

$ awk -F'[ +]' '$10 ~ /libxul.so/ { print $9 }' log1 log2 log3 > /tmp/list

The script dealing with symbols ordering in the mozilla build system will take care of deduplicating entries. Please note merging the symbols lists more cleverly may give better results.

Now we can use that list of symbols to relink libxul.so:

$ rm $objdir/toolkit/library/libxul.so
$ make -C $objdir/toolkit/library SYMBOL_ORDER=/tmp/list

And finalize by packaging it, with elfhack not disabled, then installing:

$ make -f client.mk package install

Results

After loading and linking the library: (146 decompressed chunks out of 998)

A great part of the remaining decompressed chunks above is due to relocation. It is in our roadmap to do them on-demand as well. When that comes, we’ll probably need to reorder data as we are currently reordering functions.

Some of the randomly placed red squares are, I think, static initializers from statically linked libraries such as stlport, which aren’t built with -ffunction-section.

After starting to display a web page: (356 decompressed chunks out of 998)

The result is pretty good: we’ve cut the number of decompressed chunks in half. There is still a factor of randomness, which has to be investigated, whether this comes from valgrind missing things, the linker not being able to move some symbols, the linker generating things at inconvenient locations, or something else.

Less decompression means less time spent decompressing, and thus means faster startup. Unfortunately, many other things happen on the java side during startup that make the win from doing less decompression smaller than it should be. We’re going to work on reducing, or delaying these things, whatever they are. Especially when starting with an url.

2012-03-02 11:38:08+0200

faulty.lib, p.m.o

You can leave a response, or trackback from your own site.

4 Responses to “faulty.lib vs. Firefox for Android”

  1. Andrew Sutherland Says:

    Love the visualizations! I was going to ask for some on your last blog post, but it didn’t seem helpful…

    To help drive home the awesomeness of this work, you should have an image that’s just a single giant blob of red to demonstrate how much worse things were before faulty.lib and the successive wave of optimizations!

  2. beats by dr dre Says:

    Appreciating the persistence you put into your website and detailed information you offer. It’s awesome to come across a blog every once in a while that isn’t the same old rehashed material. Wonderful read! I’ve saved your site and I’m adding your RSS feeds to my Google account.

  3. ylxw3n8d Says:

    彼らは、ヨーロッパをツアーゴラン高原でのキブツに住んでいたし、最終的に3 months.Upon米国への帰国のためのジャーナリズムのプロジェクトで作業するために、イスラエルのテルアビブに落ち着い、トムはWLUK TVの週末のアンカーとして雇われた グリーンベイ。, Bally-バリー, メキシコ湾の近く; カルカシュー川に面したスペースの3500フィート; と財産の北端にカルカシューパス/キャメロンループ上の間口の4000フィート。, バッグ, http://www.transducersite.com/miumiuミュウミュウ-c-4.html, http://www.transducersite.com/hermesエルメス-c-7.html, ここで多くの作品は、有名なアーティストによるものの、他の場所で展示されていたことがないほとんど。, SEE BY CHLOE/シーバイクロエ JOY RIDER ジョイライダー ハンドバッグ 9S7528 N105 C208 See By Chloe-シーバイクロエ, Nicolettoは16歳の学生と合意の上で性的関係を持っていた。, LOEWE/ロエベ HERITAGE MINI TOTEハンドバッグ 37779E34 0009 7341 Loewe-ロエベ, 【緊急大幅値下げ!】COACH コーチ 43207 SV/KH アレックス コーテッド オプアート ジップアラウンドウォレット 長財布 シルバー/カーキ【coupon0214】【sa0829】【超歓迎された】, また、すべての時間の最も誤解曲の一つ(バンドがあっても、サタンの教会の歌を書いて告発された)、最もカバーさの一つだ。, コンデンサ, ミュウミュウ VITELLO LUX ハンドバッグ ライトパープル RT0365 MiuMiu-ミュウミュウ, 彼はまた、ラジオシティでは6月18日22から、5泊スタンドアップをやっている。, 高い素材エルメス(Hermes) ケリーラキ 32エヴェンヌ(こげ茶)×シルバー金具, 共和党は、現在の議会の力/コントロールでのNCの教師に対する政治的な報復報復に地獄を曲げた模様はほとんど完全にノースカロライナ州の教職を骨抜きに待つことができなかったと彼らは内臓摘出で素晴らしい仕事をして、はっきりと深遠な不足だけでなく、実証されている 尊敬するが、ノースカロライナ州の教師のための無礼でも露骨な侮辱。, http://www.mytourismbd.com/バイクウエア館-c-18_19.html, #12479;カツ] オフロードミラー2 M8 左 6406【80%OFF】, 財布/小物, http://www.wanderphilippines.com/財布小物-c-14_132.html, ミュウミュウ ショルダーバッグ MIUMIU ミュウミュウ 2WAYバッグ/ポーチ『2013年春夏新作』マドラスクリスタル レザー ビジュー パパヤ 5N1742 MADRAS CRISTAL PAPAYA/2A2S F0S73売り尽くし, まあ、私は何かにショーを開いています、我々は素晴らしいです新しいアルバムの健全な量をやっている、と私は歌がライブ判明する方法と、それらが受信されてきたように、非常に満足している、too.I ちょうどギターで立ち上がって一人で遊ぶである、より多くやるのが大好きなので、私は一種の方法で、自分のために開いて、約5曲をやっている。, これらは、元の1962ピックアップの再現であり、あなたはヴィンテージフェンダーサウンドをキャプチャすることができます。, 北京語MyFordTouchTMとフォードのレーンシステム、アダプティブクルーズコントロールを維持、後部インフレータブルシートベルトとSYNCを含む先端技術の包括的なリストでは、すべての新しいモンデオは、お客様が本当に必要な機能と価値を提供します, テレコム、メディアおよび金融·アソシエイツ·インク、カリフォルニア州メンロパークにあるコンサルティングおよび調査会社は、グローバルスタースペクトルは、少なくとも20億ドルであり得ることは昨年の投稿ブログで述べている。

  4. ball Says:

    Hello There. I found your blog using msn. This is a really well
    written article. I will make sure to bookmark it and return to read more of your useful info.
    Thanks for the post. I’ll definitely comeback.

Leave a Reply