Archive for the 'p.m.o' Category

Extreme tab browsing

I have a pathological use of browser tabs: I use a lot of them. A lot is probably an understatement. We could say I use them as bookmarks of things I need to track. A couple weeks ago, I was saying I had around two hundred tabs opened. I now actually have much more.

It affected startup until I discovered that setting the browser.sessionstore.max_concurrent_tabs pref to 0 was making things much better by only loading tabs when they are selected. This preference has/will become browser.sessionstore.restore_on_demand. However, since I only start my main browser once a day, while other applications start and while I begin to read email, I hadn't noticed that this was still heavily affecting startup time: about:startup tells me reaching the sessionRestored state takes seven seconds, even on a warm startup.

It also affects memory usage, because even when tabs are only loaded on demand, there is a quite big overhead for each tab.

And more importantly, it gets worse with time. And I think the user interface is actively making it worse.

So, to get an idea how bad things were in my session, I wrote a little restartless extension. After installing it, you can go to the about:tabs url to see the damage on your session. Please note that the number of groups is currently wrong until you open the tab grouping interface.

This is what the extension has to say about my session 2 days ago, right after a startup:

556 tabs across 4 groups in 1 window
1 tab has been loaded
444 unique addresses
105 unique hosts
9 empty tabs
210 http:
319 https:
14 ftp:
2 about:
2 file:
55 addresses in more than 1 tab
39 hosts in more than 1 tab

The first thing to note is that when I filed the memory bug 4 days earlier, I had a bit more than 470 tabs in that session. You can see 4 days later, I now have 555 tabs (if excluding the about:tabs tab).

The second thing to note is something I suspected because it's so easy to get there: a lot of the tabs are opened on the same address. Since Firefox 4.0, if I'm not mistaken, there is a great feature in the awesomebar, that allows to jump to an existing tab matching what you type in the urlbar. That is very useful, and I use it a lot. However, there are a lot of cases where it's not as useful as it could be.

One of the addresses I visit a lot is http://buildd.debian.org/iceweasel. It gives me the build status of the latest iceweasel package I uploaded to Debian unstable. That url is particularly known in my browsing history, and is the first hit when I type "buildd" in the urlbar (actually, even typing "b" brings it first). Unfortunately, that url redirects to https://buildd.debian.org/status/package.php?p=iceweasel through an HTTP redirection. I say unfortunately because when I type "buildd" in the urlbar, I get 6 suggestions for urls in the form http://buildd.debian.org/package (I also watch other packages build status), and the suggestion to switch to the existing tab for what the first hit would get me to is 7th. Guess what? The suggestion list only shows 6 items ; you have to scroll to see the 7th.

The result is that I effectively have fifteen tabs open on that url.

I also keep a lot of bugzilla.mozilla.org bugs open in different tabs. The extension tells me there are 255 of them... for 166 unique bugs. Largely, the duplicate bug tabs are due to having these bugs open in some tab, but accessing the same bugs from somewhere else, usually a dependent bug or TBPL. I also have 5 tabs opened on my request queue. I usually get there by going to the bugzilla home page and clicking on the "My Requests" link. And I have several tabs opened on the same bug lists. For the same reason.

When I started using tab groups, I splitted in very distinct groups. Basically, one for Mozilla, one for Debian, one for stuff I want to follow (usually blog posts I want to follow comments from), and one for the rest. While I was keeping up with grouping at the beginning, I don't anymore, and the result is that each group is now a real mess.

Firefox has hundreds of millions users. It's impossible to create a user experience that works for everyone. One thing is sure, it doesn't work for me. My usage is probably very wrong at different levels, but I don't feel my browser is encouraging me to use it better, except by making my number of opened tabs explode to an unmanageable level (I already have 30 tabs more than when I started writing this post 2 days ago).

There are a few other things I would like to know about my usage that my extension hasn't told me yet, either because it doesn't tell, or because I haven't looked:

How many tabs end up loaded at the end of a typical day?
How many tabs do I close?
How many duplicate tabs do I open and close?
How long has it been since I looked at a given tab?
How do the number of tabs and duplicates evolve with time?

Reflecting on my usage patterns, I think a few improvements, either in the stock browser, or through extensions, could make my browsing easier:

Auto-grouping tabs: When I click on a link to an url under mozilla.org, I most likely want it in the Mozilla group. An url under debian.org would most likely go in the Debian group.
Switch to an existing tab when following a link to an already opened url: That might not be very useful as a general rule, but at least for some domains, it would seem useful for me that the browser switches to an existing tab not only through the urlbar, but also when following links in a page. If I'm reading a bug, click on a bug it depends on, and that bug is already opened in another tab, get me there. There would be a history problem to solve, though. (e.g. where do back and forward bring?)

Maybe these exist as extensions, I don't know. It's hard to find very specific things like that through an add-on search (though I haven't searched very hard). [Looks like there is an experiment for the auto tab grouping part]

I think it would also be interesting to have something like Test Pilot, but for users that want to know the answer to "How do I use my browser?". As I understand it, Test Pilot can show individual user data, but it only can do so if there is such data, and you can't get data for past studies you didn't take.

In my case, I'm not entirely sure that, apart from the pinned tabs, I use the tab bar a lot. And even for pinned tabs, most of the time I use keyboard shortcuts. I'm not using the menu button that much either. I already removed the url and search bar (most of the time) with LessChrome HD. Maybe I could go further and use the full window for web browsing.

2011-08-29 09:27:55+0900

firefox, p.m.o | 48 Comments »

No wonders with PGO on Android

I got Profile Guided Optimization (a.k.a. Feedback Directed Optimization) to work for Android builds, using GCC 4.6.1 and Gold 2.21.53.

Getting such a build is not difficult, just a bit time consuming.

Apply the patches from bug 632954
Get an instrumented build with the following command:

$ make -f client.mk MOZ_PROFILE_GENERATE=1 MOZ_PROFILE_BASE=/sdcard/mozilla-pgo
Create a Fennec Android package:

$ make -C $objdir package

If you get an elfhack error during this phase, make sure to update your tree, the corresponding elfhack bug has been fixed.
Install the package on your device:

$ adb install -r $objdir/dist/fennec-8.0a1.en-US.android-arm.apk
Open Fennec on your device, and do some things in your browser, so that execution data is collected. For my last build, I installed the Zippity Test Harness add-on, and ran V8, Sunspider and PageLoad tests
Collect the execution data:

$ adb pull /sdcard/mozilla-pgo /
Clean-up the build tree:

$ make -f client.mk clean
Build using the execution data:

$ make -f client.mk MOZ_PROFILE_USE=1
Create the final Fennec Android package, install and profit:

$ make -C $objdir package $ adb install -r $objdir/dist/fennec-8.0a1.en-US.android-arm.apk

As the title indicates, though, this actually leads to some disappointment. On my Nexus S, the resulting build is actually slightly slower on Sunspider than the corresponding nightly. It is however much faster on V8 (down to around 1200 from around 1800), but... is just as fast as a non PGO/FDO build with GCC 4.6. Even sadder, the non PGO/FDO build with GCC 4.6 is faster on Sunspider than the PGO/FDO build, and on-par with the GCC 4.4-built nightly.

So, my experiments suggest that switching to GCC 4.6 would give us some nice speed-ups, but enabling PGO/FDO wouldn't add to that.

If you want to test and compare my builds on different devices, please go ahead, with the following builds:

Yesterday's nightly build, built with GCC 4.4 (13,522,584 bytes)
Build of the same commit, with GCC 4.6 (12,984,560 bytes)
Build of the same commit, with GCC 4.6 and PGO/FDO enabled (13,992,139 bytes)

The former will install as "Nightly", while the two others will install as "Fennec".

The sizes are also interesting: while the PGO build is bigger than the Nightly build, the plain GCC 4.6 build is smaller.

2011-08-04 14:50:50+0900

p.m.o | 10 Comments »

Building an Android NDK with recent GCC and binutils

As of writing, the latest Native-code Development Kit for Android (r6) comes with gcc 4.4.3 and binutils 2.19 for ARM. The combination is a quite old toolchain, that lacks various novelties, such as properly working Profile Directed Optimization (a.k.a. Profile Guided Optimization), or Identical Code Folding.

The first thing that is needed to rebuild a custom NDK, is the NDK itself.

$ wget http://dl.google.com/android/ndk/android-ndk-r6-linux-x86.tar.bz2 $ tar -xjf android-ndk-r6-linux-x86.tar.bz2 $ cd android-ndk-r6

Next, you need to get the NDK source (this can take a little while and requires git, but see further below if you want to skip this part):

$ ./build/tools/download-toolchain-sources.sh src

Rebuilding the NDK toolchain binaries is quite simple:

$ ./build/tools/build-gcc.sh $(pwd)/src $(pwd) arm-linux-androideabi-4.4.3

But this doesn't get you anything modern. It only rebuilds what you already have.

The GCC 4.4.3 that comes with the NDK is actually quite heavily patched. Fortunately, only a few patches are required for gcc 4.6.1 to work with the NDK (corresponding upstream bug).

In order to build a newer GCC and binutils, you first need to download the source for GCC (I took 4.6.1) and binutils (I took the 2.21.53 snapshot, see further below), as well as GMP, MPFR and MPC. The latter was not a requirement to build GCC 4.4. GMP and MPFR are with the NDK toolchain sources, but the versions available there are too old for GCC 4.6.

All the sources must be placed under src/name, where name is gcc, binutils, mpc, mpfr, or gmp. The sources for MPC, MPFR and GMP need to remain as tarballs, but the sources for GCC and binutils need to be extracted (don't forget to apply the patch linked above to GCC). In the end you should have the following files/directories:

src/gcc/gcc-4.6.1/
src/binutils/binutils-2.21.53/
src/gmp/gmp-5.0.2.tar.bz2
src/mpc/mpc-0.9.tar.gz
src/mpfr/mpfr-3.0.1.tar.bz2

If you skipped the NDK toolchain source download above, you will also need the gdb sources. NDK comes with gdb 6.6, so you should probably stay with that one. The source needs to be extracted like GCC and binutils, so you'll have a src/gdb/gdb-6.6/ directory. Another part you will need is the NDK build scripts, available on git://android.git.kernel.org/toolchain/build.git. They should be put in a src/build/ directory. For convenience, you may directly download a tarball.

You then need to edit the build/tools/build-gcc.sh script to add support for MPC:

Add the following lines somewhere around similar lines in the script:

MPC_VERSION=0.8.1 register_var_option "--mpc-version=<version>" MPC_VERSION "Specify mpc version"

And add the following to the configure command in the script:

--with-mpc-version=$MPC_VERSION

If you want to use gold by default instead of GNU ld, you can also add, at the same place:

--enable-gold=default

If you want a GNU libstdc++ compiled as Position Independent Code (note that by default, the NDK won't use GNU libstdc++, but its own), you can add, at the same place:

--with-pic

Once all this preparation is done, you can build your new NDK toolchain with the following command:

$ ./build/tools/build-gcc.sh --gmp-version=5.0.2 --mpfr-version=3.0.1 --mpc-version=0.9 --binutils-version=2.21.53 $(pwd)/src $(pwd) arm-linux-androideabi-4.6.1

If you're running a 64-bits system on x86-64, you can also add the --try-64 option to the above command, which will give you a 64-bits toolchain to cross-build ARM binaries, instead of the 32-bits toolchain you get by default.

When building Firefox with this new toolchain, you need to use the following in your .mozconfig:

ac_add_options --with-android-toolchain=/path/to/android-ndk-r6/toolchains/arm-linux-androideabi-4.6.1/prebuilt/linux-x86

Or the following for the 64-bits toolchain:

ac_add_options --with-android-toolchain=/path/to/android-ndk-r6/toolchains/arm-linux-androideabi-4.6.1/prebuilt/linux-x86_64

Note that currently, elfhack doesn't support the resulting binaries very well, so you will need to also add the following to your .mozconfig:

ac_add_options --disable-elf-hack

Or, if you don't want to build it yourself, you can get the corresponding pre-built NDK (32-bits) (thanks to Brad Lassey for the temporary hosting). Please note it requires libstdc++ from gcc 4.5 or higher.

Here is a list of things you may need to know if you want to try various combinations of versions, and that I had to learn the hard way:

GCC 4.6.1 doesn't build with binutils 2.19 (GNU assembler lacks support for a few opcodes it uses)
GNU ld >= 2.21.1 has a crazy bug that leads to a crash of Firefox during startup. There is also a workaround.
Gold fails to build with gcc 4.1.1 (I was trying to build in the environment we use on the buildbots) because of warnings (it uses -Werror) in some versions, and because of an Internal Compiler Error with other versions.
When building with a toolchain that is not in the standard directory and that is newer than the system toolchain (like, in my case, using gcc 4.5 in /tools/gcc-4.5 instead of the system gcc 4.1.1), gold may end up with a libstdc++ dependency that is not satisfied with the system libstdc++. In that case, the NDK toolchain build will fail with the error message "Link tests are not allowed after GCC_NO_EXECUTABLES.", which isn't exactly helpful to understand what is wrong.
At some point, I was getting the same error as above when the build was occurring in parallel, and adding -j1 to the build-gcc.sh command line solved it. It hasn't happened to me in my recent attempts, though.
Gold 2.21.1 crashes when using Identical Code Folding. This is fixed on current binutils HEAD (which is why I took 2.21.53).

2011-08-01 17:48:16+0900

p.m.o | 19 Comments »

-feliminate-dwarf2-dups FAIL

DWARF-2 is a format to store debugging information. It is used on many ELF systems such as GNU/Linux. With the way things are compiled, there is a lot of redundant information in the DWARF-2 sections of an ELF binary.

Fortunately, there is an option to gcc that helps dealing with the redundant information and downsizes the DWARF-2 sections of ELF binaries. This option is -feliminate-dwarf2-dups.

Unfortunately, it doesn't work with C++.

With -g alone, libxul.so is 468 MB. With -g -feliminate-dwarf2-dups, it is... 1.5 GB. FAIL.

The good news is that as stated in the message linked above, -gdwarf-4 does indeed help reducing debugging information size. libxul.so, built with -gdwarf-4 is 339 MB. This however requires gcc 4.6 and a pretty recent gdb.

2011-07-30 11:21:01+0900

p.d.o, p.m.o | 1 Comment »

Faster Firefox cold startup, now in nightlies

The 20-line patch to Firefox 4 that makes startup on Windows up to 2x as fast and the stupid one-liner that does the same on Linux both grew into a full fledged preloading solution working on all our supported platforms. This involved major changes to how we initialize Firefox, and a few glitches with our leak detector, but this time it should stay for good (it had been backed out twice already).

Users shouldn't notice any change until after they reboot after upgrading to the latest nightly. It is possible to watch how things evolve with the about:startup extension.

These cold startup improvements will be available in Firefox 7.

2011-06-20 02:49:46+0900

p.m.o | 16 Comments »

Aftermath of the Linux compiler and optimizations changes

It has been two weeks since we switched to faster Linux builds. After some "fun" last week, it is time to look back.

The news that Mozilla will be providing faster Linux builds made it to quite a lot of news sites, apparently. Most of the time with titles as misleading as "as fast as Windows builds". I love that kind of journalism where "much closer to" is spelled "as fast as". Anyways, I've also seen a number of ensuing comments basically saying that we sucked and that some people had been successfully building with GCC 4.5 for a while, and now with GCC 4.6, so why can't we do that as well?

Well, for a starter, I doubt they've been building with GCC 4.6 for long, and definitely not Firefox 4.0, because we only recently fixed a bunch of C++ conformance problems that GCC 4.6 doesn't like. Update: now that I think of it, I might have mixed up things. This bunch might only become a problem when compiling in C++0x mode (which is now enabled when supported on mozilla-central).

Then, there are fundamental differences between a build someone does for her own use, and Mozilla builds:

Mozilla builds need to work on as many machines as possible, on as many Linux distros as possible,
Mozilla builds are heavily tested (yet, not enough).

Builds that run (almost) everywhere

One part of the challenge of using a modern compiler is that newer versions of GCC like to change subtle things in their C++ standard library making compiled binaries dependent on a newer version of libstdc++. This behaviour pretty much depends on the C++ standard library features used.

For quite a while, Mozilla builds have been compiled with GCC 4.3, but up to Firefox 3.6, only libstdc++ 4.1 was required. Some new code added to Firefox 4.0 however changed that and libstdc++ 4.3 is now required. This is the reason why Firefox 4.0 doesn't work on RedHat/CentOS 5 while Firefox 3.6 did, because these systems don't have libstdc++ version 4.3.

Switching to GCC 4.5 (or 4.6, for that matter), in Firefox case, means requiring libstdc++ version 4.5 (or 4.6). While this is not a problem for people building for their own system, or for distros, it is when you want the binary you distribute to work on most systems, because libstdc++ version 4.5 is less widespread.

So on one end, we had an outdated toolchain that couldn't handle Profile Guided Optimization properly, and on the other hand, a more modern toolchain that creates a dependency on a libstdc++ version that is not widespread enough.

At this point, I should point out that an easy way out exists: statically linking libstdc++. The downside is that is makes the binaries significantly bigger.

Fortunately, we found a hackish way to avoid these dependencies on newer libstdc++. It has been extended since, and now allows to build Firefox with GCC up to version 4.7, with or without the experimental C++0x mode enabled. The resulting binaries only depend on libstdc++ 4.1, meaning they should work on RedHat/CentOS 5.

Passing the test suites

We have a big test suite, which is probably an understatement: we have plenty thousands of unit tests. And we try to avoid these unit tests regressing. I don't think most people building Firefox run them. Actually most of the hundreds of Linux distributions don't.

I know, for I also happen to be the Debian maintainer, that Debian does run test suites on all its architectures, but it skips mochitests because they take too long. As Debian has switched to GCC 4.5 for a while, now, I knew there weren't regressions in these test suites it runs, at least at the optimization level used by default.

And after the switch to faster Linux builds, we haven't seen regressions either. Well, not exactly, but I'll come back on that further below.

GCC 4.5, optimization levels, and Murphy's Law

Sadly, after the switch, we weren't getting symbols in crash reports anymore. The problem was that the program used to dump debugging symbols from our binaries in a usable form for crash reports post-processing didn't output function information. This, in turn, was due to a combination of a lack of functionality in the dump program, and a bug in GCC 4.5 (which seems to be fixed in GCC 4.6) that prevented the necessary information from being present in the DWARF sections when the -freorder-block-and-partition option is used. I'll come back on this issue in a subsequent blog post. The short term (and most probably long term) solution was to remove the incriminated option.

But while searching for that root cause, we completely disabled PGO, leaving the optimization level to -O3. I had tested gcc 4.5 and -O3 without PGO a few times on the Try server with no other problems than a few unimportant rounding errors we decided to ignore by modifying the relevant tests, so I wasn't expecting anything bad.

That was without counting on Murphy's Law, in the form of a permanent Linux x86 reftest regression. But that error didn't appear in my previous tests, so it had to have been introduced by some change in the tree. After some quite painful bisecting (I couldn't reproduce the problem with local builds, so I had to resort on the Try Server, each build+test run taking between 1 and 2 hours), I narrowed it down to the first part of bug 641426 triggering a change in how GCC optimizes some code, and as a side effect, changes some floating point operations on x86, using memory instead of registers or vice versa, introducing rounding discrepancy in different parts of the code.

But while searching for that root cause, we backed out the switch to aggressive optimization and went back to -Os instead of -O3. The only remaining change from the switch was thus the GCC version. And Murphy's Law kicked in yet again, in the form of a permanent Linux x86/x86-64 a11y mochitest regression. As it turned out, that regression had already been spotted on the tracemonkey tree, during the couple days it had PGO enabled, but wasn't using -O3, and disappeared when the -O3 switch was merged from mozilla-central. But at the time, we didn't track it down. We disabled the tests to open the tree for development, but the issue is still there, just hidden. Though now that we're back to aggressive optimization and PGO, we re-enabled the test and the issue has gone away, which is kind of scary. We definitely need to find the real issue, which might be related to some uninitialized memory.

We also had a couple new intermittent failures that are thought to be related to the GCC 4.5 switch, but all of them go away if we simply re-run the test off the same build.

What does this all mean?

First, it means that in some cases it seems a newer compiler unveils some dormant bugs in our code. And that with the same compiler, different optimization options can lead to different results/breakages.

By extension, this means it is important that we carefully choose our default optimization options, especially when PGO is not used (which is most of the time for non Mozilla builds). I'm even tempted to say it would be important for us to test these non-PGO defaults, but we also can't try all possible compiler versions either.

This also means it is important that Linux distros run our test suites with their builds, especially when they use newer compilers.

A few related thoughts

While handling the transition to this new toolchain, it became clear that the lack of correlation between our code base and our mozconfig files is painful. The best demonstration is the Try server, which is now using GCC 4.5 for all builds by default. But if you push there a commit that doesn't have the necessary libstdc++ compatibility hack, the builds will fail. There are many other cases of changes in our mozconfigs requiring changes in e.g. configure.in, and these are even more reasons to get mozconfigs in our code base.

The various issues we got in the process also made me reflect on our random oranges. I think we lack one important information when we have a test failure: does it reliably happen with a given build? Chances are that most random oranges don't (like the two I mentioned further above), but those that do may point out subtle problems of compiler optimizations breaking some of our assumptions (though so far, most of the time, they just turn into permanent oranges). The self-serve API does help in that regard, allowing to re-trigger a given test suite on the same build, but I think we should enhance our test harnesses to automatically retry failing tests.

What about GCC 4.6?

I think it's too early to think about GCC 4.6. While it has some improvements over GCC 4.5, it may also bring its own set of surprises. GCC also has a pretty bad history of screwing things up in dot-zero releases, so it would be better to wait for 4.6.1, which I hear is planned for soon. And GCC 4.6 would make things even harder for the Try server and some other branches considering the C++ conformance problems I mentioned.

Also, most of the people mentioning GCC 4.6 also mention Link Time Optimization, which is the main nicety it brings. Unfortunately, linking requires gigabytes of memory, which means several things:

We need that much memory on our build bots, which I'm not sure they currently have
It actually exhausts the 32-bits address space, which means we'd need to cross compile the 32-bits builds on 64-bits hosts with a 64-bits toolchain. Which, in turn, means changing build bots, and maybe some fun with our build system.

GCC people are working on decreasing the amount of memory required to link, but it's work in progress and won't be workable until GCC 4.7 (or, who knows, even later). We might have switched to clang before that ;-)

2011-05-12 10:18:07+0900

p.m.o | 26 Comments »

Faster Linux builds

After two failed attempts last year, and a few glitches yesterday, we finally managed to get our Linux (and, obviously, Linux64) builds to use GCC 4.5, with aggressive optimization (-O3) and profile guided optimization enabled. This means we are finally using a more modern toolchain, opening opportunities for things such as static analysis. This also means we are now producing a faster Firefox, now much closer to the Windows builds on the same hardware on various performance tests.

A nice side effect of some of the work I have done to make the switch possible is that these builds will also work on older Linux platforms such as RedHat/CentOS 5, or possibly older (as long as they come with libstdc++ from GCC 4.1).

The first Firefox release to benefit these new settings should be Firefox 6.

A few branches other than mozilla-central have also been switched, most notably Try, for which there is a known issue if you push something too old. Please make sure to read the corresponding information on wiki.m.o for a workaround. A Mercurial hook is going to be put in place to issue a warning if there are chances your build will fail (it will, however, not prevent the push).

Thanks to Chris Atlee, Rail Aliiev, Taras Glek, Justin Lebar and all those I forgot or am not aware of for their assistance and/or past involvement in the previous attempts.

2011-04-29 11:31:18+0900

p.m.o | 61 Comments »

How to get the hard page fault count on Windows

Dear Lazyweb,

One of the improvements I want to make in an upcoming version of the About Startup extension is to allow to distinguish between cold and hot startups. One way to do so is to check how many page faults actually led to reading off a disk. They are called hard page faults.

On UNIX systems, their count for a given process can be obtained with the getrusage function, which works on both Linux and MacOSX systems.

Under Windows, that is another story, and so far I haven't found anything satisfactory.

My first attempt was to see how cygwin, which brings UNIXish libraries to Windows, was doing for its own getrusage. And the answer to that is that it gives the wrong data. sigh. It uses GetProcessMemoryInfo to fill the hard page faults field (ru_majflt), and nothing for the soft page faults field (ru_minflt). Except GetProcessMemoryInfo returns the number of soft page faults.

The best I could find on MSDN is the Win32_PerfFormattedData_PerfOS_Memory class from Windows Management Instrumentation, except is it system-wide instead of per-process information, and only gives rates (hard page faults per second, which it calls page reads per second). The corresponding raw data doesn't seem very satisfactory either.

So, dear lazyweb, do you have any idea?

Update: Taras suggested to use GetProcessIOCounters, which, despite not giving hard page faults count, looked promising as a way to distinguish between cold and warm startup, but it turns out it is as useless as some systemtap and dtrace scripts you can find on the net: from my experiments, it looks like it only tracks active read() and write() system calls, meaning it doesn't track mapped memory accesses, and more importantly, it only tracks the system calls, not when actually doing I/O, thus hitting the disk.

2011-04-12 16:50:04+0900

p.m.o | 6 Comments »

Avoiding dependencies upon recent libstdc++

Mozilla has been distributing Firefox builds for GNU/Linux systems for a while, and 4.0 should even bring official builds for x86-64 (finally, some would say). The buildbots configuration for these builds uses gcc 4.3.3 to compile the Firefox source code. With the C++ part of gcc, it can sometimes mean side effects when using the C++ STL.

Historically, the Mozilla code base hasn't made a great use of the STL, most probably because 10+ years back, portability and/or compiler support wasn't very good. More recently, with the borrowing of code from the Chromium project, this changed. While the borrowed code for out-of-process plugins support didn't have an impact on libstdc++ usage, the recent addition of ANGLE had. This manifests itself in symbols version usage

These are the symbol versions required from libstdc++.so.6 on 3.6 (as given by objdump -p):

CXXABI_1.3
GLIBCXX_3.4

And on 4.0:

CXXABI_1.3
GLIBCXX_3.4
GLIBCXX_3.4.9

This means Firefox 4.0 builds from Mozilla need the GLIBCXX_3.4.9 symbol version, which was introduced with gcc 4.2. This means Firefox 4.0 builds don't work on systems with a libstdc++ older than that, while 3.6 builds would. It so happens that the system libstdc++ on the buildbots themselves is that old, which is why we set LD_LIBRARY_PATH to the appropriate location during tests. This shouldn't however be a big problem for users.

Newer gcc, new problems

As part of making Firefox faster, we're planning to switch to gcc 4.5, to benefit from better (as in working) profile guided optimization, and other compiler improvements. We actually attempted to switch to gcc 4.5 twice during the 4.0 development cycle. But various problems made us go back to gcc 4.3.3, the main contender being the use of even newer libstdc++ symbols:

CXXABI_1.3
GLIBCXX_3.4
GLIBCXX_3.4.5
GLIBCXX_3.4.9
GLIBCXX_3.4.14

GLIBCXX_3.4.14 was added in gcc 4.5, making the build require a very recent libstdc++ installed on users systems. As this wouldn't work for Mozilla builds, we attempted to build with -static-libstdc++. This options makes the resulting binary effectively contain libstdc++ itself, which means not requiring a system one. This is the usual solution used for builds such as Mozilla's, that require to work properly on very different systems.

The downside of -static-libstdc++ is that it makes the libxul.so binary larger (about 1MB larger). It looks like the linker doesn't try to eliminate the code from libstdc++ that isn't actually used. Taras has been fighting to try to get libstdc++ in a shape that would allow the linker to remove that code that is effectively dead weight for Firefox.

Why do we need these symbols?

The actual number of symbols required with the GLIBCXX_3.4.14 version is actually very low:

std::_List_node_base::_M_hook(std::_List_node_base*)
std::_List_node_base::_M_unhook()

With the addition of the following on debug builds only:

std::string::_S_construct_aux_2(unsigned int, char, std::allocator<char> const&)
std::basic_string<wchar_t, std::char_traits<wchar_t>, std::allocator<wchar_t> >::_S_construct_aux_2(unsigned int, wchar_t, std::allocator<wchar_t> const&)

The number of symbols required with the GLIBCXX_3.4.9 version is even lower:

std::istream& std::istream::_M_extract<double>(double&)

It however varies depending on the compiler version. I have seen other builds also require std::ostream& std::ostream::_M_insert(double).

All these are actually internal implementation details of the libstdc++. We're never calling these functions directly. I'm going to show two small examples triggering some of these requirements (that actually generalize to all of them).

The case of templates

#include <iostream>
int main() {
    unsigned int i;
    std::cin >> i;
    return i;
}

This example, when built, requires std::istream& std::istream::_M_extract<double>(double&), but we are effectively calling std::istream& operator>>(unsigned int&). It is defined in /usr/include/c++/4.5/istream as:

template<typename _CharT, typename _Traits>
class basic_istream : virtual public basic_ios<_CharT, _Traits> {
    basic_istream<_CharT, _Traits>& operator>>(unsigned int& __n) {
        return _M_extract(__n);
    }
}

And _M_extract is defined in /usr/include/c++/4.5/bits/istream.tcc as:

template<typename _CharT, typename _Traits> template<typename _ValueT>
        basic_istream<_CharT, _Traits>&
        basic_istream<_CharT, _Traits>::_M_extract(_ValueT& __v) {
            (...)
        }

And later on in the same file:

extern template istream& istream::_M_extract(unsigned int&);

What this all means is that libstdc++ actually provides an implementation of an instance of the template for the istream (a.k.a. basic_istream<char>) class, with an unsigned int & parameter (and some more implementations). So, when building the example program, gcc decides, instead of instantiating the template, to use the libstdc++ function.

This extern definition, however, is guarded by a #if _GLIBCXX_EXTERN_TEMPLATE, so if we build with -D_GLIBCXX_EXTERN_TEMPLATE=0, we actually get gcc to instantiate the template, thus getting rid of the GLIBCXX_3.4.9 dependency. The downside is that this doesn't work so well with bigger code, because other things are hidden behind #if _GLIBCXX_EXTERN_TEMPLATE.

There is however another (obvious) way to for the template instantiation: instantiating it. So adding template std::istream& std::istream::_M_extract(unsigned int&); to our code is just enough to get rid of the GLIBCXX_3.4.9 dependency. Other template cases obviously can be worked around the same way.

The case of renamed implementations

#include <list>
int main() {
    std::list<int> l;
    l.push_back(42);
    return 0;
}

Here, we get a dependency on std::_List_node_base::_M_hook(std::_List_node_base*) but we are effectively calling std::list<int>::push_back(int &). It is defined in /usr/include/c++/bits/stl_list.h as:

template<typename _Tp, typename _Alloc = std::allocator<_Tp> >
class list : protected _List_base<_Tp, _Alloc> {
    void push_back(const value_type& __x) {
        this->_M_insert(end(), __x);
    }
}

_M_insert is defined in the same file:

template<typename ... _Args>
void _M_insert(iterator __position, _Args&&... __args) {
    _List_node<_Tp>* __tmp = _M_create_node(std::forward<_args>(__args)...);
    __tmp->_M_hook(__position._M_node);
}

Finally, _M_hook is defined as follows:

struct _List_node_base {
    void _M_hook(_List_node_base * const __position) throw ();
}

In gcc 4.4, however, push_back has the same definition, and while _M_insert is defined similarly, it calls __tmp->hook instead of __tmp->_M_hook. Interestingly, gcc 4.5's libstdc++ exports symbols for both std::_List_node_base::_M_hook and std::_List_node_base::hook, and the code for both methods is the same.

Considering the above, a work-around for this kind of dependency is to define the newer function in our code, and make it call the old function. In our case here, this would look like:

namespace std {
    struct _List_node_base {
        void hook(_List_node_base * const __position) throw ();
        void _M_hook(_List_node_base * const __position) throw ();
    };
    void _List_node_base::_M_hook(_List_node_base * const __position) throw () {
        hook(__position);
    }
}

... which you need to put in a separate source file, not including <list>.

All in all, with a small hack, we are able to build Firefox with gcc 4.5 without requiring libstdc++ 4.5. Now, another reason to switch to gcc 4.5 was to use better optimization flags, but it turns out it makes the binaries 6MB bigger. But that's another story.

2011-03-14 13:21:04+0900

p.d.o, p.m.o | 3 Comments »

How I broke (some) crash reports

We recently figured out that elfhack has been responsible for broken crash reports from Fennec 4.0b5 and Firefox 4.0b11 and 4.0b12 Linux builds. We disabled elfhack by default as soon as it was noticed, meaning both Fennec and Firefox Release Candidates will be giving good crash reports. Unfortunately, this also means we lose the startup time improvements elfhack was giving.

Stack walking and memory mapping

Both Fennec and Firefox Linux crash reports breakages are due to the same problem: Breakpad makes assumptions on memory mapping, and elfhack breaks them.

Usually, ELF binaries are loaded in memory in two segments, one for executable code, and one for data. Each segment has different access permissions, such that executable code can't be modified during execution, and such that data can't be executed as code.

When a crash occurs, the crash reporter stores a bunch of information in a minidump, most importantly a "Module" mapping (I'll come back to that later), as well as registers and stack contents for each thread. This minidump is then sent to us (if the user chooses to) for processing and the result is made available on crash-stats.

The most useful part of the processing is called stack walking. From the data stored in the minidump, correlated with symbol files we keep for each build, we can get a meaningful stack trace, which will tells us where in our codebase the crash occurred, and what call path was taken to get there. Roughly this is how it works (over-simplified):

Take the current instruction pointer address
Find the corresponding symbol
Find the corresponding stack walking information
From the stack waking information, compute the address where the code we're currently in was called from
Repeat from step 2

Step 2 and 3 require that we can map a given memory address to a relative address in a given "Module". For the stack walking software, a "Module" corresponds to a given library or program loaded in memory. For the minidump creation software, a "Module" corresponds to a single memory segment. This is where problems arise.

As I wrote above, ELF binaries are usually loaded in two memory segments, so the minidump creation software is going to store each segment as a different "Module". Well, this is what it does on Android, because Fennec uses its own dynamic loader, and this custom dynamic loader, for different reasons, was made to explicitly instruct the minidump creation software of each segment.

In the desktop Linux case, the minidump creation software actually considers that segments which don't map the beginning of the underlying binary isn't to be stored at all. In practice, this means only the first segment is stored in the minidump for Firefox Linux builds, while all of them are stored for Android. In the common case where binaries are loaded in two segments, this isn't a problem at all: only the first segment contains code, so addresses we get during stack walking are always in that segment for each binary.

Enter elfhack

What elfhack does, on the other hand, is to split the code segment in two parts that end up being loaded in memory separately. Which means instead of two segments, we get three. Moreover, the first segment then only contains binary metadata (symbols, relocations, etc.), and the actual code is in the second segment.

elfhack		normal

Segment #1		Segment #1

Segment #2

Segment #3		Segment #2

In the Linux case, where the minidump creation software only keeps the first segment, addresses it gets during stack walking actually won't map anywhere the minidump knows. As such, it can't know what function we're in, and it can't get the computation information required to walk the stack further.

In the Android case, where all segments are considered separate "Modules", addresses it gets during stack walking do map somewhere the minidump knows. Except that when Breakpad resolves symbols it uses addresses relative to the start of each segment/"Module", while the correct behaviour would be to use addreses relative to the start of the first segment for a given library. Where it gets interesting, is that since libxul.so is so big, the relative address within the second segment is very likely to hit a portion of code when taken relative to the start of the first segment.

So Breakpad is actually likely to find a symbol corresponding to the crash address, as well as stack walking information. Which Breakpad is happy to use to compute the call location, but it ends up being very wrong, since actual register and stack contents don't fit what should be there if the code where Breakpad thinks we are had really been executed. With some luck, the computed call location also ends up in libxul.so as well, at a virtually random location. And so on and so forth.

This is why some of the Fennec crash reports had impossible stack traces, with functions that never call each other.

Fixing crash reports

While disabling elfhack made new builds send minidumps that the stack walking software can handle, it didn't solve the issue with existing crash reports.

Fortunately, in the set of information that the minidump writer software stores, there are the raw contents of the /proc/$pid/maps file. This file, specific to Linux systems, contains the memory mapping of a given process, displaying which parts of what files are mapped where in the address space. This is not used by the processing software, but it allows to figure out what the "Module" mapping would have been had elfhack not been used on the binary.

There are two possible approaches to get meaningful crash reports off these broken minidumps: either modifying the processing software so that it can read the /proc/$pid/maps data, or fix the minidumps themselves. I went with the latter, as it required less work for coding, testing and deploying, the former requiring to actually update the stack walking software, with all the risks this means. The latter only had the risk of further corrupting crash reports that were already corrupted in the first place.

Making some assumptions on the way libraries are loaded in memory, I wrote two tools reconstructing "Module" mapping information from the /proc/$pid/maps data for each of Linux and Android. (warning: the code is a quick and dirty hack).

People from the Socorro team then took over to test the tools on sample crash reports, and once we verified we were getting good results, they went further and systematically applied the fix on broken crash reports (some details can be found in bug 637680). Fortunately, there were "only" around 8000 crash reports for Fennec and around 8000 more for Firefox Linux, so it didn't take long to update the databases.

As of now, the top crashers list, as well as individual crash reports are all fixed, and incoming ones are fixed up every hour.

Re-enabling elfhack

Obviously, we still want the gains from elfhack in the future, so we need to address the issues on the Breakpad end before re-enabling elfhack. Here again we have several possible implementations, but there is one that stands out for being simpler (and not requiring changes on the crash-stats server end).

Despite ELF binaries being loaded in several separate segments of memory, what the dynamic loader actually does is to first reserve the whole memory area that's going to be used for the binary, including areas between segments, and then map the binary segments at the according place. The minidump writer can just record that whole memory area as a single "Module", making the stack walking software happy.

2011-03-09 20:18:36+0900

p.m.o | No Comments »