It has been two weeks since we switched to faster Linux builds. After some "fun" last week, it is time to look back.
The news that Mozilla will be providing faster Linux builds made it to quite a lot of news sites, apparently. Most of the time with titles as misleading as "as fast as Windows builds". I love that kind of journalism where "much closer to" is spelled "as fast as". Anyways, I've also seen a number of ensuing comments basically saying that we sucked and that some people had been successfully building with GCC 4.5 for a while, and now with GCC 4.6, so why can't we do that as well?
Well, for a starter, I doubt they've been building with GCC 4.6 for long, and definitely not Firefox 4.0, because we only recently fixed a bunch of C++ conformance problems that GCC 4.6 doesn't like. Update: now that I think of it, I might have mixed up things. This bunch might only become a problem when compiling in C++0x mode (which is now enabled when supported on mozilla-central).
Then, there are fundamental differences between a build someone does for her own use, and Mozilla builds:
- Mozilla builds need to work on as many machines as possible, on as many Linux distros as possible,
- Mozilla builds are heavily tested (yet, not enough).
Builds that run (almost) everywhere
One part of the challenge of using a modern compiler is that newer versions of GCC like to change subtle things in their C++ standard library making compiled binaries dependent on a newer version of libstdc++
. This behaviour pretty much depends on the C++ standard library features used.
For quite a while, Mozilla builds have been compiled with GCC 4.3, but up to Firefox 3.6, only libstdc++
4.1 was required. Some new code added to Firefox 4.0 however changed that and libstdc++
4.3 is now required. This is the reason why Firefox 4.0 doesn't work on RedHat/CentOS 5 while Firefox 3.6 did, because these systems don't have libstdc++
version 4.3.
Switching to GCC 4.5 (or 4.6, for that matter), in Firefox case, means requiring libstdc++
version 4.5 (or 4.6). While this is not a problem for people building for their own system, or for distros, it is when you want the binary you distribute to work on most systems, because libstdc++
version 4.5 is less widespread.
So on one end, we had an outdated toolchain that couldn't handle Profile Guided Optimization properly, and on the other hand, a more modern toolchain that creates a dependency on a libstdc++
version that is not widespread enough.
At this point, I should point out that an easy way out exists: statically linking libstdc++
. The downside is that is makes the binaries significantly bigger.
Fortunately, we found a hackish way to avoid these dependencies on newer libstdc++
. It has been extended since, and now allows to build Firefox with GCC up to version 4.7, with or without the experimental C++0x mode enabled. The resulting binaries only depend on libstdc++
4.1, meaning they should work on RedHat/CentOS 5.
Passing the test suites
We have a big test suite, which is probably an understatement: we have plenty thousands of unit tests. And we try to avoid these unit tests regressing. I don't think most people building Firefox run them. Actually most of the hundreds of Linux distributions don't.
I know, for I also happen to be the Debian maintainer, that Debian does run test suites on all its architectures, but it skips mochitests because they take too long. As Debian has switched to GCC 4.5 for a while, now, I knew there weren't regressions in these test suites it runs, at least at the optimization level used by default.
And after the switch to faster Linux builds, we haven't seen regressions either. Well, not exactly, but I'll come back on that further below.
GCC 4.5, optimization levels, and Murphy's Law
Sadly, after the switch, we weren't getting symbols in crash reports anymore. The problem was that the program used to dump debugging symbols from our binaries in a usable form for crash reports post-processing didn't output function information. This, in turn, was due to a combination of a lack of functionality in the dump program, and a bug in GCC 4.5 (which seems to be fixed in GCC 4.6) that prevented the necessary information from being present in the DWARF sections when the -freorder-block-and-partition
option is used. I'll come back on this issue in a subsequent blog post. The short term (and most probably long term) solution was to remove the incriminated option.
But while searching for that root cause, we completely disabled PGO, leaving the optimization level to -O3. I had tested gcc 4.5 and -O3 without PGO a few times on the Try server with no other problems than a few unimportant rounding errors we decided to ignore by modifying the relevant tests, so I wasn't expecting anything bad.
That was without counting on Murphy's Law, in the form of a permanent Linux x86 reftest regression. But that error didn't appear in my previous tests, so it had to have been introduced by some change in the tree. After some quite painful bisecting (I couldn't reproduce the problem with local builds, so I had to resort on the Try Server, each build+test run taking between 1 and 2 hours), I narrowed it down to the first part of bug 641426 triggering a change in how GCC optimizes some code, and as a side effect, changes some floating point operations on x86, using memory instead of registers or vice versa, introducing rounding discrepancy in different parts of the code.
But while searching for that root cause, we backed out the switch to aggressive optimization and went back to -Os instead of -O3. The only remaining change from the switch was thus the GCC version. And Murphy's Law kicked in yet again, in the form of a permanent Linux x86/x86-64 a11y mochitest regression. As it turned out, that regression had already been spotted on the tracemonkey tree, during the couple days it had PGO enabled, but wasn't using -O3, and disappeared when the -O3 switch was merged from mozilla-central. But at the time, we didn't track it down. We disabled the tests to open the tree for development, but the issue is still there, just hidden. Though now that we're back to aggressive optimization and PGO, we re-enabled the test and the issue has gone away, which is kind of scary. We definitely need to find the real issue, which might be related to some uninitialized memory.
We also had a couple new intermittent failures that are thought to be related to the GCC 4.5 switch, but all of them go away if we simply re-run the test off the same build.
What does this all mean?
First, it means that in some cases it seems a newer compiler unveils some dormant bugs in our code. And that with the same compiler, different optimization options can lead to different results/breakages.
By extension, this means it is important that we carefully choose our default optimization options, especially when PGO is not used (which is most of the time for non Mozilla builds). I'm even tempted to say it would be important for us to test these non-PGO defaults, but we also can't try all possible compiler versions either.
This also means it is important that Linux distros run our test suites with their builds, especially when they use newer compilers.
A few related thoughts
While handling the transition to this new toolchain, it became clear that the lack of correlation between our code base and our mozconfig
files is painful. The best demonstration is the Try server, which is now using GCC 4.5 for all builds by default. But if you push there a commit that doesn't have the necessary libstdc++
compatibility hack, the builds will fail. There are many other cases of changes in our mozconfig
s requiring changes in e.g. configure.in
, and these are even more reasons to get mozconfig
s in our code base.
The various issues we got in the process also made me reflect on our random oranges. I think we lack one important information when we have a test failure: does it reliably happen with a given build? Chances are that most random oranges don't (like the two I mentioned further above), but those that do may point out subtle problems of compiler optimizations breaking some of our assumptions (though so far, most of the time, they just turn into permanent oranges). The self-serve API does help in that regard, allowing to re-trigger a given test suite on the same build, but I think we should enhance our test harnesses to automatically retry failing tests.
What about GCC 4.6?
I think it's too early to think about GCC 4.6. While it has some improvements over GCC 4.5, it may also bring its own set of surprises. GCC also has a pretty bad history of screwing things up in dot-zero releases, so it would be better to wait for 4.6.1, which I hear is planned for soon. And GCC 4.6 would make things even harder for the Try server and some other branches considering the C++ conformance problems I mentioned.
Also, most of the people mentioning GCC 4.6 also mention Link Time Optimization, which is the main nicety it brings. Unfortunately, linking requires gigabytes of memory, which means several things:
- We need that much memory on our build bots, which I'm not sure they currently have
- It actually exhausts the 32-bits address space, which means we'd need to cross compile the 32-bits builds on 64-bits hosts with a 64-bits toolchain. Which, in turn, means changing build bots, and maybe some fun with our build system.
GCC people are working on decreasing the amount of memory required to link, but it's work in progress and won't be workable until GCC 4.7 (or, who knows, even later). We might have switched to clang before that ;-)