Firefox is now built with clang LTO on all* platforms
You might have read that Mozilla recently switched Windows builds to clang-cl. More recently, those Windows builds have seen both PGO (Profile-Guided Optimization) and LTO (Link-Time Optimization) enabled.
As of next nightly (as of writing, obviously), all tier-1 platforms are now built with clang with LTO enabled. Yes, this means Linux, Mac and Android arm, aarch64 and x86. Linux builds also have PGO enabled.
Mac and Android builds were already using clang, so the only difference is LTO being enabled, which brought some performance improvements.
The most impressive difference, though, was on Linux, where we're getting more than 5% performance improvements on most Talos tests (up to 18% (!) on some tests) compared to GCC 6.4 with PGO. I must say I wasn't expecting switching from GCC to clang would make such a difference. And that is with clang 6. A quick test with upcoming clang 7 suggests we'd additionally get between 2 and 5% performance improvement from an upgrade, but our static analysis plugin doesn't like it.
This doesn't mean GCC is being unsupported. As a matter of fact, we still have automated jobs using GCC for some static analysis, and we also have jobs ensuring everything still builds with a baseline of GCC 6.x.
You might wonder if we tried LTO with GCC, or tried upgrading to GCC 8.x. As a matter of fact, I did. Enabling LTO turned up linker errors, and upgrading to GCC 7.x turned up breaking binary compatibility with older systems, and if I remember correctly had some problems with our test suite. GCC 8.1 was barely out when I was looking into this, and we all know to stay away from any new major GCC version until one or two minor updates. Considering the expected future advantages from using clang (cross-language inlining with Rust, consistency between platforms), it seemed a better deal to switch to clang than to try to address those issues.
Update: As there's been some interest on reddit and HN, and I failed to mention it originally, it's worth noting that comparing GCC+PGO vs. clang+LTO or GCC+PGO vs. clang+PGO was a win for clang overall in both cases, although GCC was winning on a few benchmarks. If I remember correctly, clang without PGO/LTO was also winning against GCC without PGO.
Anyways, what led me on this quest was a casual conversation at our last All Hands, where we were discussing possibly turning on LTO on Mac, and how that should roughly just be about turning a switch.
Famous last words.
At least, that's a somehow reasonable assumption. But when you have a codebase the size of Firefox, you're up for "interesting" discoveries.
This involved compiler bugs, linker bugs (with a special mention for a bug in ld64 that Apple has apparently fixed in Xcode 9 but hasn't released the source of), build system problems, elfhack issues, crash report problems, clang plugin problems (would you have guessed that __attribute__((annotate("foo")))
can affect the generated machine code?), sccache issues, inline assembly bugs (getting inputs, outputs and clobbers correctly is hard), binutils bugs, and more.
I won't bother you with all the details, but here we are, 3 months later with it all, finally, mostly done. Counting only the bugs assigned to me, there are 77 bugs on bugzilla (so, leaving out anything in other bug trackers, like LLVM's). Some of them relied on work from other people (most notably, Nathan Froyd's work to switch to clang and then non-NDK clang on Android). This spread over about 150 commits on mozilla-central, 20 of which were backouts. Not everything went according to plan, obviously, although some of those backouts were on purpose as a taskcluster trick.
Hopefully, this sticks, and Firefox 64 will ship built with clang with LTO on all tier-1 platforms as well as PGO on some. Downstreams are encouraged to do the same if they can. The build system will soon choose clang by default on all builds, but won't enable PGO/LTO.
As a bonus, as of a few days ago, Linux builds are also finally using Position Independent Executables, which improves Address Space Layout Randomization for the few things that are in the executables instead of some library (most notably, mozglue and the allocator). This was actually necessary for LTO, because clang doesn't build position independent code in executables that are not PIE (but GCC does), and that causes other problems.
Work is not entirely over, though, as more inline assembly bugs might be remaining only not causing visible problems by sheer luck, so I'm now working on a systematic analysis of inline assembly blocks with our clang plugin.
2018-09-12 17:10:49+0900
Responses are currently closed, but you can trackback from your own site.
2018-09-12 23:05:58+0900
I know most of the users don’t care but for me GCC is the preferred choice. Therefore sad to hear Firefox using Clang now by default.
2018-09-13 04:37:09+0900
why?
2018-09-13 06:05:50+0900
@jason_s,
OpenMP workloads.
Native compiles that run faster.
And many other small cases where clang is just not there yet.
2018-09-13 13:12:00+0900
@Mark This article claims notable performance improvements on all platforms from switching to Clang. So what do you mean about “native compiles that run faster”?
2018-09-13 13:57:42+0900
And what the hell is a PGO and LTO? You tech nerds do nothing to help anyone understand what is going on.
2018-09-13 15:08:05+0900
@osajdf: Updated with the acronym meanings.
2018-09-13 16:05:35+0900
I am building Fireofx with LTO+PGO regularly using GCC 7 and 8 and it works. There are also talos tests of it made half a year ago at https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=7e5bd52e36fcc1703ced01fe87e831a716677295&framework=1&selectedTimeRange=172800
How those compare to clang?
2018-09-13 16:25:15+0900
Hmm, my reading is that the link I posted above actually compares the LTO+PGO build to last two-days of official builds. It seems to show no regressions. So does it mean that GCC is still consistently faster if LTO+PGO is enabled?
2018-09-13 18:42:38+0900
@Jan: Can’t compare results from 6 months ago to results from now, see https://elvis314.wordpress.com/2018/09/12/looking-at-firefox-performance-57-vs-63/. Also, not a lot of samples on mozilla-central are with clang PGO+LTO yet (it’s not even been 2 days).
2018-09-13 19:59:31+0900
OK, Would it be possible to re-run the GCC LTO tests and/or have link to the page comparing clang builds with previous GCC one? I saved comparison with trunk at the time GCC LTO builds was fresh, so I could check if all the performance improvements seems to be due to LTO or there are other reasons.
I would say most of speedups are for LTO. Also note that GCC builds with LTO are smaller than clangs. I collected some data for GNU Cauldron.
https://gcc.gnu.org/wiki/cauldron2018?action=AttachFile&do=view&target=Hubicka_+LTO_IPA+bof.pdf
2018-09-17 06:29:28+0900
A question. If it only comes down to doing a benchmark. Why wouldn’t i just use Windows and Edge or Chrome? I choose to use GCC and Firefox and Linux … Benchmarks have little to do with that.
That is why i said i am sad to see this being considered as a progress. In my opinion this insn’t progress. But OK we are all different and have different opinion on such things.
2018-12-16 04:33:38+0900
I have finally found correct way to build Firefox with LTO+PGO and did comparsion of GCC 8 builds with official and my own Clang 7 build. Seems GCC wins in benchmarks I tried plus binary is significantly smaller.
http://hubicka.blogspot.com/2018/12/firefox-64-built-with-gcc-and-clang.html
What seem to play issue is Firefoxes watchdog killing the training run before it streams PGO data to disk.