Author Archive

Faster compilations for everyone?

If you're following this blog, you may be aware of the recent work on shared compilation cache. This has been deployed with great results on Mozilla's try server for all platforms (except a few build types, like ASAN or valgrind), and is being tested for Linux/Android builds on b2g-inbound (more on that in subsequent posts).

A side effect of the work to make it run on all platforms is that it now works to build Firefox on Windows, although it requires a specific setup. And since recently, it's also possible to use it with local storage instead of S3. This means we now have a (basic) ccache for Windows that works to build Firefox.

If you wish to try it, here is what you need to do:

  • Clone the repository from github:

    $ git clone https://github.com/glandium/sccache

  • Add the following to your mozconfig:

    ac_add_options "--with-compiler-wrapper=python2.7 path/to/sccache/sccache.py"
    export _DEPEND_CFLAGS='-deps$(MDDEPDIR)/$(@F).pp'
    mk_add_options "export CC_WRAPPER="
    mk_add_options "export CXX_WRAPPER="
    mk_add_options "export COMPILE_PDB_FLAG="
    mk_add_options "export HOST_PDB_FLAG="
    mk_add_options "export MOZ_DEBUG_FLAGS=-Z7"

    Update: Currently, path/to/sccache/sccache.py needs to be a windows-like path (as opposed to msys/cygwin path) with forward slashes.

  • Then set the SCCACHE_DIR environment variable to some local directory.
  • And build happily.

A few things to note:

  • As of writing, sccache doesn't support cleaning up the storage directory, so it will grow indefinitely (until you clean it up yourself).
  • Because the MSVC preprocessor is not exactly fast, and because sccache doesn't have a direct mode like ccache, it doesn't make as much difference as ccache does.
  • It also works on non-windows, but doesn't require all the mozconfig changes, except for the --with-compiler-wrapper line.

Play with it and feel free to fork it on github, and improve it. Pull requests encouraged.

2014-05-08 09:36:24+0900

p.m.o | No Comments »

怒り、失望、ストレス発散

I started learning japanese calligraphy a few months ago, with no prior experience with a brush and ink. It is an interesting endeavour. For various reasons, I had to skip class for a few weeks, but after the past ten days, I needed some stress relief on paper.

怒り
失望

スッキリしました。

2014-04-05 11:21:58+0900

me, p.d.o, p.m.o | 1 Comment »

Linux and Android try builds, now up to twice as fast

(Taras told me to use sensationalist titles to draw more attention, so here we are)

Last week, I brought up the observable build times improvements on Linux try builds with the use of shared cache. I want to revisit those results now there have been more builds, and to look at the first results of the switch for Android try builds, which are now also using the shared cache.

Here is a comparison between the repartition of build times from last time (about ten days of try pushes, starting from the moment shared cache was enabled) vs. build times for the past ten days (which, almost, start at the point the previous data set stopped)):

As expected, the build times are still improving overall thanks to the cache being fuller. The slowest build times are now slightly lower than the slowest build times we were getting without the shared cache. There is a small "regression" in the number of builds taking between 15 and 20 minutes, but that's likely related to changes in the tree creating more cache misses. To summarize the before/after:

Unified Non-unified
shared after 10 days shared initially ccache shared after 10 days shared initially ccache
Average 17:11 17:11 29:19 31:00 30:58 57:08
Median 13:03 13:30 30:10 22:07 22:27 60:57

[Note I'm not providing graphs for non-unified builds, they are boringly similar, with different values, which average and median values should give a grasp on]

Android try builds also got faster with shared cache. The situation looks pretty similar to what we observed after the first ten days of Linux try shared cache builds:

[Note I removed two builds without shared cache from those stats, both of which were taking more than an hour for some reason I haven't investigated]

The fastest shared cache builds are, like for Linux builds, slower than the fastest ccache builds, and the slowest builds too, but as we can see above, those slowest builds get faster as the cache fills up. And as I wrote last week, work is under way to make the fastest builds faster.

This is what the average and median look like for Android try builds:

Unified Non-unified
shared ccache shared ccache
Average 17:14 24:08 27:49 43:00
Median 13:52 24:57 20:35 47:17

2014-03-04 08:33:38+0900

p.m.o | 1 Comment »

Analyzing shared cache on try

As mentioned in previous post, shared cache is now effective on try for linux and linux64, opt and debug builds, provided the push has changeset a62bde1d6efe in its history. The unknown in that equation was how long it takes for landings in mozilla-inbound or mozilla-central to propagate to try pushes.

So I took a period of about 8 days and observed, on a sliding 24-hours window, the percentage of pushes containing that changeset, and to see if the dev-tree-management post had an impact, I also looked at a random mozilla-central changeset, 339f0d450d46. This is what this looks like:

So it takes about 2 days and a half for a mozilla-central changeset to propagate to most try pushes, and it looks like my dev-tree-management post (which was cross-posted on dev-platform) didn't have an impact, although 339f0d450d46 is close enough to the announcement that it could still be benefitting from it. I'll revisit this with future unannounced changes.

The drop that can be seen on February 16 is due to there being less overall pushes over the week-end, and that somehow made pushes without changeset a62bde1d6efe more prominent. Maybe contributors pushing on the week-end are more likely to push old trees.

Now, let's see what effect shared cache had on try build times. I took about the last two weeks of successful try build logs for linux and linux64, opt and debug, and analyzed them to extract the following data:

  • Where they were built (in-house vs AWS),
  • Whether they were built with unified sources or not (this significantly changes build times),
  • Whether they used the shared cache or not,
  • Whether they are PGO builds or not,
  • How long the "compile" step took (which, really, is "make -f client.mk", so this includes more than compilation, like configure and copying many files),

There sadly weren't enough PGO builds to plot anything about them, so I just excluded them. Then, since the shared cache is only enabled on AWS builds, and since AWS and in-house build times are so different, I excluded in-house builds. Further looking at the build times for linux opt, linux64 opt, linux debug and linux64 debug, they all looked similar enough that they didn't need to be split in different buckets.

Update: I should mention that I also excluded my own try pushes because I tended to do multiple rebuilds on them, with all of them getting near 100% cache hit and best build times.

Sorting all that data by build time, the following are graphs showing how many builds took less than a given build time.

For unified sources builds (870 builds with ccache, 1111 builds with shared cache):

For non-unified sources builds (302 builds with ccache, 332 builds with shared cache):

The first thing to note is that this does include the very first try pushes with shared cache, which probably skews the slowest builds. It should also be noted that linux debug builds are (still) currently non-unified by default for some reason.

With that being said, for unified sources builds, there are about 3.25% of the builds that end up slower with shared cache than with ccache, and 5.2% for non-unified builds. Most of that is on the best build times end, where builds with shared cache can spend twice the time we'd spend with ccache. I'm currently working on changes that should make the difference slimmer (more on that in a subsequent post). Anyways, that still leaves more than 90% builds faster with shared cache, and makes for a big improvement in build times on average:

Unified Non-unified
shared ccache shared ccache
Average 17:11 29:19 30:58 57:08
Median 13:30 30:10 22:27 60:57

Interestingly, a few of the fastest non-unified builds with shared cache were significantly faster than the others, and it looks like what they have in common is that they were built on the US-East-1 region, instead of US-West-2 region. I haven't looked into more details as to why those particular builds were much faster.

2014-02-25 03:57:20+0900

p.m.o | 1 Comment »

Testing shared cache on try

After some success with the shared cache experiment (Read about it, and some more), the next step was to get it to work on the Mozilla continuous integration infrastructure, and it turned out to reveal a couple issues.

The first issue is that the DNS server for the AWS build slaves we use is not the AWS DNS, but our in-house DNS. Which has two consequences:

  • whatever geolocation S3 does at the DNS level may end up giving a S3 endpoint IP that is not optimal for the AWS region we're in because it was correlated to the location of our in-house DNS
  • the roundtrip to the in-house DNS server was around 80ms, and because every compilation is an independent process, each one does a DNS request, so each one gets that 80ms hit. Note that while suboptimal, doing a DNS request for each compilation also allows to get different S3 endpoints because of both DNS round robin and geolocation S3 uses, which gives very different IPs every so often.

The consequence of this is that build times were very unstable, ranging from 11 minutes like during my experiments up to 45 minutes for a 99% cache hit build! After importing a DNS resolver in the shared cache script and making it use the AWS DNS, build times became much more stable between 11 and 12 minutes. (we actually do need to use the in-house DNS for normal operations on the build slaves, so it's not possible to switch /etc/resolv.conf)

The second issue is that the US Standard region for S3 can have quite high latency depending on the region you're connecting to it from. Our build slaves are located in Oregon and Northern Virginia, and while the slaves in Northern Virginia could reach S3 US Standard within 3ms, those in Oregon could only reach it within 90ms. Those numbers were unfortunately gotten with the in-house DNS, so geolocation may have had its impact on them, but after switching DNS, the build times on Oregon slaves were still way higher than on Northern Virginia slaves (~11 minutes vs. ~21 minutes). Which led us to use a S3 bucket per region.

With those issues dealt with, we're now ready for more widespread testing, and as such I've turned the shared cache on on Linux opt, Linux debug, Linux64 opt and Linux64 debug builds, for try only, only if the push contains the relevant setup, which landed in changeset a62bde1d6efe.
See my post on dev-tree-management for a few more details, notably if you hit bugs.

Please note this is only the beginning. More platforms will use the cache soon, including some that aren't currently using ccache. And I got some timing numbers during the initial tests on try that hint at the most immediate performance issues with the script that need addressing. So you can expect builds to get faster and faster as the cache populates, and as the script is improved with feedback from past experiments and current deployment (I'll be collecting data from your try pushes). Also relatedly, I'm working on build system improvements that should make the 'libs' step much faster, cutting down the time spent on that step.

2014-02-13 10:18:45+0900

p.m.o | No Comments »

Efficiency of incremental builds on inbound

Contrary to try, most other branches, like inbound, don't start builds from an empty tree. They start from the result of the previous build for the same branch on the same slave. But sometimes that doesn't work well, so we need to clobber (which means we remove the old build tree and start from scratch again). When that happens, we usually trigger a clobber on all subsequent builds for the branch. Or sometimes we just declare a slave too old and do a periodic clobber. Or sometimes a slave just doesn't have a previous build tree.

As I mentioned in the previous post about ccache efficiency, the fact that so many builds run on different slaves may hinder those incremental builds. Let's get numbers.

Taking the same sample of builds as before (spanning across 10 days after the holidays), I gathered some numbers for linux64 opt and macosx64 opt builds, based on the number of files ccache built: when starting from a previous build, ccache is not invoked as much (or so would we like), and that shows up in its stats.

The sample is 408 pushes, including a total of 1454 changesets. Of those pushes:

  • 344 had a linux64 opt build, 2 of which were retriggered because of a failure, for a total of 346 builds
  • 377 had a macosx64 opt build, 12 of which were retriggered because of a failure, and 6 more were retriggered for some other reason, for a total of 397 builds. This doesn't line up because 2 pushes had their build retriggered twice.

It's interesting to see how many builds we actually skip, most probably because of coalescing. I'd argue this is too many, but I haven't looked exactly how many of those are legitimate "no need to build this because it is android only" or similar patterns.

Armed with an AWS linux builder, I replayed those 408 pushes in an optimal setup: no clobber besides those requested by the build system itself, all pushes built on the same machine, in the order they land. I however didn't skip builds like the actual slaves do, but this really doesn't matter anyways since they are not building consecutive pushes anyways. Note configure was rerun for every push because of how my builder handles pulling from mercurial. We don't do that on build slaves but I'd argue we should, it would avoid plenty of build system level clobbers, and many "fun" build failures.

Of those 408 pushes, 6 requested a clobber at the build system level. But the numbers are very different on build slaves:

  • On linux, out of 346 builds:
    • 19 had a clobber by the build system
    • 8 had a forced clobber (when using the clobberer)
    • 1 had a periodic clobber
    • 162 (!) had no previous build tree at all for whatever reason (purged previously, or new slave)
    • for a total of 190 builds ending up starting with no previous build tree (54.9%)
  • On mac, out of 397 builds:
    • 23 had a clobber by the build system
    • 31 had a forced clobber
    • 34 had no previous build tree at all
    • for a total of 88 builds with no previous build tree (22.2%)

(Note the difference in numbers of build system clobbers and forced clobbers is due to them being masked by the lack of previous build tree on linux)

Like for ccache efficiency, the use of a bigger build slave pool for linux builds is hurting and making them start from scratch more often than not, which doesn't help with the build turnaround times.

But even on the remaining non-clobber builds, if the source tree is significantly different, we may end up rebuilding as much as if we had clobbered in the first place. Sometimes it only takes a change to one file to do that (for example, add an AC_DEFINE in configure.in, and it will rebuild almost everything), but sometimes it can be an accumulation of changes. This is where the ccache stats get useful again.

A few preliminary observations:

  • There are always at least around 1.5% files rebuilt on ideal linux builds (which needs investigating), but a lot of the builds rebuilt around 5% because of bug 959519
  • The number of source files can vary across pushes, but I used a more or less appropriate constant value for all builds, so some near 100% values may actually be 100%
  • Mac builds surprisingly sometimes build the same files more than once. I filed bug 967976

The first thing to note on the above graph is that about 42% of mac builds and about 75% of linux builds are either clobbers or near-clobbers as I like to call them (incremental builds that just rebuild everything). Near-clobbers thus count for as many as 20% of overall builds on both platforms, or about 50% (!) of non-clobber builds on linux and about 25% of non-clobber builds on mac.

I can't stress enough how the build slave pool sizes are hurting our turnaround times.

It can be noted that there are a few plateaus around 82% and 69% files built, which are likely due to central headers being changed and triggering that many files to be rebuilt. This is the kind of thing that efforts like using include-what-you-use helps with, and we've made progress on that in the past months.

Overall, with our current setup, we are in a vicious circle. Adding more build types (like recently ASAN, Root analysis, Valgrind, etc.), or landing more stuff requires more slaves. More slaves makes builds slower for reasons given here and in previous posts. Slower builds require more slaves to keep up with landings. Rinse, repeat. We need to break the feedback loop.

(Fun fact: While I haven't been doing more than mercurial updates and building the tree to gather the ideal linux numbers (so no make package, no make check, etc.), it only took about a day. For 10 days worth of inbound pushes. With one machine)

2014-02-05 03:57:49+0900

p.m.o | 2 Comments »

Ccache efficiency on Mozilla builders

In the past two blog posts, I've detailed some results I got experimenting with a shared compilation cache. Today, I will be exploring in some more detail why ccache is not helping us as much as it should.

TL;DR conclusion: we need to be smarter about which build slaves build what.

Preliminary note: the stats below were gathered over a period of about 10 days after the holidays, on several hundred successful builds (failed builds were ignored ; this is skewed, but we don't have ccache stats for those).

Try builds

Try is a special repository. Developers push very different changes on it, based on more or less random points of mozilla-central history. But they'd also come back with different iterations of a patch set, and effectively rebuild mostly the same thing. One could expect cache hit rates to be rather low on those builds, and as we've seen in the past posts, they are.

But while the previous posts were focusing on ccache vs. shared cache, let's see how it goes for different platforms using ccache, namely linux64 and mac, for one type of build each:

Here comes the surprise. Mac builds are getting a decent cache hit rate on try. Which is kind of surprising considering the usage pattern, but it's not what's the most interesting. Let's focus on why mac slaves have better hit rates than linux slaves.

And here's the main difference: there are way less mac slaves than there are linux slaves. The reason is that we do a lot of different build types on the linux slaves: linux 32 bits, 64 bits, android, ASAN, static rooting hazard analysis, valgrind, etc. We have 663 linux slaves and 23 mac slaves for try (arguably, a lot of the linux slaves are not running permanently, but you get the point), and they are all part of the same pool.

So let's look how those try builds I've been getting stats for were spread across slaves:

This is not the best graph in the world, but it shows how many slaves did x builds or more. So 218 linux slaves did one build or more, 109 did two builds or more, etc. And there comes the difference: half of the linux slaves have only done one linux64 opt build, while all the mac slaves involved have made at least 10 mac opt builds!

Overall, this is what it looks like:

  • 218 slaves for 587 builds on linux64 try (average: 2.7 builds per slave)
  • 23 slaves for 563 builds on mac try (average; 24.5 builds per slave)

Let's now compare linux builds cache hit rates for slaves with 5 builds and more, and 10 builds and more:

While the hit rates are better when looking at the slaves with more linux64 opt builds, they don't come close to mac hit rates. But this has to do with the fact that I merely removed results from slaves that only did a few builds. That didn't change how the builds were spread amongst slaves, and how more or less related those builds were in consequence: with fewer slaves to build on, slaves are more likely to build sources that look alike.

Interestingly, we can get a sense of how much builds done by a given slave are related by looking at direct mode cache hits.

The direct mode is a feature introduced in ccache 3 that avoids preprocessor calls by looking directly at sources files and their dependencies. When you have an empty cache, ccache will use the preprocessor as usual, but it will also store information about all the files that were used to preprocess the given source. That information, as well as the hash of the preprocessed source, is stored with a key corresponding, essentially, to a hash of the source file, unpreprocessed. So the next time the same source file is compiled, ccache will look at that dependency information (manifest), and check if all the dependent files are unchanged.

If they are, then it knows the hash of the preprocessed source without running the preprocessor, and can thus get the corresponding object file. If they aren't, then ccache runs the preprocessor, and does a lookup based on the preprocessed source. So the more direct mode cache hits there are compared to overall cache hits, the more slaves tended to build similar trees.

And again, looking at linux slaves with 5 or more builds, and 10 or more builds, shows the general trend that the more related builds a slave does, the more efficient the cache is (News at 11).

The problem is that we don't let them be efficient with the current pooling of slaves. Shared caching would conveniently wallpaper around that scheduling inefficiency. But the latency due to network access for the shared cache makes it necessary, for further build times improvements, to still have a local cache, which means we should still address that inefficiency.

Inbound builds

Inbound is, nowadays, the branch where most things happen. It is the most active landing branch, which makes it the place where most of future Firefox code lands first. Continuous integration of that branch relies on a different pool of build slaves than those used for try, but it uses the same pool of slaves as other project branches such as mozilla-central, b2g-inbound, fx-team, aurora, etc. or disposable branches. There are 573 linux slaves (like for try, not necessarily all running) and 63 mac slaves for all those branches.

The first thing to realize here is that there are between 4 and 5% of those builds with absolutely no cache hit. I haven't researched why that is. Maybe we're starting with an empty cache on some slaves. Or maybe we recently landed something that invalidates the cache completely (build flags changes would tend to do that).

The second thing is that cache hit rate on inbound is lower than it is on try. Direct mode cache hit rates, below, show, however, a tendency for better similarity between builds than on try. Which is pretty much expected, considering inbound only takes incremental changes, compared to try, which takes random patch sets based on more or less randomly old mozilla-central changesets.

But here's the deal: builds are even more spread across slaves than on try.

There are also less builds than on try overall, but there are more slaves involved in proportion (repeating the numbers for try for better comparison):

  • 218 slaves for 587 builds on linux64 try (average: 2.7 builds per slave)
  • 164 slaves for 279 builds on linux64 inbound (average: 1.7 builds per slave)
  • 23 slaves for 563 builds on mac try (average; 24.5 builds per slave)
  • 50 slaves for 249 builds on mac inbound (average: 5 builds per slave)

Contrary to try, where all builds start from scratch (clobber builds), builds for inbound may start from a previous build state from an older changeset. We sometimes force clobber builds on inbound, but the expectation is that most builds should not be clobber builds. The fact that so few builds run on a same slave over a period of 10 days undermines that and likely makes us mostly do near clobber builds all the time. But this will be the subject of next post. Stay tuned.

Note: CCACHE_BASEDIR makes things a bit more complicated, since the same slaves are used for various branches and CCACHE_BASEDIR might help getting better hit rates across branches, but since inbound is the place where most things land first, it shouldn't influence too much the above analysis.

Although, there is a concern that the number of different unrelated branches and different build types occurring on a same slave might be helping cache entries being evicted because the cache has a finite size. There are around 200k files in ccache on slaves, and a clobber build will fill about 8k. It only takes about 25 completely unrelated builds (think different build flags, etc.) to throw an older build's cache away. I haven't analyzed this part of the problem, but it surely influences cache hit rate in the wrong direction.

Anyways, for all these reasons, and again, while shared cache will wallpaper over it, we need to address the build scheduling inefficiencies somehow.

2014-01-31 10:56:39+0900

p.m.o | 1 Comment »

Shared compilation cache experiment, part 2

I spent some more time this week on the shared compilation cache experiment, in order to get it in a shape we can actually put in production.

As I wrote in the comments to previous post, the original prototype worked similarly to ccache with CCACHE_NODIRECT and CCACHE_CPP2. Which means it didn't support ccache's direct mode, and didn't avoid a second preprocessor invocation on cache misses. While I left the former for (much) later improvements, I implemented the latter, thinking it would improve build times. And it did, but only marginally: 36 seconds on a ~31 minutes build with 100% cache misses (and no caching at all, more on that below). I was kind of hoping for more (on the other hand, with unified sources, we now have less preprocessing and more compilation...).

Other than preprocessing, one of the operations every invocation of the cache script for compilation does is to hash various data together (including the preprocessed source) to get a unique id for a given (preprocessed) source, compiler and command line combination. I originally used MD4, like ccache, as hash algorithm. While unlikely, I figured there would be even less risks of collisions with SHA1, so I tried that. And it didn't change the build times much: 6 seconds build time regression on a ~31 minutes build with 100% cache misses.

As emptying the cache on S3 is slow, I tested the above changes with a modified script that still checks the cache for existing results, but doesn't upload anything new to the cache. The interesting thing to note is that this got me faster build times: down to 31:15 from 34:46. So there is some overhead in pushing data to S3, even though the script uploads in the background (that is, the script compiles, then forks another process to do the actual upload, while the main script returns so that make can spawn new builds). Fortunately, cache hit rates are normally high, so it shouldn't be a big concern.

Another thing that was missing is compression, making S3 transfers and storage huge. While the necessary bandwidth went down with compression implemented, build times didn't move. The time spent on compression probably compensates for the saved bandwidth.

To summarize, following are the build times I got, on the same changeset, on the same host, with different setups, from fastest to slowest:

  • 99.9% cache hit, preprocessor run once, md4: 10:57
  • 99.9% cache hit, preprocessor run once, md4, no compression: 10:59
  • build without wrapping with cache script: 27:05
  • no actual caching, preprocessor run once, md4: 30:39 (average of 5 builds, low variance)
  • no actual caching, preprocessor run once, sha1: 30:45 (average of 5 builds, low variance)
  • no actual caching, preprocessor run twice, md4: 31:15 (average of 5 builds, low variance)
  • 100% cache miss with caching, preprocessor run twice, md4: 34:46
  • 100% cache miss with caching, preprocessor run twice, md4, no compression: 34:41

For reference, the following are build times on the same host with the same changeset, with ccache:

  • 99.9% cache hit: 5:59
  • 100% cache miss: 28:35

This means the shared cache script has more overhead than ccache has (also, that SSDs with ccache do wonders with high cache hit rates, but, disclaimer, both ccache builds were run one after the other, there may have not been much I/O on the 99.9% cache hit build). On the other hand, 99.9% hit rate is barely attained with ccache, and 100% cache miss rarely obtained with shared cache. Overall, I'd expect average build times to be better with shared cache, even with its current overhead, than they are with ccache.

Cache stats redux

The previous post had ccache stats which didn't look very good, and it could have been related to both the recent switch to AWS spot instances and the holiday break. So I re-ran builds with the shared cache on the same setup as before, replaying the 10 past days or so of try builds after the holiday break, and compared again with what happened on try.

The resulting stats account for 587 linux64 opt builds on try, 356 of which ran on AWS slaves, vs. 231 on non-AWS slaves (so, much more builds ran on AWS, in proportion, compared to last time).

(Note this time I added a line combining both AWS and non-AWS ccache stats)

The first observation to make is that the line for shared cache looks identical. Which is not surprising, but comforting. The next observation is that ccache hit rates got worse on non-AWS slaves, and got slightly better on AWS slaves above 50% hit rate, but worse below. This still places ccache hit rates very far from what can be achieved with a shared cache.

The comparison between build times and hit rates, on the other hand, looks very similar to last time on both ends.

One interesting phenomenon is the three spikes of spread build times. Considering the previous graphs, one of the reason for the spikes is because there are many builds with about the same hit rate (which in itself is interesting), but the strange thing is how different the build times can be at those rates. The origin of this might be the use of EBS which may not have the same performance on all AWS instances. The builders for shared cache, on the other hand, were using ephemeral SSD storage for the build.

While the graphs look similar, let's see how average build times evolved:

  • on custom builders with shared cache: 14:30, (slightly up from 14:20).
  • on try non-AWS build slaves: 16:49 (up from 15:27).
  • on try AWS build slaves: 32:21 (up from 31:35).

This matches the observation from the first graph: cache hits regressed on try build slaves, but stays the same on custom builders with shared cache. And with the now different usage between AWS and non-AWS, the overall build time average on try went up significantly: from 20:03 to 26:15. This might mean we should build more on non-AWS slaves, but we don't have the capacity (which is why we're using AWS in the first place). But it means AWS slave builds are currently slower than non-AWS, and that hurts. And that we need to address that.

(Note those figures only include build time, not any of the preparation steps (which can be long for different reasons), or any of the post-build steps (make package, make check, etc.))

One of the figures that wasn't present in the previous post, though, to put those averages in perspective, is standard deviation. And this is what it looks like:

  • on custom builders with shared cache: 5:12.
  • on try non-AWS build slaves: 4:41.
  • on try AWS build slaves: 8:26.

Again, the non-AWS build slaves are better here, but shared cache may help us for AWS build slaves. Test is currently undergoing to see how shared cache performs with those AWS slaves. Stay tuned.

2014-01-17 13:24:15+0900

p.m.o | 3 Comments »

Shared compilation cache experiment

One known way to make compilation faster is to use ccache. Mozilla release engineering builds use it. In many cases, though, it's not very helpful on developer local builds. As usual, your mileage may vary.

Anyways, one of the sad realizations on release engineering builds is that the ccache hit rate is awfully low for most Linux builds. Much lower than for Mac builds. According to data I gathered a couple months ago on mozilla-inbound, only about a quarter of the Linux builds have a ccache hit rate greater than 50% while more than half the Mac builds have such a hit rate.

A plausible hypothesis for most of this problem is that the number of build slaves being greater on Linux, a build is less likely to occur on a slave that has a recent build in cache. And while better, the Mac cache hit rates were not really great either. That's due to the fact that consecutive pushes, that share like > 99% code in common, are most usually not built on the same slave.

With this in mind, at Taras's request, I started experimenting, before the holiday break, with sharing the ccache contents. Since a lot of our builds are running on Amazon Web Services (AWS), it made sense to run the experiment with S3.

After setting up some AWS instances as custom builders (more on this in a subsequent post) with specs similar to what we use for build slaves, I took past try pushes and replayed them on my builder instances, with a proof of concept, crude implementation of ccache-like compilation caching on S3. Both the build times and cache hit rate looked very promising. Unfortunately, I didn't get the corresponding try build stats at the time, and it turns out the logs are now gone from the FTP server, so I had to rerun the experiment yesterday, against what was available, which is the try logs from the past two weeks.

So, I ran 629 new linux64 opt builds using between 30 and 60 builders. Which ended up being too much because the corresponding try pushes didn't all trigger linux64 opt builds. Only 311 of them did. I didn't start this run with a fresh compilation cache, but obviously, so do try builders, so it's fair game. Of my 629 builds, 50 failed. Most of those failures were due to problems in the corresponding try pushes. But a few were problems with S3 that I didn't handle in the PoC (sometimes downloading from S3 fails for some reason, and that would break the build instead of falling back to compiling locally), or with something fishy happening with the way I set things up.

Of the 311 builds on try, 23 failed. Of those 288 successful builds, 8 lack ccache stats, because in some cases (like a failure during "make check") the ccache stats are not printed. Interestingly, only 81 of the successful builds ran on AWS, while 207 ran on Mozilla-owned machines. This unfortunately makes build time comparisons harder.

With that being said, here is how cache hit rates compare between non-AWS build slaves using ccache, AWS build slaves using ccache and my AWS builders using shared cache:

The first thing to note here is that this quite doesn't match my observations from a few months ago on mozilla-inbound. But that could very be related to the fact that try and mozilla-inbound pushes have different patterns.

The second thing to note is how few builds have more than 50% hit rate on AWS build slaves. A possible explanation is that AWS instances are started with a prefilled but old ccache (because looking at the complete stats shows the ccache storage is almost full), and that a lot of those AWS slaves are new (we recently switched to using spot instances). It would be worth checking the stats again after a week of try builds.

While better, non-AWS slaves are still far from efficient. But the crude shared cache PoC shows very good hit rates. In fact, it turns out most if not all builds with less than 50% hit rate are PGO or non-unified builds. As most builds are neither, the cache hit rate for the first few of those is low.

This shows another advantage of the shared cache: a new slave doesn't have to do slow builds before doing faster builds. It gets the same cache hit rate as slaves that have been running for longer. Which, on AWS, means we could actually shutdown slaves during low activity periods, without worrying about losing the cache data on ephemeral storage.

With such good hit rates, we can expect good build times. Sadly, the low number of high ccache hit rate builds on AWS slaves makes the comparison hard. Again, coming back with new stats in a week or two should make for better numbers to compare against.

(Note that I removed, from this graph, non-unified and PGO builds, which have very different build times)

At first glance, it would seem builds with the shared cache are slower, but there are a number of factors to take into account:

  • The non-AWS build slaves are generally faster than the AWS slaves, which is why the builds with higher hit rates are generally faster with ccache.
  • The AWS build slaves have pathetic build times.
  • As the previous graph showed, the hit rates are very good with the shared cache, which places most of those builds on the right end of this graph.

This is reflected on average build times: with shared cache, it is 14:20, while it is 15:27 with ccache on non-AWS slaves. And the average build time for AWS slaves with ccache is... 31:35. Overall, the average build time on try, with AWS and non-AWS build slaves, is 20:03. So on average, shared cache is a win over any setup we're currently using.

Now, I need to mention that when I say the shared cache implementation I used is crude, I do mean it. For instance, it doesn't re-emit warnings like ccache does. But more importantly, it's not compressing anything, which makes its bandwidth use very high, likely making things slower than they could be.

I'll follow-up with hopefully better stats in the coming weeks. I may gather stats for inbound, as well. I'll also likely test the same approach with Windows builds some time soon.

2014-01-08 00:36:22+0900

p.m.o | 5 Comments »

Don’t trust python’s os.execv

Python is nice and all, but its low-level functions have real disruptive discrepancies between platforms.

Case at point:

import os
os.execvp("sh", ["sh", "-c", "exit 1"])

As a UNIXy person, I'd expect running the above script to return an error code of 1. And I would be perfectly right... on UNIX systems.

On Windows, it returns 0.

You'd think such a difference in behavior would be documented? It's not.

Thank you python.

2013-11-23 01:24:26+0900

p.d.o, p.m.o | 8 Comments »