Efficiency of incremental builds on inbound
Contrary to try, most other branches, like inbound, don’t start builds from an empty tree. They start from the result of the previous build for the same branch on the same slave. But sometimes that doesn’t work well, so we need to clobber (which means we remove the old build tree and start from scratch again). When that happens, we usually trigger a clobber on all subsequent builds for the branch. Or sometimes we just declare a slave too old and do a periodic clobber. Or sometimes a slave just doesn’t have a previous build tree.
As I mentioned in the previous post about ccache efficiency, the fact that so many builds run on different slaves may hinder those incremental builds. Let’s get numbers.
Taking the same sample of builds as before (spanning across 10 days after the holidays), I gathered some numbers for linux64 opt and macosx64 opt builds, based on the number of files ccache built: when starting from a previous build, ccache is not invoked as much (or so would we like), and that shows up in its stats.
The sample is 408 pushes, including a total of 1454 changesets. Of those pushes:
- 344 had a linux64 opt build, 2 of which were retriggered because of a failure, for a total of 346 builds
- 377 had a macosx64 opt build, 12 of which were retriggered because of a failure, and 6 more were retriggered for some other reason, for a total of 397 builds. This doesn’t line up because 2 pushes had their build retriggered twice.
It’s interesting to see how many builds we actually skip, most probably because of coalescing. I’d argue this is too many, but I haven’t looked exactly how many of those are legitimate “no need to build this because it is android only” or similar patterns.
Armed with an AWS linux builder, I replayed those 408 pushes in an optimal setup: no clobber besides those requested by the build system itself, all pushes built on the same machine, in the order they land. I however didn’t skip builds like the actual slaves do, but this really doesn’t matter anyways since they are not building consecutive pushes anyways. Note configure was rerun for every push because of how my builder handles pulling from mercurial. We don’t do that on build slaves but I’d argue we should, it would avoid plenty of build system level clobbers, and many “fun” build failures.
Of those 408 pushes, 6 requested a clobber at the build system level. But the numbers are very different on build slaves:
- On linux, out of 346 builds:
- 19 had a clobber by the build system
- 8 had a forced clobber (when using the clobberer)
- 1 had a periodic clobber
- 162 (!) had no previous build tree at all for whatever reason (purged previously, or new slave)
- for a total of 190 builds ending up starting with no previous build tree (54.9%)
- On mac, out of 397 builds:
- 23 had a clobber by the build system
- 31 had a forced clobber
- 34 had no previous build tree at all
- for a total of 88 builds with no previous build tree (22.2%)
2014-02-05 03:57:49+0900
Responses are currently closed, but you can trackback from your own site.
2014-02-05 05:36:03+0900
So it sounds like for a given amount of capacity, a smaller pool of faster builders would perform better than a larger pool of slower builders. Is this something we have any control over with AWS?
2014-02-05 07:49:11+0900
Unfortunately, faster builders don’t necessarily yield in much faster builds because there are many things that are not parallelized during builds. So the point is more that several smaller pools of the same builders would perform better than a global pool of those builders.