Archive for December, 2014

A new git workflow for Gecko development

[Update: git-remote-hg being a moving target, these instructions are now outdated. Please now refer to this new article on the git-cinnabar wiki that will be updated with fresh instructions.]

If you've been following this blog, you know I've been working on a new tool to allow to use git with mercurial repositories. See the previous blog posts for some detail if you don't know what I'm talking about.

Today, a new milestone has been reached. After performing tens of thousands of pushes[1] and having had no server corruption as a result, I am now confident enough with the code to remove the limitation preventing to push to remote mercurial servers (by an interesting coincidence, it's exactly the hundredth commit). Those tens of thousands of pushes allowed to find and fix a few corner-cases, but they were only affecting the client side.

So here is my recommended setup for Gecko development with git:

  • Install mercurial (only needed for its libraries). You probably already have it installed. Eventually, this dependency will go away, because the use of mercurial libraries is pretty limited.
  • If you can, rebuild git with this patch applied: https://github.com/git/git/commit/61e704e38a4c3e181403a766c5cf28814e4102e4. It is not yet in a released version of git, but it will make small fetches and pushes much faster Update: The patch is in git version 2.2.2 and newer.
  • Install this git-remote-hg. Just clone it somewhere, and put that directory in your PATH.
  • Create a git repository for Mozilla code:
    $ git init gecko
    $ cd gecko
  • Set fetch.prune for git-remote-hg to be happier:
    $ git config fetch.prune true
  • Set push.default to "upstream", which I think allows a better workflow to use topic branches and push them more easily:
    $ git config push.default upstream
  • Add remotes for the mercurial repositories you pull from:
    $ git remote add central hg::http://hg.mozilla.org/mozilla-central -t tip
    $ git remote add inbound hg::http://hg.mozilla.org/integration/mozilla-inbound -t tip
    $ git remote set-url --push inbound hg::ssh://hg.mozilla.org/integration/mozilla-inbound
    (...)
    

    -t tip is there to reduce the amount of churn from the old branches on these repositories. Please be very cautious if you use this on beta, release or esr repositories, because their tip can switch mercurial branches, and can be very confusing. I'm sure I'm not alone having pushed something on a release branch, when actually intending to push on the default branch, and that was with mercurial... Mercurial branches and their multiple heads are confusing, and git-remote-hg, while it supports them, probably makes the confusion worse at the moment. I'm still investigating how to make things better. For Mozilla integration branches, though, it works fine because their tip doesn't switch heads.

  • Setup a remote for the try server:
    $ git remote add try hg::http://hg.mozilla.org/try
    $ git config remote.try.skipDefaultUpdate true
    $ git remote set-url --push try hg::ssh://hg.mozilla.org/try
    $ git config remote.try.push +HEAD:refs/heads/tip
    
  • Update all the remotes:
    $ git remote update

    This essentially does git fetch on all remotes, except try.

With this setup, you can e.g. create new topic branches based on the remote branches:

$ git checkout -b bugxxxxxxx inbound/tip

When you're ready to test your code on try, create a commit with the try syntax, then just do:

$ git push try

This will push whatever your checked-out branch is, to the try server (thanks to the refspec in remote.try.push).

When you're ready to push to an integration branch, remove any try commit. Assuming you were using the topic branch from above, and did set push.default to "upstream", push with:

$ git checkout topic-branch
$ git pull --rebase
$ git push

But, this is only one possibility. You're essentially free to pick your own preferred workflow. Just keep in mind that we generally prefer linear history on integration branches, so prefer rebase to merge (and, git-remote-hg doesn't support pushing merges yet). I'd recommend setting the pull.ff configuration to "only", by the way.

Please note that rebasing something you pushed to e.g. try will leave dangling mercurial metadata in your git clone. git hgdebug fsck will tell you about them, but won't do anything about them, at least currently. Eventually, there will be a git-gc-like command. [Update: actually, rebasing won't leave dangling mercurial metadata because pushing creates head references in the metadata. There will be a command to clean that up, though.]

Please report any issue you encounter in the comments, or, if they are git-remote-hg related, on github.

1. In case you wonder what kind of heavy testing I did, this is roughly how it went:

  • Add support to push a root changeset (one with no parent).
  • Clone the mercurial repository and the mozilla-central repository with git-remote-hg.
  • Since pushing merges is not supported yet, flatten the history with git filter-branch --parent-filter "awk '{print \$1,\$2}'" HEAD, effectively replacing merges with "simple" commits with the entire merged content as a single patch (by the way, I should have written a fast-import script instead, it took almost a day (24 hours) to apply it to the mozilla-central clone ; filter-branch is a not-so-smart shell script, it doesn't scale well).
  • Reclone those filtered clones, such that no git-remote-hg metadata is left.
  • Tag all the commits with numbered tags, starting from 0 for the root commit, with the following command:

    git rev-list --reverse HEAD | awk '{print "reset refs/tags/HEAD-" NR - 1; print "from", $1}' | git fast-import

  • Create empty local mercurial repositories and push random tags with increasing number, with variations of the following command:

    python -c 'import random; print "\n".join(str(i) for i in sorted(random.sample(xrange(n), m)) + [n])' | while read i; do git push -f hg-remote HEAD-$i || break; done

    [ Note: push -f is only really necessary for the push of the root commit ]

  • Repeat and rinse, with different values of n, m and hg-remote.
  • Also arrange the above test to push to multiple mercurial repositories, such that pushes are performed with both commits that have already been pushed and commits that haven't. (i.e. push A to repo-X, push B to repo-Y (which doesn't have A), push C to repo-X (which doesn't have B), etc.)
  • Check that all attempts create the same mercurial changesets.
  • Check that recloning those mercurial repositories creates the same git commits.

2014-12-23 08:10:04+0900

p.m.o | 4 Comments »

Initial support for git pushes to mercurial, early testers needed

This push to try was not created by mercurial.

I just landed initial support for pushing to mercurial from git. Considering the scary fact that it's possible to screw up a repository with bundles with missing content (and, guess what, I figured out the hard way), I have restricted it to local mercurial repositories until I am more confident.

As such, I would need volunteers to use and test it on local mercurial repositories. On top of being limited to local mercurial repositories, it doesn't support pushing merges that would have been created by git, nor does it support pushing a root commit (one with no parent).

Here's how you can use it:

$ git clone https://github.com/glandium/git-remote-hg
$ export PATH=$PATH:$(pwd)/git-remote-hg
$ git clone hg::/path/to/mercurial-repository
$ # work work, commit, commit
$ git push

[ Note: you can still pull from remote mercurial repositories ]

This will push to your local repository, where it would be useful if you could check the push didn't fuck things up.

$ cd /path/to/mercurial-repository
$ hg verify

That's the long, thorough version. You may just want to simply do this:

$ cd /path/to/mercurial-repository
$ hg log --stat

Hopefully, you won't see messages like:

abort: data/build/mozconfig.common.override.i@56d6fdb13666: no match found!

Update: You can also add the following to /path/to/mercurial-repository/.hg/hgrc, which should prevent corruptions from entering the mercurial repository at all:

[server]
validate = True

Update 2: The above setting is now unnecessary, git-remote-hg will set it itself for its push session.

Then you can push with mercurial.

$ hg push

Please note that this is integrated in git in such a way that it's possible to pass refspecs to git push and do other fancy stuff. Be aware that there are still rough edges on that part, but that your commits will be pushed, even if the resulting state under refs/remotes/ is not very consistent.

I'm planning a replay of several repositories to fully validate pushes don't send broken bundles, but it's going to take some time before I can set things up. I figured I'd rather crowdsource until then.

2014-12-18 11:56:15+0900

cinnabar, p.m.o | No Comments »

One step closer to git push to mercurial

In case you missed it, I'm working on a new tool to use mercurial remotes in git. Since my previous post, I landed several fixes making clone and pull more reliable:

  • Of 247316 unique changesets in the various mozilla-* repositories, now only two (but both in fact come from the same patch, one of the changesets being a backport to aurora of the other) are "corrupted" because their mercurial date have a timezone with a second.
  • Of 23542 unique changesets in the canonical mercurial repository, only three are "corrupted" because their raw mercurial data contains, for an unknown reason, a whitespace after the timezone.

By corrupted, here, I mean that the round-trip hg->git->hg doesn't lead to matching their sha1. They will be fixed eventually, but I haven't decided how yet, because they're really edge cases. They're old enough that they don't really matter for push anyways.

Pushing to mercurial, however, is still not there, but it's getting closer. It involves several operations:

  • Negotiating with the mercurial server what it doesn't have that we do.
  • Creating mercurial changesets, manifests and files for local git commits that were not imported from mercurial.
  • Creating a bundle of the mercurial changesets, manifests and files that we have that the server doesn't.
  • Pushing that bundle to the server.

The first step is mostly covered by the pull code, that does a similar negotiation. I now have the third step covered (although I cheated around the "corruptions" mentioned above):

$ git clone hg::http://selenic.com/hg
Cloning into 'hg'...
(...)
Checking connectivity... done.
$ cd hg
$ git hgbundle > ../hg.hg
$ mkdir ../hg2
$ cd ../hg2
$ hg init
$ hg unbundle ../hg.hg
adding changesets
adding manifests
adding file changes
added 23542 changesets with 44305 changes to 2272 files
(run 'hg update' to get a working copy)
$ hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
2272 files, 23542 changesets, 44305 total revisions

Note: that hgbundle command won't actually exist. It's just an intermediate step allowing me to work incrementally.

In case you wonder what happens when the bundle contains bad data, mercurial fortunately rejects it:

$ cd ../hg
$ git hgbundle-corrupt > ../hg.hg
$ mkdir ../hg3
$ cd ../hg3
$ hg unbundle ../hg.hg
adding changesets
transaction abort!
rollback completed
abort: integrity check failed on 00changelog.i:3180!

2014-12-16 13:54:15+0900

cinnabar, p.m.o | No Comments »

Using git to interact with mercurial repositories

I was planning to publish this later, but after talking about this project to a few people yesterday and seeing the amount of excitement in response, I took some time this morning to tie a few loose ends and publish this now. Mozillians, here comes the git revolution.

Let me start with a bit of history. I am an early git user. I've been using git almost since its first release. I like it. A lot. I've contributed dozens of patches to git.

I started using mercurial when I got commit access to Mozilla repositories, much later. I don't enjoy using mercurial much.

There are many tools to make git talk to mercurial. Most are called git-remote-hg because they use the git remote helpers infrastructure. All of them rely on having a local mercurial clone. When dealing with repositories like mozilla-central, it means storing more than 1.5GB of data just to talk to mercurial, on top of the git database.

So a few years ago, I started to toy with the idea to make git talk to mercurial directly. I got as far as being able to do a full clone of mozilla-central back then, in a reasonable amount of time. But I left it at that because I needed to figure out how to efficiently store all the metadata required to handle incremental updates/pulling, and didn't have enough incentive to go forward: working with mercurial was not painful enough.

Fast forward to the beginning of this year. The mozilla-central repository is now much bigger than it used to be, and mercurial handles it much less smoothly than it used to when Mozilla switched to using it. That was enough to get me started again, but not enough to dedicate enough time to it.

Fast forward to a few weeks ago. Gregory Szorc poked dev-platform to know what kind of workflows people were using with git to work on Mozilla code. And I was really not satisfied with the answers. First, I was wondering why no-one was mentioning the existing tools. So I picked one, and tried.

Cloning mozilla-central took 12 hours and left me with a ~10GB .git directory. Running git gc --agressive for another 10 hours (my settings may have made gc take more time than it would have with the default configuration) brought it down to about 2.6GB, only 700MB of which is actual git data, the remainder being the associated mercurial repository. And as far as I understand it, the tool doesn't really support our use of mercurial repositories, especially try (but I could be wrong, I didn't really look too much).

That was the straw that broke the camel's back. So after a couple weeks hacking, I now have something that can clone mozilla-central within 30 minutes on my machine (network transfer excluded). The resulting .git directory is around 1.5GB with the default git config, without running git gc. If you tweak the compression level in your git config, cloning takes a bit longer, and the repo takes about 1.1GB, And you can subsequently pull from mozilla-central. As well as pull from other branches without having to clone them from scratch. Push support is not there yet because it's an early prototype, but I should be able to get that to work in the next couple weeks.

At this point, you may be wondering how you can use that thing. Here it comes:

$ git clone https://github.com/glandium/git-remote-hg
$ export PATH=$PATH:$(pwd)/git-remote-hg

Note it requires having the mercurial code available to python, because git-remote-hg uses the mercurial code to talk the mercurial wire protocol. Usually, having mercurial installed is enough.

You can now clone a mercurial repository:

$ git clone hg::http://hg.mozilla.org/mozilla-central

If, like me, you had a local mercurial clone, you can do the following instead:

$ git clone hg::/path/to/mozilla-central-clone
$ git remote set-url origin hg::http://hg.mozilla.org/mozilla-central

You can then use git fetch/pull like with git repositories:

$ git pull

Now, you can add other repositories:

$ git remote add inbound hg::http://hg.mozilla.org/integration/mozilla-inbound
$ git remote update inbound

There are a few caveats, like the fact that it currently creates new remote branches essentially any time you pull something. But it shouldn't disrupt anything.

It should be noted that while the contents are identical to the gecko-dev git repositories (the git tree object sha1s are identical, I checked), the commit SHA1s are different. For two reasons: gecko-dev also contains the CVS history, and hg-git, which is used to fill it adds some mercurial metadata to commit messages that git-remote-hg doesn't add.

It is, however, possible to graft the CVS history from gecko-dev to a clone created with git-remote-hg. Assuming you have a remote for gecko-dev and fetched from it, you can do the following:

$ echo eabda6aae98d14c71d7e7b95a66896868ff9500b 3ec464b55782fb94dbbb9b5784aac141f3e3ac01 >> .git/info/grafts

Last note: please read the README file when you update your git clone of the git-remote-hg repository. As the prototype evolves, there might be things that you need to do to your existing clones, and it will be written there.

2014-12-05 20:45:10+0900

cinnabar, p.m.o | 4 Comments »

Using C++ templates to prevent some classes of buffer overflows

I recently found a small buffer overflow in Firefox's SOCKS support, and came up with a nice-ish way to make it a C++ compilation error when it may happen with some template magic.

A simplfied form of the problem looks like this:

class nsSOCKSSocketInfo {
public:
    nsSOCKSSocketInfo() : mData(new uint8_t[BUFFER_SIZE]) , mDataLength(0) {}
    ~nsSOCKSSocketInfo() { delete[] mData; }

    void WriteUint8(uint8_t aValue) { mData[mDataLength++] = aValue; }

    void WriteV5AuthRequest() {
        mDataLength = 0;
        WriteUint8(0x05);
        WriteUint8(0x01);
        WriteUint8(0x00);
    }
private:
    uint8_t* mData;
    uint32_t mDataLength;
    static const size_t BUFFER_SIZE = 2;
};

Here, the problem is more or less obvious: the third WriteUint8() call in WriteV5AuthRequest() will write beyond the allocated buffer size. (The real buffer size was much larger than that, and it was a different method overflowing, but you get the point)

While growing the buffer size fixes the overflow, that doesn't do much to prevent a similar overflow from happening again if the code changes. That got me thinking that there has to be a way to do some compile-time checking of this. The resulting solution, at its core, looks like this:

template <size_t Size> class Buffer {
public:
    Buffer() : mBuf(nullptr) , mLength(0) {}
    Buffer(uint8_t* aBuf, size_t aLength=0) : mBuf(aBuf), mLength(aLength) {}

    Buffer<Size - 1> WriteUint8(uint8_t aValue) {
        static_assert(Size >= 1, "Cannot write that much");
        *mBuf = aValue;
        Buffer result(mBuf + 1, mLength + 1);
        mBuf = nullptr;
        mLength = 0;
        return result;
    }

    size_t Written() { return mLength; }
private:
    uint8_t* mBuf;
    size_t mLength;
};

Then replacing WriteV5AuthRequest() with the following:

void WriteV5AuthRequest() {
    mDataLength = Buffer<BUFFER_SIZE>(mData)
        .WriteUint8(0x05)
        .WriteUint8(0x01)
        .WriteUint8(0x00)
        .Written();
}

So, how does this work? The Buffer class is templated by size. The first thing we do is to create an instance for the complete size of the buffer:

Buffer<BUFFER_SIZE>(mData)

Then call the WriteUint8 method on that instance, to write the first byte:

.WriteUint8(0x05)

The result of that call is a Buffer<BUFFER_SIZE - 1> (in our case, Buffer<1>) instance pointing to &mData[1] and recording that 1 byte has been written.
Then we call the WriteUint8 method on that result, to write the second byte:

.WriteUint8(0x01)

The result of that call is a Buffer<BUFFER_SIZE - 2> (in our case, Buffer<0>) instance pointing to &mData[2] and recording that 2 bytes have been written so far.
Then we call the WriteUint8 method on that new result, to write the third byte:

.WriteUint8(0x00)

But this time, the Size template parameter being 0, it doesn't match the Size >= 1 static assertion, and the build fails.
If we modify BUFFER_SIZE to 3, then the instance we run that last WriteUint8 call on is a Buffer<1> and we don't hit the static assertion.

Interestingly, this also makes the compiler emit more efficient code than the original version.

Check the full patch for more about the complete solution.

2014-12-05 18:45:56+0900

p.m.o | 2 Comments »

Logging Firefox memory allocations

A couple years ago, when I was actively working on integrating jemalloc 3 in the Firefox build, and started investigating some memory usage regression compared to our old fork, I came up with a replace-malloc library for Firefox that would log all the allocations, and allow to replay them in a more consistent (and faster) way in a separate program, such that testing different configurations of jemalloc with the same workload can be streamlined.

A couple weeks ago, I refreshed that work, and made it work on all the tier-1 Firefox desktop platforms. That work is now in the tree instead of on my hard drive, and will allow us to test the effects of jemalloc changes in a better way.

The bulk of how to use this feature is the following:

  • Start Firefox with the following environment variables:
    • on Linux:

      LD_PRELOAD=/path/to/memory/replace/logalloc/liblogalloc.so

    • on Mac OSX:

      DYLD_INSERT_LIBRARIES=/path/to/memory/replace/logalloc/liblogalloc.dylib

    • on Windows:

      MOZ_REPLACE_MALLOC_LIB=/path/to/memory/replace/logalloc/logalloc.dll

    • on all the above:

      MALLOC_LOG=/path/to/log-file

  • Play your workload in Firefox, then close it.
  • Run the following command to prepare the log file for replay:

    python /source/path/to/memory/replace/logalloc/replay/logalloc_munge.py < /path/to/log-file > /path/to/replay.log

  • Replay the logged allocations with the following command:

    /path/to/memory/replace/logalloc/replay/logalloc-replay < /path/to/replay.log

More information and implementation details can be found in the README accompanying the code for that functionality.

2014-12-03 03:22:57+0900

p.m.o | No Comments »

Mozilla Build System: past, present and future

The Mozilla Build System has, for most of its history, not changed much. But, for a couple years now, we've been, slowly and incrementally, modifying it in quite extensive ways. This post summarizes the progress so far, and my personal view on where we're headed.

Recursive make

The Mozilla Build System has, all along, been implemented as a set of recursively traversed Makefiles. The way it has been working for a very long time looks like the following:

  • For each tier (group of source directories) defined at the top-level:
    • For each subdirectory in current tier:
      • Build the export target recursively for each subdirectory defined in Makefile.
      • Build the libs target recursively for each subdirectory defined in Makefile.

The typical limitation due to the above is that some compiled tests from a given directory would require a library that's not linked until after the given directory is recursed, so another target was later added on top of that (tools).

There was not much room for parallelism, except in individual directories, where multiple sources could be built in parallel, but never would sources from multiple directories be built at the same time. So, for a bunch of directories where it was possible, special rules were added to allow that to happen, which led to interesting recursions:

  • For each of export, libs, and tools:
    • Build the target in the subdirectories that can be built in parallel.
    • Build the target in the current directory.
    • Build the target in the remaining subdirectories.

This ensured some extra fun with dependencies between (sub)directories.

Apart from the way things were recursed, all sorts of custom build rules had piled up, some of which relied on things in other directories having happened beforehand, and the build system implementation itself relied on some quite awful things (remember allmakefiles.sh?)

Gradual Overhaul

Around two years ago, we started a gradual overhaul of the build system.

One of the goals was to move away from Makefiles. For various reasons, we decided to go with our own kind-of-declarative (but really, sandboxed python) format (moz.build) instead of using e.g. gyp. The more progress we make on the build system, and the more I think this was the right choice.

Anyways, while we've come a long way and converted a lot of Makefiles to moz.build, we're not quite there yet:

One interesting thing to note in the above graph is that we've also been reducing the overall number of moz.build files we use, by consolidating some declarations. For example, some moz.build files now declare source or test files from their subdirectories directly, instead of having one file per directory declare sources and test files local to their own directory.

Pseudo derecursifying recursive Make

Neologism aside, one of the ideas to help with the process of converting the build system to something that can be parallelized more massively was to reduce the depth of recursion we do with Make. So that instead of a sequence like this:

  • Entering directory A
    • Entering directory A/B
    • Leaving directory A/B
    • Entering directory A/C
      • Entering directory A/C/D
      • Leaving directory A/C/D
      • Entering directory A/C/E
      • Leaving directory A/C/E
      • Entering directory A/C/F
      • Leaving directory A/C/F
    • Leaving directory A/C
    • Entering directory A/G
    • Leaving directory A/G
  • Leaving directory A
  • Entering directory H
  • Leaving directory H

We would have a sequence like this:

  • Entering directory A
  • Leaving directory A
  • Entering directory A/B
  • Leaving directory A/B
  • Entering directory A/C
  • Leaving directory A/C
  • Entering directory A/C/D
  • Leaving directory A/C/D
  • Entering directory A/C/E
  • Leaving directory A/C/E
  • Entering directory A/C/F
  • Leaving directory A/C/F
  • Entering directory A/G
  • Leaving directory A/G
  • Entering directory H
  • Leaving directory H

For each directory there would be a directory-specific target per top-level target, such as A/B/export, A/B/libs, etc. Essentially those targets are defined as:

%/$(target):
        $(MAKE) -C $* $(target)

And each top-level target is expressed as a set of dependencies, such as, in the case above:

A/B/libs: A/libs
A/C/libs: A/B/libs
A/C/D/libs: A/C/libs
A/C/E/libs: A/C/D/libs
A/C/F/libs: A/C/E/libs
A/G/libs: A/C/F/libs
H/libs: A/G/libs
libs: H/libs

That dependency list, instead of being declared manually, is generated from the "traditional" recursion declaration we had in moz.build, through the various *_DIRS variables. I'll skip the gory details about how this replicated the weird subdirectory orders I mentioned above when you had both PARALLEL_DIRS and DIRS. They are irrelevant today anyways.

You'll also note that this also removed the notion of tier, and the entirety of the tree is dealt with for each target, instead of iterating all targets for each group of directories.

[But note that, confusingly, but for practical reasons related to the amount of code changes required, and to how things were incrementally set in place, those targets are now called tiers]

From there, parallelizing parts of the build only involves reorganizing those Make dependencies such that Make can go in several subdirectories at once, the ultimate goal being:

libs: A/libs A/B/libs A/C/libs A/C/D/libs A/C/E/libs A/C/F/libs A/G/libs A/H/libs

But then, things in various directories may need other things built in different directories first, such that additional dependencies can be needed, for example:

A/B/libs A/C/libs: A/libs
A/H/libs: A/G/libs

The trick is that those dependencies can in many cases be deduced from the contents of moz.build. That's where we are today, for example, for everything related to compilation: We added a compile target that deals with everything C/C++ compilation, and we have dependencies between directories building objects and directories building libraries and executables that depend on those objects. You can have a taste yourself:

$ mach clobber
$ mach configure
$ mach build export
$ mach build -C / toolkit/library/target

[Note, the compile target is actually composed of two sub-targets, target and host.]

The third command runs the export target on the whole tree, because that's a prerequisite that is not expressed as a make dependency yet (it would be too broad of a dependency anyways).

The last command runs the toolkit/library/target target at the top level (as opposed to mach build toolkit/library/target, which runs the target target in the toolkit/library directory). That will build libxul, and everything needed to link it, but nothing else unrelated.

All the other top-level targets have also received this treatment to some extent:

  • The export target was made entirely parallel, although it keeps some cross-directory dependencies derived from the historical traversal data.
  • The libs target is still entirely sequential, because of all the horrible rules in it that may or may not depend on the historical traversal order.
  • A parallelized misc target was recently created to receive all the things that are currently done as part of the libs target that are actually safe to be parallelized (another reason for creating this new target is that libs is now a misnomer, since everything related to libraries is now part of compile). Please feel free to participate in the effort to move libs rules there.
  • The tools target is still sequential, but see below.

Although, some interdependencies that can't be derived yet are currently hardcoded in config/recurse.mk.

And since the mere fact of entering a directory, figuring out if there's anything to do at all, and leaving a directory takes some time on builds when only a couple things changed, we also skip some directories entirely by making them not have a directory/target target at all:

  • export and libs only traverse directories where there is a Makefile.in file, or where the moz.build file sets variables that do require something to be done in those targets.
  • compile only traverses directories where there is something to compile or link.
  • misc only traverses directories with an explicit HAS_MISC_RULE = True when they have a Makefile.in, or with a moz.build setting variables affecting the misc target.
  • tools only traverses directories that contain a Makefile.in containing tools:: (there's actually a regexp, but in essence, that's it).

Those skipping rules allow to only traverse, taking numbers from a local Linux build, as of a couple weeks ago:

  • 186 directories during libs,
  • 472 directories during compile,
  • 161 directories during misc,
  • 382 directories during libs,
  • 2 directories during tools.

instead of 850 for each.

In the near future, we may want to change export and libs to opt-ins, like misc, instead of opt-outs.

Alternative build backends

With more and more declarations in moz.build, files, we've been able to build up some alternative build backends for Eclipse and Microsoft Visual Studio. While they still are considered experimental, they seem to work well enough for people to take advantage of them in some useful ways. They however still rely on the "traditional" Make backend to build various things (to the best of my knowledge).

Ultimately, we would like to support entirely different build systems such as ninja or tup, but at the moment, many things still heavily rely on the "traditional" Make backend. We're getting close to having everything related to compilation available from the moz.build declarations (ignoring third-party code, but see further below), but there is still a long way to go for other things.

In the near future, we may want to implement hybrid build backends where compilation would be driven by ninja or tup, and the rest of the build would be handled by Make. However, my feeling is that the Make backend is fast enough for compilation and ninja doesn't bring enough other than performance that it's not worth investing in a ninja backend. Tup is different because it does solve some of the problems with incremental builds.

While on the topic of being close to having everything related to compilation available from moz.build declarations, closing in on this topic would allow, more than such hybrid build systems, to better integrate tools such as code static analyzers.

Unified C/C++ sources

Compiling C code, and even more compiling C++ code, involves reading the same headers an important number of times. In C++, that usually also means instantiating the same templates and compiling the same inline methods numerous times.

So we've worked around this by creating "unified sources" that just #include the actual source files, grouping them 16 by 16 (except when specified otherwise).

This reduced build times drastically, and, interestingly, reduced the size of DWARF debugging symbols as well. This does have a couple downsides, though. It allows #include impurity in the code in such a way that e.g. changes in the file groups can lead to subtle build failures. It also makes incremental builds significantly slower in parts of the tree where compiling one file is already somehow slow, so doing 16 at a time can be a drag for people working on that code. This is why we're considering decreasing the number for the javascript engine in the near future. We should probably investigate where else building one unified source is slow enough to be a concern. I don't think we can rely on people actively complaining about it, they've been too used to slow build times to care to file bugs about it.

Relatedly, our story with #include is suboptimal to say the least, and several have been untangled, but there's still a long tail. It's a hard problem to solve, even with tools like IWYU.

Fake libraries

For a very long time, the build system was building intermediate static libraries, and then was linking them together to form shared libraries. This is how the main components were built in the old days before libxul (remember libgklayout?), and was how libxul built when it was created. For various reasons, ranging from disk space waste to linker inefficiencies, this was replaced by building fake libraries, that only reference the objects that would normally be contained in the static library that used to be built. This later was changed to use a more complex system allowing more flexibility.

Fast-forward to today, and with all the knowledge from moz.build, one of the usecases of that more complex system was made moot, and in the future, those fake libraries could be generated as build backend files instead of being created during the build.

Specialized incremental builds

When iterating C/C++ code patches, one usually needs to (re)compile often. With the build system having the overhead it has, and rebuilding with no change taking many seconds (it has been around a minute for a long time on my machine and is now around half of that, although, sadly, I had got it down to 20 seconds but that regressed recently, damn libs rules), we also added a special rule that only handles header changes and rebuilding objects, libraries and executables.

That special rule is invoked as:

$ mach build binaries

And takes about 3.5s on my machine when there are no changes. It used to be faster thanks to clever tricks, but that was regressed on purpose. That's a trade-off, but linking libxul, which most code changes require, takes much longer than that anyways. If deemed necessary, the same clever tricks could be restored.

While we work to improve the overall build experience, in the near future we should have one or more special rules for non-compilation use-cases. For Firefox frontend developers, the following command may do part of the job:

$ mach build -C / chrome

but we should have some better and more complete commands for them and e.g. Firefox Android developers.

We also currently have a build option allowing to entirely skip everything that is compilation related (--disable-compile-environment), but it is currently broken and is only really useful in few use cases. In the near future, we need build modes that allow to use e.g. nightly builds as if they were the result of compiling C++ source. This would allow some classes of developers to skip compilations altogether, which are an unnecessary overhead for them at the moment, since they need to compile at least once (and with all the auto-clobbers we have, it's much more than that).

Localization

Related to the above, the experience of building locale packs and repacks for Firefox is dreadful. Even worse than that, it also relies on a big pile of awful Make rules. It probably is, along with the code related to the creation of Firefox tarballs and installers, the most horrifying part of the build system. Its entanglement with release automation also makes improving the situation unnecessarily difficult.

While there are some sorts of tests running on every push, there are many occasions where those tests fail to catch regressions that lead to broken localized builds for nightlies, or worse on beta or release (which, you'll have to admit, is a sadly late a moment to find such regressions).

Something really needs to be done about localization, and hopefully the discussions we'll have this week in Portland will lead to improvements in the short to medium term.

Install manifests

The build system copies many files during the build. From the source directory to the "object" directory. Sometimes in $(DIST)/somedir, sometimes elsewhere. Sometimes in both or more. On non-Windows systems, copies are replaced by symbolic links. Sometimes not. There are also files that are preprocessed during the build.

All those used to be handled by Make rules invoking $(NSINSTALL) on every build. Even when the files hadn't changed. Most of these were replaced by some Makefile magic, but many are now covered with so-called "install manifests".

Others, defined in jar.mn files, used to be added to jar files during the build. While those jars are not created anymore because of omni.ja, the corresponding content is still copied/symlinked and defined in jar.mn.

In the near future, all those should be switched to install manifests somehow, and that is greatly tied to solving part of the localization problem: currently, localization relies on Make overrides that moz.build can't know about, preventing install manifests being created and used for the corresponding content.

Faster configure

One of the very first things the build system does when a build starts from scratch is to run configure. That's a part of the build system that is based on the antiquated autoconf 2.13, with 15+ years of accumulated linear m4 and shell gunk. That's what detects what kind of compiler you use, how broken it is, how broken its headers are, what options you requested, what application you want to build, etc.

Topping that, it also invokes configure from third-party software that happen to live in the tree, like ICU or jemalloc 3. Those are also based on autoconf, but in more recent versions than 2.13. They are also third-party, so we're essentially only importing them, as opposed to actively making them bigger for those that are ours.

While it doesn't necessarily look that bad when running on e.g. Linux, the time it takes to run all this pile of shell scripts is painfully horrible on Windows (like, taking more than 5 minutes on automation). While there's still a lot to do, various improvements were recently made:

  • Some classes of changes (such as modifying configure.in) make the build system re-run configure. It used to trigger every configure to run again, but now only re-runs a relevant subset.
  • They used to all run sequentially, but apart from the top-level one, which still needs to run before all the others, they now all run in parallel. This cut configure times almost in half on Windows clobber builds on automation.

In the future, we want to get rid of autoconf 2.13 and use smart lazy python code to only run the tests that are relevant to the configure options. How this would all exactly work has, as of writing, not been determined. It's been on my list of things to investigate for a while, but hasn't reached the top. In the near future, though, I would like to move all our autoconf code related to the build toolchain (compiler and linker) to some form of python code.

Zaphod beeblebuild

There are essentially two main projects in the mozilla-central repository: Firefox/Gecko and the Javascript engine. They use the same build system in many ways. But for a very long time, they actually relied on different copies of the same build system files, like config/rules.mk or build/autoconf/*.m4. And we had a check-sync-dirs script verifying that both projects were indeed using the same file contents. Countless times, we've had landings forgetting to synchronize the files and leading to a check-sync-dirs error during the build. I plead guilty to have landed such things multiple times, and so did many other people.

Those days are now long gone, but we currently rely on dirty tricks that still keep the Firefox/Gecko and Javascript engine build systems half separate. So we kind of replaced a conjoined-twins system with a biheaded system. In the future, and this is tied to the section above, both build systems would be completely merged.

Build system interface

Another goal of the build system changes was to make the build and test experience better. Especially, running tests was not exactly the most pleasant experience.

A single entry point to the build system was created in the form of the mach tool. It simplifies and self-documents many of the workflows that required arcane knowledge.

In the future, we will deprecate the historical build system entry points, or replace their implementation to call mach. This includes client.mk and testing/testsuite-targets.mk.

moz.build

Yet another goal of the build system changes was to improve the experience developers have when adding code to the tree. Again, while there is still a lot to be done on the subject, there have been a lot of changes in the past year that I hope have made developer's lives easier.

As an example, adding new code to libxul previous required:

  • Creating a Makefile.in file
    • Defining a LIBRARY_NAME.
    • Defining which sources to build with CPPSRCS, CSRCS, CMMSRCS, SSRCS or ASFILES, using the right variable name for the each source type (C++ or C or Obj-C, or assembly. By the way, did you know there was a difference between SSRCS and ASFILES?).
  • Adding something like SHARED_LIBRARY_LIBS += $(call EXPAND_LIBNAME_PATH,libname,$(DEPTH)/path) to toolkit/library/Makefile.in.

Now, adding new code to libxul requires:

  • Creating a moz.build file
    • Defining which sources to build with SOURCES, whether they are C, C++ or other.
    • Defining FINAL_LIBRARY to 'xul'.

This is only a simple example, though. There are more things that should have gotten easier, especially since support for templates landed. Templates allowed to hide some details such as dependencies on the right combination of libxul, libnss, libmozalloc and others when building programs or XPCOM components. Combined with syntactic sugar and recent changes to how moz.build data is handled by build backends, we could, in the future, allow to define multiple targets in a single directory. Currently, if you want to build e.g. a library and a program or multiple libraries in the same directory, well, essentially, you can't.

Relatedly, moz.build currently suffers from how it was grown from simply moving definitions from Makefile.in., and how those definitions in Makefile.in were partly tied to how Make works, and how config.mk and rules.mk work. Consolidating CPPSRCS, CSRCS and other variables into a single SOURCES variable is something that should be done more broadly, and we should bring more consistency to how things are defined (for example NO_PGO vs. no_pgo depending on the context, etc.). Incidentally, I think those changes can be made in a way that simplifies the build backend python code.

Multipass

Some build types, while unusual for developers to do locally on their machine, happen regularly on automation, and are done in awful or inefficient ways.

First, Profile Guided Optimized (PGO) builds. The core idea for those builds is to build once with instrumentation, run the resulting instrumented binary against a profile, and rebuild with the data gathered from that. In our build system, this is what actually happens on Linux:

  • Build everything with instrumentation.
  • Run instrumented binary against profile.
  • Remove half the object directory, including many non-compiled code things that are generated during a normal build.
  • Rebuild with optimizations guided by the collected data.

Yes, the last step repeats things from the second that are not necessary to be repeated.

Second, Mac universal builds, which happen in the following manner:

  • Build everything for i386.
  • Build everything for x86-64.
  • Merge the result of both builds.

Yes, "everything" in both "Build everything" includes the same non-compiled files. Then the third step checks that those non-compiled files actually match (and for buildconfig.html, has special treatment) and merges the i386 and x86-64 binaries in Mach-o fat binaries. Not only is this inefficient, but the code behind this is terrible, although it got better with the new packager code. And reproducing universal builds locally is not an easy task.

In the future, the build system would be able to compile binaries for different targets in a way that doesn't require jumping through hoops like the above. It could even allow to build e.g. a javascript shell for the build machine during a cross-compilation for Android without involving wrapper scripts to handle the situation.

Third party code

Building Firefox involves building several third party libraries. In some cases, they use gyp, and we convert their gyp files to moz.build at import time (angle, for instance). In other cases, they use gyp, and we just use those gyp files through moz.build rules, such that the gyp processing is done during configure instead of at import time (webrtc). In yet other cases, they use autoconf and automake, but we use a moz.build file to build them, while still running their configure script (freetype and jemalloc). In all those cases, the sources are handled as if they had been Mozilla code all along.

But in the case of NSPR, NSS and ICU, we don't necessarily build them in ways their respective build systems (were meant to) allow and rely on hacks around their build system to do our bidding. This is especially true for NSS (don't look at config/external/nss/Makefile.in if you care about your sanity). On top of using atrocious hacks, that makes the build dependable on Make for compilation, in inefficient ways, at that.

In the future, we wouldn't rely on NSPR, NSS and ICU build systems, and would build them as if they were Mozilla code, like the others. We need to find ways to allow that while limiting the cost of updates to new versions. This is especially true for ICU which is entirely third party. For NSPR and NSS, we have some kind of foothold. Although it is highly unlikely we can make them switch to moz.build (and in the current state of moz.build, being very tied to Gecko, not possible without possibly significant changes), we can probably come up with schemes that would allow to e.g. easily generate moz.build files from their Makefiles and/or manifests. I expect this to be somewhat manageable for NSS and NSPR. ICU is an entirely different story, sadly.

And more

There are many other aspects of the build system that I'm not mentioning here, but you'll excuse me as this post is already long enough (and took much longer to write than it really should).

2014-12-03 02:01:43+0900

p.m.o | 4 Comments »