Archive for the 'p.m.o' Category

[Linux] Disabling CPU turbo, cores and threads without rebooting

[Disclaimer: this has been sitting as a draft for close to three months ; I forgot to publish it, this is now finally done.]

In my previous blog post, I built Firefox in a multiple different number of configurations where I’d disable the CPU turbo, some of its cores or some of its threads. That is something that was traditionally done via the BIOS, but rebooting between each attempt is not really a great experience.

Fortunately, the Linux kernel provides a large number of knobs that allow this at runtime.

Turbo

This is the most straightforward:

$ echo 0 > /sys/devices/system/cpu/cpufreq/boost

Re-enable with

$ echo 1 > /sys/devices/system/cpu/cpufreq/boost

CPU frequency throttling

Even though I haven’t mentioned it, I might as well add this briefly. There are many knobs to tweak frequency throttling, but assuming your goal is to disable throttling and set the CPU frequency to its fastest non-Turbo frequency, this is how you do it:

$ echo performance > /sys/devices/system/cpu/cpu$n/cpufreq/scaling_governor

where $n is the id of the core you want to do that for, so if you want to do that for all the cores, you need to do that for cpu0, cpu1, etc.

Re-enable with:

$ echo ondemand > /sys/devices/system/cpu/cpu$n/cpufreq/scaling_governor

(assuming this was the value before you changed it ; ondemand is usually the default)

Cores and Threads

This one requires some attention, because you cannot assume anything about the CPU numbers. The first thing you want to do is to check those CPU numbers. You can do so by looking at the physical id and core id fields in /proc/cpuinfo, but the output from lscpu --extended is more convenient, and looks like the following:

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ    MINMHZ
0   0    0      0    0:0:0:0       yes    3700.0000 2200.0000
1   0    0      1    1:1:1:0       yes    3700.0000 2200.0000
2   0    0      2    2:2:2:0       yes    3700.0000 2200.0000
3   0    0      3    3:3:3:0       yes    3700.0000 2200.0000
4   0    0      4    4:4:4:1       yes    3700.0000 2200.0000
5   0    0      5    5:5:5:1       yes    3700.0000 2200.0000
6   0    0      6    6:6:6:1       yes    3700.0000 2200.0000
7   0    0      7    7:7:7:1       yes    3700.0000 2200.0000
(...)
32  0    0      0    0:0:0:0       yes    3700.0000 2200.0000
33  0    0      1    1:1:1:0       yes    3700.0000 2200.0000
34  0    0      2    2:2:2:0       yes    3700.0000 2200.0000
35  0    0      3    3:3:3:0       yes    3700.0000 2200.0000
36  0    0      4    4:4:4:1       yes    3700.0000 2200.0000
37  0    0      5    5:5:5:1       yes    3700.0000 2200.0000
38  0    0      6    6:6:6:1       yes    3700.0000 2200.0000
39  0    0      7    7:7:7:1       yes    3700.0000 2200.0000
(...)

Now, this output is actually the ideal case, where pairs of CPUs (virtual cores) on the same physical core are always n, n+32, but I’ve had them be pseudo-randomly spread in the past, so be careful.

To turn off a core, you want to turn off all the CPUs with the same CORE identifier. To turn off a thread (virtual core), you want to turn off one CPU. On machines with multiple sockets, you can also look at the SOCKET column.

Turning off one CPU is done with:

$ echo 0 > /sys/devices/system/cpu/cpu$n/online

Re-enable with:

$ echo 1 > /sys/devices/system/cpu/cpu$n/online

Extra: CPU sets

CPU sets are a feature of Linux’s cgroups. They allow to restrict groups of processes to a set of cores. The first step is to create a group like so:

$ mkdir /sys/fs/cgroup/cpuset/mygroup

Please note you may already have existing groups, and you may want to create subgroups. You can do so by creating subdirectories.

Then you can configure on which CPUs/cores/threads you want processes in this group to run on:

$ echo 0-7,16-23 > /sys/fs/cgroup/cpuset/mygroup/cpuset.cpus

The value you write in this file is a comma-separated list of CPU/core/thread numbers or ranges. 0-3 is the range for CPU/core/thread 0 to 3 and is thus equivalent to 0,1,2,3. The numbers correspond to /proc/cpuinfo or the output from lscpu as mentioned above.

There are also memory aspects to CPU sets, that I won’t detail here (because I don’t have a machine with multiple memory nodes), but you can start with:

$ cat /sys/fs/cgroup/cpuset/cpuset.mems > /sys/fs/cgroup/cpuset/mygroup/cpuset.mems

Now you’re ready to assign processes to this group:

$ echo $pid >> /sys/fs/cgroup/cpuset/mygroup/tasks

There are a number of tweaks you can do to this setup, I invite you to check out the cpuset(7) manual page.

Disabling a group is a little involved. First you need to move the processes to a different group:

$ while read pid; do echo $pid > /sys/fs/cgroup/cpuset/tasks; done < /sys/fs/cgroup/cpuset/mygroup/tasks

Then deassociate CPU and memory nodes:

$ > /sys/fs/cgroup/cpuset/mygroup/cpuset.cpus
$ > /sys/fs/cgroup/cpuset/mygroup/cpuset.mems

And finally remove the group:

$ rmdir /sys/fs/cgroup/cpuset/mygroup

2020-08-31 07:00:38+0900

p.d.o, p.m.o | No Comments »

The influence of hardware on Firefox build times

I recently upgraded my aging “fast” build machine. Back when I assembled the machine, it could do a full clobber build of Firefox in about 10 minutes. That was slightly more than 10 years ago. This upgrade, and the build times I’m getting on the brand new machine (now 6 months old) and other machines led me to look at how some parameters influence build times.

Note: most of the data that follows was gathered a few weeks ago, building off Mercurial revision 70f8ce3e2d394a8c4d08725b108003844abbbff9 of mozilla-central.

Old vs. new

The old “fast” build machine had a i7-870, with 4 cores, 8 threads running at 2.93GHz, turbo at 3.6GHz, and 16GiB RAM. The new machine has a Threadripper 3970X, with 32 cores, 64 threads running at 3.7GHz, turbo at 4.5GHz (rarely reached to be honest), and 128GiB RAM (unbuffered ECC).

Let’s compare build times between them, at the same level of parallelism:

Build times at -j8

That is 63 minutes for the i7-870 vs. 26 for the Threadripper, with a twist: the Threadripper was explicitly configured to use 4 physical cores (with 2 threads each) so as to be fair to the poor i7-870.

Assuming the i7 maxed out at its base clock, and the Threadripper at turbo speed (which it actually doesn’t, but it’s closer to the truth than the base clock is with a eighth of the cores activated), the speed-up from the difference in frequency alone would make the build 1.5 times faster, but we got close to 2.5 times faster.

But that doesn’t account for other factors we’ll explore further below.

Before going there, let’s look at what unleashing the full power of the Threadripper brings to the table:

Build times at -j8 vs. -j64

Yes, that is 5 minutes for the Threadripper 3970X, when using all its cores and threads.

Spinning disk vs. NVMe

The old machine was using spinning disks in RAID 1. I can’t really test much faster SSDs on that machine because I don’t have any that would fit, and the machine is now dismantled, but I was able to try the spinning disks on the Threadripper.

Build times with HDD vs NVMe vs RAM

Using spinning disks instead of modern solid-state makes the build almost 3 times as slow! (or an addition of almost 10 minutes). And while I kind of went overboard with this new machine by setting up not one but two NVMe PCIe 4.0 SSDs in RAID1, I also slowed them down by using Full Disk Encryption, but it doesn’t matter, because I get the same build times (within noise) if I put everything in memory (because I can).

It would be interesting to get more data with different generations of SSDs, though (SATA ones may still have a visible overhead, for instance).

Going back to the original comparison between the i7-870 and the downsized Threadripper, assuming the overhead from disk alone corresponds to what we see here (which it probably doesn’t, actually, because less parallelism means less concurrent accesses, means less seeks, means more speed), the speed difference now looks closer to “only” 2x.

Desktop vs. Laptop

My most recent laptop is a 3.5 years old Dell XPS13 9360, that came with a i7-7500U (2 cores, 4 threads, because that’s all you could get in the 13″ form factor back then; 2.7GHz, 3.5GHz turbo), and 16GiB RAM.

A more recent 13″ laptop would be the Apple Macbook Pro 13″ Mid 2019, sporting an i7-8569U (4 cores, 8 threads, 2.8GHz, 4.7GHz turbo), and 16GiB RAM. I don’t own one, but Mozilla’s procurement system contains build times for it (although I don’t know what changeset that corresponds to, or what compilers were used ; also, the OS is different).

Build times on laptops vs desktop

The XPS13 being old, it is subject to thermal throttling, making it slower than it should be, but it wouldn’t beat the 10 years old desktop anyway. Macbook Pros tend to get into these thermal issues after a while too.

I’ve relied on laptops for a long time. My previous laptop before this XPS was another XPS, that is now about 6 to 7 years old, and while the newer one had more RAM, it was barely getting better build times compared to the older one when I switched. The evolution of laptop performance has been underwelming for a long time, but things finally changed last year. At long last.

I wish I had numbers with a more recent laptop under the same OS as the XPS for fairer comparison. Or with the more recent larger laptops that sport even more cores, especially the fancy ones with Ryzen processors.

Local vs. Cloud

As seen above, my laptop was not really a great experience. Neither was my faster machine. When I needed power, I actually turned to AWS EC2. That’s not exactly cheap for long term use, but I only used it for a couple hours at a time, in relatively rare occasions.

Build times on EC2 vs. Threadripper

The c5d instances are AFAIK, and as of writing, the most powerful you can get on EC2 CPU-wise. They are based on 3.0GHz Xeon Platinum CPUs, either 8124M or 8275CL (which don’t exist on Intel’s website. Edit: apparently, at least the 8275CL can turbo up to 3.9GHz). The 8124M is Skylake-based and the 8275CL is Cascade Lake, which I guess is the best you could get from Intel at the moment. I’m not sure if it’s a lottery of some sort between 8124M and 8275CL, or if it’s based on instance type, but the 4xlarge, 9xlarge and 18xlarge were 8124M and 12xlarge, 24xlarge and metal were 8275CL. Skylake or Cascade Lake doesn’t seem to make much difference here, but that would need to be validated on a non-virtualized environment with full control over the number of cores and threads being used.

Speaking of virtualization, one might wonder what kind of overhead it has, and as far as building Firefox goes, the difference between c5d.metal (without) and c5d.24xlarge (with), is well within noise.

As seen earlier, storage can have an influence on build times, and EC2 instances can use either EBS storage or NVMe. On the c5d instances, it made virtually no difference. I even compared with everything in RAM (on instances with enough of it), and that didn’t make a difference either. Take this with a grain of salt, though, because EBS can be slower on other types of instances. Also, I didn’t test NVMe or RAM on c5d.4xlarge, and those have slower networking so there could be a difference there.

There are EPYC-based m5a instances, but they are not Zen 2 and have lower frequencies, so they’re not all that good. Who knows when Amazon will deliver their Zen 2 offerings.

Number of cores

The Firefox build system is not perfect, and the number of jobs it’s allowed to run at once can have an influence on the overall build times, even on the same machine. This can somehow be observed on the results on AWS above, but testing different values of -jn on a machine with a large number of cores is another way to investigate how well the build system scales. Here it is, on the Threadripper 3970X:

Build times at different -jn

Let’s pause a moment to appreciate that this machine can build slightly faster, with only 2 cores, than the old machine could with 4 cores and 8 threads.

You may also notice that the build times at -j6 and -j8 are better than the -j8 build time given earlier. More on that further below.

The data, however, is somehow skewed as far as getting a clear picture of how well the Firefox build system scales, because the less cores and threads are being used, the faster those cores can get, thanks to “Turbo”. As mentioned earlier, the base frequency for the processor is 3.7GHz, but it can go up to 4.5GHz. In practice, /proc/cpuinfo doesn’t show much more than 4.45GHz when few cores are used, but on the opposite end, would frequently show 3.8GHz when all of them are used. Which means all the values in that last graph are not entirely comparable to each other.

So let’s see how things go with Turbo disabled (note, though, that I didn’t disable dynamic CPU frequency scaling):

Build times at different -jn without turbo

The ideal scaling (red line) is the build time at -j1 divided by the n in -jn.

So, at the scale of the graph above, things look like there’s some kind of constant overhead (the parts of the build that aren’t parallelized all that well), and a part that is within what you’d expect for parallelization, except above -j32 (we’ll get to that last part further down).

Based on the above, let’s assume a modeled build time of the form O + P / n. We can find some O and P that makes the values for e.g. -j1 and -j32 correct. That works out to be 1.9534 + 146.294 / n (or, in time format: 1:57.2 + 2:26:17.64 / n).

Let’s see how it fares:

Model vs. ideal

The red line is how far from the ideal case (-j1/n) the actual data is, in percent from the ideal case. At -j32, the build time is slightly more than 1.4 times what it ideally would be. At -j24, about 1.3 times, etc.

The blue line is how far from the modeled case the actual data is, in percent from the modeled case. The model is within 3% of reality (without Turbo) until -j32.

And above -j32, things blow up. And the reason is obvious: the CPU only has 32 actual cores. Which brings us to…

Simultaneous Multithreading (aka Hyperthreading)

Ever since the Pentium 4, many (and now most) modern desktop and laptop processors have had some sort of Simultaneous Multithreading (SMT) technology. The underlying principle is that CPUs are waiting for something a lot of the time (usually data in RAM), and that they could actually be executing something else when that happens.

From the software perspective, though, it only appears to be more cores than there physically are. So on a processor with 2 threads per core like the Threadripper 3970X, that means the OS exposes 64 “virtual” cores instead of the 32 physical ones.

When the workload doesn’t require all the “virtual” cores, the OS will schedule code to run on virtual cores that don’t share the same physical core, so as to get the best of one’s machine.

This is why -jn on a machine that has n real cores or more will have better build times than -jn on a machine with less cores, but at least n virtual cores.

This was already exposed in some way above, where the first -j8 build time shown was worse than the subsequent ones. That one was taken while disabling all physical cores but 4, and still keeping 2 threads per core. Let’s compare that with similar number of real cores:

Build times with 4 to 8 threads

As we can see, 4 cores with 8 threads is better than 4 cores, but it doesn’t beat 6 physical cores. Which kind of matches the general wisdom that SMT brings an extra 30% performance.

By the way, this being all on a Zen 2 processor, I’d expect a Ryzen 3 3300X to provide similar build times to that 4 cores 8 threads case.

Let’s look at 16 cores and 32 threads:

Build times with 16 to 32 threads

Similar observation: obviously better than 16 cores alone, but worse than 24 physical cores. And again, this being all on a Zen 2 processor, I’d expect a Ryzen 9 3950X to provide similar build times to the 16 cores and 32 threads case.

Based on the above, I’d estimate a Threadripper 3960X to have build times close to those of 32 real cores on the 3970X (32 cores is 33% more cores than 24).

Operating System

All the build times mentioned above except for the Macbook Pro have been taken under Debian GNU/Linux 10, building Firefox for Linux64 from a specific revision of mozilla-central without any tweaks.

Things get different if you build for a different target (say, Firefox for Windows), or under a different operating system.

Our baseline on this machine is the Firefox for Linux64 build is 5 minutes. Building Firefox for Windows (as a cross-compilation) takes 40 more seconds (5:40), because of building some extra Windows-specific stuff.

A similar Firefox for Windows build, natively, on a fresh Windows 10 install takes … more than 12 minutes! Until you realize you can disable Windows Defender for the source and build tree, at which point it only takes 7:27. That’s still noticeably slower than cross-compiling, but not as catastrophic as when the antivirus is enabled.

Recent versions of Windows come with a Windows Subsystem for Linux (WSL), which allows to run native Linux programs unmodified. There are now actually two versions of WSL. One emulates Linux system calls (WSL1), and the other runs a real Linux kernel in a virtual machine (WSL2). With a Debian GNU/Linux 10 system under WSL1, building Firefox for Linux64 takes 6:25, while it takes 5:41 under WSL2, so WSL2 is definitely faster, but still not close to metal.

Edit: Ironically, it’s very much possible that a cross-compiled Firefox for Windows build in WSL2 would be faster than a native Firefox for Windows build on Windows (but I haven’t tried).

Finally, a Firefox for Windows build, on a fresh Windows 10 install in a KVM virtual machine under the original Debian GNU/Linux system takes 8:17, which is a lot of overhead. Maybe there are some tweaks to do to get better build times, but it’s probably better to go with cross-compilation.

Let’s recap as a graph:

Build times on different OS configurations

Automation

All the build times mentioned so far have been building off mozilla-central without any tweaks. That’s actually not how a real Firefox release is built, which enables more optimizations. Without even going to the full set of optimizations that go into building a real Firefox Nightly (including Link Time Optimizations and Profile Guided Optimizations), simply switching --enable-release on will up the optimization game.

Building on Mozilla automation also enables things that don’t happen during what we call “local developer builds”, such as dumping debug info for use to process crash reports, linking a second libxul library for some unit tests, the full packaging of Firefox, the preparation of archives that will be used to run tests, etc.

As of writing, most Firefox builds on Mozilla automation happen on c5d.4xlarge (or similarly sized) AWS EC2 instances. They benefit from the use of a shared compilation cache, so their build times are very noisy, as they depend on the cache hit rate.

So, I took some Linux64 optimized build that runs on automation (not one with LTO or PGO), and ran its full script, with the shared compilation cache disabled, on corresponding and larger AWS EC2 instances, as well as the Threadripper:

Build times for a linux64/opt build as per automation

The immediate observation is that these builds scale much less gracefully than a “local developer build” does, and there are multiple factors that come into play, but the common thread is that parts of the build that don’t run during those “local developer builds” are slow and not well parallelized. Work is in progress to make things better, though.

Another large contributor is that with the higher level of optimizations, the compilation of Rust code takes longer. And compiling Rust doesn’t parallelize very well at the moment (for a variety of reasons, but the main one is the long sequences of crate dependencies, i.e. when A depends on B, which depends on C, which depends on D, etc.). So what happens the more cores are available, is that compiling all the C/C++ code finishes well before compiling all the Rust code does, and with the long tail of crate dependencies, what’s remaining doesn’t parallelize well. And that also blocks other portions of the build that need the compiled Rust code, and aren’t very well parallelized themselves.

All in all, it’s not all that great to go with bigger instances, because they cost more but won’t compensate by doing as many more builds in the same amount of time… except if doing more builds in parallel, which is being experimented with.

On the Threadripper 3970X, I can do 2 “local developer builds” in parallel in less time than doing them one after the other (1:20 less, so 8:40 instead of 10 minutes). Even with -j64. Heck, I can do 4 builds in parallel at -j16 in 16 minutes and 15 seconds, which is somewhere between a single -j12 and single -j8, which is where one -j16 with 8 threads and 16 threads should be at. That doesn’t seem all that useful, until you think of it this way: I can locally build for multiple platforms at once in well under 20 minutes. All on one machine, from one source tree.

Closing words

We looked at various ways build times can vary depending on different hardware (and even software) configurations. Note this only covers hardware parameters that didn’t require reboots to test out (e.g. this excludes variations like RAM speed ; and yes, this means there are ways to disable turbo, cores and threads without fiddling with the BIOS, I’ll probably write a separate blog post about this).

We saw there is room for improvements in Firefox build times, especially on automation. But even for local builds, on machines like mine, it should be possible to get under 4 minutes eventually. Without any form of caching. Which is kind of mind blowing. Even without these future improvements, these build times changed the way I approach things. I don’t mind clobber builds anymore, although ideally we’d never need them.

If you’re on the market for a new machine, I’d advise getting a powerful desktop machine (based on a 3950X, a 3960X or a 3970X) rather than refreshing your aging laptop. I don’t think any laptop currently available would get below 15 minutes build times. And those that can build in less than 20 minutes probably cost more than a desktop machine that would build in well under half that. Edit: Also, pick a fast NVMe SSD that can sustain tons of IOPS.

Of course, if you use some caching method, build times will be much better even on a slower machine, but even with a cache, it happens quite often that you get a lot of cache misses that cancel out the benefit of the cache. YMMV.

You may also have legitimate concerns that rr doesn’t work yet, but when I need it, I can pull my old Intel-based laptop. rr actually works well enough on Ryzen for single-threaded workloads, and I haven’t needed to debug Firefox itself with rr recently, so I haven’t actually pulled the old laptop a lot (although I have been using rr and Pernosco).

2020-05-28 17:02:38+0900

p.m.o | 6 Comments »

Announcing git-cinnabar 0.5.5

Please partake in the git-cinnabar survey.

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.4?

  • Updated git to 2.26.2 for the helper.
  • Improved experimental support for pushing merges.
  • Fixed a few issues with experimental support for python 3.
  • Don’t complain the helper is outdated if it’s newer.
  • Auto-enable graft when cinnabarclone contains a graft that can be fulfilled.
  • Exclude git-cinnabar notes and git-filter-branch backups from graft candidates.
  • Graft failures are more now silent.
  • Fixed handling of a null manifest.

2020-04-23 16:21:32+0900

cinnabar, p.m.o | No Comments »

Standing up the Cross-Compilation of Firefox for Windows on Linux

I’ve spent the past few weeks, and will spend the next few weeks, setting up cross-compiled builds of Firefox for Windows on Linux workers on Mozilla’s CI. Following is a long wall of text, if that’s too much for you, you may want to check the TL;DR near the end. If you’re a Windows user wondering about the Windows Subsystem for Linux, please at least check the end of the post.

What is it?

Traditionally, compiling software happens mostly on the platform it is going to run on. Obviously, this becomes less true when you’re building software that runs on smartphones, because you’re usually not developing on said smartphone. This is where Cross-Compilation comes in.

Cross-Compilation is compiling for a platform that is not the one you’re compiling on.

Cross-Compilation is less frequent for desktop software, because most developers will be testing the software on the machine they are building it with, which means building software for macOS on a Windows PC is not all that interesting to begin with.

Continuous Integration, on the other hand, in the era of “build pipelines”, doesn’t necessarily care that the software is built in the same environment as the one it runs on, or is being tested on.

But… why?

Five years ago or so, we started building Firefox for macOS on Linux. The main drivers, as far as I can remember, were resources and performance, and they were both tied: the only (legal) way to run macOS in a datacenter is to rack… Macs. And it’s not like Apple had been producing rackable, server-grade, machines. Okay, they have, but that didn’t last. So we were using aging Mac minis. Switching to Linux machines led to faster compilation times, and allowed to recycle the Mac minis to grow the pool running tests.

But, you might say, Windows runs on standard, rackable, server-grade machines. Or on virtually all cloud providers. And that is true. But for the same hardware, it turns out Linux performs better (more on that below), and the cost per hour per machine is also increased by the Windows license.

But then… why only now?

Firefox has a legacy of more than 20 years of development. That shows in its build system. All the things that allow cross-compiling Firefox for Windows on Linux only lined up recently.

The first of them is the compiler. You might interject with “mingw something something”, but the reality is that binary compatibility for accessibility (screen readers, etc.) and plugins (Flash is almost dead, but not quite) required Microsoft Visual C++ until recently. What changed the deal is clang-cl, and Mozilla has stopped using MSVC for the builds of Firefox it ships with Firefox 63, about 20 months ago. , Another is the process of creating the symbol files used to process crash reports, which was using one of the tools from breakpad to dump the debug info from PDB files in the right format. Unfortunately, that was using a Windows DLL to do so. What recently changed is that we now have a platform-independent tool to do this, that doesn’t require that DLL. And to place credit where credit is due, this was thanks to the people from Sentry providing Rust crates for most of the pieces necessary to do so.

Another is the build system itself, which assumed in many places that building for Windows meant you were on Windows, which doesn’t help cross-compiling for Windows. But worse than that, it also assumed that the compiler was similar. This worked fine when cross-compiling for Android or MacOS on Linux because compiling tools for the build itself (most notably a clang plugin) and compiling Firefox use compatible compilers, that take the same kind of arguments. The story is different when one of the compilers is clang, which has command line arguments like GCC, and the other is clang-cl, which has command line arguments like MSVC. This changed recently with work to allow building Android Geckoview on Windows (I’m not entirely sure all the pieces for that are there just yet, but the ones in place surely helped me ; I might have inadvertently broken some things, though).

So how does that work?

The above is unfortunately not the whole story, so when I started looking a few weeks ago, the idea was to figure out how far off we were, and what kind of shortcuts we could take to make it happen.

It turns out we weren’t that far off, and for a few things, we could work around by… just running the necessary Windows programs with Wine with some tweaks to the build system (Ironically, that means the tool to create symbol files didn’t matter). For others… more on that further below.

But let’s start looking how you could try this for yourself, now that blockers have been fixed.

First, what do you need?

  • A copy of Microsoft Visual C++. Unfortunately, we still need some of the tools it contains, like the assembler, as well as the platform development files.
  • A copy of the Windows 10 SDK.
  • A copy of the Windows Debug Interface Access (DIA) SDK.
  • A good old VFAT filesystem, large enough to hold a copy of all the above.
  • A WOW64-supporting version of Wine (wine64).
  • A full install of clang, including clang-cl (it usually comes along).
  • A copy of the Windows version of clang-cl (yes, both a Linux clang-cl and a Windows clang-cl are required at the moment, more on this further below).

Next, you need to setup a .mozconfig that sets the right target:

ac_add_options --target=x86_64-pc-mingw32

(Note: the target will change in the future)

You also need to set a few environment variables:

  • WINDOWSSDKDIR, with the full path to the base of the Windows 10 SDK in your VFAT filesystem.
  • DIA_SDK_PATH, with the full path to the base of the Debug Interface Access SDK in your VFAT filesystem.

You also need to ensure all the following are reachable from your $PATH:

  • wine64
  • ml64.exe (somewhere in the copy of MSVC in your VFAT filesystem, under a Hostx64/x64 directory)
  • clang-cl.exe (you also need to ensure it has the executable bit set)

And I think that’s about it. If not, please leave a comment or ping me on Matrix (@glandium:mozilla.org), and I’ll update the instructions above.

With an up-to-date mozilla-central, you should now be able to use ./mach build, and get a fresh build of Firefox for 64-bits Windows as a result (Well, not right now as of writing, the final pieces only just landed on autoland, they will be on mozilla-central in a few hours).

What’s up with that VFAT filesystem?

You probably noticed I was fairly insistive about some things being in a VFAT filesystem. The reason is filesystem case-(in)sensitivity. As you probably know, filesystems on Windows are case-insensitive. If you create a file Foo, you can access it as foo, FOO, fOO, etc.

On Linux, filesystems are most usually case-sensitive. So when some C++ file contains #include "windows.h" and your filesystem actually contains Windows.h, things don’t align right. Likewise when the linker wants kernel32.lib and you have kernel32.Lib.

Ext4 recently gained some optional case-insensitivity, but it requires a very recent kernel, and doesn’t work on existing filesystems. VFAT, however, as supported by Linux, has always(?) been case-insensitive. It is the simpler choice.

There’s another option, though, in the form of FUSE filesystems that wrap an existing directory to expose it as case-insensitive. That’s what I tried first, actually. CIOPFS does just that, with the caveat that you need to start from an empty directory, or an all-lowercase directory, because files with any uppercase characters in their name in the original directory don’t appear in the mountpoint at all. Unfortunately, the last version, from almost 9 years ago doesn’t withstand parallelism: when several processes access files under the mountpoint, one or several of them get failures they wouldn’t otherwise get if they were working alone. So during my first attempts cross-building Firefox I was actually using -j1. Needless to say, the build took a while, but it also made it more obvious when I hit something broken that needed fixing.

Now, on Mozilla CI, we can’t really mount a VFAT filesystem or use FUSE filesystems that easily. Which brings us to the next option: LD_PRELOAD. LD_PRELOAD is an environment variable that can be set to tell the dynamic loader (ld.so) to load a specified library when loading programs. Which in itself doesn’t do much, but the symbols the library exposes will take precedence over similarly named symbols from other libraries. Such as libc.so symbols. Which allows to divert e.g. open, opendir, etc. See where this is going? The library can divert the functions programs use to access files and change the paths the programs are trying to use on the fly.

Such libraries do exist, but I had issues with the few I tried. The most promising one was libcasefold, but building its dependencies turned out to be more work than it should have been, and the hooking it does via libsyscall_intercept is more hardcore than what I’m talking about above, and I wasn’t sure we wanted to support something that hotpatches libc.so machine code at runtime rather than divert it.

The result is that we now use our own, written in Rust (because who wants to write bullet-proof path munging code in C?). It can be used instead of a VFAT filesystem in the setup described above, but, being a hack, is not guaranteed to work in all setups.

So what’s up with needing clang-cl.exe?

One of the tools Firefox needs to build is the MIDL compiler. To do its work, the MIDL compiler uses a C preprocessor, and the Firefox build system makes it use clang-cl. Something amazing that I discovered while working on this is that Wine actually supports executing Linux programs from Windows programs. So it looked like it was going to be possible to use the Linux clang-cl for that. Unfortunately, that doesn’t quite work the same way executing a Windows program does from the parent process’s perspective, and the MIDL compiler ended up being unable to read the output from the preprocessor.

Technically speaking, we could have made the MIDL compiler use MSVC’s cl.exe as a preprocessor, since it conveniently is in the same directory as ml64.exe, meaning it is already in $PATH. But that would have been a step backwards, since we specifically moved off cl.exe.

Alternatively, it is also theoretically possible to compile with --disable-accessibility to avoid requiring the MIDL compiler at all, but that currently doesn’t work in practice. And while that would help for local builds, we still want to ship Firefox with accessibility on.

What about those compilation times, then?

Past my first attempts at -j1, I was able to get a Windows build on my Linux machine in slightly less than twice the time for a Linux build, which doesn’t sound great. Several things factor in this:

  • the build system isn’t parallelizing many of the calls to the MIDL compiler, and in practice that means the build sits there doing only that and nothing else (there are some known inefficiencies in the phase where this runs).
  • the build system isn’t parallelizing the calls to the Effect compiler (FXC), and this has the same effect on build times as the MIDL compiler above.
  • the above two wouldn’t actually be that much of a problem if … Wine wasn’t slow. When running full fledged applications or games, it really isn’t, but there is a very noticeable overhead when running a lot of short-lived processes. That accumulates to several minutes over a full Firefox compilation.

That third point may or may not be related to the version of Wine available in Debian stable (what I was compiling on), or how it’s compiled, but some lucky accident made things much faster on my machine.

See, we actually already have some Windows cross-compilation of Firefox on Mozilla CI, using mingw. Those were put in place to avoid breaking Tor Browser, because that’s how they build for Windows, and because not breaking the Tor Browser is important to us. And those builds are already using Wine for the Effect compiler (FXC).

But the Wine they use doesn’t support WOW64. So one of the first things necessary to setup 64-bits Windows cross-builds with clang-cl on Mozilla CI was to get a WOW64-supporting Wine. Following the Wine build instructions was more or less straightforward, but I hit a snag: it wasn’t possible to install the freetype development files for both the 32-bits version and the 64-bits version because the docker images where we build Wine are still based on Debian 9 for reasons, and the freetype development package was not multi-arch ready on Debian 9, while it now is on Debian 10.

Upgrading to Debian 10 is most certainly possible, but that has a ton more implications than what I was trying to achieve is supposed to. You might ask “why are you building Wine anyways, you could use the Debian package”, to which I’d answer “it’s a good question, and I actually don’t know. I presume the version in Debian 9 was too old (it is older than the one we build)”.

Anyways, in the moment, while I happened to be reading Wine’s configure script to get things working, I noticed the option --without-x and thought “well, we’re not running Wine for any GUI stuff, how about I try that, that certainly would make things easy”. YOLO, right?

Not only did it work, but testing the resulting Wine on my machine, compilation times were now down to only be 1 minute slower than a Linux build, rather than 4.5 minutes! That was surely good enough to go ahead and try to get something running on CI.

Tell us about those compilation times already!

I haven’t given absolute values so far, mainly because my machine is not representative (I’ll have a blog post about that soon enough, but you may have heard about it on Twitter, IRC or Slack, but I won’t give more details here), and because the end goal here is Mozilla automation, for both the actual release of Firefox (still a long way to go there), and the Try server. Those are what matters more to my fellow developers. Also, I actually haven’t built under Windows on my machine for a fair comparison.

So here it comes:

Build times on CI

Let’s unwrap a little:

  • The yellowish and magenta points are native Windows “opt” builds, on two different kinds of AWS instances.
  • The other points are Cross-Compilations with the same “opt” configuration on three different kinds of AWS instances, one of which is the same as one used for Windows, and another one having better I/O than all the others (the cyan circles).
  • We use a tool to share a compilation cache between builds on automation (sccache), which explains the very noisy nature of the build times, because they depend on the amount of source code changes and of the cache misses they induce.
  • The Cross-Compiled builds were turned on around the 27th of February and started about as fast as the native Windows builds were at the beginning of the graph, but they had just seen a regression.
  • The regression was due to a recent change that made the clang plugin change in every build, which led to large numbers of cache misses.
  • After fixing the regression, the build times came back to their previous level on the native jobs.
  • Sccache handled clang-cl arguments in a way that broke cross-compilation, so when we turned on the cross-compiled jobs on automation, they actually had the cache turned off!
  • Let me state this explicitly because that wasn’t expected at all: the cross-compiled jobs WITHOUT a cache were as fast as native jobs WITH a cache!
  • A day later, after fixing sccache, we turned it on for the cross-compiled jobs, and build times dropped.
  • The week-end passed, and with more realistic work loads where actual changes to compiled code happen and invalidate parts of the cache, build times get more noisy but stay well under what they are on native Windows.

But the above only captures build times. On automation, a job does actually more than build. It also needs to get the source code, and install the tools needed to build. The latter is unfortunately not tracked at the moment, but the former is:

clone times on CI Now, for some explanation of the above graph:

  • The colors don’t match the previous graph. Sorry about that.
  • The colors vary by AWS instance type, and there is no actual distinction between Windows and Linux, so the instance type that is shared between them has values for both, which explain why it now looks bimodal.
  • It can be seen that the ones with better I/O (in red) are largely faster to get the source code, but also that for the shared instance type, Linux is noticeably faster.

It would be fair to say that independently of Windows vs. Linux, way too much time is spent getting the source code, and there’s other ongoing work to make things better.

TL;DR

Overall, the fast end of native Windows builds on Mozilla CI, including Try server, is currently around 45 minutes. That is the time taken by the entire job, and the minimum time between a developer pushing and Windows tests starting to run.

With Cross-Compilation, the fast end is, as of writing, 13 minutes, and can improve further.

As of writing, no actual Windows build job has switched over to Cross-compilation yet. Only an experimental, tier 2, job has been added. But the main jobs developers rely on on the Try server are going to switch real soon now™ (opt and debug for 32-bits, 64-bits and aarch64). Running all the test suites on Try against them yields successful results (modulo the usual known intermittent failures).

Actually shipping off Cross-compiled builds will take longer. We first need to understand the extent of the differences with the native builds and be confident that no subtle breakage happens. Also, PGO and LTO haven’t been tested so far. Everything will come in time.

What about Windows Subsystem for Linux (WSL)?

The idea to allow developers on Windows to build Firefox from WSL has floated for a while. The work to stand up Cross-compiled builds on automation has brought us the closest ever to actually being able to do it! If you’re interested in making it pass the finish line, please come talk to me in #build:mozilla.org on Matrix, there shouldn’t be much work left and we can figure it out (essentially, all the places using Wine would need to do something else, and… that’s it(?)). That should yield faster build times than natively with MozillaBuild.

2020-03-05 15:31:45+0900

p.m.o | 8 Comments »

Announcing git-cinnabar 0.5.4

Please partake in the git-cinnabar survey.

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.3?

  • Windows helper is dynamically linked against libcurl again. Static linkage was causing more problems than it was fixing.
  • Fix clonebundles support to ignore stream=v2 bundles.
  • Ignore graft cinnabarclones when not grafting.
  • Fixed a corner case where git cinnabar fsck would not skip files it was meant to skip and failed as a result.

2020-02-06 09:16:38+0900

cinnabar, p.m.o | No Comments »

Announcing git-cinnabar 0.5.3

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.2?

  • Updated git to 2.25.0 for the helper.
  • Fixed small memory leaks.
  • Combinations of remote ref styles are now allowed.
  • Added a git cinnabar unbundle command that allows to import a mercurial bundle.
  • Experimental support for python >= 3.5.
  • Fixed erroneous behavior of git cinnabar {hg2git,git2gh} with some forms of abbreviated SHA1s.
  • Fixed handling of the GIT_SSH environment variable.
  • Don’t eliminate closed tips when they are the only head of a branch.
  • Better handle manifests with double slashes created by hg convert from Mercurial < 2.0.1, and the following updates to those paths with normal Mercurial operations.
  • Fix compatibility with Mercurial libraries >= 3.0, < 3.4.
  • Windows helper is now statically linked against libcurl.

2020-01-18 11:49:48+0900

cinnabar, p.m.o | No Comments »

Five years of git-cinnabar

On this very day five years ago, I committed the initial code of what later became git-cinnabar. It is kind of an artificial anniversary, because I didn’t actually publish anything until 3 weeks later, and I also had some prototypes months earlier.

The earlier prototypes of what I’ll call “pre-git-cinnabar” could handle doing git clone hg::https://hg.mozilla.org/mozilla-central (that is, creating a git clone of a Mercurial repository), but they couldn’t git pull later. That pre-git-cinnabar initial commit, however, was the first version that did.

The state of the art back then was similar git helpers, the most popular choice being Felipec’s git-remote-hg, or the opposite tool: hg-git, a mercurial plugin that allows to push to a git repository.

They both had the same caveats: they were slow to handle a repository the size of mozilla-central back then, and both required a local mercurial repository (hidden in the .git directory in the case of Felipec’s git-remote-hg).

This is what motivated me to work on pre-git-cinnabar, which was also named git-remote-hg back then because of how git requires a git-remote-hg executable to handle hg::-prefixed urls.

Fast forward five years, mozilla-central has grown tremendously, and another mozilla-unified repository was created that aggregates the various mozilla branches (esr*, release, beta, central, integration/*).

git-cinnabar went through multiple versions, multiple changes to the metadata it keeps, and while I actually haven’t cumulatively worked all that much on it considering the number of years, a lot of progress has been made.

But let’s go back to the 19th of November 2014. Thankfully, Mercurial allows to strip everything past a certain date, artificially allowing to restore the state of the repository at that date. Unfortunately, pre-git-cinnabar only supports the old Mercurial bundle format, which both the mozilla-central and mozilla-unified repositories now don’t allow. So pre-git-cinnabar can’t clone them out of the box anymore. It’s still possible to allow it in mirror repositories, but because they now use generaldelta, that incurs a server-side conversion that is painfully slow (the hg.mozilla.org server rejects clients that don’t support the new format for this reason).

So for testing purposes, I setup a nginx reverse-proxy and cache, such that the conversion only happens once, and performed clones multiple times, removing any bundling and conversion cost out of the equation. And tested the current version of Felipec’s git-remote-hg, the current version of hg-git, pre-git-cinnabar, and last git-cinnabar release (0.5.2 as of writing), on some AWS instances, with Xeon Platinum 8124M 3Ghz CPUs. That’s a different CPU from what I had back in 2014, yielding some different results from what I wrote in that first announcement.

I’ve thus cloned both mozilla-central (denoted m-c) and mozilla-unified (denoted m-u), with simulated old states of the repositories. Mozilla-unified didn’t exist before 2016, but it’s still interesting to simulate its content as if it had existed because it allows to learn how the tools perform with the additional branches it contains, with the implication they have on how the data is stored in the repository.

Note: I didn’t test older versions of git-remote-hg or hg-git to see how they performed at the time, and how things evolved for them.

Clone times in 2014

There are multiple things of note in the results above:

  • I wrote back then that cloning took 12 hours with git-remote-hg and 30 minutes with pre-git-cinnabar on the machine I used. And while cloning with pre-git-cinnabar on more modern hardware was much faster (16 minutes), cloning with git-remote-hg wasn’t. The pre-git-cinnabar improvement could, though, be attributed in part to improvements in git-fast-import itself (I haven’t tested older versions). But it’s also not impossible that git-remote-hg regressed. Only further testing would tell.
  • mozilla-unified is bigger than mozilla-central, because it is a superset, and that reflects on the clone times, but hg-git and pre-git-cinnabar are both much slower to clone mozilla-unified than you’d expect from the difference in size, especially hg-git. git-cinnabar made a lot of progress in that regard.
  • I hadn’t tested hg-git back then, but while it’s not as slow as git-remote-hg, it’s still horribly slow for a repository this size.

Let’s now look at the .git sizes:

.git sizes in 2014

Those are the sizes for the .git directory fresh after cloning. In all cases, git gc --aggressive would make the clone smaller, at the cost of CPU time (although not significantly smaller in the git-cinnabar case). And after you spent 12 hours cloning, are you really going to spend another large number of hours on a git gc to save disk space?

It is worth noting that in the case of hg-git, this doesn’t include the size of the mercurial repository required to maintain the git repository, while it is included for git-remote-hg, where it is hidden in .git, as mentioned earlier. That puts them about on par w.r.t size.

It’s interesting how close hg-git and git-remote-hg are in disk usage, when the former uses dulwich, a pure Python implementation of Git, and the latter uses git-fast-import. pre-git-cinnabar used git-fast-import too, but optimized the data it sent to git-fast-import to allow for a more compact .git. Recent git-cinnabar made it even better, although it doesn’t use git-fast-import directly, instead using a helper derived from git-fast-import.

But that was 2014. Let’s look at how things evolved over time, by taking “snapshots” of the repositories at one year interval, starting in November 2007.

Clone times over time

Of note:

  • pre-git-cinnabar somehow invalidated the nginx cache for years >= 2016 for mozilla-unified, which didn’t allow to get reliable measures.
  • Things went well out of hand with git-remote-hg and hg-git, so much so that I wasn’t able to get results for git-remote-hg clones for 2019 in time for this post. They’re now getting clone times that count in days rather than hours.
  • Things are getting much worse with mozilla-unified, relatively to mozilla-central, for hg-git than they do for git-remote-hg or git-cinnabar, while they were really bad with pre-git-cinnabar.
  • pre-git-cinnabar clone times for mozilla-central are indistinguishable from git-cinnabar’s at this scale (but see further below).
  • the progression is not linear, but the progression in repository size wasn’t linear either. In order to get a slightly better picture, it is better to look at the clone times vs. the size of the repositories. One measure of that size is number of objects (changeset, manifests and file revisions they contain).

Clone times over repo size

The progression here looks more linear, but still not quite linear. The difference between the mozilla-central and mozilla-unified clone times is the most damning, especially for hg-git and pre-git-cinnabar. At this scale things don’t look so bad for git-cinnabar, but looking closer, they aren’t actually much better:

Clone times over repo size, pre-git-cinnabar and git-cinnabar only

mozilla-central clone times have slightly improved since pre-git-cinnabar days, at least more than the comparison with hg-git and git-remote-hg suggested. mozilla-unified clone times, however, have dramatically improved (notwithstanding the fact that it’s currently not possible to clone with pre-git-cinnabar at all directly from hg.mozilla.org).

But clone times are starting to get a little out of hand, especially for mozilla-unified, which is why I’ve recently added support for “clone bundles”. But I also have work in progress that I expect will make non-bundle clones faster too, and hopefully more linear.

As for .git sizes:

.git sizes over repo size

  • hg-git and git-remote-hg are still hand in hand.
  • Here the progression is mostly linear, with almost no difference between mozilla-central and mozilla-unified, as one could expect.
  • I think the larger increase in size between what would be 2017 and 2018 is largely due to the testing/web-platform/meta/MANIFEST.json file.
  • People who try to clone the Mozilla repositories with hg-git or git-remote-hg at this point better have a lot of time and a lot of free disk space.

While git-cinnabar is demonstrably significantly faster than both git-remote-hg and hg-git by a large margin for the Mozilla repositories (more than an order of magnitude), looking at the data more closely revealed something interesting that can be pictured in the following graph, plotting how much slower than git-cinnabar the other tools are.

Clone time ratios against git-cinnabar

The ratio is not constant, and has surprisingly been decreasing steadily since 2016, correlating with the observation that clone times are getting slower more quickly than the repositories are growing. But they are doing more so with git-cinnabar than with the other tools. Clone times with git-cinnabar have multiplied by more than 5 in five years, for a repository that only has 2.5 times more objects. At this pace, in five more years, clones will take well above 10 hours, and that’s not counting for the accelerated slowdown. Hopefully, the current work in progress will help.

It’s also interesting to see how the ratios changed after 2011 between mozilla-central and mozilla-unified. 2011 is when Firefox 4 was released and the release process switched to multiple repositories, which mozilla-unified, well, unified in a single repository. So mozilla-unified and mozilla-central were largely identical when stripped of commits after 2011 and started diverging afterwards.

To conclude this rather long post, pre-git-cinnabar improved the state of the art to clone large Mercurial repositories, and git-cinnabar went further in the past five years. But without more work, things will get out of hand. And that only accounts for clone times. I haven’t spent much time working on other aspects, like negotiation times (the time it takes to agree with a Mercurial server what the git clone has in common with it), or bundling times (the time it takes to generate a bundle to send a Mercurial server). Both are the relevant parts of push times.

2019-11-19 18:04:26+0900

cinnabar, p.m.o | No Comments »

Reproducing the Linux builds of Firefox 68

Starting with Firefox 68, the Linux builds shipped by Mozilla should be reproducible (it is not currently automatically validated that it definitely is, but 68.0 is). These builds are optimized with Profile Guided Optimization, and the profile data was not kept and published until recently, which is why they weren’t reproducible until now.

The following instructions require running Docker on a Linux host (this may or may not work on a non-Linux host, I don’t know what e.g. Docker for Mac does, and if the docker support in the mach command works with it). I’ll try to make them generic enough that they may apply to any subsequent release of Firefox.

  • Clone either the mozilla-unified or mozilla-release repository. You can use Mercurial or Git (with git-cinnabar), it doesn’t matter.
  • Checkout the FIREFOX_68_0_RELEASE tag and find out what its Mercurial changeset id is (it is 353628fec415324ca6aa333ab6c47d447ecc128e).
  • Open the Taskcluster index tool in a browser tab.
  • In the input field type or copy/paste gecko.v2.mozilla-release.shippable.revision.353628fec415324ca6aa333ab6c47d447ecc128e.firefox.linux64-opt and press the Enter key. (replace 353628fec415324ca6aa333ab6c47d447ecc128e with the right revision if you’re trying for another release)
  • This will fill the “Indexed Task” pane, where you will find a TaskId. Follow the link there, it will bring you to the corresponding Task Run Logs
  • Switch to the Task Details
  • Scroll down to the “Dependencies” list, and check the task name that begins with “build-docker-image”. For the Firefox 68 build task, it is build-docker-image-debian7-amd64-build.
  • Take that name, remove the “build-docker-image-” prefix, and run the following command, from inside the repository, to download the corresponding docker image:
    $ ./mach taskcluster-load-image debian7-amd64-build

    Obviously, replace debian7-amd64-build with whatever you found in the task dependencies. The image can also be built from the source tree, but this is out of scope for this post.

  • The command output will give you a docker run -ti ... command to try. Run it. It will open a shell in the docker image.
  • From the docker shell, run the following commands:
    $ echo no-api-key > /builds/mozilla-desktop-geoloc-api.key
    $ echo no-api-key > /builds/sb-gapi.data
    $ echo no-api-key > /builds/gls-gapi.data
    

    Or replace no-api-key with the actual keys if you have them.

  • Back to the Task Details, check the env part of the “Payload”. You’ll need to export all these variables with the corresponding values. e.g.
    $ export EXTRA_MOZHARNESS_CONFIG='{"update_channel": "release", "mozconfig_variant": "release"}'
    $ export GECKO_BASE_REPOSITORY='https://hg.mozilla.org/mozilla-unified'
    $ export GECKO_HEAD_REPOSITORY='https://hg.mozilla.org/releases/mozilla-release'
    ...
  • Set the missing TASKCLUSTER_ROOT_URL environment variable:
    $ export TASKCLUSTER_ROOT_URL='https://taskcluster.net'
  • Change the value of MOZHARNESS_ACTIONS to:
    $ export MOZHARNESS_ACTIONS='build'

    The original value contains get-secrets, which will try to download from http://taskcluster/, which will fail with a DNS error, and check-test, which runs make check, which is not necessary to get a working Firefox.

  • Take command part of the “Payload”, and run that in the docker shell:
    $ /builds/worker/bin/run-task --gecko-checkout /builds/worker/workspace/build/src -- /builds/worker/workspace/build/src/taskcluster/scripts/builder/build-linux.sh
  • Once the build is finished, in another terminal, check what the container id of your running docker container is, and extract the build artifact from there:
    $ docker ps
    CONTAINER ID        IMAGE                                                                                  COMMAND             CREATED             STATUS              PORTS               NAMES
    d234383ba9c7        debian7-amd64-build:be96d1b734e1a152a861ce786861fca6e70bcb996bf67347f5af4f146db157ec   "bash"              2 hours ago         Up 2 hours                              nifty_hermann
    $ docker cp d234383ba9c7:/builds/worker/artifacts/target.tar.bz2 .

    (replace d234383ba9c7 with your container id)

  • Now you can exit the docker shell. That will remove the container.

After all the above, you can finally compare your target.tar.bz2 to the Linux64 Firefox 68 release. You will find a few inevitable differences:

  • The .chk files will be different, because they are self-signatures for FIPS mode that are generated with one-time throw-away keys.
  • The Firefox 68 release contains .sig files that your build won’t contain. They are signature files, which aren’t reproducible outside Mozilla automation for obvious reasons.
  • Consequently, the precomplete file contains instructions for the .sig files in the Firefox 68 release that won’t be in your build.
  • The omni.ja files are different. If you extract them (they are uncompressed zip files with a few tweaks to the format), you’ll see the only difference is in modules/AppConstants.jsm, for the three API keys you created a file for earlier.

Everything else is identical bit for bit.

All the above is a rather long list of manual steps. Ideally, most of it would be automated. We’re not there yet. We only recently got to the point where the profile data is available to make it possible at all. In other words, this is a starting point. It’s valuable to know it does work but requires manual steps and what those are.

It is also worth noting that while the above downloads and uses pre-built compilers and other tools, it is also possible to rebuild those, although they likely won’t be bit-for-bit identical. But differences in those shouldn’t incur differences in Firefox. Replacing the pre-built ones with ones you’d build yourself unfortunately currently requires some more manual work.

As for Windows and Mac builds, long story short, they are not reproducible as of writing. Mac builds are not optimized with PGO, but Windows builds are, and their profile data won’t be available until Firefox 69. Both platforms require SDKs that Mozilla can’t redistribute per their license (but are otherwise available for download from Microsoft or Apple, respectively), which makes the setup more complex. And in all likeliness, for both platforms, the toolchains are not deterministic yet (that’s at least true for Mac). Also, binary signatures would need to be tripped off the executables and libraries before any comparison.

2019-07-11 11:31:22+0900

p.m.o | No Comments »

Git now faster than Mercurial to clone Mozilla Mercurial repos

How is that for clickbait?

With the now released git-cinnabar 0.5.2, the cinnabarclone feature is enabled by default, which means it doesn’t need to be enabled manually anymore.

Cinnabarclone is to git-cinnabar what clonebundles is to Mercurial (to some extent). Clonebundles allow Mercurial to download a pre-generated bundle of a repository, which reduces work on the server side. Similarly, Cinnabarclone allows git-cinnabar to download a pre-generated bundle of the git form of a Mercurial repository.

Thanks to Connor Sheehan, who deployed the necessary extension and configuration on the server side, cinnabarclone is now enabled for mozilla-central and mozilla-unified, making git-cinnabar clone faster than ever for these repositories. In fact, under some conditions (mostly depending on network bandwidth), cloning with git-cinnabar is now faster than cloning with Mercurial:

$ time git clone hg::https://hg.mozilla.org/mozilla-unified mozilla-unified_git
Cloning into 'mozilla-unified_git'...
Fetching cinnabar metadata from https://index.taskcluster.net/v1/task/github.glandium.git-cinnabar.bundle.mozilla-unified/artifacts/public/bundle.git
Receiving objects: 100% (12153616/12153616), 2.67 GiB | 41.41 MiB/s, done.
Resolving deltas: 100% (8393939/8393939), done.
Reading 172 changesets
Reading and importing 170 manifests
Reading and importing 758 revisions of 570 files
Importing 172 changesets
It is recommended that you set "remote.origin.prune" or "fetch.prune" to "true".
git config remote.origin.prune true
or
git config fetch.prune true

Run the following command to update tags:
git fetch --tags hg::tags: tag "*"
Checking out files: 100% (279688/279688), done.

real    4m57.837s
user    9m57.373s
sys     0m41.106s

$ time hg clone https://hg.mozilla.org/mozilla-unified
destination directory: mozilla-unified
applying clone bundle from https://hg.cdn.mozilla.net/mozilla-unified/5ebb4441aa24eb6cbe8dad58d232004a3ea11b28.zstd-max.hg
adding changesets
adding manifests
adding file changes
added 537259 changesets with 3275908 changes to 523698 files (+13 heads)
finished applying clone bundle
searching for changes
adding changesets
adding manifests
adding file changes
added 172 changesets with 758 changes to 570 files (-1 heads)
new changesets 8b3c35badb46:468e240bf668
537259 local changesets published
updating to branch default
(warning: large working directory being used without fsmonitor enabled; enable fsmonitor to improve performance; see "hg help -e fsmonitor")
279688 files updated, 0 files merged, 0 files removed, 0 files unresolved

real    21m9.662s
user    21m30.851s
sys     1m31.153s

To be fair, the Mozilla Mercurial repos also have a faster “streaming” clonebundle that they only prioritize automatically if the client is on AWS currently, because they are much larger, and could take longer to download. But you can opt-in with the --stream command line argument:

$ time hg clone --stream https://hg.mozilla.org/mozilla-unified mozilla-unified_hg
destination directory: mozilla-unified_hg
applying clone bundle from https://hg.cdn.mozilla.net/mozilla-unified/5ebb4441aa24eb6cbe8dad58d232004a3ea11b28.packed1.hg
525514 files to transfer, 2.95 GB of data
transferred 2.95 GB in 51.5 seconds (58.7 MB/sec)
finished applying clone bundle
searching for changes
adding changesets
adding manifests
adding file changes
added 172 changesets with 758 changes to 570 files (-1 heads)
new changesets 8b3c35badb46:468e240bf668
updating to branch default
(warning: large working directory being used without fsmonitor enabled; enable fsmonitor to improve performance; see "hg help -e fsmonitor")
279688 files updated, 0 files merged, 0 files removed, 0 files unresolved

real    1m49.388s
user    2m52.943s
sys     0m43.779s

If you’re using Mercurial and can download 3GB in less than 20 minutes (in other words, if you can download faster than 2.5MB/s), you’re probably better off with the streaming clone.

Bonus fact: the Git clone is smaller than the Mercurial clone

The Mercurial streaming clone bundle contains data in a form close to what Mercurial puts on disk in the .hg directory, meaning the size of .hg is close to that of the clone bundle. The Cinnabarclone bundle contains a git pack, meaning the size of .git is close to that of the bundle, plus some more for the pack index file that unbundling creates.

The amazing fact is that, to my own surprise, the git pack, containing the repository contents along with all git-cinnabar needs to recreate Mercurial changesets, manifests and files from the contents, takes less space than the Mercurial streaming clone bundle.

And that translates in local repository size:

$ du -h -s --apparent-size mozilla-unified_hg/.hg
3.3G    mozilla-unified_hg/.hg
$ du -h -s --apparent-size mozilla-unified_git/.git
3.1G    mozilla-unified_git/.git

And because Mercurial creates so many files (essentially, two per file that ever was in the repository), there is a larger difference in block size used on disk:

$ du -h -s mozilla-unified_hg/.hg
4.7G    mozilla-unified_hg/.hg
$ du -h -s mozilla-unified_git/.git
3.1G    mozilla-unified_git/.git

It’s even more mind blowing when you consider that Mercurial happily creates delta chains of several thousand revisions, when the git pack’s longest delta chain is 250 (set arbitrarily at pack creation, by which I mean I didn’t pick a larger value because it didn’t make a significant difference). For the casual readers, Git and Mercurial try to store object revisions as a diff/delta from a previous object revision because that takes less space. You get a delta chain when that previous object revision itself is stored as a diff/delta from another object revision itself stored as a diff/delta … etc.

My guess is that the difference is mainly caused by the use of line-based deltas in Mercurial, but some Mercurial developer should probably take a deeper look. The fact that Mercurial cannot delta across file renames is another candidate.

2019-07-02 10:06:50+0900

p.m.o | 6 Comments »

Announcing git-cinnabar 0.5.2

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.1?

  • Updated git to 2.22.0 for the helper.
  • cinnabarclone support is now enabled by default. See details in README.md and mercurial/cinnabarclone.py.
  • cinnabarclone now supports grafted repositories.
  • git cinnabar fsck now does incremental checks against last known good state.
  • Avoid git cinnabar sometimes thinking the helper is not up-to-date when it is.
  • Removing bookmarks on a Mercurial server is now working properly.

2019-07-01 14:17:21+0900

cinnabar, p.m.o | No Comments »