Archive for the 'p.m.o' Category

Announcing git-cinnabar 0.5.10

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.9?

  • Fixed exceptions during config initialization.
  • Fixed swapped error messages.
  • Fixed correctness issues with bundle chunks with no delta node.
  • This is probably the last 0.5.x release before 0.6.0.

2022-07-31 06:35:25+0900

cinnabar, p.m.o | No Comments »

Announcing git-cinnabar 0.5.9

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.8?

  • Updated git to 2.37.1 for the helper.
  • Various python 3 fixes.
  • Fixed stream bundle
  • Added python and py.exe as executables tried on top of python3 and python2.
  • Improved handling of ill-formed local urls.
  • Fixed using old mercurial libraries that don’t support bundlev2 with a server that does.
  • When fsck reports the metadata as broken, prevent further updates to the repo.
  • When issue #207 is detected, mark the metadata as broken
  • Added support for logging redirection to a file
  • Now ignore refs/cinnabar/replace/ refs, and always use the corresponding metadata instead.
  • Various git cinnabar fsck fixes.

2022-07-16 07:11:55+0900

cinnabar, p.m.o | No Comments »

Announcing git-cinnabar 0.5.8

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.7?

  • Updated git to 2.34.0 for the helper.
  • Python 3.5 and newer are now officially supported. Git-cinnabar will try to use the python3 program by default, but will fallback to python2.7 if that’s where the Mercurial libraries are available. It is possible to pick a specific python with the GIT_CINNABAR_PYTHON environment variable.
  • Fixed compatibility with Mercurial 5.8 and newer.
  • The prebuilt binaries are now optimized on arm64 macOS and Windows.
  • git cinnabar download now properly returns an error code when failing to extract the prebuilt binaries.
  • Pushing to a non-empty Mercurial repository without having pulled at least once from it is now prevented.
  • Replaced the nagging about fsck with a smaller check always happening after pulling.
  • Fail earlier on git fetch hg::url <sha1> (it would properly fetch the Mercurial changeset and its ancestors, but git would fail at the end because the sha1 is not a git sha1 ; use git cinnabar fetch instead)
  • Minor fixes.

2021-11-20 07:05:57+0900

cinnabar, p.m.o | No Comments »

Announcing git-cinnabar 0.5.7

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.6?

  • Updated git to 2.31.1 for the helper.
  • When using git >= 2.31.0, git -c config=value ... works again.
  • Minor fixes.

2021-04-01 07:50:50+0900

cinnabar, p.m.o | No Comments »

6000

It seems to be a month of anniversaries for me.

Yesterday, I was randomly checking how many commits I had in mozilla-central, and the answer was 5999. Now that I’ve pushed something else, I’ve now reached my 6000th commit.

This made me go in a small rabbit hole, and I realized more anniversaries are coming:

  • My bugzilla.mozilla.org account is going to turn 19 on March 21/22 depending on timezones. Bugzilla itself is saying the account was created in August 2002, but that’s a lie. I still have the email I received at account creation:
    Received: from mothra.mozilla.org ([207.200.81.216])
        by ...
        id 16oE0n-0003UA-00
        for <...>; Fri, 22 Mar 2002 02:38:34 +0100
    Received: (from nobody@localhost)
        by mothra.mozilla.org with � id g2M1eXf26844;
        Thu, 21 Mar 2002 17:40:33 -0800 (PST)
    Date: Thu, 21 Mar 2002 17:40:33 -0800 (PST)
    Message-Id: <200203220140.g2M1eXf26844@mothra.mozilla.org>
    From: bugzilla-daemon@mozilla.org
    Subject: mozilla.org Bugzilla Account Information
    
  • The first bug I ever filed will also turn 19 on the same day because I filed it right after opening the account (which is further evidence that the user profile information is bogus).
  • The oldest bug I filed and that is still open will turn 15 on March 27.
  • My first review (or at least the earliest I could find in mozilla-central) will turn 11 on March 25.
  • My application for commit access level 3 will turn 11 on March 30 (but I only actually got access 21 days later).

Interestingly, all the above happened before I joined Mozilla as paid staff, although the latter two happened on the same year that I did join (in September 2010).

As for commits, the earliest commit that is attributed to me as author (and thus the first of the 6000) was in July 2008. It predates my commit access by almost two years. The first commit I pushed myself to mozilla-central was on the same day month I received access. I actually pushed 15 patched at once that day. My last “checkin-needed” patch was a few days before that, and was landed by Mossop, who is still at Mozilla.

But digging deeper, I was reminded that attribution worked differently before the Mozilla repository moved to Mercurial: the commit message itself would contain “Patch by x” or “p=x”. Following this trail, my oldest landed patch seems to have happened in August 2005. This practice actually survived the death of CVS, so there are Mercurial changesets with this pattern that thus don’t count in the 6000. The last one of them seems to have been in May 2008. There are probably less than 40 such commits across CVS and Mercurial, though, which would still put the real 6000th somewhere this month.

Now, ignoring the above extra commits, if I look at individual commit one-line summaries, I can see that 5015 of them appear once (as you’d expect), 385 (!) twice, 61 (!!) three times, and EIGHT (!!!) four times. Some patches are just hard to stick. For the curious, those 8 that took 4 attempts are:

Now that I see this list, I can totally see how they could have bounced multiple times.

Those multiple attempts at landing some patches transforms those 6000 commits into 5469 unique commits that either stuck or didn’t but never re-landed (I think there are a few of those). That puts the rejection rate slightly below 10% ((6000 – 5469) / 5469 = 9.7%). I would have thought I was on the high end, but a quick and dirty estimate for the entirety of mozilla-central seems to indicate the overall rejection rate is around 10% too. I guess I’m average.

The ten files I touched the most? (not eliminating multiple attempts of landing something)

  • configure.in, 342 times.
  • config/rules.mk, 256 times.
  • old-configure.in, 251 times.
  • build/moz.configure/toolchain.configure, 196 times.
  • memory/build/mozjemalloc.cpp, 189 times.
  • config/config.mk, 167 times.
  • build/moz.configure/old.configure, 159 times.
  • python/mozbuild/mozbuild/backend/recursivemake.py, 154 times.
  • build/moz.configure/old.configure, 150 times.
  • js/src/configure.in, 148 times.

So, mostly build system, but also memory allocator.

Total number of files touched? 10510. Only 3929 of them still exist today. I didn’t look how many have simply moved around (and are counted multiple times in the total).

2021-03-17 20:02:01+0900

p.m.o | 1 Comment »

5 years ago, Firefox (re)entered Debian

5 years ago today, I was declaring Iceweasel dead, and Firefox was making a come back in Debian. I hadn’t planned to make this post, and in fact, I thought it had been much longer. But coincidentally, I was binge-watching Mr. Robot recently, which prominently featured Iceweasel.

iceweasel command in a terminal

Mr. Robot is set in the year 2015, and I was surprised that Iceweasel was being used, which led me to search for that post where I announced Firefox was back… and realizing that we were close to the 5 years mark. Well, we are at the 5 years mark now.

iceweasel start page

I’d normally say time flies, but it turns out it hasn’t flown as much as I thought it did. I wonder if the interminable pandemic is to blame for that.

2021-03-10 15:16:02+0900

firefox, p.m.o | 4 Comments »

Announcing git-cinnabar 0.5.6

Please partake in the git-cinnabar survey.

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.5?

  • Updated git to 2.29.2 for the helper.
  • git cinnabar git2hg and git cinnabar hg2git now have a --batch flag.
  • Fixed a few issues with experimental support for python 3.
  • Fixed compatibility issues with mercurial >= 5.5.
  • Avoid downloading unsupported clonebundles.
  • Provide more resilience to network problems during bundle download.
  • Prebuilt helper for Apple Silicon macos now available via git cinnabar download.

2020-11-12 11:40:13+0900

cinnabar, p.m.o | No Comments »

[Linux] Disabling CPU turbo, cores and threads without rebooting

[Disclaimer: this has been sitting as a draft for close to three months ; I forgot to publish it, this is now finally done.]

In my previous blog post, I built Firefox in a multiple different number of configurations where I’d disable the CPU turbo, some of its cores or some of its threads. That is something that was traditionally done via the BIOS, but rebooting between each attempt is not really a great experience.

Fortunately, the Linux kernel provides a large number of knobs that allow this at runtime.

Turbo

This is the most straightforward:

$ echo 0 > /sys/devices/system/cpu/cpufreq/boost

Re-enable with

$ echo 1 > /sys/devices/system/cpu/cpufreq/boost

CPU frequency throttling

Even though I haven’t mentioned it, I might as well add this briefly. There are many knobs to tweak frequency throttling, but assuming your goal is to disable throttling and set the CPU frequency to its fastest non-Turbo frequency, this is how you do it:

$ echo performance > /sys/devices/system/cpu/cpu$n/cpufreq/scaling_governor

where $n is the id of the core you want to do that for, so if you want to do that for all the cores, you need to do that for cpu0, cpu1, etc.

Re-enable with:

$ echo ondemand > /sys/devices/system/cpu/cpu$n/cpufreq/scaling_governor

(assuming this was the value before you changed it ; ondemand is usually the default)

Cores and Threads

This one requires some attention, because you cannot assume anything about the CPU numbers. The first thing you want to do is to check those CPU numbers. You can do so by looking at the physical id and core id fields in /proc/cpuinfo, but the output from lscpu --extended is more convenient, and looks like the following:

CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ    MINMHZ
0   0    0      0    0:0:0:0       yes    3700.0000 2200.0000
1   0    0      1    1:1:1:0       yes    3700.0000 2200.0000
2   0    0      2    2:2:2:0       yes    3700.0000 2200.0000
3   0    0      3    3:3:3:0       yes    3700.0000 2200.0000
4   0    0      4    4:4:4:1       yes    3700.0000 2200.0000
5   0    0      5    5:5:5:1       yes    3700.0000 2200.0000
6   0    0      6    6:6:6:1       yes    3700.0000 2200.0000
7   0    0      7    7:7:7:1       yes    3700.0000 2200.0000
(...)
32  0    0      0    0:0:0:0       yes    3700.0000 2200.0000
33  0    0      1    1:1:1:0       yes    3700.0000 2200.0000
34  0    0      2    2:2:2:0       yes    3700.0000 2200.0000
35  0    0      3    3:3:3:0       yes    3700.0000 2200.0000
36  0    0      4    4:4:4:1       yes    3700.0000 2200.0000
37  0    0      5    5:5:5:1       yes    3700.0000 2200.0000
38  0    0      6    6:6:6:1       yes    3700.0000 2200.0000
39  0    0      7    7:7:7:1       yes    3700.0000 2200.0000
(...)

Now, this output is actually the ideal case, where pairs of CPUs (virtual cores) on the same physical core are always n, n+32, but I’ve had them be pseudo-randomly spread in the past, so be careful.

To turn off a core, you want to turn off all the CPUs with the same CORE identifier. To turn off a thread (virtual core), you want to turn off one CPU. On machines with multiple sockets, you can also look at the SOCKET column.

Turning off one CPU is done with:

$ echo 0 > /sys/devices/system/cpu/cpu$n/online

Re-enable with:

$ echo 1 > /sys/devices/system/cpu/cpu$n/online

Extra: CPU sets

CPU sets are a feature of Linux’s cgroups. They allow to restrict groups of processes to a set of cores. The first step is to create a group like so:

$ mkdir /sys/fs/cgroup/cpuset/mygroup

Please note you may already have existing groups, and you may want to create subgroups. You can do so by creating subdirectories.

Then you can configure on which CPUs/cores/threads you want processes in this group to run on:

$ echo 0-7,16-23 > /sys/fs/cgroup/cpuset/mygroup/cpuset.cpus

The value you write in this file is a comma-separated list of CPU/core/thread numbers or ranges. 0-3 is the range for CPU/core/thread 0 to 3 and is thus equivalent to 0,1,2,3. The numbers correspond to /proc/cpuinfo or the output from lscpu as mentioned above.

There are also memory aspects to CPU sets, that I won’t detail here (because I don’t have a machine with multiple memory nodes), but you can start with:

$ cat /sys/fs/cgroup/cpuset/cpuset.mems > /sys/fs/cgroup/cpuset/mygroup/cpuset.mems

Now you’re ready to assign processes to this group:

$ echo $pid >> /sys/fs/cgroup/cpuset/mygroup/tasks

There are a number of tweaks you can do to this setup, I invite you to check out the cpuset(7) manual page.

Disabling a group is a little involved. First you need to move the processes to a different group:

$ while read pid; do echo $pid > /sys/fs/cgroup/cpuset/tasks; done < /sys/fs/cgroup/cpuset/mygroup/tasks

Then deassociate CPU and memory nodes:

$ > /sys/fs/cgroup/cpuset/mygroup/cpuset.cpus
$ > /sys/fs/cgroup/cpuset/mygroup/cpuset.mems

And finally remove the group:

$ rmdir /sys/fs/cgroup/cpuset/mygroup

2020-08-31 07:00:38+0900

p.d.o, p.m.o | 1 Comment »

The influence of hardware on Firefox build times

I recently upgraded my aging “fast” build machine. Back when I assembled the machine, it could do a full clobber build of Firefox in about 10 minutes. That was slightly more than 10 years ago. This upgrade, and the build times I’m getting on the brand new machine (now 6 months old) and other machines led me to look at how some parameters influence build times.

Note: most of the data that follows was gathered a few weeks ago, building off Mercurial revision 70f8ce3e2d394a8c4d08725b108003844abbbff9 of mozilla-central.

Old vs. new

The old “fast” build machine had a i7-870, with 4 cores, 8 threads running at 2.93GHz, turbo at 3.6GHz, and 16GiB RAM. The new machine has a Threadripper 3970X, with 32 cores, 64 threads running at 3.7GHz, turbo at 4.5GHz (rarely reached to be honest), and 128GiB RAM (unbuffered ECC).

Let’s compare build times between them, at the same level of parallelism:

Build times at -j8

That is 63 minutes for the i7-870 vs. 26 for the Threadripper, with a twist: the Threadripper was explicitly configured to use 4 physical cores (with 2 threads each) so as to be fair to the poor i7-870.

Assuming the i7 maxed out at its base clock, and the Threadripper at turbo speed (which it actually doesn’t, but it’s closer to the truth than the base clock is with a eighth of the cores activated), the speed-up from the difference in frequency alone would make the build 1.5 times faster, but we got close to 2.5 times faster.

But that doesn’t account for other factors we’ll explore further below.

Before going there, let’s look at what unleashing the full power of the Threadripper brings to the table:

Build times at -j8 vs. -j64

Yes, that is 5 minutes for the Threadripper 3970X, when using all its cores and threads.

Spinning disk vs. NVMe

The old machine was using spinning disks in RAID 1. I can’t really test much faster SSDs on that machine because I don’t have any that would fit, and the machine is now dismantled, but I was able to try the spinning disks on the Threadripper.

Build times with HDD vs NVMe vs RAM

Using spinning disks instead of modern solid-state makes the build almost 3 times as slow! (or an addition of almost 10 minutes). And while I kind of went overboard with this new machine by setting up not one but two NVMe PCIe 4.0 SSDs in RAID1, I also slowed them down by using Full Disk Encryption, but it doesn’t matter, because I get the same build times (within noise) if I put everything in memory (because I can).

It would be interesting to get more data with different generations of SSDs, though (SATA ones may still have a visible overhead, for instance).

Going back to the original comparison between the i7-870 and the downsized Threadripper, assuming the overhead from disk alone corresponds to what we see here (which it probably doesn’t, actually, because less parallelism means less concurrent accesses, means less seeks, means more speed), the speed difference now looks closer to “only” 2x.

Desktop vs. Laptop

My most recent laptop is a 3.5 years old Dell XPS13 9360, that came with a i7-7500U (2 cores, 4 threads, because that’s all you could get in the 13″ form factor back then; 2.7GHz, 3.5GHz turbo), and 16GiB RAM.

A more recent 13″ laptop would be the Apple Macbook Pro 13″ Mid 2019, sporting an i7-8569U (4 cores, 8 threads, 2.8GHz, 4.7GHz turbo), and 16GiB RAM. I don’t own one, but Mozilla’s procurement system contains build times for it (although I don’t know what changeset that corresponds to, or what compilers were used ; also, the OS is different).

Build times on laptops vs desktop

The XPS13 being old, it is subject to thermal throttling, making it slower than it should be, but it wouldn’t beat the 10 years old desktop anyway. Macbook Pros tend to get into these thermal issues after a while too.

I’ve relied on laptops for a long time. My previous laptop before this XPS was another XPS, that is now about 6 to 7 years old, and while the newer one had more RAM, it was barely getting better build times compared to the older one when I switched. The evolution of laptop performance has been underwelming for a long time, but things finally changed last year. At long last.

I wish I had numbers with a more recent laptop under the same OS as the XPS for fairer comparison. Or with the more recent larger laptops that sport even more cores, especially the fancy ones with Ryzen processors.

Local vs. Cloud

As seen above, my laptop was not really a great experience. Neither was my faster machine. When I needed power, I actually turned to AWS EC2. That’s not exactly cheap for long term use, but I only used it for a couple hours at a time, in relatively rare occasions.

Build times on EC2 vs. Threadripper

The c5d instances are AFAIK, and as of writing, the most powerful you can get on EC2 CPU-wise. They are based on 3.0GHz Xeon Platinum CPUs, either 8124M or 8275CL (which don’t exist on Intel’s website. Edit: apparently, at least the 8275CL can turbo up to 3.9GHz). The 8124M is Skylake-based and the 8275CL is Cascade Lake, which I guess is the best you could get from Intel at the moment. I’m not sure if it’s a lottery of some sort between 8124M and 8275CL, or if it’s based on instance type, but the 4xlarge, 9xlarge and 18xlarge were 8124M and 12xlarge, 24xlarge and metal were 8275CL. Skylake or Cascade Lake doesn’t seem to make much difference here, but that would need to be validated on a non-virtualized environment with full control over the number of cores and threads being used.

Speaking of virtualization, one might wonder what kind of overhead it has, and as far as building Firefox goes, the difference between c5d.metal (without) and c5d.24xlarge (with), is well within noise.

As seen earlier, storage can have an influence on build times, and EC2 instances can use either EBS storage or NVMe. On the c5d instances, it made virtually no difference. I even compared with everything in RAM (on instances with enough of it), and that didn’t make a difference either. Take this with a grain of salt, though, because EBS can be slower on other types of instances. Also, I didn’t test NVMe or RAM on c5d.4xlarge, and those have slower networking so there could be a difference there.

There are EPYC-based m5a instances, but they are not Zen 2 and have lower frequencies, so they’re not all that good. Who knows when Amazon will deliver their Zen 2 offerings.

Number of cores

The Firefox build system is not perfect, and the number of jobs it’s allowed to run at once can have an influence on the overall build times, even on the same machine. This can somehow be observed on the results on AWS above, but testing different values of -jn on a machine with a large number of cores is another way to investigate how well the build system scales. Here it is, on the Threadripper 3970X:

Build times at different -jn

Let’s pause a moment to appreciate that this machine can build slightly faster, with only 2 cores, than the old machine could with 4 cores and 8 threads.

You may also notice that the build times at -j6 and -j8 are better than the -j8 build time given earlier. More on that further below.

The data, however, is somehow skewed as far as getting a clear picture of how well the Firefox build system scales, because the less cores and threads are being used, the faster those cores can get, thanks to “Turbo”. As mentioned earlier, the base frequency for the processor is 3.7GHz, but it can go up to 4.5GHz. In practice, /proc/cpuinfo doesn’t show much more than 4.45GHz when few cores are used, but on the opposite end, would frequently show 3.8GHz when all of them are used. Which means all the values in that last graph are not entirely comparable to each other.

So let’s see how things go with Turbo disabled (note, though, that I didn’t disable dynamic CPU frequency scaling):

Build times at different -jn without turbo

The ideal scaling (red line) is the build time at -j1 divided by the n in -jn.

So, at the scale of the graph above, things look like there’s some kind of constant overhead (the parts of the build that aren’t parallelized all that well), and a part that is within what you’d expect for parallelization, except above -j32 (we’ll get to that last part further down).

Based on the above, let’s assume a modeled build time of the form O + P / n. We can find some O and P that makes the values for e.g. -j1 and -j32 correct. That works out to be 1.9534 + 146.294 / n (or, in time format: 1:57.2 + 2:26:17.64 / n).

Let’s see how it fares:

Model vs. ideal

The red line is how far from the ideal case (-j1/n) the actual data is, in percent from the ideal case. At -j32, the build time is slightly more than 1.4 times what it ideally would be. At -j24, about 1.3 times, etc.

The blue line is how far from the modeled case the actual data is, in percent from the modeled case. The model is within 3% of reality (without Turbo) until -j32.

And above -j32, things blow up. And the reason is obvious: the CPU only has 32 actual cores. Which brings us to…

Simultaneous Multithreading (aka Hyperthreading)

Ever since the Pentium 4, many (and now most) modern desktop and laptop processors have had some sort of Simultaneous Multithreading (SMT) technology. The underlying principle is that CPUs are waiting for something a lot of the time (usually data in RAM), and that they could actually be executing something else when that happens.

From the software perspective, though, it only appears to be more cores than there physically are. So on a processor with 2 threads per core like the Threadripper 3970X, that means the OS exposes 64 “virtual” cores instead of the 32 physical ones.

When the workload doesn’t require all the “virtual” cores, the OS will schedule code to run on virtual cores that don’t share the same physical core, so as to get the best of one’s machine.

This is why -jn on a machine that has n real cores or more will have better build times than -jn on a machine with less cores, but at least n virtual cores.

This was already exposed in some way above, where the first -j8 build time shown was worse than the subsequent ones. That one was taken while disabling all physical cores but 4, and still keeping 2 threads per core. Let’s compare that with similar number of real cores:

Build times with 4 to 8 threads

As we can see, 4 cores with 8 threads is better than 4 cores, but it doesn’t beat 6 physical cores. Which kind of matches the general wisdom that SMT brings an extra 30% performance.

By the way, this being all on a Zen 2 processor, I’d expect a Ryzen 3 3300X to provide similar build times to that 4 cores 8 threads case.

Let’s look at 16 cores and 32 threads:

Build times with 16 to 32 threads

Similar observation: obviously better than 16 cores alone, but worse than 24 physical cores. And again, this being all on a Zen 2 processor, I’d expect a Ryzen 9 3950X to provide similar build times to the 16 cores and 32 threads case.

Based on the above, I’d estimate a Threadripper 3960X to have build times close to those of 32 real cores on the 3970X (32 cores is 33% more cores than 24).

Operating System

All the build times mentioned above except for the Macbook Pro have been taken under Debian GNU/Linux 10, building Firefox for Linux64 from a specific revision of mozilla-central without any tweaks.

Things get different if you build for a different target (say, Firefox for Windows), or under a different operating system.

Our baseline on this machine is the Firefox for Linux64 build is 5 minutes. Building Firefox for Windows (as a cross-compilation) takes 40 more seconds (5:40), because of building some extra Windows-specific stuff.

A similar Firefox for Windows build, natively, on a fresh Windows 10 install takes … more than 12 minutes! Until you realize you can disable Windows Defender for the source and build tree, at which point it only takes 7:27. That’s still noticeably slower than cross-compiling, but not as catastrophic as when the antivirus is enabled.

Recent versions of Windows come with a Windows Subsystem for Linux (WSL), which allows to run native Linux programs unmodified. There are now actually two versions of WSL. One emulates Linux system calls (WSL1), and the other runs a real Linux kernel in a virtual machine (WSL2). With a Debian GNU/Linux 10 system under WSL1, building Firefox for Linux64 takes 6:25, while it takes 5:41 under WSL2, so WSL2 is definitely faster, but still not close to metal.

Edit: Ironically, it’s very much possible that a cross-compiled Firefox for Windows build in WSL2 would be faster than a native Firefox for Windows build on Windows (but I haven’t tried).

Finally, a Firefox for Windows build, on a fresh Windows 10 install in a KVM virtual machine under the original Debian GNU/Linux system takes 8:17, which is a lot of overhead. Maybe there are some tweaks to do to get better build times, but it’s probably better to go with cross-compilation.

Let’s recap as a graph:

Build times on different OS configurations

Automation

All the build times mentioned so far have been building off mozilla-central without any tweaks. That’s actually not how a real Firefox release is built, which enables more optimizations. Without even going to the full set of optimizations that go into building a real Firefox Nightly (including Link Time Optimizations and Profile Guided Optimizations), simply switching --enable-release on will up the optimization game.

Building on Mozilla automation also enables things that don’t happen during what we call “local developer builds”, such as dumping debug info for use to process crash reports, linking a second libxul library for some unit tests, the full packaging of Firefox, the preparation of archives that will be used to run tests, etc.

As of writing, most Firefox builds on Mozilla automation happen on c5d.4xlarge (or similarly sized) AWS EC2 instances. They benefit from the use of a shared compilation cache, so their build times are very noisy, as they depend on the cache hit rate.

So, I took some Linux64 optimized build that runs on automation (not one with LTO or PGO), and ran its full script, with the shared compilation cache disabled, on corresponding and larger AWS EC2 instances, as well as the Threadripper:

Build times for a linux64/opt build as per automation

The immediate observation is that these builds scale much less gracefully than a “local developer build” does, and there are multiple factors that come into play, but the common thread is that parts of the build that don’t run during those “local developer builds” are slow and not well parallelized. Work is in progress to make things better, though.

Another large contributor is that with the higher level of optimizations, the compilation of Rust code takes longer. And compiling Rust doesn’t parallelize very well at the moment (for a variety of reasons, but the main one is the long sequences of crate dependencies, i.e. when A depends on B, which depends on C, which depends on D, etc.). So what happens the more cores are available, is that compiling all the C/C++ code finishes well before compiling all the Rust code does, and with the long tail of crate dependencies, what’s remaining doesn’t parallelize well. And that also blocks other portions of the build that need the compiled Rust code, and aren’t very well parallelized themselves.

All in all, it’s not all that great to go with bigger instances, because they cost more but won’t compensate by doing as many more builds in the same amount of time… except if doing more builds in parallel, which is being experimented with.

On the Threadripper 3970X, I can do 2 “local developer builds” in parallel in less time than doing them one after the other (1:20 less, so 8:40 instead of 10 minutes). Even with -j64. Heck, I can do 4 builds in parallel at -j16 in 16 minutes and 15 seconds, which is somewhere between a single -j12 and single -j8, which is where one -j16 with 8 threads and 16 threads should be at. That doesn’t seem all that useful, until you think of it this way: I can locally build for multiple platforms at once in well under 20 minutes. All on one machine, from one source tree.

Closing words

We looked at various ways build times can vary depending on different hardware (and even software) configurations. Note this only covers hardware parameters that didn’t require reboots to test out (e.g. this excludes variations like RAM speed ; and yes, this means there are ways to disable turbo, cores and threads without fiddling with the BIOS, I’ll probably write a separate blog post about this).

We saw there is room for improvements in Firefox build times, especially on automation. But even for local builds, on machines like mine, it should be possible to get under 4 minutes eventually. Without any form of caching. Which is kind of mind blowing. Even without these future improvements, these build times changed the way I approach things. I don’t mind clobber builds anymore, although ideally we’d never need them.

If you’re on the market for a new machine, I’d advise getting a powerful desktop machine (based on a 3950X, a 3960X or a 3970X) rather than refreshing your aging laptop. I don’t think any laptop currently available would get below 15 minutes build times. And those that can build in less than 20 minutes probably cost more than a desktop machine that would build in well under half that. Edit: Also, pick a fast NVMe SSD that can sustain tons of IOPS.

Of course, if you use some caching method, build times will be much better even on a slower machine, but even with a cache, it happens quite often that you get a lot of cache misses that cancel out the benefit of the cache. YMMV.

You may also have legitimate concerns that rr doesn’t work yet, but when I need it, I can pull my old Intel-based laptop. rr actually works well enough on Ryzen for single-threaded workloads, and I haven’t needed to debug Firefox itself with rr recently, so I haven’t actually pulled the old laptop a lot (although I have been using rr and Pernosco).

2020-05-28 17:02:38+0900

p.m.o | 5 Comments »

Announcing git-cinnabar 0.5.5

Please partake in the git-cinnabar survey.

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.4?

  • Updated git to 2.26.2 for the helper.
  • Improved experimental support for pushing merges.
  • Fixed a few issues with experimental support for python 3.
  • Don’t complain the helper is outdated if it’s newer.
  • Auto-enable graft when cinnabarclone contains a graft that can be fulfilled.
  • Exclude git-cinnabar notes and git-filter-branch backups from graft candidates.
  • Graft failures are more now silent.
  • Fixed handling of a null manifest.

2020-04-23 16:21:32+0900

cinnabar, p.m.o | No Comments »