Archive for the 'cinnabar' Category

Five years of git-cinnabar

On this very day five years ago, I committed the initial code of what later became git-cinnabar. It is kind of an artificial anniversary, because I didn’t actually publish anything until 3 weeks later, and I also had some prototypes months earlier.

The earlier prototypes of what I’ll call “pre-git-cinnabar” could handle doing git clone hg::https://hg.mozilla.org/mozilla-central (that is, creating a git clone of a Mercurial repository), but they couldn’t git pull later. That pre-git-cinnabar initial commit, however, was the first version that did.

The state of the art back then was similar git helpers, the most popular choice being Felipec’s git-remote-hg, or the opposite tool: hg-git, a mercurial plugin that allows to push to a git repository.

They both had the same caveats: they were slow to handle a repository the size of mozilla-central back then, and both required a local mercurial repository (hidden in the .git directory in the case of Felipec’s git-remote-hg).

This is what motivated me to work on pre-git-cinnabar, which was also named git-remote-hg back then because of how git requires a git-remote-hg executable to handle hg::-prefixed urls.

Fast forward five years, mozilla-central has grown tremendously, and another mozilla-unified repository was created that aggregates the various mozilla branches (esr*, release, beta, central, integration/*).

git-cinnabar went through multiple versions, multiple changes to the metadata it keeps, and while I actually haven’t cumulatively worked all that much on it considering the number of years, a lot of progress has been made.

But let’s go back to the 19th of November 2014. Thankfully, Mercurial allows to strip everything past a certain date, artificially allowing to restore the state of the repository at that date. Unfortunately, pre-git-cinnabar only supports the old Mercurial bundle format, which both the mozilla-central and mozilla-unified repositories now don’t allow. So pre-git-cinnabar can’t clone them out of the box anymore. It’s still possible to allow it in mirror repositories, but because they now use generaldelta, that incurs a server-side conversion that is painfully slow (the hg.mozilla.org server rejects clients that don’t support the new format for this reason).

So for testing purposes, I setup a nginx reverse-proxy and cache, such that the conversion only happens once, and performed clones multiple times, removing any bundling and conversion cost out of the equation. And tested the current version of Felipec’s git-remote-hg, the current version of hg-git, pre-git-cinnabar, and last git-cinnabar release (0.5.2 as of writing), on some AWS instances, with Xeon Platinum 8124M 3Ghz CPUs. That’s a different CPU from what I had back in 2014, yielding some different results from what I wrote in that first announcement.

I’ve thus cloned both mozilla-central (denoted m-c) and mozilla-unified (denoted m-u), with simulated old states of the repositories. Mozilla-unified didn’t exist before 2016, but it’s still interesting to simulate its content as if it had existed because it allows to learn how the tools perform with the additional branches it contains, with the implication they have on how the data is stored in the repository.

Note: I didn’t test older versions of git-remote-hg or hg-git to see how they performed at the time, and how things evolved for them.

Clone times in 2014

There are multiple things of note in the results above:

  • I wrote back then that cloning took 12 hours with git-remote-hg and 30 minutes with pre-git-cinnabar on the machine I used. And while cloning with pre-git-cinnabar on more modern hardware was much faster (16 minutes), cloning with git-remote-hg wasn’t. The pre-git-cinnabar improvement could, though, be attributed in part to improvements in git-fast-import itself (I haven’t tested older versions). But it’s also not impossible that git-remote-hg regressed. Only further testing would tell.
  • mozilla-unified is bigger than mozilla-central, because it is a superset, and that reflects on the clone times, but hg-git and pre-git-cinnabar are both much slower to clone mozilla-unified than you’d expect from the difference in size, especially hg-git. git-cinnabar made a lot of progress in that regard.
  • I hadn’t tested hg-git back then, but while it’s not as slow as git-remote-hg, it’s still horribly slow for a repository this size.

Let’s now look at the .git sizes:

.git sizes in 2014

Those are the sizes for the .git directory fresh after cloning. In all cases, git gc --aggressive would make the clone smaller, at the cost of CPU time (although not significantly smaller in the git-cinnabar case). And after you spent 12 hours cloning, are you really going to spend another large number of hours on a git gc to save disk space?

It is worth noting that in the case of hg-git, this doesn’t include the size of the mercurial repository required to maintain the git repository, while it is included for git-remote-hg, where it is hidden in .git, as mentioned earlier. That puts them about on par w.r.t size.

It’s interesting how close hg-git and git-remote-hg are in disk usage, when the former uses dulwich, a pure Python implementation of Git, and the latter uses git-fast-import. pre-git-cinnabar used git-fast-import too, but optimized the data it sent to git-fast-import to allow for a more compact .git. Recent git-cinnabar made it even better, although it doesn’t use git-fast-import directly, instead using a helper derived from git-fast-import.

But that was 2014. Let’s look at how things evolved over time, by taking “snapshots” of the repositories at one year interval, starting in November 2007.

Clone times over time

Of note:

  • pre-git-cinnabar somehow invalidated the nginx cache for years >= 2016 for mozilla-unified, which didn’t allow to get reliable measures.
  • Things went well out of hand with git-remote-hg and hg-git, so much so that I wasn’t able to get results for git-remote-hg clones for 2019 in time for this post. They’re now getting clone times that count in days rather than hours.
  • Things are getting much worse with mozilla-unified, relatively to mozilla-central, for hg-git than they do for git-remote-hg or git-cinnabar, while they were really bad with pre-git-cinnabar.
  • pre-git-cinnabar clone times for mozilla-central are indistinguishable from git-cinnabar’s at this scale (but see further below).
  • the progression is not linear, but the progression in repository size wasn’t linear either. In order to get a slightly better picture, it is better to look at the clone times vs. the size of the repositories. One measure of that size is number of objects (changeset, manifests and file revisions they contain).

Clone times over repo size

The progression here looks more linear, but still not quite linear. The difference between the mozilla-central and mozilla-unified clone times is the most damning, especially for hg-git and pre-git-cinnabar. At this scale things don’t look so bad for git-cinnabar, but looking closer, they aren’t actually much better:

Clone times over repo size, pre-git-cinnabar and git-cinnabar only

mozilla-central clone times have slightly improved since pre-git-cinnabar days, at least more than the comparison with hg-git and git-remote-hg suggested. mozilla-unified clone times, however, have dramatically improved (notwithstanding the fact that it’s currently not possible to clone with pre-git-cinnabar at all directly from hg.mozilla.org).

But clone times are starting to get a little out of hand, especially for mozilla-unified, which is why I’ve recently added support for “clone bundles”. But I also have work in progress that I expect will make non-bundle clones faster too, and hopefully more linear.

As for .git sizes:

.git sizes over repo size

  • hg-git and git-remote-hg are still hand in hand.
  • Here the progression is mostly linear, with almost no difference between mozilla-central and mozilla-unified, as one could expect.
  • I think the larger increase in size between what would be 2017 and 2018 is largely due to the testing/web-platform/meta/MANIFEST.json file.
  • People who try to clone the Mozilla repositories with hg-git or git-remote-hg at this point better have a lot of time and a lot of free disk space.

While git-cinnabar is demonstrably significantly faster than both git-remote-hg and hg-git by a large margin for the Mozilla repositories (more than an order of magnitude), looking at the data more closely revealed something interesting that can be pictured in the following graph, plotting how much slower than git-cinnabar the other tools are.

Clone time ratios against git-cinnabar

The ratio is not constant, and has surprisingly been decreasing steadily since 2016, correlating with the observation that clone times are getting slower more quickly than the repositories are growing. But they are doing more so with git-cinnabar than with the other tools. Clone times with git-cinnabar have multiplied by more than 5 in five years, for a repository that only has 2.5 times more objects. At this pace, in five more years, clones will take well above 10 hours, and that’s not counting for the accelerated slowdown. Hopefully, the current work in progress will help.

It’s also interesting to see how the ratios changed after 2011 between mozilla-central and mozilla-unified. 2011 is when Firefox 4 was released and the release process switched to multiple repositories, which mozilla-unified, well, unified in a single repository. So mozilla-unified and mozilla-central were largely identical when stripped of commits after 2011 and started diverging afterwards.

To conclude this rather long post, pre-git-cinnabar improved the state of the art to clone large Mercurial repositories, and git-cinnabar went further in the past five years. But without more work, things will get out of hand. And that only accounts for clone times. I haven’t spent much time working on other aspects, like negotiation times (the time it takes to agree with a Mercurial server what the git clone has in common with it), or bundling times (the time it takes to generate a bundle to send a Mercurial server). Both are the relevant parts of push times.

2019-11-19 18:04:26+0900

cinnabar, p.m.o | No Comments »

Announcing git-cinnabar 0.5.2

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.1?

  • Updated git to 2.22.0 for the helper.
  • cinnabarclone support is now enabled by default. See details in README.md and mercurial/cinnabarclone.py.
  • cinnabarclone now supports grafted repositories.
  • git cinnabar fsck now does incremental checks against last known good state.
  • Avoid git cinnabar sometimes thinking the helper is not up-to-date when it is.
  • Removing bookmarks on a Mercurial server is now working properly.

2019-07-01 14:17:21+0900

cinnabar, p.m.o | No Comments »

Announcing git-cinnabar 0.5.1

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.0?

  • Updated git to 2.21.0 for the helper.
  • Experimental native mercurial support (used when mercurial libraries are not available) now has feature parity.
  • Try to read the git system config from the same place as git does. This fixes native HTTPS support with Git on Windows.
  • Avoid pushing more commits than necessary in some corner cases (see e.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1529360).
  • Added an –abbrev argument for git cinnabar {git2hg,hg2git} to display shortened sha1s.
  • Can now pass multiple revisions to git cinnabar fetch.
  • Don’t require the requests python module for git cinnabar download.
  • Fixed git cinnabar fsck file checks to actually report errors.
  • Properly return an error code from git cinnabar rollback.
  • Track last fsck’ed metadata and allow git cinnabar rollback --fsck to go back to last known good metadata directly.
  • git cinnabar reclone can now be rolled back.
  • Added support for git bundles as cinnabarclone source.
  • Added alternate styles of remote refs.
  • More resilient to interruptions when HTTP Range requests are supported.
  • Fixed off-by-one when storing mercurial heads.
  • Better handling of mercurial branchmap tips.
  • Better support for end of parts in bundle v2.
  • Improvements handling urls to local mercurial repositories.
  • Fixed compatibility with (very) old mercurial servers when using mercurial 5.0 libraries.
  • Converted Continuous Integration scripts to Python 3.

2019-05-09 06:57:58+0900

cinnabar, p.m.o | No Comments »

Announcing git-cinnabar 0.5.0

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.4.0?

  • git-cinnabar-helper is now mandatory. You can either download one with git cinnabar download on supported platforms or build one with make.
  • Performance and memory consumption improvements.
  • Metadata changes require to run git cinnabar upgrade.
  • Mercurial tags are consolidated in a separate (fake) repository. See the README file.
  • Updated git to 2.18.0 for the helper.
  • Improved memory consumption and performance.
  • Improved experimental support for pushing merges.
  • Support for clonebundles for faster clones when the server provides them.
  • Removed support for the .git/hgrc file for mercurial specific configuration.
  • Support any version of Git (was previously limited to 1.8.5 minimum)
  • Git packs created by git-cinnabar are now smaller.
  • Fixed incompatibilities with Mercurial 3.4 and >= 4.4.
  • Fixed tag cache, which could lead to missing tags.
  • The prebuilt helper for Linux now works across more distributions (as long as libcurl.so.4 is present, it should work)
  • Properly support the pack.packsizelimit setting.
  • Experimental support for initial clone from a git repository containing git-cinnabar metadata.
  • Now can successfully clone the pypy and GNU octave mercurial repositories.
  • More user-friendly errors.

Development process changes

It took about 6 months between version 0.3 and 0.4. It took more than 18 months to reach version 0.5 after that. That’s a long time to wait for a new version, considering all the improvements that have happened under the hood.

From now on, the release branch will point to the last tagged release, which is roughly the same as before, but won’t be the default branch when cloning anymore.

The default branch when cloning will now be master, which will receive changes that are acceptable for dot releases (0.5.x). These include:

  • Changes in behavior that are backwards compatible (e.g. adding new options which default to the current behavior).
  • Changes that improve error handling.
  • Changes to existing experimental features, and additions of new experimental features (that require knobs to be enabled).
  • Changes to Continuous Integration/Tests.
  • Git version upgrades for the helper.

The next branch will receive changes for the next “major” release, which as of writing is planned to be 0.6.0. These include:

  • Changes in behavior.
  • Changes in metadata.
  • Stabilizing experimental features.
  • Remove backwards compability with older metadata (< 0.5.0).

2018-08-12 10:57:10+0900

cinnabar, p.m.o | No Comments »

Announcing git-cinnabar 0.5.0 beta 4

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.0 beta 3?

  • Fixed incompatibility with Mercurial 3.4.
  • Performance and memory consumption improvements.
  • Work around networking issues while downloading clone bundles from Mozilla CDN with range requests to continue past failure.
  • Miscellaneous metadata format changes.
  • The prebuilt helper for Linux now works across more distributions (as long as libcurl.so.4 is present, it should work)
  • Updated git to 2.18.0 for the helper.
  • Properly support the pack.packsizelimit setting.
  • Experimental support for initial clone from a git repository containing git-cinnabar metadata.
  • Changed the default make rule to only build the helper.
  • Now can successfully clone the pypy and GNU octave mercurial repositories.
  • More user-friendly errors.

2018-07-16 08:04:48+0900

cinnabar, p.m.o | No Comments »

Announcing git-cinnabar 0.5.0 beta 3

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.0 beta 2?

  • Fixed incompatibilities with Mercurial >= 4.4.
  • Miscellaneous metadata format changes.
  • Move more operations to the helper, hopefully making things faster.
  • Updated git to 2.17.0 for the helper.
  • Properly handle clones with bundles when the repository doesn’t contain anything newer than the bundle.
  • Fixed tag cache, which could lead to missing tags.

2018-05-19 14:26:51+0900

cinnabar, p.m.o | No Comments »

Announcing git-cinnabar 0.5.0 beta 2

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.5.0 beta 1?

  • Enabled support for clonebundles for faster clones when the server provides them.
  • Git packs created by git-cinnabar are now smaller.
  • Added a new git cinnabar upgrade command to handle metadata upgrade separately from fsck.
  • Metadata upgrade is now significantly faster.
  • git cinnabar fsck also faster.
  • Both now also use significantly less memory.
  • Updated git to 2.13.1 for git-cinnabar-helper.

2017-06-16 08:12:13+0900

cinnabar, p.m.o | No Comments »

Announcing git-cinnabar 0.5.0 beta 1

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.4.0?

  • git-cinnabar-helper is now mandatory. You can either download one with git cinnabar download on supported platforms or build one with make helper.
  • Metadata changes require to run git cinnabar fsck.
  • Mercurial tags are consolidated in a separate (fake) repository. See the README file.
  • Updated git to 2.13.0 for git-cinnabar-helper.
  • Improved memory consumption and performance.
  • Improved experimental support for pushing merges.
  • Experimental support for clonebundles.
  • Removed support for the .git/hgrc file for mercurial specific configuration.
  • Support any version of Git (was previously limited to 1.8.5 minimum)

2017-06-04 07:33:05+0900

cinnabar, p.m.o | No Comments »

git-cinnabar experimental features

Since version 0.4.0, git-cinnabar has a few hidden experimental features. Two of them are available in 0.4.0, and a third was recently added on the master branch.

The basic mechanism to enable experimental features is to set a preference in the git configuration with a comma-separated list of features to enable, or all, for all of them. That preference is cinnabar.experiments.

Any means to set a git configuration can be used. You can:

  • Add the following to .git/config:
    [cinnabar]
    experiments=feature
    
  • Or run the following command:
    $ git config cinnabar.experiments feature
    
  • Or only enable the feature temporarily for a given command:
    $ git -c cinnabar.experiments=feature command arguments
    

But what features are there?

wire

In order to talk to Mercurial repositories, git-cinnabar normally uses mercurial python modules. This experimental feature allows to access Mercurial repositories without using the mercurial python modules. It then relies on git-cinnabar-helper to connect to the repository through the mercurial wire protocol.

As of version 0.4.0, the feature is automatically enabled when Mercurial is not installed.

merge

Git-cinnabar currently doesn’t allow to push merge commits. The main reason for this is that generating the correct mercurial data for those merges is tricky, and needs to be gotten right.

In version 0.4.0, enabling this feature allows to push merge commits as long as the parent commits are available on the mercurial repository. If they aren’t, you need to first push them independently, and then push the merge.

On current master, that limitation doesn’t exist anymore ; you can just push everything in one go.

The main caveat with this experimental support for pushing merges is that it currently doesn’t handle the case where a file was moved on one of the branches the same way mercurial would (i.e. the information would be lost to mercurial users).

clonebundles

As of mercurial 3.6, Mercurial servers can opt-in to providing pre-generated bundles, which, when clients support it, takes CPU load off the server when a clone is performed. Good for servers, and usually good for clients too when they have a fast network connection, because downloading a pre-generated bundle is usually faster than waiting for the server to generate one.

As of a few days ago, the master branch of git-cinnabar supports cloning using those pre-generated bundles, provided the server advertizes them (mozilla-central does).

2017-04-02 07:54:58+0900

cinnabar, p.m.o | No Comments »

Progress on git-cinnabar memory usage

This all started when I figured out that git-cinnabar was using crazy amounts of memory when cloning mozilla-central. That pointed to memory allocation patterns that triggered a suboptimal behavior in the glibc memory allocator, and, while overall, git-cinnabar wasn’t really abusing memory all things considered, it happened to be realloc()ating way too much.

It also turned out that recent changes on the master branch had made most uses of fast-import synchronous, making the whole process significantly slower.

This is where we started from on 0.4.0:

And on the master branch as of be75326:

An interesting thing to note here is that the glibc allocator runaway memory use was, this time, more pronounced on 0.4.0 than on master. It was the opposite originally, but as I mentioned in the past ASLR is making it not happen exactly the same way each time.

While I’m here, one thing I failed to mention in the previous posts is that all these measurements were done by cloning a local mercurial clone of mozilla-central, served from localhost via HTTP to eliminate the download time from hg.mozilla.org. And while mozilla-central itself has received new changesets since the first post, the local clone has not been updated, such that all subsequent clone tests I did were cloning the exact same repository under the exact same circumstances.

After last blog post, I focused on the low hanging fruits identified so far:

  • Moving the mercurial to git SHA1 mapping to the helper process (Finding a git bug in the process).
  • Tracking mercurial manifest heads in the helper process.
  • Removing most of the synchronous calls to the helper happening during a clone.

And this is how things now look on the master branch as of 35c18e7:

So where does that put us?

  • The overall clone is now about 11 minutes faster than 0.4.0 (and about 50 minutes faster than master as of be75326!)
  • Non-shared memory use of the git-remote-hg process stays well under 2GB during the whole clone, with no spike at the end.
  • git-cinnabar-helper now uses more memory, but the sum of both processes is less than what it used to be, even when compensating for the glibc memory allocator issue. One thing to note is that while the git-cinnabar-helper memory use goes above 2GB at the end of the clone, a very large part is due to the pack window size being 1GB on 64-bits (vs. 32MB on 32-bits). Memory usage should stay well under the 2GB address space limit on a 32-bits system.
  • CPU usage is well above 100% for most of the clone.

On a more granular level:

  • The “Import manifests” phase is now 13 minutes faster than it was in 0.4.0.
  • The “Read and import files” phase is still almost 4 minutes slower than in 0.4.0.
  • The “Import changesets” phase is still almost 2 minutes slower than in 0.4.0.
  • But the “Finalization” phase is now 3 minutes faster than in 0.4.0.

What this means is that there’s still room for improvement. But at this point, I’d rather focus on other things.

Logging all the memory allocations with the python allocator disabled still resulted in a 6.5GB compressed log file, containing 2.6 billion calls to malloc, calloc, free and realloc (down from 2.7 billions in be75326). The number of allocator calls done by the git-remote-hg process is down to 2.25 billions (from 2.34 billion in be75326).

Surprisingly, while more things were moved to the helper, it still made less allocations than in be75326: 345 millions, down from 363 millions. Presumably, this is because the number of commands processed by the fast-import code was reduced.

Let’s now take a look at the various metrics we analyzed previously (the horizontal axis represents the number of allocator calls that happened before the measurement):

A few observations to make here:

  • The allocated memory (requested bytes) is well below what it was, and the spike at the end is entirely gone. It also more closely follows the amount of raw data we’re holding on to (which makes sense since most of the bookkeeping was moved to the helper)
  • The number of live allocations (allocated memory pointers that haven’t been free()d yet) has gone significantly down as well.
  • The cumulated[*] bytes are now in a much more reasonable range, with the lower bound close to the total amount of data we’re dealing with during the clone, and the upper bound slightly over twice that amount (the upper bound for the be75326 is not shown here, but it was around 45TB; less than 7TB is a big improvement).
  • There are less allocator calls during the first phases and the “Importing changesets” phase, but more during the “Reading and importing files” and “Importing manifests” phases.

[*] The upper bound is the sum of all sizes ever given to malloc, calloc, realloc etc. and the lower bound is the same, but removing the size of allocations passed as input to realloc (in practical words, this pretends reallocs never happened and that the final size for a given reallocated pointer is the one that counts)

So presumably, some of the changes led to more short-lived allocations. Considering python uses its own allocator for sizes smaller than 512 bytes, it’s probably not so much of a problem. But let’s look at the distribution of buffer sizes (including all sizes given to realloc).

(Bucket size is 16 bytes)

What is not completely obvious from the logarithmic scale is that, in fact, 98.4% of the allocations are less than 512 bytes with the current master (35c18e7), and they were 95.5% with be75326. Interestingly, though, in absolute numbers, there are less allocations smaller than 512 bytes in current master than in be75326 (1,194,268,071 vs 1,214,784,494). This suggests the extra allocations that happen during some phases are larger than that.

There are clearly less allocations across the board (apart from very few exceptions), and close to an order of magnitude less allocations larger than 1MiB. In fact, widening the bucket size to 32KiB shows an order of magnitude difference (or close) for most buckets:

An interesting thing to note is how some sizes are largely overrepresented in the data with buckets of 16 bytes, like 768, 1104, 2048, 4128, with other smaller bumps for e.g. 2144, 2464, 2832, 3232, 3696, 4208, 4786, 5424, 6144, 6992, 7920… While some of those are powers of 2, most aren’t, and some of them may actually represent objects sized with a power of 2, but that have an extra PyObject overhead.

While looking at allocation stats, I got to wonder what the lifetimes of those allocations looked like. So I scanned the allocator logs and measured the distance between when an allocation is made and when it is freed, ignoring reallocs.

To give a few examples of what I mean, the following allocation for p gets a lifetime of 0:

void *p = malloc(42);
free(p);

The following a lifetime of 1:

void *p = malloc(42);
void *other = malloc(42);
free(p);

And the following a lifetime of 1 as well:

void *p = malloc(42);
p = realloc(p, 84);
free(p);

(that is, it is not counted as two malloc/free pairs)

The further away the free is from the corresponding malloc, the larger the lifetime. And the largest the lifetime can ever be is the total number of allocator function calls minus two, in the hypothetical case the very first allocation is freed as the very last (minus two because we defined the lifetime as the distance).

What comes out of this data:

  • As expected, there are more short-lived allocations in 35c18e7.
  • Around 90% of allocations have a lifetime spanning 10% of the process life or less. This is a rather surprisingly large amount of allocations with a very large lifetime.
  • Around 80% of allocations have a lifetime spanning 0.01% of the process life or less.
  • The median lifetime is around 0.0000002% (2*10-7%) of the process life, which, in absolute terms is around 500 allocator function calls between a malloc and a free.
  • If we consider every imported changeset, manifest and file to require a similar number of allocations, and considering there are about 2.7M of them in total, each spans about 3.7*10-7%. About 53% of all allocations on be75326 and 57% on 35c18e7 have a lifetime below that. Whenever I get to look more closely to memory usage again, I’ll probably look at the data separately for each individual phase.
  • One surprising fact, that doesn’t appear on the graph because of the logarithmic scale not showing “0” on the horizontal axis, is that 9.3% on be75326 and 7.3% on 35c18e7 of all allocations have a lifetime of 0. That is, whatever the code using them is doing, it’s not allocating or freeing anything else, and not reallocating them either.

All in all, what the data shows is that we’re definitely in a better place now than we used to be a few days ago, and that there is still work to do on the memory front, but:

  • As mentioned in a previous post, there are bigger wins to be had from not keeping manifests data around in memory at all, and by importing it directly instead.
  • In time, a lot of the import code is meant to move to the helper, where the constraints are completely different, and it might not be worth spending time now on reducing the memory usage of python code that might go away soon(ish). The situation was bad and necessitated action rather quickly, but we’re now in a place where it’s not as bad anymore.

So at this point, I won’t look any deeper into the memory usage of the git-remote-hg python process, and will instead focus on the planned metadata storage changes. They will allow to share the metadata more easily (allowing faster and more straightforward gecko-dev graft), and will allow to import manifests earlier, which, as mentioned already, will help reduce memory use, but, more importantly, will allow to do more actual work while downloading the data. On slow networks, this is crucial to make clones and pulls faster.

2017-04-01 18:45:19+0900

cinnabar, p.m.o | No Comments »