Apart from the memory considerations, one thing that the data presented in the “When the memory allocator works against you” post that I haven’t touched in the followup posts is that there is a large difference in the time it takes to clone mozilla-central with git-cinnabar 0.4.0 vs. the master branch.
One thing that was mentioned in the first followup is that reducing the amount of
realloc and substring copies made the cloning more than 15 minutes faster on master. But the same code exists in 0.4.0, so this isn’t part of the difference.
So what’s going on? Looking at the CPU usage during the clone is enlightening.
(Note: the data gathering is flawed in some ways, which explains why the
git-remote-hg process goes above 100%, which is not possible for this python process. The data is however good enough for the high level analysis that follows, so I didn’t bother to get something more acurate)
On 0.4.0, the
git-cinnabar-helper process was saturating one CPU core during the File import phase, and the
git-remote-hg process was saturating one CPU core during the Manifest import phase. Overall, the sum of both processes usually used more than one and a half core.
On master, however, the total of both processes barely uses more than one CPU core.
Essentially, before those changes,
git-remote-hg would send instructions to
git-cinnabar-helper, but in this case it’s only used as a wrapper for
git-fast-import), and use marks to track the git objects that
After those changes,
git-fast-import the git object SHA1 of objects it just asked to be created. In other words, those changes replaced something asynchronous with something synchronous: while it used to be possible for
git-remote-hg to work on the next file/manifest/changeset while
git-fast-import was working on the previous one, it now waits.
The changes helped simplify the python code, but made the overall clone process much slower.
If I’m not mistaken, the only real use for that information is for the mapping of mercurial to git SHA1s, which is actually rarely used during the clone, except at the end, when storing it. So what I’m planning to do is to move that mapping to the
git-cinnabar-helper process, which, incidentally, will kill not 2, but 3 birds with 1 stone:
- It will restore the asynchronicity, obviously (at least, that’s the expected main outcome).
- Storing the mapping in the
git-cinnabar-helperprocess is very likely to take less memory than what it currently takes in the
git-remote-hgprocess. Even if it doesn’t (which I doubt), that should still help stay under the 2GB limit of 32-bit processes.
- The whole thing that spikes memory usage during the finalization phase, as seen in previous post, will just go away, because the
git-cinnabar-helperprocess will just have prepared the git notes-like tree on its own.
So expect git-cinnabar 0.5 to get moar faster, and to use moar less memory.