Explicit rename/copy tracking vs. detection after the fact
One of the main differences in how mercurial and git track files is that mercurial does rename and copy tracking and git doesn't. So in the case of mercurial, users are expected to explicitly rename or copy the files through the mercurial command line so that mercurial knows what happened. Git simply doesn't care, and will try to detect after the fact when you ask it to.
The consequence is that my git-remote-hg, being currently a limited prototype, doesn't make the effort to inform mercurial of renames or copies.
This week, Ehsan, as a user of that tool, pushed some file moves, and subsequently opened an issue, because some people didn't like it.
It was a conscious choice on my part to make git-remote-hg public without rename/copies detection, because file renames/copies are not happening often, and can just as much not be registered by mercurial users.
In fact, they haven't all been registered for as long as Mozilla has been using mercurial (see below, I didn't actually know I was so spot on when I wrote this sentence), and people haven't been pointed at for using broken tools (and I'll skip the actual language that was used when talking about Ehsan's push).
And since I'd rather not make unsubstantiated claims, I dug in all of mozilla-central and related repositories (inbound, b2g-inbound, fx-team, aurora, beta, release, esr*) and here is what I found, only accounting files that have been copied or renamed without being further modified (so, using git diff-tree -r -C100%
, and eliminating empty files), and correlating with the mercurial rename/copy metadata:
- There have been 45069 file renames or copies in 1546 changesets.
- Mercurial doesn't know 5482 (12.1%) of them, from 419 (27.1%) changesets.
- 72 of those changesets were backouts.
- 19 of those backouts were of changesets that didn't have rename/copy information, so 53 of those backouts didn't actually undo what mercurial knew of those backed out changesets.
- Those 419 changesets were from 144 distinct authors (assuming I didn't miss some duplicates from people who changed email).
- Fun fact, the person with colorful language, and that doesn't like git-remote-hg, is part of them. I am too, and that was with mercurial.
- The most recent occurrence of renames/copies unknown to mercurial is already not Ehsan's anymore.
- The oldest occurrence is in the 19th (!) mercurial changeset.
And that's not counting all the copies and renames with additional modifications.
Fun fact, this is what I found in the Mercurial mercurial repository:
- There have been 255 file renames or copies in 41 changesets.
- Mercurial doesn't know about 38 (14.9%) of them, from 4 (9.7%) changesets.
- One of those changesets was from Matt Mackall himself (creator and lead developer of mercurial).
There are 1061 files in mercurial, versus 115845 in mozilla-central, so there is less occasion for renames/copies there, still, even they forget to use "hg move" and break their history as a result.
I think this shows how requiring explicit user input simply doesn't pan out.
Meanwhile, I have prototype copy/rename detection for git-remote-hg working, but I need to tweak it a little bit more before publishing.
2015-01-24 12:48:58+0900