Explicit rename/copy tracking vs. detection after the fact

One of the main differences in how mercurial and git track files is that mercurial does rename and copy tracking and git doesn't. So in the case of mercurial, users are expected to explicitly rename or copy the files through the mercurial command line so that mercurial knows what happened. Git simply doesn't care, and will try to detect after the fact when you ask it to.

The consequence is that my git-remote-hg, being currently a limited prototype, doesn't make the effort to inform mercurial of renames or copies.

This week, Ehsan, as a user of that tool, pushed some file moves, and subsequently opened an issue, because some people didn't like it.

It was a conscious choice on my part to make git-remote-hg public without rename/copies detection, because file renames/copies are not happening often, and can just as much not be registered by mercurial users.

In fact, they haven't all been registered for as long as Mozilla has been using mercurial (see below, I didn't actually know I was so spot on when I wrote this sentence), and people haven't been pointed at for using broken tools (and I'll skip the actual language that was used when talking about Ehsan's push).

And since I'd rather not make unsubstantiated claims, I dug in all of mozilla-central and related repositories (inbound, b2g-inbound, fx-team, aurora, beta, release, esr*) and here is what I found, only accounting files that have been copied or renamed without being further modified (so, using git diff-tree -r -C100%, and eliminating empty files), and correlating with the mercurial rename/copy metadata:

  • There have been 45069 file renames or copies in 1546 changesets.
  • Mercurial doesn't know 5482 (12.1%) of them, from 419 (27.1%) changesets.
  • 72 of those changesets were backouts.
  • 19 of those backouts were of changesets that didn't have rename/copy information, so 53 of those backouts didn't actually undo what mercurial knew of those backed out changesets.
  • Those 419 changesets were from 144 distinct authors (assuming I didn't miss some duplicates from people who changed email).
  • Fun fact, the person with colorful language, and that doesn't like git-remote-hg, is part of them. I am too, and that was with mercurial.
  • The most recent occurrence of renames/copies unknown to mercurial is already not Ehsan's anymore.
  • The oldest occurrence is in the 19th (!) mercurial changeset.

And that's not counting all the copies and renames with additional modifications.

Fun fact, this is what I found in the Mercurial mercurial repository:

  • There have been 255 file renames or copies in 41 changesets.
  • Mercurial doesn't know about 38 (14.9%) of them, from 4 (9.7%) changesets.
  • One of those changesets was from Matt Mackall himself (creator and lead developer of mercurial).

There are 1061 files in mercurial, versus 115845 in mozilla-central, so there is less occasion for renames/copies there, still, even they forget to use "hg move" and break their history as a result.

I think this shows how requiring explicit user input simply doesn't pan out.

Meanwhile, I have prototype copy/rename detection for git-remote-hg working, but I need to tweak it a little bit more before publishing.

2015-01-24 12:48:58+0900

cinnabar, p.m.o

You can leave a response, or trackback from your own site.

6 Responses to “Explicit rename/copy tracking vs. detection after the fact”

  1. Michael Kaply Says:

    Dumb question.

    If mercurial supports moving files, why are there so many cases in the Mozilla source trees of files and directories being moved and losing all their history?

    Does mercurial maintain history when files are moved?

    Or were they just moved improperly?

    The renaming of all the theme directories is a good example of this.

  2. glandium Says:

    @mkaply because as I wrote in the first paragraph, mercurial requires the user to use the mercurial command line to do the move. And expecting the users to do that just doesn’t work, because in practice, many of them forget. That was the entire point of this post.

  3. Trevor Saunders Says:

    I’m actually kind of amazed people get it right more than 70% of the time.

  4. Robert O'Callahan Says:

    It overstates the case to say “expecting users to do that just doesn’t work”. Based on your results, it works 87% of the time.

    git’s approach of mysterious parameters that you have to pass to every command (that make git really slow) also doesn’t always work in practice.

  5. glandium Says:

    @roc: you mean 73%. There are also more misses if you add non-exact renames and copies.
    There’s only one related option that “makes git really slow”, it’s –find-copies-harder (or giving multiple -C), and I doubt it would detect an interesting copy that mercurial knows about but that -C alone wouldn’t catch. I’ll get some stats about this, it should be interesting. [Update: preliminary data shows I’m probably wrong]
    Also, what is “mysterious” about -C for copy detection and -M for move detection?

  6. Steve Fink Says:

    mkaply: mercurial records the renames, but it doesn’t make use of them with the default log command. You have to pass in –follow to see the complete history, which is far slower. (So sadly, it mimics git here — you need to pass in a magic command line option that also makes it much slower.)

    I suspect it’s because mercurial’s data store records history on a per-file basis, which has always struck me as a bit unfortunate. But I don’t understand it well enough to be confident that there’s a better solution.

Leave a Reply