Archive for the 'cinnabar' Category

Looking for a new project name for git-remote-hg

If you've been following this blog, you know I've been working on a (fast) git remote helper to access mercurial without a local mercurial clone, with the main goal to make it work for Gecko developers.

The way git remote helpers work forces how their executable is named: For a foo:: remote prefix, the executable must be named git-remote-foo. So for hg::, it's git-remote-hg.

As you may know, there already exists a project with that name. And when I picked the name for this new helper, I didn't really care to find a separate name, especially considering its prototype nature.

Now that I'm satisfied enough with it that I'm close to release it with a version number (which will be 0.1.0), I'm thinking that the confusion with the other project with that name is not really helpful, and an unfortunate implementation detail.

So I'm looking for a new project name... and have no good idea.

Dear lazy web, do you have good ideas?

2015-02-03 11:07:01+0900

cinnabar, p.m.o | 15 Comments »

Explicit rename/copy tracking vs. detection after the fact

One of the main differences in how mercurial and git track files is that mercurial does rename and copy tracking and git doesn't. So in the case of mercurial, users are expected to explicitly rename or copy the files through the mercurial command line so that mercurial knows what happened. Git simply doesn't care, and will try to detect after the fact when you ask it to.

The consequence is that my git-remote-hg, being currently a limited prototype, doesn't make the effort to inform mercurial of renames or copies.

This week, Ehsan, as a user of that tool, pushed some file moves, and subsequently opened an issue, because some people didn't like it.

It was a conscious choice on my part to make git-remote-hg public without rename/copies detection, because file renames/copies are not happening often, and can just as much not be registered by mercurial users.

In fact, they haven't all been registered for as long as Mozilla has been using mercurial (see below, I didn't actually know I was so spot on when I wrote this sentence), and people haven't been pointed at for using broken tools (and I'll skip the actual language that was used when talking about Ehsan's push).

And since I'd rather not make unsubstantiated claims, I dug in all of mozilla-central and related repositories (inbound, b2g-inbound, fx-team, aurora, beta, release, esr*) and here is what I found, only accounting files that have been copied or renamed without being further modified (so, using git diff-tree -r -C100%, and eliminating empty files), and correlating with the mercurial rename/copy metadata:

  • There have been 45069 file renames or copies in 1546 changesets.
  • Mercurial doesn't know 5482 (12.1%) of them, from 419 (27.1%) changesets.
  • 72 of those changesets were backouts.
  • 19 of those backouts were of changesets that didn't have rename/copy information, so 53 of those backouts didn't actually undo what mercurial knew of those backed out changesets.
  • Those 419 changesets were from 144 distinct authors (assuming I didn't miss some duplicates from people who changed email).
  • Fun fact, the person with colorful language, and that doesn't like git-remote-hg, is part of them. I am too, and that was with mercurial.
  • The most recent occurrence of renames/copies unknown to mercurial is already not Ehsan's anymore.
  • The oldest occurrence is in the 19th (!) mercurial changeset.

And that's not counting all the copies and renames with additional modifications.

Fun fact, this is what I found in the Mercurial mercurial repository:

  • There have been 255 file renames or copies in 41 changesets.
  • Mercurial doesn't know about 38 (14.9%) of them, from 4 (9.7%) changesets.
  • One of those changesets was from Matt Mackall himself (creator and lead developer of mercurial).

There are 1061 files in mercurial, versus 115845 in mozilla-central, so there is less occasion for renames/copies there, still, even they forget to use "hg move" and break their history as a result.

I think this shows how requiring explicit user input simply doesn't pan out.

Meanwhile, I have prototype copy/rename detection for git-remote-hg working, but I need to tweak it a little bit more before publishing.

2015-01-24 12:48:58+0900

cinnabar, p.m.o | 6 Comments »

Initial support for git pushes to mercurial, early testers needed

This push to try was not created by mercurial.

I just landed initial support for pushing to mercurial from git. Considering the scary fact that it's possible to screw up a repository with bundles with missing content (and, guess what, I figured out the hard way), I have restricted it to local mercurial repositories until I am more confident.

As such, I would need volunteers to use and test it on local mercurial repositories. On top of being limited to local mercurial repositories, it doesn't support pushing merges that would have been created by git, nor does it support pushing a root commit (one with no parent).

Here's how you can use it:

$ git clone https://github.com/glandium/git-remote-hg
$ export PATH=$PATH:$(pwd)/git-remote-hg
$ git clone hg::/path/to/mercurial-repository
$ # work work, commit, commit
$ git push

[ Note: you can still pull from remote mercurial repositories ]

This will push to your local repository, where it would be useful if you could check the push didn't fuck things up.

$ cd /path/to/mercurial-repository
$ hg verify

That's the long, thorough version. You may just want to simply do this:

$ cd /path/to/mercurial-repository
$ hg log --stat

Hopefully, you won't see messages like:

abort: data/build/mozconfig.common.override.i@56d6fdb13666: no match found!

Update: You can also add the following to /path/to/mercurial-repository/.hg/hgrc, which should prevent corruptions from entering the mercurial repository at all:

[server]
validate = True

Update 2: The above setting is now unnecessary, git-remote-hg will set it itself for its push session.

Then you can push with mercurial.

$ hg push

Please note that this is integrated in git in such a way that it's possible to pass refspecs to git push and do other fancy stuff. Be aware that there are still rough edges on that part, but that your commits will be pushed, even if the resulting state under refs/remotes/ is not very consistent.

I'm planning a replay of several repositories to fully validate pushes don't send broken bundles, but it's going to take some time before I can set things up. I figured I'd rather crowdsource until then.

2014-12-18 11:56:15+0900

cinnabar, p.m.o | No Comments »

One step closer to git push to mercurial

In case you missed it, I'm working on a new tool to use mercurial remotes in git. Since my previous post, I landed several fixes making clone and pull more reliable:

  • Of 247316 unique changesets in the various mozilla-* repositories, now only two (but both in fact come from the same patch, one of the changesets being a backport to aurora of the other) are "corrupted" because their mercurial date have a timezone with a second.
  • Of 23542 unique changesets in the canonical mercurial repository, only three are "corrupted" because their raw mercurial data contains, for an unknown reason, a whitespace after the timezone.

By corrupted, here, I mean that the round-trip hg->git->hg doesn't lead to matching their sha1. They will be fixed eventually, but I haven't decided how yet, because they're really edge cases. They're old enough that they don't really matter for push anyways.

Pushing to mercurial, however, is still not there, but it's getting closer. It involves several operations:

  • Negotiating with the mercurial server what it doesn't have that we do.
  • Creating mercurial changesets, manifests and files for local git commits that were not imported from mercurial.
  • Creating a bundle of the mercurial changesets, manifests and files that we have that the server doesn't.
  • Pushing that bundle to the server.

The first step is mostly covered by the pull code, that does a similar negotiation. I now have the third step covered (although I cheated around the "corruptions" mentioned above):

$ git clone hg::http://selenic.com/hg
Cloning into 'hg'...
(...)
Checking connectivity... done.
$ cd hg
$ git hgbundle > ../hg.hg
$ mkdir ../hg2
$ cd ../hg2
$ hg init
$ hg unbundle ../hg.hg
adding changesets
adding manifests
adding file changes
added 23542 changesets with 44305 changes to 2272 files
(run 'hg update' to get a working copy)
$ hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
2272 files, 23542 changesets, 44305 total revisions

Note: that hgbundle command won't actually exist. It's just an intermediate step allowing me to work incrementally.

In case you wonder what happens when the bundle contains bad data, mercurial fortunately rejects it:

$ cd ../hg
$ git hgbundle-corrupt > ../hg.hg
$ mkdir ../hg3
$ cd ../hg3
$ hg unbundle ../hg.hg
adding changesets
transaction abort!
rollback completed
abort: integrity check failed on 00changelog.i:3180!

2014-12-16 13:54:15+0900

cinnabar, p.m.o | No Comments »

Using git to interact with mercurial repositories

I was planning to publish this later, but after talking about this project to a few people yesterday and seeing the amount of excitement in response, I took some time this morning to tie a few loose ends and publish this now. Mozillians, here comes the git revolution.

Let me start with a bit of history. I am an early git user. I've been using git almost since its first release. I like it. A lot. I've contributed dozens of patches to git.

I started using mercurial when I got commit access to Mozilla repositories, much later. I don't enjoy using mercurial much.

There are many tools to make git talk to mercurial. Most are called git-remote-hg because they use the git remote helpers infrastructure. All of them rely on having a local mercurial clone. When dealing with repositories like mozilla-central, it means storing more than 1.5GB of data just to talk to mercurial, on top of the git database.

So a few years ago, I started to toy with the idea to make git talk to mercurial directly. I got as far as being able to do a full clone of mozilla-central back then, in a reasonable amount of time. But I left it at that because I needed to figure out how to efficiently store all the metadata required to handle incremental updates/pulling, and didn't have enough incentive to go forward: working with mercurial was not painful enough.

Fast forward to the beginning of this year. The mozilla-central repository is now much bigger than it used to be, and mercurial handles it much less smoothly than it used to when Mozilla switched to using it. That was enough to get me started again, but not enough to dedicate enough time to it.

Fast forward to a few weeks ago. Gregory Szorc poked dev-platform to know what kind of workflows people were using with git to work on Mozilla code. And I was really not satisfied with the answers. First, I was wondering why no-one was mentioning the existing tools. So I picked one, and tried.

Cloning mozilla-central took 12 hours and left me with a ~10GB .git directory. Running git gc --agressive for another 10 hours (my settings may have made gc take more time than it would have with the default configuration) brought it down to about 2.6GB, only 700MB of which is actual git data, the remainder being the associated mercurial repository. And as far as I understand it, the tool doesn't really support our use of mercurial repositories, especially try (but I could be wrong, I didn't really look too much).

That was the straw that broke the camel's back. So after a couple weeks hacking, I now have something that can clone mozilla-central within 30 minutes on my machine (network transfer excluded). The resulting .git directory is around 1.5GB with the default git config, without running git gc. If you tweak the compression level in your git config, cloning takes a bit longer, and the repo takes about 1.1GB, And you can subsequently pull from mozilla-central. As well as pull from other branches without having to clone them from scratch. Push support is not there yet because it's an early prototype, but I should be able to get that to work in the next couple weeks.

At this point, you may be wondering how you can use that thing. Here it comes:

$ git clone https://github.com/glandium/git-remote-hg
$ export PATH=$PATH:$(pwd)/git-remote-hg

Note it requires having the mercurial code available to python, because git-remote-hg uses the mercurial code to talk the mercurial wire protocol. Usually, having mercurial installed is enough.

You can now clone a mercurial repository:

$ git clone hg::http://hg.mozilla.org/mozilla-central

If, like me, you had a local mercurial clone, you can do the following instead:

$ git clone hg::/path/to/mozilla-central-clone
$ git remote set-url origin hg::http://hg.mozilla.org/mozilla-central

You can then use git fetch/pull like with git repositories:

$ git pull

Now, you can add other repositories:

$ git remote add inbound hg::http://hg.mozilla.org/integration/mozilla-inbound
$ git remote update inbound

There are a few caveats, like the fact that it currently creates new remote branches essentially any time you pull something. But it shouldn't disrupt anything.

It should be noted that while the contents are identical to the gecko-dev git repositories (the git tree object sha1s are identical, I checked), the commit SHA1s are different. For two reasons: gecko-dev also contains the CVS history, and hg-git, which is used to fill it adds some mercurial metadata to commit messages that git-remote-hg doesn't add.

It is, however, possible to graft the CVS history from gecko-dev to a clone created with git-remote-hg. Assuming you have a remote for gecko-dev and fetched from it, you can do the following:

$ echo eabda6aae98d14c71d7e7b95a66896868ff9500b 3ec464b55782fb94dbbb9b5784aac141f3e3ac01 >> .git/info/grafts

Last note: please read the README file when you update your git clone of the git-remote-hg repository. As the prototype evolves, there might be things that you need to do to your existing clones, and it will be written there.

2014-12-05 20:45:10+0900

cinnabar, p.m.o | 4 Comments »