I was planning to publish this later, but after talking about this project to a few people yesterday and seeing the amount of excitement in response, I took some time this morning to tie a few loose ends and publish this now. Mozillians, here comes the git revolution.
Let me start with a bit of history. I am an early git user. I’ve been using git almost since its first release. I like it. A lot. I’ve contributed dozens of patches to git.
I started using mercurial when I got commit access to Mozilla repositories, much later. I don’t enjoy using mercurial much.
There are many tools to make git talk to mercurial. Most are called git-remote-hg because they use the git remote helpers infrastructure. All of them rely on having a local mercurial clone. When dealing with repositories like mozilla-central, it means storing more than 1.5GB of data just to talk to mercurial, on top of the git database.
So a few years ago, I started to toy with the idea to make git talk to mercurial directly. I got as far as being able to do a full clone of mozilla-central back then, in a reasonable amount of time. But I left it at that because I needed to figure out how to efficiently store all the metadata required to handle incremental updates/pulling, and didn’t have enough incentive to go forward: working with mercurial was not painful enough.
Fast forward to the beginning of this year. The mozilla-central repository is now much bigger than it used to be, and mercurial handles it much less smoothly than it used to when Mozilla switched to using it. That was enough to get me started again, but not enough to dedicate enough time to it.
Fast forward to a few weeks ago. Gregory Szorc poked dev-platform to know what kind of workflows people were using with git to work on Mozilla code. And I was really not satisfied with the answers. First, I was wondering why no-one was mentioning the existing tools. So I picked one, and tried.
Cloning mozilla-central took 12 hours and left me with a ~10GB .git directory. Running git gc –agressive for another 10 hours (my settings may have made gc take more time than it would have with the default configuration) brought it down to about 2.6GB, only 700MB of which is actual git data, the remainder being the associated mercurial repository. And as far as I understand it, the tool doesn’t really support our use of mercurial repositories, especially try (but I could be wrong, I didn’t really look too much).
That was the straw that broke the camel’s back. So after a couple weeks hacking, I now have something that can clone mozilla-central within 30 minutes on my machine (network transfer excluded). The resulting .git directory is around 1.5GB with the default git config, without running git gc. If you tweak the compression level in your git config, cloning takes a bit longer, and the repo takes about 1.1GB, And you can subsequently pull from mozilla-central. As well as pull from other branches without having to clone them from scratch. Push support is not there yet because it’s an early prototype, but I should be able to get that to work in the next couple weeks.
At this point, you may be wondering how you can use that thing. Here it comes:
$ git clone https://github.com/glandium/git-remote-hg $ export PATH=$PATH:$(pwd)/git-remote-hg
Note it requires having the mercurial code available to python, because git-remote-hg uses the mercurial code to talk the mercurial wire protocol. Usually, having mercurial installed is enough.
You can now clone a mercurial repository:
$ git clone hg::http://hg.mozilla.org/mozilla-central
If, like me, you had a local mercurial clone, you can do the following instead:
$ git clone hg::/path/to/mozilla-central-clone $ git remote set-url origin hg::http://hg.mozilla.org/mozilla-central
You can then use git fetch/pull like with git repositories:
$ git pull
Now, you can add other repositories:
$ git remote add inbound hg::http://hg.mozilla.org/integration/mozilla-inbound $ git remote update inbound
There are a few caveats, like the fact that it currently creates new remote branches essentially any time you pull something. But it shouldn’t disrupt anything.
It should be noted that while the contents are identical to the gecko-dev git repositories (the git tree object sha1s are identical, I checked), the commit SHA1s are different. For two reasons: gecko-dev also contains the CVS history, and hg-git, which is used to fill it adds some mercurial metadata to commit messages that git-remote-hg doesn’t add.
It is, however, possible to graft the CVS history from gecko-dev to a clone created with git-remote-hg. Assuming you have a remote for gecko-dev and fetched from it, you can do the following:
$ echo eabda6aae98d14c71d7e7b95a66896868ff9500b 3ec464b55782fb94dbbb9b5784aac141f3e3ac01 >> .git/info/grafts
Last note: please read the README file when you update your git clone of the git-remote-hg repository. As the prototype evolves, there might be things that you need to do to your existing clones, and it will be written there.