April 7th, 2015

Announcing git-cinnabar 0.2.0

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

What's new since 0.1.1?

  • git cinnabar git2hg and git cinnabar hg2git commands that allow to translate (possibly abbreviated) git sha1s to mercurial sha1s and vice-versa.
  • A "native" helper that makes some operations faster. It is not required for git-cinnabar to work, but it can improve performance significantly. Check the Setup instructions in the README file.
  • Do not store mercurial metadata when pushing to non-publishing repositories. For Mozilla developers, this means not storing that metadata when pushing to try, which is a good thing when you know each of them makes pulling slower. This behavior can be changed if necessary. Future releases will allow to remove metadata that was created by previous releases but that wouldn't be created with 0.2.0.
  • Made the discovery phase of pushes require less round trips (the phase that finds what is common between the local and remote repositories), hopefully making pushing faster.
  • Improved logging, which now doesn't require fiddling with the code to get extra logging.
  • Made fsck validate more things, and act on more errors.
  • Fixed a few edge cases.
  • Better handle files with weird names, and that git quotes in its output.
  • Extensively tested on the following repositories: mozilla-central, mozilla-beta, mercurial, hg-git, cpython.

What to expect next?

  • Allow to push merge commits.
  • Improve memory footprint for pushes (currently, it's fairly catastrophic on big repositories ; don't try to push multiple hundreds of commits of a Mozilla-sized repository if you don't have multiple gigabytes of memory available).
  • As mentioned above, allow to remove some metadata.
  • And more...

If you want to follow the improvements more closely, I encourage you to switch to the `next` branch. I won't push anything there that hasn't been extensively tested on the above mentioned repositories.

And as always, please report any issue you run into.

2015-04-07 04:18:14+0900

cinnabar, p.m.o | No Comments »

March 13th, 2015

Announcing git-cinnabar 0.1.1

0.1.1 is a one-bugfix release fixing an issue where git-cinnabar could be confused if commands were started from a subdirectory of the repository. It might be the cause for corruptions leading to issues such as the impossibility to push.

If you do encounter failures please report them. Please also paste the output of git cinnabar fsck.

2015-03-13 02:17:25+0900

cinnabar, p.m.o | No Comments »

February 11th, 2015

Announcing git-cinnabar 0.1.0

As you may or may not know, I have been working on this project for quite some time, but actually really started the most critical parts a couple months ago. After having looked for (and chosen) a new name for what was a prototype project, it's now time for a very first release.

So what is this all about?

Cinnabar is the common natural form in which mercury can be found on Earth. It contains mercury sulfide and its powder is used to make the vermillion pigment.

What does that have to do with git?

Hint: mercury.

Git-cinnabar is a git remote helper (you can think of that as a plugin) to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Numerous such tools already exist. Where git-cinnabar stands out is that it doesn't use a local mercurial clone under the hood (unlike all the existing other such tools), and is close to an order of magnitude faster to clone a repository like mozilla-central than the git-remote-hg that used to be shipped as a contrib to git.

I won't claim it is exempt of problems and limitations, which is why it's not a 1.0. I'm however confident enough with its state to make the first "official" release.

Get it on github.

If you've been using the prototype, you can actually continue to use that clone and update it. Github conveniently keeps things working after a rename. You can update the remote url if you feel like it, though.

If you are a Gecko developer, you can take a look at a possible workflow.

2015-02-11 08:51:46+0900

cinnabar, p.m.o | 3 Comments »

February 3rd, 2015

Looking for a new project name for git-remote-hg

If you've been following this blog, you know I've been working on a (fast) git remote helper to access mercurial without a local mercurial clone, with the main goal to make it work for Gecko developers.

The way git remote helpers work forces how their executable is named: For a foo:: remote prefix, the executable must be named git-remote-foo. So for hg::, it's git-remote-hg.

As you may know, there already exists a project with that name. And when I picked the name for this new helper, I didn't really care to find a separate name, especially considering its prototype nature.

Now that I'm satisfied enough with it that I'm close to release it with a version number (which will be 0.1.0), I'm thinking that the confusion with the other project with that name is not really helpful, and an unfortunate implementation detail.

So I'm looking for a new project name... and have no good idea.

Dear lazy web, do you have good ideas?

2015-02-03 11:07:01+0900

cinnabar, p.m.o | 15 Comments »

January 24th, 2015

Explicit rename/copy tracking vs. detection after the fact

One of the main differences in how mercurial and git track files is that mercurial does rename and copy tracking and git doesn't. So in the case of mercurial, users are expected to explicitly rename or copy the files through the mercurial command line so that mercurial knows what happened. Git simply doesn't care, and will try to detect after the fact when you ask it to.

The consequence is that my git-remote-hg, being currently a limited prototype, doesn't make the effort to inform mercurial of renames or copies.

This week, Ehsan, as a user of that tool, pushed some file moves, and subsequently opened an issue, because some people didn't like it.

It was a conscious choice on my part to make git-remote-hg public without rename/copies detection, because file renames/copies are not happening often, and can just as much not be registered by mercurial users.

In fact, they haven't all been registered for as long as Mozilla has been using mercurial (see below, I didn't actually know I was so spot on when I wrote this sentence), and people haven't been pointed at for using broken tools (and I'll skip the actual language that was used when talking about Ehsan's push).

And since I'd rather not make unsubstantiated claims, I dug in all of mozilla-central and related repositories (inbound, b2g-inbound, fx-team, aurora, beta, release, esr*) and here is what I found, only accounting files that have been copied or renamed without being further modified (so, using git diff-tree -r -C100%, and eliminating empty files), and correlating with the mercurial rename/copy metadata:

  • There have been 45069 file renames or copies in 1546 changesets.
  • Mercurial doesn't know 5482 (12.1%) of them, from 419 (27.1%) changesets.
  • 72 of those changesets were backouts.
  • 19 of those backouts were of changesets that didn't have rename/copy information, so 53 of those backouts didn't actually undo what mercurial knew of those backed out changesets.
  • Those 419 changesets were from 144 distinct authors (assuming I didn't miss some duplicates from people who changed email).
  • Fun fact, the person with colorful language, and that doesn't like git-remote-hg, is part of them. I am too, and that was with mercurial.
  • The most recent occurrence of renames/copies unknown to mercurial is already not Ehsan's anymore.
  • The oldest occurrence is in the 19th (!) mercurial changeset.

And that's not counting all the copies and renames with additional modifications.

Fun fact, this is what I found in the Mercurial mercurial repository:

  • There have been 255 file renames or copies in 41 changesets.
  • Mercurial doesn't know about 38 (14.9%) of them, from 4 (9.7%) changesets.
  • One of those changesets was from Matt Mackall himself (creator and lead developer of mercurial).

There are 1061 files in mercurial, versus 115845 in mozilla-central, so there is less occasion for renames/copies there, still, even they forget to use "hg move" and break their history as a result.

I think this shows how requiring explicit user input simply doesn't pan out.

Meanwhile, I have prototype copy/rename detection for git-remote-hg working, but I need to tweak it a little bit more before publishing.

2015-01-24 12:48:58+0900

cinnabar, p.m.o | 6 Comments »

January 22nd, 2015

Fx0を購入

Two weeks ago, I went to a local au shop to get a hand on the Fx0, KDDI's LG-manufactured Firefox OS phone that was released in Japan for Christmas in a few flagship shops and on the web, and on January 6 everywhere else in Japan.

They had it on display, like any other phone.

But they didn't have any stock, so I couldn't bring one home, but ordered one.

Fast forward two days ago, the shop called to say they received it, and I went yesterday to get it.

Unboxing

Since the phone is not sold without a carrier subscription, the shop staff does the unboxing for you, to place the SIM card in the phone. But let's pretend that didn't happen.

The Fx0 comes in a gold box with a gold Firefox logo, wrapped in a white box with the characters "Fx0" embossed.

Opening the gold box, unsurprisingly, reveals the gold transparent phone.

Reading articles about this phone, opinions are divided about its look. I'm on the side that thinks it looks awesome, especially in the back. I does look bulky, probably because of its rather sharp edges, but it's not much larger than a Nexus 4. Nor is it much thicker.

One side has "au" embossed, and the other has "Fx0".

One downside of the transparent theme is that it limited the types of materials that could be used, so it sadly feels plastic to the touch. At least that's why I think it is this way.

At the bottom of its front, a single "home" button, showing the Firefox logo.

Turning it on

Well, it was already on when I first got my hands on it, but in our pretense of unboxing, let's say it was not, and that I turned it on for the first time (which, in some sense, is true). This is what it looks like when it boots:

After unlocking, the home screen appears.

I'll be trying to use it as my main (smart)phone, and see how that goes. I'll also test some of its KDDI specific features. Blog posts will come along. Stay tuned.

2015-01-22 00:25:45+0900

p.m.o | 8 Comments »

December 23rd, 2014

A new git workflow for Gecko development

[Update: git-remote-hg being a moving target, these instructions are now outdated. Please now refer to this new article on the git-cinnabar wiki that will be updated with fresh instructions.]

If you've been following this blog, you know I've been working on a new tool to allow to use git with mercurial repositories. See the previous blog posts for some detail if you don't know what I'm talking about.

Today, a new milestone has been reached. After performing tens of thousands of pushes[1] and having had no server corruption as a result, I am now confident enough with the code to remove the limitation preventing to push to remote mercurial servers (by an interesting coincidence, it's exactly the hundredth commit). Those tens of thousands of pushes allowed to find and fix a few corner-cases, but they were only affecting the client side.

So here is my recommended setup for Gecko development with git:

  • Install mercurial (only needed for its libraries). You probably already have it installed. Eventually, this dependency will go away, because the use of mercurial libraries is pretty limited.
  • If you can, rebuild git with this patch applied: https://github.com/git/git/commit/61e704e38a4c3e181403a766c5cf28814e4102e4. It is not yet in a released version of git, but it will make small fetches and pushes much faster Update: The patch is in git version 2.2.2 and newer.
  • Install this git-remote-hg. Just clone it somewhere, and put that directory in your PATH.
  • Create a git repository for Mozilla code:
    $ git init gecko
    $ cd gecko
  • Set fetch.prune for git-remote-hg to be happier:
    $ git config fetch.prune true
  • Set push.default to "upstream", which I think allows a better workflow to use topic branches and push them more easily:
    $ git config push.default upstream
  • Add remotes for the mercurial repositories you pull from:
    $ git remote add central hg::http://hg.mozilla.org/mozilla-central -t tip
    $ git remote add inbound hg::http://hg.mozilla.org/integration/mozilla-inbound -t tip
    $ git remote set-url --push inbound hg::ssh://hg.mozilla.org/integration/mozilla-inbound
    (...)
    

    -t tip is there to reduce the amount of churn from the old branches on these repositories. Please be very cautious if you use this on beta, release or esr repositories, because their tip can switch mercurial branches, and can be very confusing. I'm sure I'm not alone having pushed something on a release branch, when actually intending to push on the default branch, and that was with mercurial... Mercurial branches and their multiple heads are confusing, and git-remote-hg, while it supports them, probably makes the confusion worse at the moment. I'm still investigating how to make things better. For Mozilla integration branches, though, it works fine because their tip doesn't switch heads.

  • Setup a remote for the try server:
    $ git remote add try hg::http://hg.mozilla.org/try
    $ git config remote.try.skipDefaultUpdate true
    $ git remote set-url --push try hg::ssh://hg.mozilla.org/try
    $ git config remote.try.push +HEAD:refs/heads/tip
    
  • Update all the remotes:
    $ git remote update

    This essentially does git fetch on all remotes, except try.

With this setup, you can e.g. create new topic branches based on the remote branches:

$ git checkout -b bugxxxxxxx inbound/tip

When you're ready to test your code on try, create a commit with the try syntax, then just do:

$ git push try

This will push whatever your checked-out branch is, to the try server (thanks to the refspec in remote.try.push).

When you're ready to push to an integration branch, remove any try commit. Assuming you were using the topic branch from above, and did set push.default to "upstream", push with:

$ git checkout topic-branch
$ git pull --rebase
$ git push

But, this is only one possibility. You're essentially free to pick your own preferred workflow. Just keep in mind that we generally prefer linear history on integration branches, so prefer rebase to merge (and, git-remote-hg doesn't support pushing merges yet). I'd recommend setting the pull.ff configuration to "only", by the way.

Please note that rebasing something you pushed to e.g. try will leave dangling mercurial metadata in your git clone. git hgdebug fsck will tell you about them, but won't do anything about them, at least currently. Eventually, there will be a git-gc-like command. [Update: actually, rebasing won't leave dangling mercurial metadata because pushing creates head references in the metadata. There will be a command to clean that up, though.]

Please report any issue you encounter in the comments, or, if they are git-remote-hg related, on github.

1. In case you wonder what kind of heavy testing I did, this is roughly how it went:

  • Add support to push a root changeset (one with no parent).
  • Clone the mercurial repository and the mozilla-central repository with git-remote-hg.
  • Since pushing merges is not supported yet, flatten the history with git filter-branch --parent-filter "awk '{print \$1,\$2}'" HEAD, effectively replacing merges with "simple" commits with the entire merged content as a single patch (by the way, I should have written a fast-import script instead, it took almost a day (24 hours) to apply it to the mozilla-central clone ; filter-branch is a not-so-smart shell script, it doesn't scale well).
  • Reclone those filtered clones, such that no git-remote-hg metadata is left.
  • Tag all the commits with numbered tags, starting from 0 for the root commit, with the following command:

    git rev-list --reverse HEAD | awk '{print "reset refs/tags/HEAD-" NR - 1; print "from", $1}' | git fast-import

  • Create empty local mercurial repositories and push random tags with increasing number, with variations of the following command:

    python -c 'import random; print "\n".join(str(i) for i in sorted(random.sample(xrange(n), m)) + [n])' | while read i; do git push -f hg-remote HEAD-$i || break; done

    [ Note: push -f is only really necessary for the push of the root commit ]

  • Repeat and rinse, with different values of n, m and hg-remote.
  • Also arrange the above test to push to multiple mercurial repositories, such that pushes are performed with both commits that have already been pushed and commits that haven't. (i.e. push A to repo-X, push B to repo-Y (which doesn't have A), push C to repo-X (which doesn't have B), etc.)
  • Check that all attempts create the same mercurial changesets.
  • Check that recloning those mercurial repositories creates the same git commits.

2014-12-23 08:10:04+0900

p.m.o | 4 Comments »

December 18th, 2014

Initial support for git pushes to mercurial, early testers needed

This push to try was not created by mercurial.

I just landed initial support for pushing to mercurial from git. Considering the scary fact that it's possible to screw up a repository with bundles with missing content (and, guess what, I figured out the hard way), I have restricted it to local mercurial repositories until I am more confident.

As such, I would need volunteers to use and test it on local mercurial repositories. On top of being limited to local mercurial repositories, it doesn't support pushing merges that would have been created by git, nor does it support pushing a root commit (one with no parent).

Here's how you can use it:

$ git clone https://github.com/glandium/git-remote-hg
$ export PATH=$PATH:$(pwd)/git-remote-hg
$ git clone hg::/path/to/mercurial-repository
$ # work work, commit, commit
$ git push

[ Note: you can still pull from remote mercurial repositories ]

This will push to your local repository, where it would be useful if you could check the push didn't fuck things up.

$ cd /path/to/mercurial-repository
$ hg verify

That's the long, thorough version. You may just want to simply do this:

$ cd /path/to/mercurial-repository
$ hg log --stat

Hopefully, you won't see messages like:

abort: data/build/mozconfig.common.override.i@56d6fdb13666: no match found!

Update: You can also add the following to /path/to/mercurial-repository/.hg/hgrc, which should prevent corruptions from entering the mercurial repository at all:

[server]
validate = True

Update 2: The above setting is now unnecessary, git-remote-hg will set it itself for its push session.

Then you can push with mercurial.

$ hg push

Please note that this is integrated in git in such a way that it's possible to pass refspecs to git push and do other fancy stuff. Be aware that there are still rough edges on that part, but that your commits will be pushed, even if the resulting state under refs/remotes/ is not very consistent.

I'm planning a replay of several repositories to fully validate pushes don't send broken bundles, but it's going to take some time before I can set things up. I figured I'd rather crowdsource until then.

2014-12-18 11:56:15+0900

cinnabar, p.m.o | No Comments »

December 16th, 2014

One step closer to git push to mercurial

In case you missed it, I'm working on a new tool to use mercurial remotes in git. Since my previous post, I landed several fixes making clone and pull more reliable:

  • Of 247316 unique changesets in the various mozilla-* repositories, now only two (but both in fact come from the same patch, one of the changesets being a backport to aurora of the other) are "corrupted" because their mercurial date have a timezone with a second.
  • Of 23542 unique changesets in the canonical mercurial repository, only three are "corrupted" because their raw mercurial data contains, for an unknown reason, a whitespace after the timezone.

By corrupted, here, I mean that the round-trip hg->git->hg doesn't lead to matching their sha1. They will be fixed eventually, but I haven't decided how yet, because they're really edge cases. They're old enough that they don't really matter for push anyways.

Pushing to mercurial, however, is still not there, but it's getting closer. It involves several operations:

  • Negotiating with the mercurial server what it doesn't have that we do.
  • Creating mercurial changesets, manifests and files for local git commits that were not imported from mercurial.
  • Creating a bundle of the mercurial changesets, manifests and files that we have that the server doesn't.
  • Pushing that bundle to the server.

The first step is mostly covered by the pull code, that does a similar negotiation. I now have the third step covered (although I cheated around the "corruptions" mentioned above):

$ git clone hg::http://selenic.com/hg
Cloning into 'hg'...
(...)
Checking connectivity... done.
$ cd hg
$ git hgbundle > ../hg.hg
$ mkdir ../hg2
$ cd ../hg2
$ hg init
$ hg unbundle ../hg.hg
adding changesets
adding manifests
adding file changes
added 23542 changesets with 44305 changes to 2272 files
(run 'hg update' to get a working copy)
$ hg verify
checking changesets
checking manifests
crosschecking files in changesets and manifests
checking files
2272 files, 23542 changesets, 44305 total revisions

Note: that hgbundle command won't actually exist. It's just an intermediate step allowing me to work incrementally.

In case you wonder what happens when the bundle contains bad data, mercurial fortunately rejects it:

$ cd ../hg
$ git hgbundle-corrupt > ../hg.hg
$ mkdir ../hg3
$ cd ../hg3
$ hg unbundle ../hg.hg
adding changesets
transaction abort!
rollback completed
abort: integrity check failed on 00changelog.i:3180!

2014-12-16 13:54:15+0900

cinnabar, p.m.o | No Comments »

December 5th, 2014

Using git to interact with mercurial repositories

I was planning to publish this later, but after talking about this project to a few people yesterday and seeing the amount of excitement in response, I took some time this morning to tie a few loose ends and publish this now. Mozillians, here comes the git revolution.

Let me start with a bit of history. I am an early git user. I've been using git almost since its first release. I like it. A lot. I've contributed dozens of patches to git.

I started using mercurial when I got commit access to Mozilla repositories, much later. I don't enjoy using mercurial much.

There are many tools to make git talk to mercurial. Most are called git-remote-hg because they use the git remote helpers infrastructure. All of them rely on having a local mercurial clone. When dealing with repositories like mozilla-central, it means storing more than 1.5GB of data just to talk to mercurial, on top of the git database.

So a few years ago, I started to toy with the idea to make git talk to mercurial directly. I got as far as being able to do a full clone of mozilla-central back then, in a reasonable amount of time. But I left it at that because I needed to figure out how to efficiently store all the metadata required to handle incremental updates/pulling, and didn't have enough incentive to go forward: working with mercurial was not painful enough.

Fast forward to the beginning of this year. The mozilla-central repository is now much bigger than it used to be, and mercurial handles it much less smoothly than it used to when Mozilla switched to using it. That was enough to get me started again, but not enough to dedicate enough time to it.

Fast forward to a few weeks ago. Gregory Szorc poked dev-platform to know what kind of workflows people were using with git to work on Mozilla code. And I was really not satisfied with the answers. First, I was wondering why no-one was mentioning the existing tools. So I picked one, and tried.

Cloning mozilla-central took 12 hours and left me with a ~10GB .git directory. Running git gc --agressive for another 10 hours (my settings may have made gc take more time than it would have with the default configuration) brought it down to about 2.6GB, only 700MB of which is actual git data, the remainder being the associated mercurial repository. And as far as I understand it, the tool doesn't really support our use of mercurial repositories, especially try (but I could be wrong, I didn't really look too much).

That was the straw that broke the camel's back. So after a couple weeks hacking, I now have something that can clone mozilla-central within 30 minutes on my machine (network transfer excluded). The resulting .git directory is around 1.5GB with the default git config, without running git gc. If you tweak the compression level in your git config, cloning takes a bit longer, and the repo takes about 1.1GB, And you can subsequently pull from mozilla-central. As well as pull from other branches without having to clone them from scratch. Push support is not there yet because it's an early prototype, but I should be able to get that to work in the next couple weeks.

At this point, you may be wondering how you can use that thing. Here it comes:

$ git clone https://github.com/glandium/git-remote-hg
$ export PATH=$PATH:$(pwd)/git-remote-hg

Note it requires having the mercurial code available to python, because git-remote-hg uses the mercurial code to talk the mercurial wire protocol. Usually, having mercurial installed is enough.

You can now clone a mercurial repository:

$ git clone hg::http://hg.mozilla.org/mozilla-central

If, like me, you had a local mercurial clone, you can do the following instead:

$ git clone hg::/path/to/mozilla-central-clone
$ git remote set-url origin hg::http://hg.mozilla.org/mozilla-central

You can then use git fetch/pull like with git repositories:

$ git pull

Now, you can add other repositories:

$ git remote add inbound hg::http://hg.mozilla.org/integration/mozilla-inbound
$ git remote update inbound

There are a few caveats, like the fact that it currently creates new remote branches essentially any time you pull something. But it shouldn't disrupt anything.

It should be noted that while the contents are identical to the gecko-dev git repositories (the git tree object sha1s are identical, I checked), the commit SHA1s are different. For two reasons: gecko-dev also contains the CVS history, and hg-git, which is used to fill it adds some mercurial metadata to commit messages that git-remote-hg doesn't add.

It is, however, possible to graft the CVS history from gecko-dev to a clone created with git-remote-hg. Assuming you have a remote for gecko-dev and fetched from it, you can do the following:

$ echo eabda6aae98d14c71d7e7b95a66896868ff9500b 3ec464b55782fb94dbbb9b5784aac141f3e3ac01 >> .git/info/grafts

Last note: please read the README file when you update your git clone of the git-remote-hg repository. As the prototype evolves, there might be things that you need to do to your existing clones, and it will be written there.

2014-12-05 20:45:10+0900

cinnabar, p.m.o | 4 Comments »