Archive for the 'p.m.o' Category

Announcing git-cinnabar 0.4.0 beta 2

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.4.0b1?

  • Some more bug fixes.
  • Updated git to 2.9.2 for cinnabar-helper.
  • Now supports `git push –dry-run`.
  • Added a new `git cinnabar fetch` command to fetch a specific revision that is not necessarily a head.
  • Some improvements to the experimental native wire protocol support.

2016-07-25 17:38:17+0900

cinnabar, p.m.o | No Comments »

Are all integer overflows equal?

Background: I’ve been relearning Rust (more about that in a separate post, some time later), and in doing so, I chose to implement the low-level parts of git (I’ll touch the why in that separate post I just promised).

Disclaimer: It’s friday. This is not entirely(?) a serious post.

So, I was looking at Documentation/technical/index-format.txt, and saw:

32-bit number of index entries.

What? The index/staging area can’t handle more than ~4.3 billion files?

There I was, writing Rust code to write out the index.

try!(out.write_u32::<NetworkOrder>(self.entries.len()));

(For people familiar with the byteorder crate and wondering what NetworkOrder is, I have a use byteorder::BigEndian as NetworkOrder)

And the Rust compiler rightfully barfed:

error: mismatched types:
 expected `u32`,
    found `usize` [E0308]

And there I was, wondering: “mmmm should I just add as u32 and silently truncate or … hey what does git do?”

And it turns out, git uses an unsigned int to track the number of entries in the first place, so there is no truncation happening.

Then I thought “but what happens when cache_nr reaches the max?”

Well, it turns out there’s only one obvious place where the field is incremented.

What? Holy coffin nails, Batman! No overflow check?

Wait a second, look 3 lines above that:

ALLOC_GROW(istate->cache, istate->cache_nr + 1, istate->cache_alloc);

Yeah, obviously, if you’re incrementing cache_nr, you already have that many entries in memory. So, how big would that array be?

        struct cache_entry **cache;

So it’s an array of pointers, assuming 64-bits pointers, that’s … ~34.3 GB. But, all those cache_nr entries are in memory too. How big is a cache entry?

struct cache_entry {
        struct hashmap_entry ent;
        struct stat_data ce_stat_data;
        unsigned int ce_mode;
        unsigned int ce_flags;
        unsigned int ce_namelen;
        unsigned int index;     /* for link extension */
        unsigned char sha1[20];
        char name[FLEX_ARRAY]; /* more */
};

So, 4 ints, 20 bytes, and as many bytes as necessary to hold a path. And two inline structs. How big are they?


struct hashmap_entry {
        struct hashmap_entry *next;
        unsigned int hash;
};

struct stat_data {
        struct cache_time sd_ctime;
        struct cache_time sd_mtime;
        unsigned int sd_dev;
        unsigned int sd_ino;
        unsigned int sd_uid;
        unsigned int sd_gid;
        unsigned int sd_size;
};

Woohoo, nested structs.

struct cache_time {
        uint32_t sec;
        uint32_t nsec;
};

So all in all, we’re looking at 1 + 2 + 2 + 5 + 4 32-bit integers, 1 64-bits pointer, 2 32-bits padding, 20 bytes of sha1, for a total of 92 bytes, not counting the variable size for file paths.

The average path length in mozilla-central, which only has slightly over 140 thousands of them, is 59 (including the terminal NUL character).

Let’s conservatively assume our crazy repository would have the same average, making the average cache entry 151 bytes.

But memory allocators usually allocate more than requested. In this particular case, with the default allocator on GNU/Linux, it’s 156 (weirdly enough, it’s 152 on my machine).

156 times 4.3 billion… 670 GB. Plus the 34.3 from the array of pointers: 704.3 GB. Of RAM. Not counting the memory allocator overhead of handling that. Or all the other things git might have in memory as well (which apparently involves a hashmap, too, but I won’t look at that, I promise).

I think one would have run out of memory before hitting that integer overflow.

Interestingly, looking at Documentation/technical/index-format.txt again, the on-disk format appears smaller, with 62 bytes per file instead of 92, so the corresponding index file would be smaller. (And in version 4, paths are prefix-compressed, so paths would be smaller too).

But having an index that large supposes those files are checked out. So let’s say I have an empty ext4 file system as large as possible (which I’m told is 2^60 bytes (1.15 billion gigabytes)). Creating a small empty ext4 tells me at least 10 inodes are allocated by default. I seem to remember there’s at least one reserved for the journal, there’s the top-level directory, and there’s lost+found ; there apparently are more. Obviously, on that very large file system, We’d have a git repository. git init with an empty template creates 9 files and directories, so that’s 19 more inodes taken. But git init doesn’t create an index, and doesn’t have any objects. We’d thus have at least one file for our hundreds of gigabyte index, and at least 2 who-knows-how-big files for the objects (a pack and its index). How many inodes does that leave us with?

The Linux kernel source tells us the number of inodes in an ext4 file system is stored in a 32-bits integer.

So all in all, if we had an empty very large file system, we’d only be able to store, at best, 2^32 – 22 files… And we wouldn’t even be able to get cache_nr to overflow.

… while following the rules. Because the index can keep files that have been removed, it is actually possible to fill the index without filling the file system. After hours (days? months? years? decades?*) of running

seq 0 4294967296 | while read i; do touch $i; git update-index --add $i; rm $i; done

One should be able to reach the integer overflow. But that’d still require hundreds of gigabytes of disk space and even more RAM.

  • At the rate it was possible to add files to the index when I tried (yeah, I tried), for a few minutes, and assuming a constant rate, the estimate is close to 2 years. But the time spent reading and writing the index file increases linearly with its size, so the longer it’d run, the longer it’d take.

Ok, it’s actually much faster to do it hundreds of thousand files at a time, with something like:

seq 0 100000 4294967296 | while read i; do j=$(seq $i $(($i + 99999))); touch $j; git update-index --add $j; rm $j; done

At the rate the first million files were added, still assuming a constant rate, it would take about a month on my machine. Considering reading/writing a list of a million files is a thousand times faster than reading a list of a billion files, assuming linear increase, we’re still talking about decades, and plentiful RAM. Fun fact: after leaving it run for 5 times as much as it had run for the first million files, it hasn’t even done half more…

One could generate the necessary hundreds-of-gigabytes index manually, that wouldn’t be too hard, and assuming it could be done at about 1 GB/s on a good machine with a good SSD, we’d be able to craft a close-to-explosion index within a few minutes. But we’d still lack the RAM to load it.

So, here is the open question: should I report that integer overflow?

Wow, that was some serious procrastination.

Edit: Epilogue: Actually, oops, there is a separate integer overflow on the reading side that can trigger a buffer overflow, that doesn’t actually require a large index, just a crafted header, demonstrating that yes, not all integer overflows are equal.

2016-07-08 13:18:34+0900

p.d.o, p.m.o | 1 Comment »

Announcing git-cinnabar 0.4.0 beta 1

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

What’s new since 0.3.2?

  • Various bug fixes.
  • Updated git to 2.9 for cinnabar-helper.
  • Now supports bundle2 for both fetch/clone and push (https://www.mercurial-scm.org/wiki/BundleFormat2).
  • Now Supports git credential for HTTP authentication.
  • Removed upgrade path from repositories used with version < 0.3.0.
  • Experimental (and partial) support for using git-cinnabar without having mercurial installed.
  • Use a mercurial subprocess to access local mercurial repositories.
  • Cinnabar-helper now handles fast-import, with workarounds for performance issues on macOS.

2016-07-04 12:46:15+0900

cinnabar, p.m.o | No Comments »

Using git to access mercurial repositories, without mercurial

If you’ve been following this blog, you know I’ve been working on a git remote helper that gives access to mercurial repositories, named git-cinnabar. So far, it has been using libraries from mercurial itself in order to talk to local or remote repositories.

That is, until today. The current master branch now has experimental support for direct access to remote mercurial repositories, without mercurial.

2016-05-10 17:45:35+0900

cinnabar, p.m.o | 3 Comments »

Announcing git-cinnabar 0.3.2

Git-cinnabar is a git remote helper to interact with mercurial repositories. It allows to clone, pull and push from/to mercurial remote repositories, using git.

Get it on github.

These release notes are also available on the git-cinnabar wiki.

This is mostly a bug and regression-fixing release.

What’s new since 0.3.1?

  • Fixed a performance regression when cloning big repositories on OSX.
  • git configuration items with line breaks are now supported.
  • Fixed a number of issues with corner cases in mercurial data (such as, but not limited to nodes with no first parent, malformed .hgtags, etc.)
  • Fixed a stack overflow, a buffer overflow and a use-after free in cinnabar-helper.
  • Better work with git worktrees, or when called from subdirectories.
  • Updated git to 2.7.4 for cinnabar-helper.
  • Properly remove all refs meant to be removed when using git version lower than 2.1.

2016-04-28 00:42:56+0900

cinnabar, p.m.o | No Comments »

RIP Iceweasel, 13 Nov 2006 – 10 Mar 2016

This took longer than it should have, but a page is now officially turned. I uploaded Firefox and Firefox ESR to Debian unstable. They will have to go through the Debian NEW queue because they are new source packages, so won’t be immediately available, but they should arrive soon enough.

People using Iceweasel from Debian unstable will be upgraded to Firefox ESR.

Debian stable will receive Firefox ESR after Iceweasel/Firefox ESR38 is end-of-lifed, in about 3 months.

Thanks go to Sylvestre Ledru, Mike Connor (the same who filed bug 354622) and Stefano Zacchiroli.

2016-03-10 13:36:28+0900

p.d.o, p.m.o | 19 Comments »

SSH through jump hosts, revisited

Close to 7 years ago, I wrote about SSH through jump hosts. Twice. While the method used back then still works, Openssh has grown an new option in version 5.3 that allows it to be simplified a bit, by not using nc.

So here is an updated rule, version 2016:

Host *+*
ProxyCommand ssh -W $(echo %h | sed 's/^.*+//;s/^\([^:]*$\)/\1:22/') $(echo %h | sed 's/+[^+]*$//;s/\([^+%%]*\)%%\([^+]*\)$/\2 -l \1/;s/:\([^:+]*\)$/ -p \1/')

The syntax you can use to connect through jump hosts hasn’t changed compared to previous blog posts:

  • With one jump host:
    $ ssh login1%host1:port1+host2:port2 -l login2
  • With two jump hosts:
    $ ssh login1%host1:port1+login2%host2:port2+host3:port3 -l login3
  • With three jump hosts:
    $ ssh login1%host1:port1+login2%host2:port2+login3%host3:port3+host4:port4 -l login4
  • etc.

Logins and ports can be omitted.

Update: Add missing port to -W flag when one is not given.

2016-02-08 00:26:53+0900

p.d.o, p.m.o | 5 Comments »

Going beyond NS_ProcessNextEvent

If you’ve been debugging Gecko, you’ve probably hit the frustration of having the code you’re inspecting being called asynchronously, and your stack trace rooting through NS_ProcessNextEvent, which means you don’t know at first glance how your code ended up being called in the first place.

Events running from the Gecko event loop are all nsRunnable instances. So at some level close to NS_ProcessNextEvent, in your backtrace, you will see Class::Run. If you’re lucky, you can find where the nsRunnable was created. But that requires the stars to be perfectly aligned. In many cases, they’re not.

There comes your savior: rr. If you don’t know it, check it out. The downside is that you must first rr record a Firefox session doing what you’re debugging. Then, rr replay will give you a debugger with the capabilities of a time machine.

Note, I’m kind of jinxed, I don’t do much C++ debugging these days, so every time I use rr replay, I end up hitting a new error. Tip #1: try again with rr’s current master. Tip #2: roc is very helpful. But my takeaway is that it’s well worth the trouble. It is a game changer for debugging.

Anyways, once you’re in rr replay and have hit your crasher or whatever execution path you’re interested in, and you want to go beyond that NS_ProcessNextEvent, here is what you can do:

(rr) break nsEventQueue.cpp:60
(rr) reverse-continue

(Adjust the line number to match wherever the *aResult = mHead->mEvents[mOffsetHead++]; line is in your tree).

(rr) disable
(rr) watch -l mHead->mEvents[mOffsetHead]
(rr) reverse-continue
(rr) disable

And there you are, you just found where the exact event that triggered the executed code you were looking at was put on the event queue. (assuming there isn’t a nested event loop processed during the first reverse-continue)

Rinse and repeat.

2016-02-05 07:19:41+0900

p.m.o | 1 Comment »

Enabling TLS on this blog

Long overdue, I finally enabled TLS on this blog. It went almost like a breeze.

I used simp_le to get the certificate from Let’s Encrypt, along Mozilla’s Web Server Configuration generator. SSL Labs now reports a rating of A+.

I just had a few issues:

  • I had some hard-coded http:// links in my wordpress theme, that needed changes,
  • Since my wordpress instance is reverse-proxied and the real server not behind HTTPS, I had to adjust the wordpress configuration so that it doesn’t do an infinite redirect loop,
  • Nginx’s config for multiple virtualhosts needs SSL configuration to be repeated. Fortunately, one can use include statements,
  • Contrary to the suggested configuration, setting ssl_session_tickets off; makes browsers unhappy (at least, it made my Firefox unhappy, with a SSL_ERROR_RX_UNEXPECTED_NEW_SESSION_TICKET error message).

I’m glad that there are tools helping to get a proper configuration of SSL. It is sad, though, that the defaults are not better and that we still need to tweak at all. Setting where the certificate and the private key files are should, in 2016, be the only thing to do to have a secure web server.

2016-01-30 07:22:42+0900

p.d.o, p.m.o, website | 1 Comment »

Announcing git-cinnabar 0.3.1

This is a brown paper bag release. It turns out I managed to break the upgrade path only 10 commits before the release.

What’s new since 0.3.0?

  • git cinnabar fsck doesn’t fail to upgrade metadata.
  • The remote.$remote.cinnabar-draft config works again.
  • Don’t fail to clone an empty repository.
  • Allow to specify mercurial configuration items in a .git/hgrc file.

2016-01-16 12:26:45+0900

cinnabar, p.m.o | No Comments »