{"id":146,"date":"2007-07-07T15:32:07","date_gmt":"2007-07-07T13:32:07","guid":{"rendered":"http:\/\/web.glandium.org\/blog\/?p=146"},"modified":"2010-01-27T08:52:31","modified_gmt":"2010-01-27T07:52:31","slug":"finding-where-a-tarball-came-from-with-git","status":"publish","type":"post","link":"https:\/\/glandium.org\/blog\/?p=146","title":{"rendered":"Finding where a tarball came from with git"},"content":{"rendered":"<p>I got the idea reading <a href=\"http:\/\/lists.macosforge.org\/pipermail\/webkit-dev\/2007-July\/002098.html\">this question on the webkit-dev list<\/a>, and from my recollection of the <a href=\"http:\/\/www.kernel.org\/pub\/software\/scm\/git\/docs\/\">git documentation<\/a>. Well, for the webkit case, the basic script I put up would be of no help because the tarballs don't contain all the files (plus, they use subversion, not git, so it would also require a long importing process). Anyways...<\/p>\n<p>Considering how SHA-1 hashes of objects are created with git, it is actually pretty easy to generate  a SHA-1 hash from a random (non-git) tree, and then, the corresponding commit. First, you have <code>git-hash-object<\/code> that helps you creating a hash for a particular object (though it's also trivial to do with <code>sha1sum<\/code>). For regular files, <code>git-hash-object -t blob $filename<\/code> is enough. For symbolic links, you have to read the link destination, and give it <b>without<\/b> a trailing character (be it the NULL character or a carriage return) to <code>git-hash-object -t blob --stdin<\/code>. For directories, you have to generate a tree \"structure\" by yourself and pass it to <code>git-hash-object -t tree --stdin<\/code>. I haven't bothered looking at other file types.<\/p>\n<p>The tree structure can be guessed by either looking at <code>mktree.c<\/code> or at the output of <code>git-cat-file tree $sha1<\/code> where <code>$sha1<\/code> is the SHA-1 hash for a tree object. It contains 3 informations for each node in the tree : the file mode, with the same format as what <code>stat()<\/code> returns, except for some reason, permissions are 000 for directories and symbolic links ; the file name ; and the SHA-1 hash. These informations are written with the following format : file mode in octal ascii and no padding zero (\"%o\") followed by a space character, then the filename followed by a NULL character, and the binary form of the SHA-1 hash.<\/p>\n<p>Nodes are sorted in a not-so-quite lexical order (take a look at <code>base_name_compare<\/code> in <code>read-cache.c<\/code>) and are not separated by any special character: the mode of a file just follows the SHA-1 hash of its predecessor.<\/p>\n<p>With all this new knowledge, you should be able to write some code that would return the SHA-1 from an arbitrary directory. Okay, since you must be at least as lazy as I am, you can take the <a href=\"\/files\/git-hash-tree.pl\">script I wrote<\/a>.<\/p>\n<p>Now, let's take a look at a real life case : what commit is the latest nightly snapshot for the linux kernel from ? First, download the latest snapshot patch and its baseline, and extract the whole. Then, run my <code><a href=\"\/files\/git-hash-tree.pl\">git-hash-tree.pl<\/a><\/code> script with the directory containing the extracted kernel as an argument. It will return, after a while, the SHA-1 hash for the whole tree. During this long process, you also have plenty of time to <code>git clone<\/code> linus's tree.<\/p>\n<p>Once you're all done, you can search for the commit corresponding to the tree hash (let's call it <code>$hash<\/code>) with the following command :<\/p>\n<blockquote><p><code>git-rev-list --all | while read h; do git-cat-file commit $h | grep -q \"^tree $hash\" && echo $h && break; done<br \/>\n<\/code><\/p><\/blockquote>\n<p>If you just followed these steps, you should just have spent a great moment having no result at all. There are actually 2 things that prevent this method to properly work with the linux kernel nightlies :<\/p>\n<ul>\n<li>The snapshot patches contain a change to the top Makefile that doesn't exist in the repository. You need to remove the <code>-git<em>n<\/em><\/code> from the <code>EXTRAVERSION<\/code> variable in the Makefile.<\/li>\n<li><code>git diff<\/code> only includes diff headers for removal of empty files, so if you apply the snapshot patch with the <code>patch<\/code> utility (and you can't apply it with <code>git-apply<\/code> since you don't have a <code>.git<\/code> directory), empty files that were marked as deleted will still be on your tree. It happens with the current snapshot patch (2.6.22-rc7-git6): it doesn't remove <code>include\/asm-blackfin\/macros.h<\/code>.<\/li>\n<\/ul>\n<p>Note this is a naive method, because I haven't dedicated much time going through git documentation and code to find better ways, if there are any. Also note it's pretty much worthless to do this with the kernel nightly snapshots, since a file containing the SHA-1 hash of the corresponding commit can be found alongside the patch.<\/p>\n<p>I guess a similar method could be used with mercurial, though I could not find a documentation detailing what are the hashes calculated from (I've not searched a lot, I must say, but for git, it was just before my eyes).<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I got the idea reading this question on the webkit-dev list, and from my recollection of the git documentation. Well, for the webkit case, the basic script I put up would be of no help because the tarballs don&#8217;t contain all the files (plus, they use subversion, not git, so it would also require a [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1,5],"tags":[23],"class_list":["post-146","post","type-post","status-publish","format-standard","hentry","category-misc","category-pdo","tag-en"],"_links":{"self":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/146","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=146"}],"version-history":[{"count":1,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/146\/revisions"}],"predecessor-version":[{"id":717,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=\/wp\/v2\/posts\/146\/revisions\/717"}],"wp:attachment":[{"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=146"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/glandium.org\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}