I got the idea reading this question on the webkit-dev list, and from my recollection of the git documentation. Well, for the webkit case, the basic script I put up would be of no help because the tarballs don't contain all the files (plus, they use subversion, not git, so it would also require a long importing process). Anyways...
Considering how SHA-1 hashes of objects are created with git, it is actually pretty easy to generate a SHA-1 hash from a random (non-git) tree, and then, the corresponding commit. First, you have git-hash-object
that helps you creating a hash for a particular object (though it's also trivial to do with sha1sum
). For regular files, git-hash-object -t blob $filename
is enough. For symbolic links, you have to read the link destination, and give it without a trailing character (be it the NULL character or a carriage return) to git-hash-object -t blob --stdin
. For directories, you have to generate a tree "structure" by yourself and pass it to git-hash-object -t tree --stdin
. I haven't bothered looking at other file types.
The tree structure can be guessed by either looking at mktree.c
or at the output of git-cat-file tree $sha1
where $sha1
is the SHA-1 hash for a tree object. It contains 3 informations for each node in the tree : the file mode, with the same format as what stat()
returns, except for some reason, permissions are 000 for directories and symbolic links ; the file name ; and the SHA-1 hash. These informations are written with the following format : file mode in octal ascii and no padding zero ("%o") followed by a space character, then the filename followed by a NULL character, and the binary form of the SHA-1 hash.
Nodes are sorted in a not-so-quite lexical order (take a look at base_name_compare
in read-cache.c
) and are not separated by any special character: the mode of a file just follows the SHA-1 hash of its predecessor.
With all this new knowledge, you should be able to write some code that would return the SHA-1 from an arbitrary directory. Okay, since you must be at least as lazy as I am, you can take the script I wrote.
Now, let's take a look at a real life case : what commit is the latest nightly snapshot for the linux kernel from ? First, download the latest snapshot patch and its baseline, and extract the whole. Then, run my git-hash-tree.pl
script with the directory containing the extracted kernel as an argument. It will return, after a while, the SHA-1 hash for the whole tree. During this long process, you also have plenty of time to git clone
linus's tree.
Once you're all done, you can search for the commit corresponding to the tree hash (let's call it $hash
) with the following command :
git-rev-list --all | while read h; do git-cat-file commit $h | grep -q "^tree $hash" && echo $h && break; done
If you just followed these steps, you should just have spent a great moment having no result at all. There are actually 2 things that prevent this method to properly work with the linux kernel nightlies :
- The snapshot patches contain a change to the top Makefile that doesn't exist in the repository. You need to remove the
-gitn
from the EXTRAVERSION
variable in the Makefile.
git diff
only includes diff headers for removal of empty files, so if you apply the snapshot patch with the patch
utility (and you can't apply it with git-apply
since you don't have a .git
directory), empty files that were marked as deleted will still be on your tree. It happens with the current snapshot patch (2.6.22-rc7-git6): it doesn't remove include/asm-blackfin/macros.h
.
Note this is a naive method, because I haven't dedicated much time going through git documentation and code to find better ways, if there are any. Also note it's pretty much worthless to do this with the kernel nightly snapshots, since a file containing the SHA-1 hash of the corresponding commit can be found alongside the patch.
I guess a similar method could be used with mercurial, though I could not find a documentation detailing what are the hashes calculated from (I've not searched a lot, I must say, but for git, it was just before my eyes).