[Lxr-general] [PATCH] Preliminary GIT support
Brought to you by:
ajlittoz
From: Jan-Benedict G. <jb...@lu...> - 2006-03-26 12:19:18
|
Hi! This code is not yet ready for use. It contains known bugs and other hidden goodies. However we'd like to get feedback on it, since we're both no Perl experts. Of most interest: * Given that a called program outputs a single line containing only a single word ("tag" or "7638" or "commit", without the quotes, but containing the needed \n for proper display), what is the easiest way to get this into a variable and check if the program exited cleanly (exit(0)) or found a problem (exit !=3D 0)? * Given that a program outputs numerous lines of which the first lines have a strict syntax (and the following are free-form text), what's the easiest way to get one of the lines with known formating? Example: -------------------------------------------------------------------- $ git-cat-file commit 79f558a5fc1c471e5db926a1272fe930f24784bb tree 6dd43a8262e61bf2cea75529454f5bc86e57686d parent bdaa085f8c33e75cf477ff6b4292f35c9c5f4c22 parent cb9594e28c940d2bbf4d7fb69c337d27155da37a author Junio C Hamano <ju...@co...> 1143186547 -0800 committer Junio C Hamano <ju...@co...> 1143186547 -0800 Merge branch 'jc/cvsimport' * jc/cvsimport: cvsimport: fix reading from rev-parse cvsimport: honor -i and non -i upon subsequent imports -------------------------------------------------------------------- "tree" is the new tree which is the result of this commit. There's exactly one tree, except in cases where the following free-form text also contains 'tree'. "parents" is/are the previous commit(s). There's one parent for a simple commit and probably multiple in cases of merges. (For LXR, parents are basically uninteresting.) "author" is the author. "committer" is the person who introduced the patch/merge into the SCM. So I'd like to ask for a "translation" of this shell excerpt to Perl: if ! OUTPUT=3D"`git-cat-file commit 79f558a5fc1c471e5db926a1272fe930f24784= bb`"; then echo "something unexpected happend" >&2 exit 1 fi AUTHOR_LINE=3D"`echo "${OUTPUT}" | grep '^author ' | head -1`" AUTHOR_EMAIL=3D"`echo "${AUTHOR_LINE}" | cut -f 2 -d '<' | cut -f 1 -d '>'= `" It seems to be quite easy getting AUTHOR_EMAIL (from the example above), but getting the first '^author ' line seems to be a bit more interesting. As mentioned, we're no Perl experts... MfG, JBG 2006-03-26 Maximilian Wilhelm <ma...@rf...> Jan-Benedict Glaw <jb...@lu...> * lib/LXR/Files.pm (LXR::Files->new()): Regognize "git:/" repositories. * templates/lxr.conf (sourceroot): Document how to configure a GIR repository holding the revision history. (ignoredirs): Add ".git" to the list of ignored directories. Also update the documentation a bit. * lib/LXR/Files/GIT.pm: New file containing a _preliminary_ and _known_buggy_ GIT backend for LXR. --- a/lib/LXR/Files.pm 2 Nov 2005 23:39:55 -0000 1.9 +++ b/lib/LXR/Files.pm 26 Mar 2006 11:01:08 -0000 @@ -36,6 +36,11 @@ sub new { $srcroot =3D $1; $files =3D new LXR::Files::BK($srcroot, $params); } + elsif ( $srcroot =3D~ /^git:(.*)/i ) { + require LXR::Files::GIT; + $srcroot =3D $1; + $files =3D new LXR::Files::GIT($srcroot, $params); + } else { require LXR::Files::Plain; $files =3D new LXR::Files::Plain($srcroot); --- /dev/null 2006-03-14 17:36:51.000000000 +0100 +++ b/lib/LXR/Files/GIT.pm 2006-03-25 23:23:46.000000000 +0100 @@ -0,0 +1,421 @@ +# -*- tab-width: 4 -*- ############################################### +# +# $Id$ + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. + +package LXR::Files::GIT; + +$CVSID =3D '$Id$'; + +use strict; +use FileHandle; +use LXR::Common; +use LXR::Author; + +# +# We're adding ".git" to the path since we're only dealing with +# low-level stuff and _never_ ever deal with checked-out files. +# +sub new { + my ($self, $rootpath, $params) =3D @_; + + $self =3D bless({}, $self); + $self->{'rootpath'} =3D $rootpath; + + $ENV{'GIT_DIR'} =3D $self->{'rootpath'}; + return $self; +} + +sub filerev { + my ($self, $filename, $release) =3D @_; + + $filename =3D $self->sanitizePath ($filename); + $release =3D $self->get_treehash_for_branchhead_or_tag ($release); + + my $pid =3D open(my $F, '-|'); + die $! unless defined $pid; + if (!$pid) { + exec ("git-ls-tree", $release, $filename) + or die "filerev: Cannot exec git-ls-tree";=20 + } + + my $git_line=3D<$F>; + chomp $git_line; + close($F); + + if ($git_line =3D~ m/(\d+)\s(\w+)\s([[:xdigit:]]+)\t(.*)/ ) { + return $3; + =09 + } else { + die "filerev( $filename, $release ): No entry found.\n"; + } +} + +sub getfiletime { + my ($self, $filename, $release) =3D @_; + $filename =3D $self->sanitizePath ($filename); + + if ($filename =3D~ m/\/\.\.$/ ) + return undef; +# if ($filename =3D~ /\/\.\.\$/) +# return undef; + + my $pid1 =3D open(my $R, '-|' ); + die $! unless defined $pid1; + if(!$pid1) { + exec("git-rev-list", "--max-count=3D1", "$release", "--", $filename ) or= die "getfiletime ($filename, $release): Cannot exec git-rev-list\n"; + } + my $commit =3D <$R>; + chomp $commit; + close($R); +=09 + my $pid =3D open(my $F, '-|'); + die $! unless defined $pid; + if(!$pid) { + exec("git-cat-file", "commit", $commit) or die "getfiletime ($filename, = $release): Cannot exec git-cat-file\n"; + } + + while(<$F>) { + chomp; + if ( m/^author .*<.*> (\d+)\s.*$/ ) { + close($F); + return $1; + } + } +=09 + close($F); + + die "getfiletime ($filename, $release) : Did not find GIT entry.\n"; +} + +sub getfilesize { + my ($self, $filename, $release) =3D @_; + + $filename =3D $self->sanitizePath ($filename); + my $object_hash =3D $self->filerev ($filename, $release); + + print STDERR "getfilesize ($filename, $release)\n"; + + # return `git-cat-file -s $blobhash`; + my $pid =3D open (my $F, '-|'); + die $! unless defined $pid; + if(!$pid) {=09 + exec ("git-cat-file", "-s", $object_hash) or die "getfilesize ($filename= , $release): Cannot exec git-cat-file\n"; + } + + my $size =3D <$F>; + close ($F); + chomp $size; + if ( $size ) { + return $size; + } else { + return undef; + } + + close ($F); + return undef; +} + +sub getfile { + my ($self, $filename, $release) =3D @_; + my ($buffer); + +# my $blobhash =3D open( "git-ls-tree $release $filename | cut -f 3 -d ' '= | cut -f 1 -d \$'\t' |") or die "Cannot open git-ls-tree $release $filenam= e in getfile\n"; + my $blobhash =3D $self->filerev( $filename, $release ); +# local ($/) =3D undef; + + open(FILE, "git-cat-file blob $blobhash|") || return undef; + $buffer =3D <FILE>; + close(FILE); + return $buffer; +} + +sub getfilehandle { + my ($self, $filename, $release) =3D @_; + my ($fileh); + $filename =3D $self->sanitizePath ($filename); + + my $treeid =3D $self->get_treehash_for_branchhead_or_tag ($release); + + $filename =3D $self->sanitizePath ($filename); + my $objectid =3D $self->getBlobOrTreeOfPathAndTree ($filename, $treeid); + + $fileh =3D new IO::File; + $fileh->open ("git-cat-file blob $objectid |") or die "Cannot execute git= -cat-file blob $objectid"; + + return $fileh; +} + +sub tmpfile { + my ($self, $filename, $release) =3D @_; + my ($tmp, $fileh); + local ($/) =3D undef; + + $tmp =3D $config->tmpdir . '/lxrtmp.' . time . '.' . $$ . '.' . &LXR::Com= mon::tmpcounter; + open(TMP, "> $tmp") || return undef; + $fileh =3D $self->getfilehandle( $filename, $release ); + print(TMP <$fileh>); + close($fileh); + close(TMP); + + return $tmp; +} + +sub getannotations { + + return (); + my ($self, $pathname, $release) =3D @_; + my @authors =3D (); + + if ( $pathname =3D~ m#^/(.*)$# ) { + $pathname =3D $1; + } +=09 + open( BLAME, "git-blame -l $pathname $release |"); + while( <BLAME> ) { + if ( m/(^[[:xdigit:]]+)\s.*$/ ) { + my $linehash =3D $1; + my $authorline =3D `git-cat-file commit $linehash`; + if ($authorline =3D~ m/^author ([^<]+)<(([^@])\@[^>]+)>.*$/ ) { + my ($authorname, $authoruser, $authoremail) =3D ($1, $2, $3); + push(@authors, LXR::Author->new(chomp $authorname, + $authoruser, $authoremail)); + } else { + push(@authors, LXR::Author->new("", "", "")); + } + } else {=09 + print STDERR "getannotations: JB HAT DOOFE OHREN: $_\n"; + } + } + close(BLAME); + + print STDERR "authors: " . join(" ", @authors) . "\n"; +=09 + return @authors; +} + +sub getauthor { + + return (); + + my ($self, $filename, $release) =3D @_; + $filename =3D $self->sanitizePath ($filename); + print STDERR "getauthr( $filename, $release )\n"; + my $commit =3D `git-rev-list --max-count=3D1 $release -- $filename | tr -= d \$'\n'`; + my $authorline =3D `git-cat-file commit $commit | grep '^author' | head -= n1 | tr -d \$'\n'`; + + if ($authorline =3D~ m/^author ([^<]+)<(([^@])\@[^>]+)>.*$/ ) { + my ($authorname, $authoruser, $authoremail) =3D ($1, $2, $3); + return LXR::Author->new(chomp $authorname, $authoruser, $authoremail); + } else { + return LXR::Author->new("", "", ""); + } +} + +sub getdir { + my ($self, $pathname, $release) =3D @_; + my ($dir, $node, @dirs, @files); +=09 + my $treeid =3D $self->get_treehash_for_branchhead_or_tag ($release); +=09 + $pathname =3D $self->sanitizePath( $pathname ); + if ( $pathname !~ m#..*/# ) { + $pathname =3D $pathname . '/'; + } + + open(DIRLIST, "git-ls-tree $treeid $pathname |") or die "Cannot open git-= ls-tree $treeid $pathname"; + while( <DIRLIST> ) { + if ( m/(\d+)\s(\w+)\s([[:xdigit:]]+)\t(.*)/ ) { + my ($entrymode, $entrytype, $objectid, $entryname) =3D ($1,$2,$3,$4); + + # Weed out things to ignore + foreach my $ignoredir ($config->{ignoredirs}) { + next if $entryname eq $ignoredir; + } + + next if $entryname =3D~ /^\.$/; + next if $entryname =3D~ /^\.\.$/; + + if ($entrytype eq "blob") { + push(@files, $entryname); + =09 + } elsif ($entrytype eq "tree") { + push(@dirs, "$entryname/"); + #push(@dirs, "$entryname"); + } + } + } + close(DIRLIST); +=09 + return sort(@dirs), sort(@files); +} + +# This function should not be used outside this module +# except for printing error messages +# (I'm not sure even that is legitimate use, considering +# other possible File classes.) + +##sub toreal { +## my ($self, $pathname, $release) =3D @_; +## +## nearly all (if not all) method calls eventually call toreal(), so this = is a good place to block file access +## foreach my $ignoredir ($config->ignoredirs) { +## return undef if $pathname =3D~ m|/$ignoredir/|; +## } +## +## return ($self->{'rootpath'} . $release . $pathname); +##} + +sub isdir { + my ($self, $pathname, $release) =3D @_; + + $pathname =3D $self->sanitizePath ($pathname); + $release =3D $self->get_newest_commit_from_branchhead_or_tag ($release); + + print STDERR "isdir ($pathname, $release)\n"; + + my $treeid =3D $self->get_treehash_for_branchhead_or_tag ($release); + + return $self->getObjectType ($pathname, $treeid) eq "tree"; +} + +sub isfile { + my ($self, $pathname, $release) =3D @_; + + $pathname =3D $self->sanitizePath ($pathname); + $release =3D $self->get_newest_commit_from_branchhead_or_tag ($release); +=09 + print STDERR "isfile($pathname, $release)\n"; + + my $treeid =3D $self->get_treehash_for_branchhead_or_tag ($release); +=09 + return $self->getObjectType ($pathname, $treeid) eq "blob"; +} + +# +# For a given commit (that is, the latest commit on a named branch or +# a tag's name) return the tree object's hash corresponding to it. +# +sub get_treehash_for_branchhead_or_tag () { + my ($self, $release) =3D @_; + $release =3D $self->get_newest_commit_from_branchhead_or_tag ($release); + + return `git-cat-file commit $release | grep '^tree' | head -n1 | cut -f 2= -d ' ' | tr -d \$'\n'`; +} + +sub getObjectType() { + my ($self, $pathname, $treeid) =3D @_; + + open (DIRLIST, "git-ls-tree $treeid $pathname |") or die "Cannot open git= -ls-tree $treeid $pathname"; + while (<DIRLIST>) { + if (m/(\d+)\s(\w+)\s([[:xdigit:]]+)\t(.*)/) { + my ($entrymode, $entrytype, $objectid, $entryname) =3D ($1, $2, $3, $4); + + # Weed out things to ignore +# # This should only be needed in the getdir function. +# foreach my $ignoredir ($config->{ignoredirs}) { +# next if $entryname eq $ignoredir; +# } + + $entryname =3D $self->sanitizePath ($entryname); + +# print STDERR "getBlobOrTreeOfPathAndTree: pathname: \"$pathname\" :: e= ntryname: \"$entryname\"\n"; + next if ( ! $pathname eq $entryname ); + + close (DIRLIST); +# print STDERR "Juhu, wir haben $pathname gefunden :: $objectid\n"; + return $entrytype; + } + } + close (DIRLIST); + + return undef; +} + +sub getBlobOrTreeOfPathAndTree() { + my ($self, $pathname, $treeid ) =3D @_; + + open (DIRLIST, "git-ls-tree $treeid $pathname |") or die "Cannot open git= -ls-tree $treeid $pathname"; + while (<DIRLIST>) { + if (m/(\d+)\s(\w+)\s([[:xdigit:]]+)\t(.*)/) { + my ($entrymode, $entrytype, $objectid, $entryname) =3D ($1, $2, $3, $4); + + # Weed out things to ignore + foreach my $ignoredir ($config->{ignoredirs}) { + next if $entryname eq $ignoredir; + } + + $entryname =3D $self->sanitizePath( $entryname ); + next if (! $pathname eq $entryname ); + + close (DIRLIST); + return $objectid; + } + } + close (DIRLIST); + + return undef; +} + +# +# This function will take a branch name ("master") or a tag name +# (like "v2.6.15") and return either the branch commit object ID, +# or descend from the tag object into the referenced commit object +# and return its commit ID. XXX +# +sub get_newest_commit_from_branchhead_or_tag ($$) { + my ($self, $head_or_tag) =3D @_; + my $objtype =3D `git-cat-file -t $head_or_tag | tr -d \$'\n'`; + + if ($objtype eq "commit") { + return $head_or_tag; + } elsif ($objtype eq "tag") { + return `git-cat-file tag $head_or_tag | grep '^object' | head -n1 | cut = -f 2 -d ' ' | tr -d \$'\n'`; + } else { + die ("get_newest_commit_from_branchhead_or_tag: Unrecognized object type= $objtype for $head_or_tag\n"); + } +} + +sub sanitizePath() { + my ($self, $pathname) =3D @_; + + if ( $pathname eq "" ) { + # Empty? Just beam the client to the root. + $pathname =3D "."; + } elsif ( $pathname =3D~ m#^/# ) { + # Absolute? We want them to be relative! + $pathname =3D ".$pathname"; + } else { + # Filename incurrent directory? Add "./" to + # make them truly relative. + $pathname =3D "./$pathname"; + } + + # Don't let them exploit us easily. +# if ( $pathname =3D~ m#/../# ) { +# die("You are now dead because of $pathname\n"); +# } + + # Doubled slashes? We remove them. + $pathname =3D~ s#//#/#g; + + # Delete leading slashes. + $pathname =3D~ s#/*$##g; + + return $pathname; +} + +1; diff -u -p -r1.26 lxr.conf --- a/templates/lxr.conf 2 Nov 2005 23:39:55 -0000 1.26 +++ b/templates/lxr.conf 26 Mar 2006 11:01:29 -0000 @@ -113,7 +113,15 @@ # the 'sourceparams' value below. This should point to a directory wher= e the=20 # code can write and read files. # 'sourceroot' =3D> 'bk:/some/repository/here', - =09 + + # Finally, you can also use GIT, Linux Kernel's new + # distributes SCM to access the history: + # 'sourceroot' =3D> 'git:/some/repository/.git' + # 'sourceroot' =3D> 'git:/some/repostitory.git' + # The `sourceroot' config variable points to the directory + # in which you find the `objects', `refs', `index' etc. + # directories. + # The name to display for this source tree 'sourcerootname' =3D> 'Example', =09 @@ -231,7 +239,9 @@ #'cvswebprefix' =3D> 'http://cvs.myhost.com/cgi-bin/viewcvs.cgi/myroot', #'cvswebpostfix' =3D> '', =20 - # choose to ignore certain directories - 'ignoredirs' =3D> ['CVSROOT'], # 'CVS' dir is always ignored + # Directories to always ignore. These usually are the SCM's + # private directories which possibly may contain non-public + # project history. + 'ignoredirs' =3D> ['CVSROOT', '.git'], } ) --=20 Jan-Benedict Glaw jb...@lu... . +49-172-7608481 = _ O _ "Eine Freie Meinung in einem Freien Kopf | Gegen Zensur | Gegen Krieg = _ _ O f=C3=BCr einen Freien Staat voll Freier B=C3=BCrger" | im Internet! | i= m Irak! O O O ret =3D do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA)= ); |