[Lxr-dev] Some patches for review (CVS tree indexing working)
Brought to you by:
ajlittoz
From: <pe...@kl...> - 2001-07-24 11:47:52
|
Finally, I have a version of LXR working that indexes our CVS tree directly. Here's the significant patches against the current CVS tree on sourceforge. I'd be very happy if someone took a look and told me if there was something fundamentally wrong with this (I'd also appreciate being told what a genius I am, naturally ;) ). If I don't hear anything, I will assume the changes don't bother anyone, and check them in in a day or two. The bulk of the changes are in CVS.pm, which now doesn't segfault anymore when faced with binary files. It now largely uses external commands for getting files out of CVS. Unfortunately, it does this in a way that is probably exploitable. While this is not a big issue on our intranet, you'd probably not want to run this on a public server yet. My next task will be to fix this. I just wanted to get these patches out the door, as they've taken far too long already. See also comments in between the parts of the patch. =================================================================== These patches to the datamodel may be just me being dense, but because a CVS file contains a lot of versions, there is no single mapping file->version. genxref complained loudly without this change. Also, the usage table wasn't dropped along with the others in postgres. Index: initdb =================================================================== RCS file: /cvsroot/lxr/lxr/initdb,v retrieving revision 1.10 diff -u -r1.10 initdb --- initdb 1999/09/17 09:37:37 1.10 +++ initdb 2001/07/24 11:16:34 @@ -4,6 +4,7 @@ drop table symbols; drop table indexes; drop table releases; +drop table usage; drop table status; create sequence filenum cache 50; @@ -37,7 +38,7 @@ create table releases (fileid int references files, release varchar, - primary key (fileid) + primary key (fileid,release) ); create table usage Index: initdb-mysql =================================================================== RCS file: /cvsroot/lxr/lxr/initdb-mysql,v retrieving revision 1.4 diff -u -r1.4 initdb-mysql --- initdb-mysql 2001/05/23 05:33:12 1.4 +++ initdb-mysql 2001/07/24 11:16:34 @@ -34,7 +34,7 @@ create table releases (fileid int not null references files, release char(255) binary not null, - primary key (fileid) + primary key (fileid,release) ); create table useage Index: initdb-postgres =================================================================== RCS file: /cvsroot/lxr/lxr/initdb-postgres,v retrieving revision 1.1 diff -u -r1.1 initdb-postgres --- initdb-postgres 1999/12/25 21:58:27 1.1 +++ initdb-postgres 2001/07/24 11:16:34 @@ -4,6 +4,7 @@ drop table symbols; drop table indexes; drop table releases; +drop table usage; drop table status; create sequence filenum cache 50; @@ -37,7 +38,7 @@ create table releases (fileid int references files, release varchar, - primary key (fileid) + primary key (fileid,release) ); create table usage =================================================================== The comment says it all, really. I just wanted to illustrate a new way of specifying the range. Index: lxr.conf.template =================================================================== RCS file: /cvsroot/lxr/lxr/lxr.conf.template,v retrieving revision 1.5 diff -u -r1.5 lxr.conf.template --- lxr.conf.template 1999/06/16 09:17:48 1.5 +++ lxr.conf.template 2001/07/24 11:16:34 @@ -22,6 +22,15 @@ # Define typed variable "v", read valueset from file. 'v' => {'name' => 'Version', 'range' => [ readfile('src/cvsversions') ], + + # If files within a tree can have different versions, + # e.g in a CVS tree, 'range' can be specified as a + # function to call for each file: + #'range' => sub { return + # ($files->allreleases($LXR::Common::pathname), + # $files->allrevisions($LXR::Common::pathname)) + # }, # deferred function call. + 'default' => '1.0.6'}, # Define typed variable "a". First value is default. =================================================================== And this is what implements this new way of specifying the range. Index: lib/LXR/Config.pm =================================================================== RCS file: /cvsroot/lxr/lxr/lib/LXR/Config.pm,v retrieving revision 1.23 diff -u -r1.23 Config.pm --- lib/LXR/Config.pm 2000/09/04 19:26:28 1.23 +++ lib/LXR/Config.pm 2001/07/24 11:16:36 @@ -117,6 +117,10 @@ sub varrange { my ($self, $var) = @_; + if (ref($self->{variables}{$var}{range}) eq "CODE") { + return &{$self->{variables}{$var}{range}}; + } + return @{$self->{variables}{$var}{range} || []}; } =================================================================== The shebang parsing didn't work previously. Now it does. Index: lib/LXR/Lang.pm =================================================================== RCS file: /cvsroot/lxr/lxr/lib/LXR/Lang.pm,v retrieving revision 1.19 diff -u -r1.19 Lang.pm --- lib/LXR/Lang.pm 1999/12/25 21:58:27 1.19 +++ lib/LXR/Lang.pm 2001/07/24 11:16:38 @@ -30,9 +30,9 @@ $lang = new LXR::Lang::Perl($pathname, $release); } else { - my ($shebang); + $files->getfile($pathname, $release) =~ /^#!\s*(\S+)/s; - $shebang = $files->getfile($pathname, $release) =~ /^#!\s*(\S+)/s; + my $shebang = $1; if ($shebang =~ /perl/) { require LXR::Lang::Perl; =================================================================== Quite a lot of changes here. Some insignificant bugfixes for undefined values and stuff, and then the big change of having parsecvs only parse the header of the cvs file, because it will get confused by binary content in the controlled file. All places that depended on the content of the file being available in the %cvs hash have been changed to use the proper RCS/CVS commands instead (unfortunately not in a safe way - yet). Index: lib/LXR/Files/CVS.pm =================================================================== RCS file: /cvsroot/lxr/lxr/lib/LXR/Files/CVS.pm,v retrieving revision 1.11 diff -u -r1.11 CVS.pm --- lib/LXR/Files/CVS.pm 1999/11/24 14:48:53 1.11 +++ lib/LXR/Files/CVS.pm 2001/07/24 11:16:38 @@ -29,6 +29,9 @@ if ($release =~ /rev_([\d\.]+)/) { return $1; } + elsif ($release =~ /([\d\.]+)/) { + return $1; + } else { $self->parsecvs($filename); return $cvs{'header'}{'symbols'}{$release}; @@ -47,8 +50,10 @@ return undef unless defined($rev); my @t = reverse(split(/\./, $cvs{'branch'}{$rev}{'date'})); - $t[4]--; + return undef unless @t; + + $t[4]--; return timegm(@t); } @@ -60,38 +65,10 @@ sub getfile { my ($self, $filename, $release) = @_; - - $self->parsecvs($filename); - - my $rev = $self->filerev($filename, $release); - return undef unless defined($rev); - - my $hrev = $cvs{'header'}{'head'}; - my @head = $cvs{'history'}{$hrev}{'text'} =~ /([^\n]*\n)/gs; - - while ($hrev ne $rev && $cvs{'branch'}{$hrev}{'branches'} ne $rev) { - $hrev = $cvs{'branch'}{$hrev}{'next'}; - my @diff = $cvs{'history'}{$hrev}{'text'} =~ /([^\n]*\n)/gs; - my $off = 0; - - while (@diff) { - my $dir = shift(@diff); - if ($dir =~ /^a(\d+)\s+(\d+)/) { - splice(@head, $1-$off, 0, splice(@diff, 0, $2)); - $off -= $2; - } - elsif ($dir =~ /^d(\d+)\s+(\d+)/) { - splice(@head, $1-$off-1, $2); - $off += $2; - } - else { - warning("Oops! Out of sync!"); - } - } - } - - return join('', @head); + my $fileh = $self->getfilehandle($filename, $release); + return undef unless $fileh; + return join('', $fileh->getlines); } sub getannotations { @@ -105,22 +82,23 @@ my $hrev = $cvs{'header'}{'head'}; my $lrev; my @anno; - my @head = $cvs{'history'}{$hrev}{'text'} =~ /\n()/gs; + my $headfh = $self->getfilehandle($filename, $release); + my @head = $headfh->getlines; while (1) { if ($rev eq $hrev) { @head = 0..$#head; } - + $lrev = $hrev; $hrev = $cvs{'branch'}{$hrev}{'next'} || last; - - my @diff = $cvs{'history'}{$hrev}{'text'} =~ /([^\n]*\n)/gs; - my $off = 0; - + + my @diff = $self->getdiff($filename, $lrev, $hrev); + my $off = 0; + while (@diff) { my $dir = shift(@diff); - + if ($dir =~ /^a(\d+)\s+(\d+)/) { splice(@diff, 0, $2); splice(@head, $1-$off, 0, ('') x $2); @@ -130,7 +108,7 @@ map { $anno[$_] = $lrev if $_ ne ''; } splice(@head, $1-$off-1, $2); - + $off += $2; } else { @@ -159,22 +137,33 @@ my ($self, $filename, $release) = @_; my ($fileh); -# $fileh = new FileHandle("co -q -pv$release ". -# $self->toreal($filename, $release). -# " |"); # FIXME: Exploitable? - - my $buffer = $self->getfile($filename, $release); - - &LXR::Common::fflush; - my ($readh, $writeh) = FileHandle::pipe; - unless (fork) { - $writeh->autoflush(1); - $writeh->print($buffer); - exec("/bin/true"); # Exit without cleanup. - exit; - } + $self->parsecvs($filename); + + my $rev = $self->filerev($filename, $release); + return undef unless defined($rev); + + $fileh = new FileHandle("co -q -p$rev ". + $self->toreal($filename, $release). + " |"); # FIXME: Exploitable? + return $fileh; +} + +sub getdiff { + my ($self, $filename, $release1, $release2) = @_; + my ($fileh); + + $self->parsecvs($filename); + + my $rev1 = $self->filerev($filename, $release1); + return undef unless defined($rev1); + + my $rev2 = $self->filerev($filename, $release2); + return undef unless defined($rev2); - return $readh; + $fileh = new FileHandle("rcsdiff -q -a -n -r$rev1 -r$rev2 ". + $self->toreal($filename, $release1). + " |"); # FIXME: Exploitable? + return $fileh->getlines; } sub tmpfile { @@ -297,15 +286,35 @@ return sort(keys(%{$cvs{'header'}{'symbols'}})); } +sub allrevisions { + my ($self, $filename) = @_; + + $self->parsecvs($filename); + + return sort(keys(%{$cvs{'branch'}})); +} + sub parsecvs { + # Actually, these days it just parses the header. + # RCS tools are much better at parsing RCS files. + # -pok my ($self, $filename) = @_; return if $cache_filename eq $filename; $cache_filename = $filename; + + my $file = ''; + open (CVS, $self->toreal($filename, undef)); + while (<CVS>) { + if (/^text\s*$/) { + # stop reading when we hit the text. + last; + } + $file .= $_; + } + close (CVS); - open(CVS, $self->toreal($filename, undef)); - my @cvs = join('', <CVS>) =~ /((?:(?:[^\n@]+|@[^@]*@)\n?)+)/gs; - close(CVS); + my @cvs = $file =~ /((?:(?:[^\n@]+|@[^@]*@)\n?)+)/gs; $cvs{'header'} = { map { s/@@/@/gs; /^@/s && substr($_, 1, -1) || $_ } @@ -313,7 +322,7 @@ $cvs{'header'}{'symbols'} = { $cvs{'header'}{'symbols'} =~ /(\S+?):(\S+)/g }; - + my ($orel, $nrel, $rev); while (($orel, $rev) = each %{$cvs{'header'}{'symbols'}}) { $nrel = $config->cvsversion($orel); @@ -337,13 +346,6 @@ $cvs{'desc'} = shift(@cvs) =~ /\s*desc\s+((?:[^\n@]+|@[^@]*@)*)\n/s; $cvs{'desc'} =~ s/^@|@($|@)/$1/gs; - while (@cvs) { - my ($r, $v) = shift(@cvs) =~ /\s*(\S+)\s*(.*)/s; - $cvs{'history'}{$r} = { map { s/@@/@/gs; - /^@/s && substr($_, 1, -1) || $_ } - $v =~ /(\w+)\s*((?:[^\n@]+|@[^@]*@)*)\n/gs }; - } } - 1; =================================================================== Also, some small changes to the Postgres interface. I dislike hardcoded numbers that occur more that one place, so I put the number of inserts before a commit into a variable. I also fixed a bug resulting from a wrong assumption on the return value on an execute of a select statement. Index: lib/LXR/Index/Postgres.pm =================================================================== RCS file: /cvsroot/lxr/lxr/lib/LXR/Index/Postgres.pm,v retrieving revision 1.4 diff -u -r1.4 Postgres.pm --- lib/LXR/Index/Postgres.pm 2000/10/31 12:52:12 1.4 +++ lib/LXR/Index/Postgres.pm 2001/07/24 11:16:41 @@ -9,7 +9,7 @@ use strict; use DBI; -use vars qw($dbh $transactions %files %symcache +use vars qw($dbh $transactions %files %symcache $commitlimit $files_select $filenum_nextval $files_insert $symbols_byname $symbols_byid $symnum_nextval $symbols_remove $symbols_insert $indexes_select $indexes_insert @@ -26,7 +26,8 @@ $$dbh{'AutoCommit'} = 0; # $dbh->trace(1); - + + $commitlimit = 100; $transactions = 0; %files = (); %symcache = (); @@ -96,7 +97,7 @@ $line, $type, $relsym ? $self->symid($relsym) : undef); - unless (++$transactions % 500) { + unless (++$transactions % $commitlimit) { $dbh->commit(); } } @@ -108,7 +109,7 @@ $line, $self->symid($symname)); - unless (++$transactions % 500) { + unless (++$transactions % $commitlimit) { $dbh->commit(); } } @@ -178,12 +179,16 @@ # Indicate that this filerevision is part of this release sub release { my ($self, $fileid, $release) = @_; + + + $releases_select->execute($fileid+0, $release); + my $firstrow = $releases_select->fetchrow_array(); + - my $rows = $releases_select->execute($fileid+0, $release); - $releases_select->finish(); +# $releases_select->finish(); - unless ($rows > 0) { - $releases_insert->execute($fileid, $release); + unless ($firstrow) { + $releases_insert->execute($fileid+0, $release); } } @@ -239,7 +244,7 @@ my ($self, $fileid) = @_; $status_insert->execute($fileid+0, $fileid+0); - return $status_update->execute(1, $fileid, 0) > 0; + return $status_update->execute(1, $fileid+0, 0) > 0; } sub toreference { ...Peder... -- Cogito ergo panta rei. |