From: Peter T. <Peter@StructuredWikis.com> - 2007-01-21 23:45:37
|
Crawford, Thanks for making search faster. > + } elsif (exists $ENV{MOD_PERL}) { > + # Use pure-perl grep if MOD_PERL, as the fork() used by TWiki::Sandbox > + # is horribly inefficient with mod_perl How does pure-Perl grep compare to external grep? Could you share some numbers? Also, rather than hardcoding pure-Perl grep it is more flexible to define the type of search in configure. I know that you do not like additional settings in configure; however I believe that it is actually good to have many configuration option as long as the defaults are good. Regards, Peter de...@de... wrote: > Author: CrawfordCurrie > Date: 2007-01-21 11:10:17 -0600 (Sun, 21 Jan 2007) > New Revision: 12590 > > Added: > twiki/branches/MAIN/test/bin/make_big.pl > twiki/branches/MAIN/tools/native_search/ > twiki/branches/MAIN/tools/native_search/Makefile.PL > twiki/branches/MAIN/tools/native_search/NativeTWikiSearch.pm > twiki/branches/MAIN/tools/native_search/NativeTWikiSearch.xs > Modified: > twiki/branches/MAIN/lib/TWiki/Store/RcsFile.pm > Log: > Item3443: native grep linked via XS. For some reason TWiki works fine with it (and is nearly twice as fast as forked grep for searching, even outside mod_perl) but the unit tests fail; I haven't fathomed why. To build the native search, cd tools/native_search;perl Makefile.PL;make install (all as root). Help in packaging this correctly so it can be installed by a non-root would be most welcome. There is also a script to generate large webs for testing. > > Modified: twiki/branches/MAIN/lib/TWiki/Store/RcsFile.pm > =================================================================== > --- twiki/branches/MAIN/lib/TWiki/Store/RcsFile.pm 2007-01-21 02:22:26 UTC (rev 12589) > +++ twiki/branches/MAIN/lib/TWiki/Store/RcsFile.pm 2007-01-21 17:10:17 UTC (rev 12590) > @@ -328,43 +328,80 @@ > my( $this, $searchString, $topics, $options ) = @_; > ASSERT(defined $options) if DEBUG; > my $type = $options->{type} || ''; > - > - # I18N: 'grep' must use locales if needed, > - # for case-insensitive searching. See TWiki::setupLocale. > - my $program = ''; > - # FIXME: For Cygwin grep, do something about -E and -F switches > - # - best to strip off any switches after first space in > - # EgrepCmd etc and apply those as argument 1. > - if( $type eq 'regex' ) { > - $program = $TWiki::cfg{RCS}{EgrepCmd}; > - } else { > - $program = $TWiki::cfg{RCS}{FgrepCmd}; > - } > - > - $program =~ s/%CS{(.*?)\|(.*?)}%/$options->{casesensitive}?$1:$2/ge; > - $program =~ s/%DET{(.*?)\|(.*?)}%/$options->{files_without_match}?$2:$1/ge; > - > my $sDir = $TWiki::cfg{DataDir}.'/'.$this->{web}.'/'; > - my $seen = {}; > - # process topics in sets, fix for Codev.ArgumentListIsTooLongForSearch > - my $maxTopicsInSet = 512; # max number of topics for a grep call > - my @take = @$topics; > - my @set = splice( @take, 0, $maxTopicsInSet ); > - my $sandbox = $this->{session}->{sandbox}; > - while( @set ) { > - @set = map { "$sDir/$_.txt" } @set; > - my ($matches, $exit ) = $sandbox->sysCommand( > - $program, > - TOKEN => $searchString, > - FILES => \@set); > - foreach my $match ( split( /\r?\n/, $matches )) { > - if( $match =~ m/([^\/]*)\.txt(:(.*))?$/ ) { > - push( @{$seen->{$1}}, $3 ); > + my $matches = ''; > + my %seen; > + # Use the WikiRing native search if it is available, it is faster > + # than forking grep. > + eval 'use NativeTWikiSearch qw(cgrep)'; > + unless ($@) { > + my @fs; > + push(@fs, "-i") unless $options->{casesensitive}; > + push(@fs, "-l") if $options->{files_without_match}; > + push(@fs, $searchString); > + push(@fs, map { "$sDir/$_.txt" } @$topics); > + my $matches = NativeTWikiSearch::cgrep(\@fs); > + if (defined($matches)) { > + for (@$matches) { > + if (/([^\/]*)\.txt(:(.*))?$/) { > + push( @{$seen{$1}}, $3 ); > + } > } > } > - @set = splice( @take, 0, $maxTopicsInSet ); > + } elsif (exists $ENV{MOD_PERL}) { > + # Use pure-perl grep if MOD_PERL, as the fork() used by TWiki::Sandbox > + # is horribly inefficient with mod_perl > + local $/ = "\n"; > + if ($type eq 'regex') { > + $searchString =~ s!/!\\/!g; > + } else { > + $searchString =~ s/(\W)/\\$1/g; > + } > + my $match_code = "/$searchString/o"; > + $match_code .= 'i' unless ($options->{casesensitive}); > + my $doMatch = eval "sub { $match_code }"; > + FILE: > + foreach my $file ( @$topics ) { > + next unless open(FILE, "$sDir/$file.txt"); > + while (<FILE>) { > + if (&$doMatch()) { > + push( @{$seen{$file}}, $_ ); > + next FILE if $options->{files_without_match}; > + } > + } > + } > + } else { > + # I18N: 'grep' must use locales if needed, > + # for case-insensitive searching. See TWiki::setupLocale. > + my $program = ''; > + # FIXME: For Cygwin grep, do something about -E and -F switches > + # - best to strip off any switches after first space in > + # EgrepCmd etc and apply those as argument 1. > + if( $type eq 'regex' ) { > + $program = $TWiki::cfg{RCS}{EgrepCmd}; > + } else { > + $program = $TWiki::cfg{RCS}{FgrepCmd}; > + } > + > + $program =~ s/%CS{(.*?)\|(.*?)}%/$options->{casesensitive}?$1:$2/ge; > + $program =~ s/%DET{(.*?)\|(.*?)}%/$options->{files_without_match}?$2:$1/ge; > + # process topics in sets, fix for Codev.ArgumentListIsTooLongForSearch > + my $maxTopicsInSet = 512; # max number of topics for a grep call > + my @take = @$topics; > + my @set = splice( @take, 0, $maxTopicsInSet ); > + my $sandbox = $this->{session}->{sandbox}; > + while( @set ) { > + @set = map { "$sDir/$_.txt" } @set; > + my ($m, $exit ) = $sandbox->sysCommand( > + $program, > + TOKEN => $searchString, > + FILES => \@set); > + $matches .= $m; > + @set = splice( @take, 0, $maxTopicsInSet ); > + } > + $matches =~ s/([^\/]*)\.txt(:(.*))?$/push( @{$seen{$1}}, $3 ); ''/gem; > } > - return $seen; > + return \%seen; > } > > =pod -- * Peter Thoeny Peter@StructuredWikis.com * http://StructuredWikis.com - bringing wikis to the workplace * http://TWiki.org - is your team already TWiki enabled? * Knowledge cannot be managed, it can be discovered and shared * This e-mail is: (_) private (x) ask first (_) public |