Thread: [Lxr-dev] Solution for "Don't require glimpse for file search"
Brought to you by:
ajlittoz
From: Shree K. <sh...@pr...> - 2002-02-12 06:27:09
|
This is in response to the feature request "Don't require glimpse for file search" As was indicated in the request, there is no need to use glimpse as all the filename are present in the database... The idea is to get the list of all files in a particular release and filter the list using the regular expression specified by the user. First, we have to have a "getallfiles" method in MySQL.pm ------------------------------------------------------------------------------- Add the following to the "new" method: $self->{all_files_select} = $self->{dbh}->prepare ("select distinct f.filename ". "from symbols s, indexes i, files f, releases r ". "where s.symid = i.symid and i.fileid = f.fileid ". "and r.release = ?"); And add the new function: sub getallfiles { my ($self,$release) = @_; my ($rows, @ret); $rows = $self->{all_files_select}->execute($release); while ($rows-- > 0) { push(@ret, $self->{all_files_select}->fetchrow_array); } $self->{all_files_select}->finish(); return @ret; } ------------------------------------------------------------------------------- Now apply the following patch to "find": ------------------------------------------------------------------------------- < unless (open(FILELLISTING,$config->glimpsedir."/.glimpse_filenames")) { < &warning("Could not open .glimpse_filenames."); < return; < } < print("<hr>\n"); < $sourceroot = $config->sourceroot; < while($file = <FILELLISTING>) { < $file =~ s/^$sourceroot//; < if($file =~ /$searchtext/) { < print(&fileref("$file", "find-file", "/$file"),"<br>\n"); < } < } --- > # Modification to do file search without glimpse > my @files = $index->getallfiles($release); > my $file; > my $refs=''; > my $count=0; > while ($file = shift(@files)) { > if($file =~ /$searchtext/i) { > $refs .= &fileref("$file", "find-file", "/$file")."<br>\n"; > $count++; > } > } > print "<hr>\n"; > if ($count) { > print "Found $count files/directories matching <b>$searchtext</b><br>\n<hr>\n"; > print $refs; > } else { > print "No files/directories match <b>$searchtext</b><br>\n"; > } -------------------------------------------------------------------------------- The reason I use Perl for matching regexps is that this facility is not available in all sorts of databases - and even if it were, would the regular expression format be consistent ? |
From: Malcolm B. <ma...@br...> - 2002-02-12 16:58:37
|
Hi Shree, > The idea is to get the list of all files in a particular release and > filter the list using the regular expression specified by the user. This looks like a neat addition. I'm a little worried by memory usage - there are lxr installs with tens of thousands of files in them per release - have you tried to measure the impact of reading substantial numbers of filenames into memory? From the pov of accepting this patch, it would be much easier for me if you sent the output of either diff -du or cvs diff -du. Then applying the patch is automatic rather than requiring manual editing. Cheers, Malcolm |
From: Shree K. <sh...@pr...> - 2002-02-13 05:37:39
|
Hi Malcolm, > This looks like a neat addition. I'm a little worried by memory usage > - there are lxr installs with tens of thousands of files in them per > release - have you tried to measure the impact of reading substantial > numbers of filenames into memory? No, I haven't measured the memory impact. The easiest solution for this would be to filter by the regexp while executing the query itself,and only add the files that match the regexp to the return list. This must drastically reduce the memory usage as unneeded files will not be retained in memory. > From the pov of accepting this patch, it would be much easier for me if > you sent the output of either diff -du or cvs diff -du. Then applying > the patch is automatic rather than requiring manual editing. Yes, that's correct. I actually edited the diff output and sent it. If the above change is OK, then I can post a patch immediately. Cheers, Shree Kumar |
From: Malcolm B. <ma...@br...> - 2002-02-18 14:24:03
|
Hi Shree, Thanks for the patch. Shree Kumar wrote: >No, I haven't measured the memory impact. The easiest solution for this >would be to filter by the regexp while executing the query itself,and >only add the files that match the regexp to the return list. This must >drastically reduce the memory usage as unneeded files will not be >retained in memory. > I see that's what your new patch does - it seems good to me. Unfortunately I've suffered a major computer failure, so I'm unable to commit patches at the moment. I hope to get things back together sometime this weekend and then be able to get your patch in. Malcolm |
From: Shree K. <sh...@pr...> - 2002-02-25 13:05:46
|
Hi Malcolm, > I see that's what your new patch does - it seems good to me. > Unfortunately I've suffered a major computer failure, so I'm unable to > commit patches at the moment. I hope to get things back together > sometime this weekend and then be able to get your patch in. I have already encountered a major problem with the patch I sent. The problem is due to the query to get all files of a particular release: $self->{all_files_select} = $self->{dbh}->prepare ("select distinct f.filename ". "from symbols s, indexes i, files f, releases r ". "where s.symid = i.symid and i.fileid = f.fileid ". "and r.release = ?"); I had actually tested the patch for a small repository - it's with a big repository that the problem creeps in. In a big repository, it's very costly to do the join due to the huge number of entries in the "indexes" table! - and MySql makes my computer crawl [Win2k almost hangs with mysql taking 99% of CPU time]... The query has to be changed to : $self->{all_files_select} = $self->{dbh}->prepare ("select distinct f.filename ". "from files f, releases r ". "where f.fileid=r.fileid and r.release = ?"); With this change filesearch works properly - but there is one problem : If a file is not identified as belonging to a "language", it is not added to the filename list. Refer processfile() in Tagger.pm. The solution is to move the "return unless $lang;" line to come after the line "$index->release($fileid,$release)" in processfile() - Shree Kumar |
From: Malcolm B. <ma...@br...> - 2002-02-26 03:26:00
|
Hi Shree, Please can you post an updated patch, either to the lxr-developer list or on sourceforge? Cheers, Malcolm |
From: Shree K. <sh...@pr...> - 2002-02-26 06:35:46
Attachments:
db_based_find_patches.tar.gz
|
Hi Malcolm, > Please can you post an updated patch, either to the lxr-developer list > or on sourceforge? > The patch for the 3 files - find, Mysql.pm and Tagger.pm [w.r.t lxr-0.9.1] is attached. Cheers, Shree |