[Lxr-dev] [ lxr-Bugs-518365 ] Indexing of files once indexed is buggy!
Brought to you by:
ajlittoz
From: <no...@so...> - 2002-06-07 13:10:35
|
Bugs item #518365, was opened at 2002-02-16 10:04 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=390117&aid=518365&group_id=27350 Category: genxref Group: current cvs Status: Open Resolution: None Priority: 7 Submitted By: Shree Kumar (shreekumar) Assigned to: Malcolm Box (mbox) Summary: Indexing of files once indexed is buggy! Initial Comment: I am using LXR-0.9.1 Consider this scenario : There is a source tree "test" having only one file - test.c test.c ------- #define TEST 100 now, I run genxref & when I search for TEST in identifiers, I get that it is a macro defined in test.c at line 1 now I change test.c to ------- #define T 1 #define TEST 100 & run genxref Now what I get is - TEST is defined as a macro in test.c in line 1 and line 2 ! The culprit is this piece of code in function processfile() [ Tagger.pm ] ------ if ($index->toindex($fileid)) { $index->empty_cache(); print(STDERR "--- $pathname $fileid\n"); my $path = $files->tmpfile($pathname, $release); $lang->indexfile($pathname, $path, $fileid, $index, $config); unlink($path); } else { print(STDERR "$pathname was already indexed\n"); } ------ The problem is that if the file already existed and has changed since then [based on the timestamp], the identifiers added to the database due to this file in the previous run of genxref are not removed from the database, hence the number of definitions will keep on growing... The same problem is also present in processrefs(). ---------------------------------------------------------------------- Comment By: Gregor Hartmann (grex) Date: 2002-06-07 13:10 Message: Logged In: YES user_id=559509 Another similar problem would be files ore whole directories that are deleted from the source tree. They would stay in the database forever as well. Maybe it could be fixed by iterating through all files in the database and removing those (from the database) which have changed or were removed in the source tree. then proceed indexing as before. ---------------------------------------------------------------------- Comment By: Shree Kumar (shreekumar) Date: 2002-02-19 07:21 Message: Logged In: YES user_id=142912 Here's my fix for this bug: Add a field "timestamp" to the "status" table. And remove the "status" field. Before finding identifiers in a file, check whether it's modification time is greater that it was previously. If yes, then remove all the identifier definitions due to this file [and release] from the database. Store the new timestamp in the database. Before finding references in a file, remove all identifier references due to this file [and release] from the database. [ No need to check the timestamp in this case since the "definitions" are always found before the references]. In a large CVS tree, it is quite possible that a file may change between the time it is "indexed" and "referenced". An easy way out of this seems to be to "index" a file and immediately "reference" it. Related to this there is a problem in "Plain.pm" - the current "filerev" function returns a value based on the timestamp. Problem arises if a file changes between runs of genxref. What happens is that different values are returned by "filerev" even though it is the same (file,revision) pair is being indexed [or referenced]. I have changed filerev() for this purpose as sub filerev { my ($self, $filename, $release) = @_; # TODO: length of filename+revision # might turn out to be > 255 chars # [length used in the db] return join("-", $filename, $release); } With this modification filerev() will return the same value for (file,revision) pair everytime - thus solving the problem. I have a patch ready for this. ---------------------------------------------------------------------- Comment By: Malcolm Box (mbox) Date: 2002-02-18 14:20 Message: Logged In: YES user_id=215386 Yes, you're right, this is a bug. The underlying assumption that is being broken is that the files in a version are static - which is true if one is indexing released software, but not if it is a development tree. The simplest work-around is to drop and recreate the database each time, thus avoiding the problem. For small to medium repositories with the index updated nightly this should work fine, but it doesn't work for large repositories. The full solution would appear to be to check for an existing entry for the (filename, release) pair and if it is found delete it and all associated information. ---------------------------------------------------------------------- Comment By: Shree Kumar (shreekumar) Date: 2002-02-16 13:32 Message: Logged In: YES user_id=142912 There are two cases where the scenario that I've referred to applies: 1. Files are not in CVS [ ie usage of "Files.pm" ]. You run genxref, then change a file & genxref again 2. Files are in CVS, and you want to index the "head" tag. Files change regularly, and you want to keep the cross reference in sync - probably by running genxref once an hour or so [as a cron job]. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2002-02-16 12:47 Message: Logged In: NO I was in the impression that a file may never ever change again, except if (and only if) the file was changed and has either got a new CVS revision (or tag) or if there is a new directory for a new version of the whole project (if it is not managed by CVS). ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=390117&aid=518365&group_id=27350 |