Re: [Lxr-dev] [ lxr-Bugs-518365 ] Indexing of files once indexed is buggy!
Brought to you by:
ajlittoz
From: Arne G. G. <ar...@li...> - 2002-05-01 11:31:22
|
* Malcolm Box > Again, moving discussion of this to the list. Forgive me for butting in this late, I'm just trying to catch up. I'll just add a few comments to this issue. Just whack me if I'm stating stuff you've already covered. > Plain.pm then emulates this by using timestamps as the revision > history of the file. Therefore if you have two directories v1 and v2, > and they both contain a file X with the same timestamp, LXR will treat > them as the same file. The easiest way to see this is to symlink > between the two version directories - indexing will occur only once. > This mechanism only has one way of breaking - two files with the same > name and timestamp that are actually not identical. However, it would > be a strange revision control system that would give you such files - > most systems either give you the time of checkin (in which case the > timestamps would not be identical) or the time of checkout (similarly, > the timestamps should not be equal since the two files probably > weren't written at the same time). The result of this is that with > Plain.pm LXR will sometimes index the file more often that it needs > to, but it should not decide not to index it when it does need to. > Using the Plain.pm backend essentially trades off diskspace for ease > of use. This is a bit non-obvious, and I'm happy to see that you're able to summarise the issue so clearly when someone suggests to mangle stuff we've actually struggled a fair bit with in the past. :) As a note, Plain.pm includes the file size in the revision string, which means that files would have to have the same timestamp and size as well as different contents for LXR to fail to index changed files. As far as solutions to this problem go; even with Plain.pm we have some notion of the set of files belonging to a particular release. Thus, when indexing a release and encountering a (filename, revision)-tuple belonging to it, we could invalidate all non-matching (filename, *)-tuples marked as belonging to the same release (and no other releases). In doing this, we would also need to invalidate the reference-information for this release. As long as we do that we'd be home free as far as database integrity is concerned, as far as I can see. (A possible shortcut would be to index (filename, rev2) before (possibly) invalidating the information for (filename, rev1) and only invalidate the reference-information if we find that the two define non-matching sets of symbols.) Arne. |