Re: [Lxr-dev] [ lxr-Bugs-518365 ] Indexing of files once indexed is buggy!
Brought to you by:
ajlittoz
From: Malcolm B. <ma...@br...> - 2002-05-01 12:01:02
|
Arne Georg Gleditsch wrote: >This is a bit non-obvious, and I'm happy to see that you're able to >summarise the issue so clearly when someone suggests to mangle stuff >we've actually struggled a fair bit with in the past. :) As a note, >Plain.pm includes the file size in the revision string, which means >that files would have to have the same timestamp and size as well as >different contents for LXR to fail to index changed files. > I hadn't realised that size was included - that makes it even more robust. Certainly the terminology of releases, revisions etc is not that clear - it took me a while to get my head round it. Moving to the idea of being able to index a "HEAD" revision (ie one that is evolving) clearly challenges some of the assumptions in the code, though not the overall semantic model. >As far as solutions to this problem go; even with Plain.pm we have >some notion of the set of files belonging to a particular release. >Thus, when indexing a release and encountering a (filename, >revision)-tuple belonging to it, we could invalidate all non-matching >(filename, *)-tuples marked as belonging to the same release (and no >other releases). In doing this, we would also need to invalidate the >reference-information for this release. As long as we do that we'd be >home free as far as database integrity is concerned, as far as I can >see. > Indeed, this works very well. I have got it going for the Postgres backend, since the nice referential integrity triggers make this kind of cascading delete very easy. Unfortunately I haven't completed the port to the MySQL backend, since that takes much more manual grovelling to clean up. This also won't be hitting the CVS repository for a while since the code is on a laptop which is being shipped from Japan to the UK and so is now bobbing around on the Pacific ocean at a guess :-) Of course, I might get frustrated enough with the bug to just re-code the fix, but the "drop and rebuild the db every now and then" fix is working for me at the moment. >(A possible shortcut would be to index (filename, rev2) before >(possibly) invalidating the information for (filename, rev1) and only >invalidate the reference-information if we find that the two define >non-matching sets of symbols.) > It's probably more effort to track the new stuff and compare with the old than simply to delete and re-add. The big problem (that's just occurred to me) is with the useage table. If the new revision of the file defines new symbols, then for total accuracy all existing files need to be re-referenced to see if they use that symbol. Luckily it's extremely unlikely that someone would add a new symbol in a file that retrospectively re-defines symbols in other files, but I guess it is a theoretical possibility. This is also the reason why a "index file, reference file" loop doesn't work, rather than the "index all files", "reference all files" approach taken at the moment. Cheers, Malcolm P.S. Good to see you back on the list again :-) |