Re: [Lxr-dev] Tag "head" in its timeline (was: /bin/true as a zombie)
Brought to you by:
ajlittoz
From: Malcolm B. <ma...@br...> - 2001-08-02 05:39:36
|
Jason Dorje Short wrote: >"Peder O. Klingenberg" wrote: > >>Shouldn't be. In the process of indexing (in Tagger.pm), the pathname >>and symbol ('head') is looked up to find the actual revision of the >>file (like '1.4'). The filename and revision is a key in the files >>table in the database. As the revision associated with the 'head' is >>changed, so will the file-id in the database. The file-id determines >>if lxr thinks the file has been indexed before or not. >> Yep, this is the files table in the datamodel. This table maintains a unique fileid for each (pathname, version) tuple - so the same pathname can have multiple fileids as new versions of the file are indexed. The releases table then says which fileids comprise a release. >Does that mean that repeatedly re-indexing on "head" will leave the old >indexes around, and thus the database will continually grow? That's >less than ideal (although a small problem compared to others...). > Yes, this will indeed happen. Currently there is no way to remove the data associated with a fileid that is no longer referenced by a release. This is a problem for one of the sites I manage, since the source tree evolves quickly and a newly created index is over 1Gb, wasting space is a problem. However, as yet I have not worked out the relevant SQL magic to discover which fileids are not associated with a release, and then which identifiers etc are found only in those files. It may well be a non-trivial exercise. Of course, dropping all the tables and re-indexing works, but since it takes over a week for the index to be built from scratch, it's hardly ideal :-) Cheers, Malcolm |