Re: [Lxr-dev] Tag "head" in its timeline (was: /bin/true as a zombie)
Brought to you by:
ajlittoz
|
From: Malcolm B. <ma...@br...> - 2001-08-02 05:39:36
|
Jason Dorje Short wrote:
>"Peder O. Klingenberg" wrote:
>
>>Shouldn't be. In the process of indexing (in Tagger.pm), the pathname
>>and symbol ('head') is looked up to find the actual revision of the
>>file (like '1.4'). The filename and revision is a key in the files
>>table in the database. As the revision associated with the 'head' is
>>changed, so will the file-id in the database. The file-id determines
>>if lxr thinks the file has been indexed before or not.
>>
Yep, this is the files table in the datamodel. This table maintains a
unique fileid for each (pathname, version) tuple - so the same pathname
can have multiple fileids as new versions of the file are indexed. The
releases table then says which fileids comprise a release.
>Does that mean that repeatedly re-indexing on "head" will leave the old
>indexes around, and thus the database will continually grow? That's
>less than ideal (although a small problem compared to others...).
>
Yes, this will indeed happen. Currently there is no way to remove the
data associated with a fileid that is no longer referenced by a release.
This is a problem for one of the sites I manage, since the source tree
evolves quickly and a newly created index is over 1Gb, wasting space is
a problem. However, as yet I have not worked out the relevant SQL magic
to discover which fileids are not associated with a release, and then
which identifiers etc are found only in those files. It may well be a
non-trivial exercise.
Of course, dropping all the tables and re-indexing works, but since it
takes over a week for the index to be built from scratch, it's hardly
ideal :-)
Cheers,
Malcolm
|