Thread: Re: [Lxr-dev] Re: /bin/true as a zombie (Page 2)
Brought to you by:
ajlittoz
|
From: Jan-Benedict G. <jb...@lu...> - 2001-08-02 08:07:37
|
On Thu, Aug 02, 2001 at 09:45:26AM +0200, Peder O. Klingenberg wrote: > Jason Dorje Short <js...@de...> writes: > > > No doubt I'm missing something, but I don't understand why my change was > > conceptually wrong. I took each version to be indexed, and traversed > > the tree for that version. > > But you can't find the versions to be indexed by asking for the > versions of the root of the tree. In a CVS tree, each file can have a > differing number of versions, have completely different tags, etc. I don't like to search a tree for all tags and take those as --allversions. I think the "old" way in having a file (or this Perl'ish "qw" list) in which all versions are listed is better. There, you've got a list _the user supplied_. She may want to only show off a number of versions, but not all of those. It is not that handy to call genxref with all wanted versions if you could simply take them from a file:-) > > The current/former behavior is to traverse the tree, and for each > > file index it for all possible versions. > > Yes, that was the point. Unless you specify a version, you get all > the versions a file has to offer. Which is bad. I'd loke to only have those versions which have a user-given tag, and only those the user really *wants* to see. So keep the traditional version list and loop over it:-) > by the name of 'range'. Would you be happy if --allversions just used > the range in lxr.conf? Yes! I think this actually is what I would prefer:-) MfG, JBG |
|
From: Jason D. S. <js...@de...> - 2001-08-02 18:07:22
|
"Peder O. Klingenberg" wrote: > > Jason Dorje Short <js...@de...> writes: > > > No doubt I'm missing something, but I don't understand why my change was > > conceptually wrong. I took each version to be indexed, and traversed > > the tree for that version. > > But you can't find the versions to be indexed by asking for the > versions of the root of the tree. In a CVS tree, each file can have a > differing number of versions, have completely different tags, etc. Here is my problem...I didn't realize it was "asking". > It may of course > well be that my concepts are different from yours, in which case it's > a matter of opinion which version (pun intended) to hold as the > correct one. My opinion is that it should work for everyone :-) Obviously, right now things will work for _either_ CVS or static. > It seems to me that this is an area that needs more customization, > because we obviously disagree, and I suspect that we both have equally > valid views. And as it so happens, we have a configuration variable > by the name of 'range'. Would you be happy if --allversions just used > the range in lxr.conf? Actually, that's what I thought it *was* doing, which is why I thought my code was good. Yes, I would be much happier if that's what happened. What happens right now? Does it index every possible version? jason |
|
From: Malcolm B. <ma...@br...> - 2001-08-03 00:27:18
|
Peder O. Klingenberg wrote: >Jason Dorje Short <js...@de...> writes: > >>No doubt I'm missing something, but I don't understand why my change was >>conceptually wrong. I took each version to be indexed, and traversed >>the tree for that version. >> >But you can't find the versions to be indexed by asking for the >versions of the root of the tree. In a CVS tree, each file can have a >differing number of versions, have completely different tags, etc. > Absolutely true. Simply because files are in the same CVS archive doesn't mean that they are even logically in the same project, let alone the same release! However, with the plain file storage, the versions are physically separate, and in other SCM systems they could either be physically separate (a different pathname) or simply different versions. >>The current/former behavior is to traverse the tree, and for each >>file index it for all possible versions. >> > >Yes, that was the point. Unless you specify a version, you get all >the versions a file has to offer. > >That was at least the concept I had of --allversions, and your patch >didn't fit that concept. Hence conceptually wrong. It may of course >well be that my concepts are different from yours, in which case it's >a matter of opinion which version (pun intended) to hold as the >correct one. > I think this is the key difference between people using CVS.pm and those using Plain.pm. You can't traverse a filetree in Plain.pm and ask what versions the file has, since the different versions have different locations. On the other hand, with CVS each file can have multiple versions. >>BTW, the new genxref doesn't work for me again. It runs through all >>versions just as if everything had already been indexed, even >>immediately after I drop and re-create the database. >> Same for me here - file->allversions() doesn't seem to work for Plain.pm. I think some test code for the lxr is badly needed :-) >Maybe this is because Plain.pm and CVS.pm disagree on how to compute >allreleases? I've never looked at Plain.pm, so I don't know. > That would appear to be the problem, looking at Plain.pm >It seems to me that this is an area that needs more customization, >because we obviously disagree, and I suspect that we both have equally >valid views. And as it so happens, we have a configuration variable >by the name of 'range'. Would you be happy if --allversions just used >the range in lxr.conf? > That seems to me to be the best solution - then the user gets to pick what versions are indexed, and it's easy to make range be equivalent to the current "allversions" for the CVS backend. >I've already patched Config.pm to accept closures for this variable in >order to get the cgi scripts to display versions based on the versions >of each file, so I think I could make this work for me. > >It would still need to use the original control flow in order to allow >the flexibility I need, but specifying a constant range list (or file) >in lxr.conf should give the same semantics as precomputing the >versions to index, should it not? > It would, yes. Malcolm |
|
From: Jan-Benedict G. <jb...@lu...> - 2001-08-01 09:03:57
|
On Wed, Aug 01, 2001 at 10:31:08AM +0200, Peder O. Klingenberg wrote:
> Jan-Benedict Glaw <jb...@lu...> writes:
> > > > ### /mm/Makefile
> > > > co: /home/data/CVSROOT/linux//mm/Makefile,v: branch number 0 too low
> > > > co: /home/data/CVSROOT/linux//mm/Makefile,v: branch number 0 too low
> > > > co: /home/data/CVSROOT/linux//mm/Makefile,v: branch number 0 too low
> > > > co: /home/data/CVSROOT/linux//mm/Makefile,v: branch number 0 too low
> > > > co: /home/data/CVSROOT/linux//mm/Makefile,v: branch number 0 too low
> > > > co: /home/data/CVSROOT/linux//mm/Makefile,v: branch number 0 too low
> > Hmmm. Yes, there are 6 {tag,branche}s for that file:
> >
> > www-data@mirror:/home/data/CVSROOT/linux/mm$ head -n 11 Makefile,v
> > head 1.4;
^^^^
This leads me to some question: how does genxref react on the fact
that "head" may be something different each time it is called? For
example I could now run a genxref on the current linux kernel, and
I could do this 2 weeks later again. The first "head" would
mean all those revisions as of the older kernel's version, the
second "head" would be a more recent kernel. Any conflicts there?
MfG, JBG
|
|
From: <pe...@kl...> - 2001-08-01 09:26:44
|
Jan-Benedict Glaw <jb...@lu...> writes:
> This leads me to some question: how does genxref react on the fact
> that "head" may be something different each time it is called? For
> example I could now run a genxref on the current linux kernel, and
> I could do this 2 weeks later again. The first "head" would
> mean all those revisions as of the older kernel's version, the
> second "head" would be a more recent kernel. Any conflicts there?
Shouldn't be. In the process of indexing (in Tagger.pm), the pathname
and symbol ('head') is looked up to find the actual revision of the
file (like '1.4'). The filename and revision is a key in the files
table in the database. As the revision associated with the 'head' is
changed, so will the file-id in the database. The file-id determines
if lxr thinks the file has been indexed before or not.
...Peder...
--
Cogito ergo panta rei.
|
|
From: <pe...@if...> - 2001-08-01 15:02:34
|
* Peder O. Klingenberg
| Shouldn't be. In the process of indexing (in Tagger.pm), the pathname
| and symbol ('head') is looked up to find the actual revision of the
| file (like '1.4'). The filename and revision is a key in the files
| table in the database. As the revision associated with the 'head' is
| changed, so will the file-id in the database. The file-id determines
| if lxr thinks the file has been indexed before or not.
I have been thinking about this as well. The ideal situation would be
if it was a way to index every revision of every file. It would then
be possible to select which version of a file to view, independently
from the version tags.
Per Kristian
|
|
From: Malcolm B. <ma...@br...> - 2001-08-02 05:46:33
|
pe...@if... wrote: >I have been thinking about this as well. The ideal situation would be >if it was a way to index every revision of every file. It would then >be possible to select which version of a file to view, independently >from the version tags. > I'm not convinced this would work well on heavily used repositories - many of the versions checked in will have little or no lasting value, so indexing them would simply eat up space with no benefit. Since a release is automatically something that people care about (else it wouldn't have been a release), simply indexing the releases seems to me to make more sense. The linux code testbed for the lxr is atypical here, since the indexer is not running off the real Linux development repository. So all versions of the files in the CVS repository are created by importing releases, resulting in all versions needing to be indexed. This is certainly not true of other users of the lxr who run it against a live development repository. Cheers, Malcolm |
|
From: Per K. G. <pe...@if...> - 2001-08-02 06:42:31
|
* Malcolm Box
| >I have been thinking about this as well. The ideal situation would be
| >if it was a way to index every revision of every file. It would then
| >be possible to select which version of a file to view, independently
| > from the version tags.
| I'm not convinced this would work well on heavily used repositories -
| many of the versions checked in will have little or no lasting value,
| so indexing them would simply eat up space with no benefit. Since a
| release is automatically something that people care about (else it
| wouldn't have been a release), simply indexing the releases seems to
| me to make more sense. The linux code testbed for the lxr is atypical
| here, since the indexer is not running off the real Linux development
| repository. So all versions of the files in the CVS repository are
| created by importing releases, resulting in all versions needing to be
| indexed. This is certainly not true of other users of the lxr who run
| it against a live development repository.
That is true. What we should probably do is to have some sort of
utility that can remove unwanted revisions from the database. I think
it should be up to the user to decide how and when this deletion
should be done. One option could be to keep all revisions newer than
the last version for instance. One of my motives for this is to enable
lxr to implement (or merge in) the features of both cvsweb and bonsai.
Per Kristian
|
|
From: Malcolm B. <ma...@br...> - 2001-08-02 11:14:55
|
Per Kristian Gjermshus wrote:
>That is true. What we should probably do is to have some sort of
>utility that can remove unwanted revisions from the database. I think
>it should be up to the user to decide how and when this deletion
>should be done. One option could be to keep all revisions newer than
>the last version for instance. One of my motives for this is to enable
>lxr to implement (or merge in) the features of both cvsweb and bonsai.
>
That's exactly what I want to see - some way of removing unused entries
from the database. Alas I have not come up with the magic SQL
incantations required yet - perhaps I'll try this weekend to see what I
can do.
Logically we need to first find all fileids that are not in any release
select f.fileid from files f, releases r where f.fileid not in {select
fileid from releases};
Then for these fileids, we need to zap any entries in the indexes table
where the symbol only appears in file(s) in the list. This is
complicated - I can't work out how the SQL would look, not being
terribly familiar with it. Then cleaning up the useage, status and
symbols table should be easy.
Cheers,
Malcolm
|
|
From: Jason D. S. <js...@de...> - 2001-08-01 15:17:28
|
"Peder O. Klingenberg" wrote:
>
> Jan-Benedict Glaw <jb...@lu...> writes:
>
> > This leads me to some question: how does genxref react on the fact
> > that "head" may be something different each time it is called? For
> > example I could now run a genxref on the current linux kernel, and
> > I could do this 2 weeks later again. The first "head" would
> > mean all those revisions as of the older kernel's version, the
> > second "head" would be a more recent kernel. Any conflicts there?
>
> Shouldn't be. In the process of indexing (in Tagger.pm), the pathname
> and symbol ('head') is looked up to find the actual revision of the
> file (like '1.4'). The filename and revision is a key in the files
> table in the database. As the revision associated with the 'head' is
> changed, so will the file-id in the database. The file-id determines
> if lxr thinks the file has been indexed before or not.
Does that mean that repeatedly re-indexing on "head" will leave the old
indexes around, and thus the database will continually grow? That's
less than ideal (although a small problem compared to others...).
jason
|
|
From: Malcolm B. <ma...@br...> - 2001-08-02 05:39:36
|
Jason Dorje Short wrote:
>"Peder O. Klingenberg" wrote:
>
>>Shouldn't be. In the process of indexing (in Tagger.pm), the pathname
>>and symbol ('head') is looked up to find the actual revision of the
>>file (like '1.4'). The filename and revision is a key in the files
>>table in the database. As the revision associated with the 'head' is
>>changed, so will the file-id in the database. The file-id determines
>>if lxr thinks the file has been indexed before or not.
>>
Yep, this is the files table in the datamodel. This table maintains a
unique fileid for each (pathname, version) tuple - so the same pathname
can have multiple fileids as new versions of the file are indexed. The
releases table then says which fileids comprise a release.
>Does that mean that repeatedly re-indexing on "head" will leave the old
>indexes around, and thus the database will continually grow? That's
>less than ideal (although a small problem compared to others...).
>
Yes, this will indeed happen. Currently there is no way to remove the
data associated with a fileid that is no longer referenced by a release.
This is a problem for one of the sites I manage, since the source tree
evolves quickly and a newly created index is over 1Gb, wasting space is
a problem. However, as yet I have not worked out the relevant SQL magic
to discover which fileids are not associated with a release, and then
which identifiers etc are found only in those files. It may well be a
non-trivial exercise.
Of course, dropping all the tables and re-indexing works, but since it
takes over a week for the index to be built from scratch, it's hardly
ideal :-)
Cheers,
Malcolm
|
|
From: Malcolm B. <ma...@br...> - 2001-08-16 16:18:19
|
"Peder O. Klingenberg" wrote: > Could people please test this before I commit? Looking at this patch, it seems to suffer from one problem. That is that if you are using the Plain.pm backend, you must know which release you are using before you attempt the getdir operation. The current version of genxref does this by reading the range variable at the head of the indexing loop - I don't see how this patch fits in with this. On a meta level, do we really need the ability to have the versions specified by a closure - if the range of versions changes based on pathnames then it's unlikely that the browsing will work smoothly, let alone the the indexing. I'd be very reluctant to see this patch applied before we thrash out precisely what it is that is wanted here - for both the plain and cvs backends. Perhaps you could kick off by outline what it is you want to be able to do with a CVS archive? Malcolm |