Re: [Lxr-dev] Status of this project?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, 2007-03-22 at 21:41 +0100, Jan-Benedict Glaw wrote:
> On Thu, 2007-03-22 16:12:23 -0400, Paul Smith <ps...@ne...> wrote:
> >
> > Genxref performance improvement:
> > Especially when adding a new release, genxref can be very slow.  I think
> > it's because of all the indexing that goes on, and the DB re-indexes
> > after every insert.  A common method of adding a lot of data to an SQL
> > DB is to put the commands into a file, then load the file all at once
> > with indexing disabled, then re-index everything at the end.
> 
> The basic concept could be simplified to drop PRIMARY KEY constraints
> and indexes and add them back afterwards. (Though you may fail in case
> the PRIMARY KEY constraints were ignored in parts of the data...)  Not
> a bright idea...

Yes, and there's an even more important problem I realized after I sent
this: many of LXR's tables are indexed through auto-increment values.
Of course we cannot know what values these will have until the table is
loaded, so we can't really write out a file to load all that data at
once.  The best we could do would be to update the tables one at a time,
being sure to create the basic tables first then reading the key values
out of them to create the next table, etc.

Seems overly complex, so this project is probably not worth it as things
stand.

> > Web performance improvement:
> > I'm sure someone else has already thought of this, but right now we
> > generate our HTML dynamically every time.  It seems to me that this is a
> > prime candidate for caching!  Especially for backends that support
> > annotate/blame etc.  The annotations on a file won't change unless the
> > contents of the file changes, for the most part (the other possibility
> > is that the symbols in the database changes--as far as I can tell this
> > shouldn't happen normally but if people are worried we can have genxref
> > flush the cache).
> 
> Output shouldn't change at all :)  Well, unless the templates get
> modified.

Hopefully most/all of those types of changes can be handled through CSS;
that would be my preferred way to do it anyway (I think many already
are).

I was thinking about symbols: if we re-index and some symbol that we
didn't used to know about suddenly becomes available in the database,
then any cached files that reference that symbol will not be updated to
have a link.  For static source trees this can't happen, of course, but
LXR is also used to index trees that are still changing (a daily update
of the HEAD of some stream/branch for example).

Admittedly it's pretty hard to think of how this could happen without
the relevant source files changing as well, which would obviously flush
the cache.  The only way I can see it offhand is if a symbol which used
to be contained outside the tree (and so not indexed), suddenly were
moved inside.  This is such a corner case I'm not sure it's worth
catering to at the expense of much performance.

I was imagining how this might be implemented and I think that the first
step is to create an LXR::File (one object per file) class.  That would
make the caching interface and code much simpler.

This is definitely a post-1.0 thing.

> > I'd be happy with that.  It's not actually such a big deal to fix this;
> > you just need a series of ALTER TABLE operations.  It's easy enough to
> > add to the readme, or even write an update script.
> 
> I'm not sure how happy MySQL will be with its foreign keys...

Hm.  It definitely needs to be tested.

-- 
-----------------------------------------------------------------------------
 Paul D. Smith <ps...@ne...>                       http://netezza.com
 "Please remain calm--I may be mad, but I am a professional."--Mad Scientist
-----------------------------------------------------------------------------
      These are my opinions--Netezza takes no responsibility for them.