Re: [Lxr-dev] Status of this project?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, 2007-03-22 16:12:23 -0400, Paul Smith <ps...@ne...> wrote:
> I had two other performance-related ideas:
>=20
> Genxref performance improvement:
> Especially when adding a new release, genxref can be very slow.  I think
> it's because of all the indexing that goes on, and the DB re-indexes
> after every insert.  A common method of adding a lot of data to an SQL
> DB is to put the commands into a file, then load the file all at once
> with indexing disabled, then re-index everything at the end.  I know
> both MySQL and Postgres support this model (although most likely it's
> accomplished in different ways) although I've not investigated it
> thoroughly.

The basic concept could be simplified to drop PRIMARY KEY constraints
and indexes and add them back afterwards. (Though you may fail in case
the PRIMARY KEY constraints were ignored in parts of the data...)  Not
a bright idea...

> So, my idea was changing genxref to do this: instead of adding things to
> the DB one at a time, it would write out the statements to a file and at
> the end, import that file with indexing disabled.

COPY could be an alternative, but that requires a file being read
directly by the PostgreSQL server. (Don't know how, or whether at all,
this is implemented by the other DB backends.)

Also, the \copy directive of psql is worth being mentioned.

> Web performance improvement:
> I'm sure someone else has already thought of this, but right now we
> generate our HTML dynamically every time.  It seems to me that this is a
> prime candidate for caching!  Especially for backends that support
> annotate/blame etc.  The annotations on a file won't change unless the
> contents of the file changes, for the most part (the other possibility
> is that the symbols in the database changes--as far as I can tell this
> shouldn't happen normally but if people are worried we can have genxref
> flush the cache).

Output shouldn't change at all :)  Well, unless the templates get
modified.

> We have a unique file id already, so we can cache the content using the
> file id with a typical span out to avoid any single directory being too
> large.  We can compare creation time of the cached file vs. the source
> file to tell when it's out of date.  We can use the access time to clean
> out old, unused cache entries if we want.
>=20
> Also, we can just cache the actual file content, and leave off the
> header information; the header info can be added dynamically when the
> user browses it.  That way the same cached copy of a file can be used
> for different releases, if they all share that fileid.

Hopefully, everybody has a nice robots.txt to forbid Google et al. to
index the whole thing, once...

> > Heck, this f*ing column name caused so much grief, lets just rename
> > it!  Yes, that's somewhat painful and we need a Big Fat Warning in the
> > v1.0 docs that the column needs to be renamed, but I'm all for doing
> > that.
>=20
> I'd be happy with that.  It's not actually such a big deal to fix this;
> you just need a series of ALTER TABLE operations.  It's easy enough to
> add to the readme, or even write an update script.

I'm not sure how happy MySQL will be with its foreign keys...

> > > Actually there is an "ANSI_QUOTES" mode that we could set, that lets =
you
> > > (among other things) use standard "-quoting in MySQL.  That might be a
> > > valid thing to do.
> >=20
> > Is this in the DBI backend or configury on the MySQL server side?
>=20
> It can be set globally or per-session.  We'd use per-session obviously.
> It's set from the client side.
>=20
> I also discovered that there's a DBI method that quotes identifiers like
> this for you:
>=20
> 	my $release =3D $dbh->quote_identifier('release');
>=20
> That would be the safest way to go, although it's annoyingly verbose.
> And, there's a DBI get_info() method that lets you ask about all kinds
> of features of the server, and one of those is the quoting character, so
> we could get that and use it instead of quotes.
>=20
> But, changing the name sounds good to me! :)

Lets just change the name.  I actually don't think such a kludge is
worth being done while an easy solution is available.

MfG, JBG

--=20
      Jan-Benedict Glaw      jb...@lu...              +49-172-7608481
Signature of:              Alles sollte so einfach wie m=C3=B6glich gemacht=
 sein.
the second  :                          Aber nicht einfacher.  (Einstein)