Re: [Lxr-dev] Status of this project?
Brought to you by:
ajlittoz
|
From: Paul S. <ps...@ne...> - 2007-03-22 20:13:09
|
On Thu, 2007-03-22 at 20:30 +0100, Jan-Benedict Glaw wrote:
> What may be different is the way to get the newly auto-generated
> number back again.
Yes, this is what I meant.
> But since the INSERT pathes aren't time-critical, we'd just allow to
> SELECT for the value instead of playing tricks to get it.
Hm. Interesting idea. I'm not so sure they aren't time-critical. They
don't impact the web user of course, but genxref does take a while to
index stuff already :).
I had two other performance-related ideas:
Genxref performance improvement:
Especially when adding a new release, genxref can be very slow. I think
it's because of all the indexing that goes on, and the DB re-indexes
after every insert. A common method of adding a lot of data to an SQL
DB is to put the commands into a file, then load the file all at once
with indexing disabled, then re-index everything at the end. I know
both MySQL and Postgres support this model (although most likely it's
accomplished in different ways) although I've not investigated it
thoroughly.
So, my idea was changing genxref to do this: instead of adding things to
the DB one at a time, it would write out the statements to a file and at
the end, import that file with indexing disabled.
Web performance improvement:
I'm sure someone else has already thought of this, but right now we
generate our HTML dynamically every time. It seems to me that this is a
prime candidate for caching! Especially for backends that support
annotate/blame etc. The annotations on a file won't change unless the
contents of the file changes, for the most part (the other possibility
is that the symbols in the database changes--as far as I can tell this
shouldn't happen normally but if people are worried we can have genxref
flush the cache).
We have a unique file id already, so we can cache the content using the
file id with a typical span out to avoid any single directory being too
large. We can compare creation time of the cached file vs. the source
file to tell when it's out of date. We can use the access time to clean
out old, unused cache entries if we want.
Also, we can just cache the actual file content, and leave off the
header information; the header info can be added dynamically when the
user browses it. That way the same cached copy of a file can be used
for different releases, if they all share that fileid.
> Heck, this f*ing column name caused so much grief, lets just rename
> it! Yes, that's somewhat painful and we need a Big Fat Warning in the
> v1.0 docs that the column needs to be renamed, but I'm all for doing
> that.
I'd be happy with that. It's not actually such a big deal to fix this;
you just need a series of ALTER TABLE operations. It's easy enough to
add to the readme, or even write an update script.
> > Actually there is an "ANSI_QUOTES" mode that we could set, that lets you
> > (among other things) use standard "-quoting in MySQL. That might be a
> > valid thing to do.
>
> Is this in the DBI backend or configury on the MySQL server side?
It can be set globally or per-session. We'd use per-session obviously.
It's set from the client side.
I also discovered that there's a DBI method that quotes identifiers like
this for you:
my $release = $dbh->quote_identifier('release');
That would be the safest way to go, although it's annoyingly verbose.
And, there's a DBI get_info() method that lets you ask about all kinds
of features of the server, and one of those is the quoting character, so
we could get that and use it instead of quotes.
But, changing the name sounds good to me! :)
--
-----------------------------------------------------------------------------
Paul D. Smith <ps...@ne...> http://netezza.com
"Please remain calm--I may be mad, but I am a professional."--Mad Scientist
-----------------------------------------------------------------------------
These are my opinions--Netezza takes no responsibility for them.
|