[PyIndexer] Searching and Indexing Redux II
Status: Pre-Alpha
Brought to you by:
cduncan
|
From: Chris W. <ch...@ni...> - 2001-12-13 17:11:02
|
I'm cc'ing the list on this, in case anyone finds this interesting or can help,
sorry about the lack of context for anyone reading it...
Marcus Collins wrote:
>
> On Tue, 4 Dec 2001 at 00:31:10 +0000, Chris Withers wrote:
>
> > Indeed it is and I've found it particularly helpful, especially when diagnosing
> > MySQL hangs...
>
> So why was MySQL hanging?
No idea yet, still trying to help the MySQL AB guys reproduce the problem...
> Also useful is that you can kill (mysqladmin
> kill) the offending threads.
Hmmm... it hangs in such a way that 'mysqladmin kill' doesnt' work, only the
windows task manager can kill it :-(
> You can still use ANALYZE TABLE with BDB, and apparently it does do some
> stuff. That's the equivalent of the {,my}isamchk -a yourtablename.
Ah, okay...
<snip insert speed>
Well, we can optimize indexign speed later :-)
> It would be useful to know how those eight seconds are being used --
> what proportion is used by the Python script; by the MySQL server; by
> the OS (on I/O).
Well, most of it is inside the loop that does the inserts, so I'm guessing split
between MySQLdb, MySQL and the OS.
> Your best bet in determining this would be to test on
> your Linux box. times (under bash(1)) will give you wall clock time, CPU
> time, and system time.
Hmmm... will have to give that a go when it comes to optimising...
> What's the SQL nut book like?
Looks pretty cool, haven't had to use it in anger yet...
[snip RDB normalisation]
> FWIW, the chapter on Performance and Design in the Oracle reference
> begins, "No major application will run in Third Normal Form."
Hehe, 'cept searchign and indexing ;-)
> > [slow InnoDB]
> > > Yep, it'll be interesting to see. Did you broach the subject at all to
> > > the list?
> >
> > Yup, and one of the Innobase guys has got back to me, so we shall see...
>
> What did he have to say, BTW?
Turns out the recent windows binaries were compiled with what was effectively a
"make_me_cripplingly_slow" flag, so I might try the next release with InnoDB and
see what happens...
> Are they using some sort of hash function? I'm guessing yes, since
> they're also compressing. Probably a page index system. That's
> impressive performace, but how long does it take them to index? Also, is
> their index updatable?
Dunno, guess 'll find out when I get there...
> > That said, I think the SQL engine could be pretty f'ing cool...
Hmmm... well, we have searching not-quite-quick-enough and indexing slow so we
need some kind of tweak still.
Anyone got any ideas?
Bit disappointed to see that MySQL's performance goes bad when you have OR's in
a WHERE, or something like that, which could make boolean stuff unpleasant :-S
> Quick reply now, and hopefully more in-depth after Friday... Do you want
> to post this on the SF list maybe, or personal?
SF List it is :-)
> > > BTW, I'm not sure if the key on prev_textindex_id is necessary... Not
> > > sure how MySQL handles that query... Wanna post EXPLAIN output?
<snip>
> Nope, it appears only in possible_keys and ref. The keys in key are used
> for the index lookup, as I understand it.
Indeed, so we could drop it?
> > searching for "db 2" brings back 138 results...
> >
> > I hope there's not a bug lurking here :-S
>
> Well, that's obviously the first thing to determine. Maybe post the
> actual SQL used, the DDL, and the EXPLAIN output?
Indeed. I think I'm gonna tackle this from the other end now.
Get the framework and unit tests up and running before returning to build the
proper SQL engine now that I know what it should smell like...
> > > - search terms that would return many, many results (impacts GROUP BY)
> >
> > well, when "windows 2000" brought back 1-2K results, it took 10 seconds
>
> Hmmm... Ten secs still too long.
Indeed :-( (seen higher figures than that since...)
> You can use COUNT(colname | * ), but that's basically going to save time
> only because there's less data for it to allocate memory for and
> transfer :-S.
How does that differ from just COUNT (*), which didn't work for me?
> I'm also leaning towards experimenting with sub-selects for this, but
> not sure if they're yet implemented in MySQL (don't have latest version
> of manual handy...)
Nope... I wonder if it'd be good to look at PostGreSQL here?
> Hmm, was reading through the MySQLdb code the other day, and saw that
> the cursor.execute method can take a list. Don't remember details, but
> may be worth looking at.
Will do...
> Indeed. It's the damned join. That's why I was thinking again of
> sub-selects, which may be more efficient for very large results sets
> (but less so for small ones, I imagine).
What's the difference between a sub-select and a join? (spot the SQL newbie ;-)
> I think I mentioned UNION and INTERSECT a while back, but looking at my
> (out-of-date) MySQL reference, it looks like it's not supported.
Nope, but it is in Postgres...
> And it playing right now, it doesn't look like UNION is in the latest
> version either :-S
It is, but the latest unstable release...
> ... which means ugly LEFT OUTER JOINs and other hacks. Or
> application-side processing.
or both :-S
Well, hope to hear from Marcus some time, but if anyone else can dive in in the
meantime, all the better...
cheers,
Chris
|