[PyIndexer] Searching and Indexing Redux II
Status: Pre-Alpha
Brought to you by:
cduncan
From: Chris W. <ch...@ni...> - 2001-12-13 17:11:02
|
I'm cc'ing the list on this, in case anyone finds this interesting or can help, sorry about the lack of context for anyone reading it... Marcus Collins wrote: > > On Tue, 4 Dec 2001 at 00:31:10 +0000, Chris Withers wrote: > > > Indeed it is and I've found it particularly helpful, especially when diagnosing > > MySQL hangs... > > So why was MySQL hanging? No idea yet, still trying to help the MySQL AB guys reproduce the problem... > Also useful is that you can kill (mysqladmin > kill) the offending threads. Hmmm... it hangs in such a way that 'mysqladmin kill' doesnt' work, only the windows task manager can kill it :-( > You can still use ANALYZE TABLE with BDB, and apparently it does do some > stuff. That's the equivalent of the {,my}isamchk -a yourtablename. Ah, okay... <snip insert speed> Well, we can optimize indexign speed later :-) > It would be useful to know how those eight seconds are being used -- > what proportion is used by the Python script; by the MySQL server; by > the OS (on I/O). Well, most of it is inside the loop that does the inserts, so I'm guessing split between MySQLdb, MySQL and the OS. > Your best bet in determining this would be to test on > your Linux box. times (under bash(1)) will give you wall clock time, CPU > time, and system time. Hmmm... will have to give that a go when it comes to optimising... > What's the SQL nut book like? Looks pretty cool, haven't had to use it in anger yet... [snip RDB normalisation] > FWIW, the chapter on Performance and Design in the Oracle reference > begins, "No major application will run in Third Normal Form." Hehe, 'cept searchign and indexing ;-) > > [slow InnoDB] > > > Yep, it'll be interesting to see. Did you broach the subject at all to > > > the list? > > > > Yup, and one of the Innobase guys has got back to me, so we shall see... > > What did he have to say, BTW? Turns out the recent windows binaries were compiled with what was effectively a "make_me_cripplingly_slow" flag, so I might try the next release with InnoDB and see what happens... > Are they using some sort of hash function? I'm guessing yes, since > they're also compressing. Probably a page index system. That's > impressive performace, but how long does it take them to index? Also, is > their index updatable? Dunno, guess 'll find out when I get there... > > That said, I think the SQL engine could be pretty f'ing cool... Hmmm... well, we have searching not-quite-quick-enough and indexing slow so we need some kind of tweak still. Anyone got any ideas? Bit disappointed to see that MySQL's performance goes bad when you have OR's in a WHERE, or something like that, which could make boolean stuff unpleasant :-S > Quick reply now, and hopefully more in-depth after Friday... Do you want > to post this on the SF list maybe, or personal? SF List it is :-) > > > BTW, I'm not sure if the key on prev_textindex_id is necessary... Not > > > sure how MySQL handles that query... Wanna post EXPLAIN output? <snip> > Nope, it appears only in possible_keys and ref. The keys in key are used > for the index lookup, as I understand it. Indeed, so we could drop it? > > searching for "db 2" brings back 138 results... > > > > I hope there's not a bug lurking here :-S > > Well, that's obviously the first thing to determine. Maybe post the > actual SQL used, the DDL, and the EXPLAIN output? Indeed. I think I'm gonna tackle this from the other end now. Get the framework and unit tests up and running before returning to build the proper SQL engine now that I know what it should smell like... > > > - search terms that would return many, many results (impacts GROUP BY) > > > > well, when "windows 2000" brought back 1-2K results, it took 10 seconds > > Hmmm... Ten secs still too long. Indeed :-( (seen higher figures than that since...) > You can use COUNT(colname | * ), but that's basically going to save time > only because there's less data for it to allocate memory for and > transfer :-S. How does that differ from just COUNT (*), which didn't work for me? > I'm also leaning towards experimenting with sub-selects for this, but > not sure if they're yet implemented in MySQL (don't have latest version > of manual handy...) Nope... I wonder if it'd be good to look at PostGreSQL here? > Hmm, was reading through the MySQLdb code the other day, and saw that > the cursor.execute method can take a list. Don't remember details, but > may be worth looking at. Will do... > Indeed. It's the damned join. That's why I was thinking again of > sub-selects, which may be more efficient for very large results sets > (but less so for small ones, I imagine). What's the difference between a sub-select and a join? (spot the SQL newbie ;-) > I think I mentioned UNION and INTERSECT a while back, but looking at my > (out-of-date) MySQL reference, it looks like it's not supported. Nope, but it is in Postgres... > And it playing right now, it doesn't look like UNION is in the latest > version either :-S It is, but the latest unstable release... > ... which means ugly LEFT OUTER JOINs and other hacks. Or > application-side processing. or both :-S Well, hope to hear from Marcus some time, but if anyone else can dive in in the meantime, all the better... cheers, Chris |