[PyIndexer] MySQL indexing performance
Status: Pre-Alpha
Brought to you by:
cduncan
From: Marcus C. <ma...@wr...> - 2001-12-17 14:47:56
|
A few notes on index population in the MySQL implementation: 0. The time to index each document really does need to be improved, especially since reads on the textindex table are largely blocked during this time. 1. All the indexing is already within a transaction block per document. LOAD DATA should still increase INSERT performance, as might using MySQL's extended INSERT syntax. 2. Indexing on the first 32 characters of dictionary.word, instead of the full length, resulted in an overall 5% increase in performance, with a < 1% loss in performance in the 'words to wordids' block. Not clear yet why there was a _loss_ of performance in the lookup; would have to do more tests/tests with other key lengths... (Only one test performed) 3. Deferring indexing of the textindex indexes over prev_textindex_id and (word_id, prev_word_id) (IOW, creating indexes initially only for identity columns and dictionary.word) did not result in a pronounced improvement ( < 5%) on textindex generation with 0.7M rows. Overall performance actually sufferred (by 5%). This could be because MySQL creates a temporary working copy of the table so that it can index it in a consistent state. Anyway, this wouldn't be feasible for real- world practice.... (Only one test performed) 4. I haven't yet played with recording the document count for dictionary words, but that will increase the time a bit. How much depends... More positive test results on searching to follow... Cheers -- Marcus |