Re: [PyIndexer] Thoughts on MySQL Implementation
Status: Pre-Alpha
Brought to you by:
cduncan
From: Marcus C. <ma...@wr...> - 2001-12-16 22:54:03
|
On Sun, 16 Dec 2001 at 09:00:53 -0800, Casey Duncan wrote: [ Data population and LOAD DATA ] This would definitely be a lot faster than individual INSERTs. The main reason for the speed-up is that in this case MySQL treats the whole thing as one transaction, so it defers flushing the key buffers until all the data is loaded. The same can be accomplished (although not quite as efficiently) by wrapping the whole lot of inserts inside a transaction block (if using BDB or InnoDB tables) or LOCK/UNLOCK TABLEs (if using non-TST's). Use BEGIN/COMMIT in the former case. [ App-side processing ] > As the search end, it seems to me that app side > processing will be best for positional matches, such > as for phrases. I'd imagine such processing should be > eventually coded in C, but it should be acceptable in > Python if done efficiently (like using Python arrays, > IISets or some-such). I'd tend to do more of the processing on the database side, as described, in which case use of C or Python becomes less of an issue. > http://www.python.org/doc/essays/list2str.html Very interesting and inspiring :-) > I agree that storing a document count for each word > could help with optimizing since you could start with > the smallest dataset first and prune it from there. > Perhaps IISets could be used to get UNION/INTERSECT > functionality efficiently if mySQL can't do it for > you. I reckon MySQL can do it pretty efficiently given a set of ids, but doing such processing as coalescing duplicates, etc., on the app side would likely be faster than letting the MySQL query parser and optimiser do it for you. That said, it would be only a matter of ms improvement; the main benefit of app-side processing is in eliminating joins. [ prefix indexes ] > Sometimes less is more 8^). :-) > impossible to tell which would be faster on a given > architecture without real-world testing. Indeed. Although I don't see a knob to change the size of the key block, though, so AFAIK it's set at 1024 bytes. Caching will likely have different effects on different platforms, though, and the setting of key_buffer_size will have an effect on MySQL's own caching. [ MySQL's FULLTEXT index ] > http://www.mysql.org/documentation/mysql/bychapter/manual_Reference.html#Fulltext_Search AFAIK, it doesn't do phrase matching. Their relevance calculation looks very interesting, though. Cheers -- Marcus |