Re: [PyIndexer] Thoughts on MySQL Implementation
Status: Pre-Alpha
Brought to you by:
cduncan
From: Chris W. <ch...@ni...> - 2001-12-17 12:59:30
|
Casey Duncan wrote: > > This would involve writing the rows to be inserted > into textindex to a temporary file and then using LOAD > DATA to batch load it to mySQL. See the following for > docs on LOAD DATA: > > http://www.mysql.org/documentation/mysql/bychapter/manual_Reference.html#LOAD_DATA Well, my reluctance on this is that it needs a temporary file. Where do we put this temporary file? How do we know we're going to be allowed to write to the filesystem? That said, the only other option is to build a big INSERT INTO tbl_name VALUES (expression,...),(...),... ...and then we have to worry about max. sql length I guess? Anyone know what the maximum length of SQL you can shove down a single c.execute() is? > As the search end, it seems to me that app side > processing will be best for positional matches, such > as for phrases. Can you elaborate? > I agree that storing a document count for each word > could help with optimizing since you could start with > the smallest dataset first and prune it from there. Does this still hold true when you're OR'ign terms together? > Perhaps IISets could be used to get UNION/INTERSECT > functionality efficiently if mySQL can't do it for > you. I prefer to not require the BTrees module, as it's not part of the standard python library, but if needs must ;-) > As for partial indexes, they may help, especially > reducing memory paging and improving cache usage. Here > are the docs for that: > > http://www.mysql.org/documentation/mysql/bychapter/manual_Reference.html#CREATE_INDEX Thanks :-) Once I get the initial implementation and scalability testing package finalized, mayeb you guys could try some tweakage? > BTW: Have you seen mySQL's built-in full-text indexing > support? > > http://www.mysql.org/documentation/mysql/bychapter/manual_Reference.html#Fulltext_Search Yeah, sadly doesn't do phrase matching :-S We could use this and do the usual cheap hack that ZCatalog does, matching the phrase "x y z" simply gets turned into "x" AND "y" AND "z", but that won't be good enough for the specific application I need the indexer for :-S thanks for the comments :-) Chris |