From: Martin G. <gim...@gi...> - 2003-02-22 10:12:15
|
Jeff Dairiki <da...@da...> writes: >> Could you store the records as binary data. > > Yes, but then you can't use the database text-search functionality > (which would be a big performance penalty.) (E.g. you can't do > case-insensitive searches on binary data.) This is {only,primarily} > an issue with the SQL databases, of course, since file and dbm > database don't have text search ability (so PhpWiki has to iterate > over each page itself to do the search.) Isn't this almost what the database has to do today? A FullTextSearch for 'Hello World' uses this SQL as the $search_clause: (LOWER(pagename) LIKE '%hello%' OR content LIKE '%hello%') AND (LOWER(pagename) LIKE '%world%' OR content LIKE '%world%') I don't think any database will be able to optimize on such a query very much because it contains the computed value LOWER(pagename) which means that an index for pagename probably cannot be used. Also, the LIKE search is expensive if it cannot use an index. The manual for MySQL actually says that it can't: The following SELECT statements will not use indexes: mysql> SELECT * FROM tbl_name WHERE key_col LIKE "%Patrick%"; mysql> SELECT * FROM tbl_name WHERE key_col LIKE other_col; In the first statement, the LIKE value begins with a wildcard character. In the second statement, the LIKE value is not a constant. That was from http://www.mysql.com/doc/en/MySQL_indexes.html#IDX879. When the database is done searching through all the text in the Wiki, then it's processed further with regular expressions to do the highlighting... So perhaps it would be almost as fast if we retrieved all the text from the database, and then searched and highlighted in one step? Then we could store the data in UTF-8 in the database (which would be extremely cool!), because it no longer has to deal with the data, just store it for us. I agree that this sounds a little ugly --- databases are meant to be used to speed up such searches through large masses of data, but since the current code already forces the database to do a slow search, and we then also search (highlight) ourselves afterwards, perhaps it isn't that much uglier than the current scheme... -- Martin Geisler My GnuPG Key: 0xF7F6B57B See http://gimpster.com/ and http://phpweather.net/ for: PHP Weather => Shows the current weather on your webpage and PHP Shell => A telnet-connection (almost :-) in a PHP page. |