Main Page
From mysqlftppc
Contents |
MySQL full-text parser plugin collection
MySQL 5.1 and later, full-text parser can be plugged to swap default builtin parser with user provided one. MySQL full-text parser plugin collection project (mysqlftppc) now provides following full-text parser plugins.
Installation & Settings
Requirements
- MySQL 5.1.31 or later (previous version must be patched. patches are available. http://bugs.mysql.com/39746, http://bugs.mysql.com/39640)
- ICU library 2.6 or later (optional for Unicode normalization)
- utf8 charset (optional for Unicode normalization)
- mecab library (optional for mecab plugin)
- You must run mysql_upgrade when you upgrade from MySQL 5.0.
compiling
If you download tar-ball, extract it. Run configure script. If the script could not find mysql_config, pass --mysql-config=/path/to/mysql_config argument at least. If your mysqld is compiled in 64bit, or debug=full, be careful to supply extra appropriate CFLAGS.
installation
After you issue 'make install', the plugin must be loaded into mysqld daemon process. Connect to mysql with administrative user, and run INSTALL PLUGIN sql.
INSTALL PLUGIN bigram SONAME 'libftbigram.so'
my.cnf
mysqlftppc plugins (except snowball plugin) are not affected by the setting of MySQL system variable ft_min_word_len and ft_max_word_len. Upper limit of the word length is 254 bytes and it is hardcoded (same with HA_FT_MAXBYTELEN of MyISAM).
If you use skip-grant-tables option, you might want to load the plugin at server startup. Use plugin-load as following in my.cnf
[mysqld] plugin-load=space=libftspace.so ;; If you have multiple plugins: ;; plugin-load=space=libftspace.so:mecab=libftmecab.so
Using subversion
If you want to use the latest source code, please check out the source from subversion repository. You can generate configure script like this:
$ svn co http://mysqlftppc.svn.sourceforge.net/svnroot/mysqlftppc/mecab/trunk/ mecab $ cd mecab $ aclocal $ libtoolize --automake $ automake --add-missing $ automake $ autoconf $ ./configure --with-mysql-config=/path/to/mysql_config --with-mecab-config=/path/to/mecab/bin/mecab-config
Unicode normalization
Current MySQL implementation of Unicode collation algorithm is not complete, but is really useful in real application. If your application is fine with that default collation implementation, you don't have to compile the plugins with ICU library. You can use MySQL collation, which can be defined at CREATE TABLE statement.
Examples:
mysql> SELECT 'ガギグゲゴ'='カキクケコ' COLLATE utf8_unicode_ci; +-------------------------------------------------------------+ | 'ガギグゲゴ'='カキクケコ' COLLATE utf8_unicode_ci | +-------------------------------------------------------------+ | 1 | +-------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT '㍉'='ミリ' COLLATE utf8_unicode_ci; +----------------------------------------+ | '㍉'='ミリ' COLLATE utf8_unicode_ci | +----------------------------------------+ | 1 | +----------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT 'ガ'='ガ' COLLATE utf8_unicode_ci; +----------------------------------------+ | 'ガ'='ガ' COLLATE utf8_unicode_ci | +----------------------------------------+ | 1 | +----------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT '①'='1' COLLATE utf8_unicode_ci; +-----------------------------------+ | '①'='1' COLLATE utf8_unicode_ci | +-----------------------------------+ | 1 | +-----------------------------------+ 1 row in set (0.01 sec)
You do have to compile the plugins with ICU library only when you want to control perfect Unicode normalization, typically when you want to decompose the string sequence or want to normalize into compatibility form (NFKC, NFKD). When you enable ICU and use unicode normalization, plugin will use more memory and CPU.
Reporting bugs
Please use the Tracker, when you have found a bug, have a question, or have something to report.
