In lucene 3.1, there is a boolean field "autoGeneratePhraseQueries" which is used to control whether automatically generate a phrase query after parsing the query with specific analyzer.
This field is very important especially for CJK. A query with more than 2 cjk characters should not be converted to a phrase query. For example, query "XYZ" (where X, Y and Z are chinese characters) will be converted to a phrase query "XY YZ " for clucene-core-18.104.22.168. The query may not return any result because only bigrams are indexed.
I suggest to add autoGeneratePhraseQueries field according to lucene 3.1 implementation.
In the implemenation of Token* CJKTokenizer::next(Token* token), for a '\0' started buffer, it should return NULL.
Log in to post a comment.