In lucene 3.1, there is a boolean field "autoGeneratePhraseQueries" which is used to control whether automatically generate a phrase query after parsing the query with specific analyzer.
This field is very important especially for CJK. A query with more than 2 cjk characters should not be converted to a phrase query. For example, query "XYZ" (where X, Y and Z are chinese characters) will be converted to a phrase query "XY YZ " for clucene-core-2.3.3.4. The query may not return any result because only bigrams are indexed.
I suggest to add autoGeneratePhraseQueries field according to lucene 3.1 implementation.
P.S.
In the implemenation of Token* CJKTokenizer::next(Token* token), for a '\0' started buffer, it should return NULL.
if (buffer[0]=='\0')
return NULL;
Our current target version is 2.3.2, we can't mix behaviors from different versions. Hopefully we will get to a 3.1 version soon enough and will have that there.