From: Chris T. <chr...@gm...> - 2010-08-26 14:53:59
|
Hi Joe, On Aug 26, 2010, at 8:13 PM, Joe Wicentowski wrote: > > I experimented last week with dropping a Lucene Chinese analyzer into > eXist (the lucene-smartcn-2.9.2.jar documented at > http://lucene.apache.org/java/2_9_2/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html), > configuring my collection.xconf to use this analyzer instead of the 2 > default analyzers, and was pretty impressed with the results. I > recall Mike Ferrando posting some helpful steps and files about his > work porting the Snowball analyzers to eXist, too. It'd be great if > we could continue to assemble such information and even make it easier > to plug in and configure these analyzers. I'm interested to know if the match highlighting worked for you with simple <phrase> and a search like <bool><term occurs="must">xxxx</term><term occurs="must">yyyy</term><bool>. The reason that I ask is that I have made a modified WhitespaceAnalyzer and Tokenizer that works with Tibetan and made a jar of the two classes and dropped them in with the lucene jars and had no problem with indexing (after of course changing the index configuration) and searching - getting back exactly the expected results from our corpus. However, the highlight matching isn't working. We were getting highlighting with the StandardAnalyzer but of course the results were not so stellar as that analyzer is not Tibetan aware. With the phrase type query we get no highlighting and with the bool type of query one of the terms will be highlighted but not the other - it's not always the first term that is getting highlighted. I'm wondering if there's something that I need to do that I'm not aware of or is it possible that what is being done is exposing a problem in the Lucene matching code? Thank you, Chris |