From: Dannes W. <di...@ex...> - 2024-12-11 10:26:05
|
Hi, I think you need to register the jar in EXIST_HOME/etc/launcher.xml Cheers Dannes > On 4 Dec 2024, at 12:55, Irene Vagionakis <ire...@pa...bh> wrote: > > > Hi there! > > I am trying to add a custom Lucene analyzer that behaves like the WhitespaceAnalyzer concerning tokenization and (lack of) stemming, but that is also case-insensitive (basically the same of https://sourceforge.net/p/exist/mailman/message/35188378/). > > I followed what suggested in the post thread above, that is writing the custom analyzer, compiling its class as JAR and saving it in $EXIST_HOME/lib/user, but it is not working. I tried also putting it in the same folder of the other Lucene JAR files, but the same. > > Since both my Java/Lucene and eXist-db knowledge are quite poor, I am struggling to figure out whether the problem concerns my code or has to do with eXist-db itself. > > This is my custom analyzer code: > > package org.custom; > import org.apache.lucene.analysis.Analyzer; > import org.apache.lucene.analysis.TokenStream; > import org.apache.lucene.analysis.core.LowerCaseFilter; > import org.apache.lucene.analysis.core.WhitespaceTokenizer; > public class CaseInsensitiveWhitespaceAnalyzer extends Analyzer { > @Override > protected TokenStreamComponents createComponents(String fieldName) { > final WhitespaceTokenizer source = new WhitespaceTokenizer(); > final TokenStream filter = new LowerCaseFilter(source); > return new TokenStreamComponents(source, filter); > } > } > > And this is how I reference to it in collection.xconf: > <analyzer id="custom" class="org.custom.CaseInsensitiveWhitespaceAnalyzer"/> > > I also tested the analyzer outside eXist-db with the following and it returned the expected tokens: > > import org.apache.lucene.analysis.Analyzer; > import org.apache.lucene.analysis.TokenStream; > import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; > import org.custom.CaseInsensitiveWhitespaceAnalyzer; > import java.io.IOException; > import java.io.StringReader; > public class TestAnalyzer { > public static void main(String[] args) throws IOException { > String text = "Lucene is a Simple1 123 5% _test - Yet Powerful - Java Based Search Library. I love IT!"; > Analyzer analyzer = new CaseInsensitiveWhitespaceAnalyzer(); > try (TokenStream tokenStream = analyzer.tokenStream("field", new StringReader(text))) { > CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class); > tokenStream.reset(); > while (tokenStream.incrementToken()) { > System.out.println(charTermAttribute.toString()); > } > tokenStream.end(); > } > } > } > > What am I doing wrong? Any suggestions/hints will be highly appreciated :) > > Thanks, > Irene > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open |