From: Irene V. <ire...@pa...> - 2024-12-12 11:24:22
|
Thank you very much, it was this! In the meantime I was also suggested to look at this https://github.com/eXist-db/documentation/issues/385, so I copy it here in case it may be helpful to others too. Thanks again, Irene > On 11.12.2024 11:17 CET Dannes Wessels <di...@ex...> wrote: > > > > Hi, > > I think you need to register the jar in EXIST_HOME/etc/launcher.xml > > Cheers > > Dannes > > > > > On 4 Dec 2024, at 12:55, Irene Vagionakis <ire...@pa...bh> wrote: > > > > > > > Hi there! > > > > I am trying to add a custom Lucene analyzer that behaves like the WhitespaceAnalyzer concerning tokenization and (lack of) stemming, but that is also case-insensitive (basically the same of https://sourceforge.net/p/exist/mailman/message/35188378/). > > > > I followed what suggested in the post thread above, that is writing the custom analyzer, compiling its class as JAR and saving it in $EXIST_HOME/lib/user, but it is not working. I tried also putting it in the same folder of the other Lucene JAR files, but the same. > > > > Since both my Java/Lucene and eXist-db knowledge are quite poor, I am struggling to figure out whether the problem concerns my code or has to do with eXist-db itself. > > > > This is my custom analyzer code: > > > > package org.custom; > > import org.apache.lucene.analysis.Analyzer; > > import org.apache.lucene.analysis.TokenStream; > > import org.apache.lucene.analysis.core.LowerCaseFilter; > > import org.apache.lucene.analysis.core.WhitespaceTokenizer; > > public class CaseInsensitiveWhitespaceAnalyzer extends Analyzer { > > @Override > > protected TokenStreamComponents createComponents(String fieldName) { > > final WhitespaceTokenizer source = new WhitespaceTokenizer(); > > final TokenStream filter = new LowerCaseFilter(source); > > return new TokenStreamComponents(source, filter); > > } > > } > > > > And this is how I reference to it in collection.xconf: > > <analyzer id="custom" class="org.custom.CaseInsensitiveWhitespaceAnalyzer"/> > > > > I also tested the analyzer outside eXist-db with the following and it returned the expected tokens: > > > > import org.apache.lucene.analysis.Analyzer; > > import org.apache.lucene.analysis.TokenStream; > > import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; > > import org.custom.CaseInsensitiveWhitespaceAnalyzer; > > import java.io.IOException; > > import java.io.StringReader; > > public class TestAnalyzer { > > public static void main(String[] args) throws IOException { > > String text = "Lucene is a Simple1 123 5% _test - Yet Powerful - Java Based Search Library. I love IT!"; > > Analyzer analyzer = new CaseInsensitiveWhitespaceAnalyzer(); > > try (TokenStream tokenStream = analyzer.tokenStream("field", new StringReader(text))) { > > CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class); > > tokenStream.reset(); > > while (tokenStream.incrementToken()) { > > System.out.println(charTermAttribute.toString()); > > } > > tokenStream.end(); > > } > > } > > } > > > > What am I doing wrong? Any suggestions/hints will be highly appreciated :) > > > > Thanks, > > Irene > > _______________________________________________ > > Exist-open mailing list > > Exi...@li... > > https://lists.sourceforge.net/lists/listinfo/exist-open > > > |