Re: [Exist-open] Lucene custom analyzer

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Thank you very much, it was this!
In the meantime I was also suggested to look at this https://github.com/eXist-db/documentation/issues/385, so I copy it here in case it may be helpful to others too.

Thanks again,
Irene

> On 11.12.2024 11:17 CET Dannes Wessels <di...@ex...> wrote:
>
>
>
> Hi,
>
> I think you need to register the jar in EXIST_HOME/etc/launcher.xml
>
> Cheers
>
> Dannes
>
>
>
> > On 4 Dec 2024, at 12:55, Irene Vagionakis <ire...@pa...> wrote:
> >
> >
>
> > Hi there!
> >
> > I am trying to add a custom Lucene analyzer that behaves like the WhitespaceAnalyzer concerning tokenization and (lack of) stemming, but that is also case-insensitive (basically the same of https://sourceforge.net/p/exist/mailman/message/35188378/).
> >
> > I followed what suggested in the post thread above, that is writing the custom analyzer, compiling its class as JAR and saving it in $EXIST_HOME/lib/user, but it is not working. I tried also putting it in the same folder of the other Lucene JAR files, but the same.
> >
> > Since both my Java/Lucene and eXist-db knowledge are quite poor, I am struggling to figure out whether the problem concerns my code or has to do with eXist-db itself.
> >
> > This is my custom analyzer code:
> >
> > package org.custom;
> > import org.apache.lucene.analysis.Analyzer;
> > import org.apache.lucene.analysis.TokenStream;
> > import org.apache.lucene.analysis.core.LowerCaseFilter;
> > import org.apache.lucene.analysis.core.WhitespaceTokenizer;
> > public class CaseInsensitiveWhitespaceAnalyzer extends Analyzer {
> >     @Override
> >     protected TokenStreamComponents createComponents(String fieldName) {
> >         final WhitespaceTokenizer source = new WhitespaceTokenizer();
> >         final TokenStream filter = new LowerCaseFilter(source);
> >         return new TokenStreamComponents(source, filter);
> >     }
> > }
> >
> > And this is how I reference to it in collection.xconf:
> > <analyzer id="custom" class="org.custom.CaseInsensitiveWhitespaceAnalyzer"/>
> >
> > I also tested the analyzer outside eXist-db with the following and it returned the expected tokens:
> >
> > import org.apache.lucene.analysis.Analyzer;
> > import org.apache.lucene.analysis.TokenStream;
> > import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
> > import org.custom.CaseInsensitiveWhitespaceAnalyzer;
> > import java.io.IOException;
> > import java.io.StringReader;
> > public class TestAnalyzer {
> >     public static void main(String[] args) throws IOException {
> >         String text = "Lucene is a Simple1 123 5% _test - Yet Powerful - Java Based Search Library. I love IT!";
> >         Analyzer analyzer = new CaseInsensitiveWhitespaceAnalyzer();
> >         try (TokenStream tokenStream = analyzer.tokenStream("field", new StringReader(text))) {
> >             CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
> >             tokenStream.reset();
> >             while (tokenStream.incrementToken()) {
> >                 System.out.println(charTermAttribute.toString());
> >             }
> >             tokenStream.end();
> >         }
> >     }
> > }
> >
> > What am I doing wrong? Any suggestions/hints will be highly appreciated :)
> >
> > Thanks,
> > Irene
> > _______________________________________________
> > Exist-open mailing list
> > Exi...@li...
> > https://lists.sourceforge.net/lists/listinfo/exist-open
> >
>

Re: [Exist-open] Lucene custom analyzer

eXist-db is a feature rich Open Source native XML database

Re: [Exist-open] Lucene custom analyzer