Re: [Hebmorph-thinktank] Help need with HdbMorph Setup on Solr
Status: Pre-Alpha
Brought to you by:
synhershko
|
From: Shai <sh...@dr...> - 2012-01-10 09:56:34
|
I think that with the current configuration it don't load the hspell for each query because the query takes 2ms (for a very small amount of documents) On Tue, Jan 10, 2012 at 10:52 AM, Itamar Syn-Hershko <it...@co...>wrote: > Basically, MorphAnalyzer uses a custom tokenizer and some of it's own > filters, so I'm not sure if its a good idea to define other ones like you > did. The snowball one is definitely not helpful here. > > Also, you want to make sure MorphAnalyzer doesn't get recreated on each > query. I'm not sure if and how this could be done with SOLR, but it's > crucial as loading the hspell dictionary takes about 2 seconds... > > > On Mon, Jan 9, 2012 at 12:06 PM, Shai <sh...@dr...> wrote: > >> Hi >> I am re-testing HebMorph now for using with hebrew searches in apache-solr >> I use apache-solr-1.4.1 >> and it seems to work with the latest HebMorph commit id >> eb403a6ad63bfc0dc18cf100dc3f256a4a6eb8af >> (even when compiled with lucene 3.0.2) >> >> it seems to work but I didn't test it fully yet >> >> I end up with something like this config for fieldType text in >> schema.xml - >> I will be happy to know the configurations others use and if its fully >> configured to work properly >> (if i need to use additional filters/tokenizers/analyzers and so on...) >> >> >> <fieldType name="text" class="solr.TextField"> >> <analyzer type="index" >> class="org.apache.lucene.analysis.hebrew.MorphAnalyzer"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.StopFilterFactory" >> ignoreCase="true" >> words="stopwords.txt" >> enablePositionIncrements="true" >> /> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="1" >> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.SnowballPorterFilterFactory" >> language="English" protected="protwords.txt"/> >> </analyzer> >> <analyzer type="query" >> class="org.apache.lucene.analysis.hebrew.MorphAnalyzer"> >> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" >> ignoreCase="true" expand="true"/> >> <filter class="solr.StopFilterFactory" >> ignoreCase="true" >> words="stopwords.txt" >> enablePositionIncrements="true" >> /> >> <filter class="solr.WordDelimiterFilterFactory" >> generateWordParts="1" generateNumberParts="1" catenateWords="0" >> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.SnowballPorterFilterFactory" >> language="English" protected="protwords.txt"/> >> </analyzer> >> </fieldType> >> >> >> >> On Thu, Nov 24, 2011 at 11:29 PM, Itamar Syn-Hershko <it...@co...>wrote: >> >>> I'm not really sure what to tell you. I never used HebMorph with Solr, >>> but I know some people did ( >>> http://lucene.472066.n3.nabble.com/using-HebMorph-td1826534.html), >>> possibly with earlier versions. >>> >>> Java's ClassCastException is sometimes when compilation to jar isn't >>> done correctly. >>> >>> Sorry I can't be of more help atm. >>> >>> On Thu, Nov 24, 2011 at 6:59 PM, Manoj Damodaran <mda...@at...>wrote: >>> >>>> Itamar,**** >>>> >>>> ** ** >>>> >>>> I gave up making it work with lucene 2.9.3 (solr 1.4.1) and tried to >>>> compile HebMorph for other solr versions, but none of them work.**** >>>> >>>> Solr Lucene**** >>>> >>>> 1.4.1 2.9.3**** >>>> >>>> 3.1.0 3.1.0**** >>>> >>>> 3.2.0 3.2.0**** >>>> >>>> 3.3.0 3.3.0**** >>>> >>>> 3.4.0 3.4.0**** >>>> >>>> ** ** >>>> >>>> Lucene 3.0.2 is not bundled with any solr. I am getting the below >>>> runtime exception**** >>>> >>>> ** ** >>>> >>>> 24-Nov-2011 16:58:39 org.apache.solr.schema.IndexSchema readAnalyzer*** >>>> * >>>> >>>> SEVERE: Cannot load analyzer: >>>> org.apache.lucene.analysis.hebrew.MorphAnalyzer**** >>>> >>>> java.lang.ClassCastException: class >>>> org.apache.lucene.analysis.hebrew.MorphAnalyzer**** >>>> >>>> at java.lang.Class.asSubclass(Unknown Source)**** >>>> >>>> at >>>> org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:828)** >>>> ** >>>> >>>> at >>>> org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:62)**** >>>> >>>> at >>>> org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)**** >>>> >>>> at >>>> org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)**** >>>> >>>> at >>>> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140) >>>> **** >>>> >>>> ** ** >>>> >>>> Has anyone had success running HebMorph on Solr, What version did they >>>> use.**** >>>> >>>> ** ** >>>> >>>> Thanks,**** >>>> >>>> Manoj**** >>>> >>>> ** ** >>>> >>>> *From:* ita...@gm... [mailto:ita...@gm...] >>>> *On Behalf Of *Itamar Syn-Hershko >>>> *Sent:* 23 November 2011 07:39 PM >>>> >>>> *To:* Manoj Damodaran >>>> *Cc:* heb...@li... >>>> *Subject:* Re: [Hebmorph-thinktank] Help need with HdbMorph Setup on >>>> Solr**** >>>> >>>> ** ** >>>> >>>> MorphAnalyzer is compiled against 3.0.2, and the API might have >>>> changed. Can you try looking at the project history, I think it was 2.9.3 >>>> not long ago, that should get you going.**** >>>> >>>> On Wed, Nov 23, 2011 at 2:44 PM, Manoj Damodaran <mda...@at...> >>>> wrote:**** >>>> >>>> Itmar,**** >>>> >>>> **** >>>> >>>> Thanks for the quick response.**** >>>> >>>> I would like to make it work with Lucene 2.9.3 (solr 1.41.) if possible >>>> as upgrading the solr will bring other complications for me. I changed the >>>> ant build script to use <property name="lucene-version" value="2.9.3" /> >>>> now Solr loads Lucene 2.9.3 libs, but I still get the same runtime >>>> error when loading MorphAnalyzer**** >>>> >>>> **** >>>> >>>> Thanks,**** >>>> >>>> Manoj**** >>>> >>>> **** >>>> >>>> **** >>>> >>>> *From:* ita...@gm... [mailto:ita...@gm...] >>>> *On Behalf Of *Itamar Syn-Hershko >>>> *Sent:* 22 November 2011 18:45 >>>> *To:* Manoj Damodaran >>>> *Cc:* heb...@li... >>>> *Subject:* Re: [Hebmorph-thinktank] Help need with HdbMorph Setup on >>>> Solr**** >>>> >>>> **** >>>> >>>> That is probably because HebMorph is compiled against Lucene 3.0.2 in >>>> the Java version. Try changing that, or using a compatible version of Solr, >>>> let me know how it goes.**** >>>> >>>> On Tue, Nov 22, 2011 at 7:57 PM, Manoj Damodaran <mda...@at...> >>>> wrote:**** >>>> >>>> Hi,**** >>>> >>>> **** >>>> >>>> I am trying to use HebMorph to do hebrew search with Solr in our >>>> application. HebMorph looks quite promising, but I am having difficulty >>>> making it work.**** >>>> >>>> **** >>>> >>>> I am not able to make solr useHebMorph. I am able to build the Jar >>>> files and have put them in the lib folder. When I make schema change to >>>> add filed type to use lucene.analysis.hebrew.MorphAnalyzer , I get a >>>> run-time exception shown below. Any idea what is going wrong ? I am running >>>> Solr 1.4.1( Lucene 2.9.3)**** >>>> >>>> **** >>>> >>>> Nov 22, 2011 5:38:51 PM org.apache.solr.common.SolrException log**** >>>> >>>> SEVERE: java.lang.ClassCastException: >>>> org.apache.lucene.analysis.hebrew.MorphAnalyzer cannot be cast to >>>> org.apache.lucene.analysis.Analyzer**** >>>> >>>> at >>>> org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:759)** >>>> ** >>>> >>>> at >>>> org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58)**** >>>> >>>> at >>>> org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:429)**** >>>> >>>> **** >>>> >>>> Thanks,**** >>>> >>>> *Manoj***** >>>> >>>> **** >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> All the data continuously generated in your IT infrastructure >>>> contains a definitive record of customers, application performance, >>>> security threats, fraudulent activity, and more. Splunk takes this >>>> data and makes sense of it. IT sense. And common sense. >>>> http://p.sf.net/sfu/splunk-novd2d >>>> _______________________________________________ >>>> Hebmorph-thinktank mailing list >>>> Heb...@li... >>>> https://lists.sourceforge.net/lists/listinfo/hebmorph-thinktank**** >>>> >>>> **** >>>> >>>> ** ** >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> All the data continuously generated in your IT infrastructure >>> contains a definitive record of customers, application performance, >>> security threats, fraudulent activity, and more. Splunk takes this >>> data and makes sense of it. IT sense. And common sense. >>> http://p.sf.net/sfu/splunk-novd2d >>> _______________________________________________ >>> Hebmorph-thinktank mailing list >>> Heb...@li... >>> https://lists.sourceforge.net/lists/listinfo/hebmorph-thinktank >>> >>> >> > |