Re: [Hebmorph-thinktank] Help need with HdbMorph Setup on Solr

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Basically, MorphAnalyzer uses a custom tokenizer and some of it's own
filters, so I'm not sure if its a good idea to define other ones like you
did. The snowball one is definitely not helpful here.

Also, you want to make sure MorphAnalyzer doesn't get recreated on each
query. I'm not sure if and how this could be done with SOLR, but it's
crucial as loading the hspell dictionary takes about 2 seconds...

On Mon, Jan 9, 2012 at 12:06 PM, Shai <sh...@dr...> wrote:

> Hi
> I am re-testing HebMorph now for using with hebrew searches in apache-solr
> I use apache-solr-1.4.1
> and it seems to work with the latest HebMorph commit id
> eb403a6ad63bfc0dc18cf100dc3f256a4a6eb8af
> (even when compiled with lucene 3.0.2)
>
> it seems to work but I didn't test it fully yet
>
> I end up with something like this config for fieldType text  in schema.xml
> -
> I will be happy to know the configurations others use and if its fully
> configured to work properly
>  (if i need to use additional filters/tokenizers/analyzers and so on...)
>
>
>     <fieldType name="text" class="solr.TextField">
>       <analyzer type="index"
> class="org.apache.lucene.analysis.hebrew.MorphAnalyzer">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>       </analyzer>
>       <analyzer type="query"
> class="org.apache.lucene.analysis.hebrew.MorphAnalyzer">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.StopFilterFactory"
>                 ignoreCase="true"
>                 words="stopwords.txt"
>                 enablePositionIncrements="true"
>                 />
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>       </analyzer>
>     </fieldType>
>
>
>
> On Thu, Nov 24, 2011 at 11:29 PM, Itamar Syn-Hershko <it...@co...>wrote:
>
>> I'm not really sure what to tell you. I never used HebMorph with Solr,
>> but I know some people did (
>> http://lucene.472066.n3.nabble.com/using-HebMorph-td1826534.html),
>> possibly with earlier versions.
>>
>> Java's ClassCastException is sometimes when compilation to jar isn't done
>> correctly.
>>
>> Sorry I can't be of more help atm.
>>
>> On Thu, Nov 24, 2011 at 6:59 PM, Manoj Damodaran <mda...@at...>wrote:
>>
>>> Itamar,****
>>>
>>> ** **
>>>
>>> I gave up making it work with lucene 2.9.3 (solr 1.4.1) and tried to
>>> compile HebMorph for other solr versions, but none of them work.****
>>>
>>> Solr                        Lucene****
>>>
>>> 1.4.1                      2.9.3****
>>>
>>> 3.1.0                      3.1.0****
>>>
>>> 3.2.0                      3.2.0****
>>>
>>> 3.3.0                      3.3.0****
>>>
>>> 3.4.0                      3.4.0****
>>>
>>> ** **
>>>
>>> Lucene 3.0.2 is not bundled with any solr. I am getting the below
>>> runtime exception****
>>>
>>> ** **
>>>
>>> 24-Nov-2011 16:58:39 org.apache.solr.schema.IndexSchema readAnalyzer****
>>>
>>> SEVERE: Cannot load analyzer:
>>> org.apache.lucene.analysis.hebrew.MorphAnalyzer****
>>>
>>> java.lang.ClassCastException: class
>>> org.apache.lucene.analysis.hebrew.MorphAnalyzer****
>>>
>>>         at java.lang.Class.asSubclass(Unknown Source)****
>>>
>>>         at
>>> org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:828)***
>>> *
>>>
>>>         at
>>> org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:62)****
>>>
>>>         at
>>> org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)****
>>>
>>>         at
>>> org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)****
>>>
>>>         at
>>> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
>>> ****
>>>
>>> ** **
>>>
>>> Has anyone had success running HebMorph on Solr, What version did they
>>> use.****
>>>
>>> ** **
>>>
>>> Thanks,****
>>>
>>> Manoj****
>>>
>>> ** **
>>>
>>> *From:* ita...@gm... [mailto:ita...@gm...]
>>> *On Behalf Of *Itamar Syn-Hershko
>>> *Sent:* 23 November 2011 07:39 PM
>>>
>>> *To:* Manoj Damodaran
>>> *Cc:* heb...@li...
>>> *Subject:* Re: [Hebmorph-thinktank] Help need with HdbMorph Setup on
>>> Solr****
>>>
>>> ** **
>>>
>>> MorphAnalyzer is compiled against 3.0.2, and the API might have changed.
>>> Can you try looking at the project history, I think it was 2.9.3 not long
>>> ago, that should get you going.****
>>>
>>> On Wed, Nov 23, 2011 at 2:44 PM, Manoj Damodaran <mda...@at...>
>>> wrote:****
>>>
>>> Itmar,****
>>>
>>>  ****
>>>
>>> Thanks for the quick response.****
>>>
>>> I would like to make it work with Lucene 2.9.3 (solr 1.41.) if possible
>>> as upgrading the solr will bring other complications for me.  I changed the
>>> ant build script to use <property name="lucene-version" value="2.9.3" />
>>> now Solr loads Lucene 2.9.3 libs, but I still get the same runtime
>>> error when loading MorphAnalyzer****
>>>
>>>  ****
>>>
>>> Thanks,****
>>>
>>> Manoj****
>>>
>>>  ****
>>>
>>>  ****
>>>
>>> *From:* ita...@gm... [mailto:ita...@gm...]
>>> *On Behalf Of *Itamar Syn-Hershko
>>> *Sent:* 22 November 2011 18:45
>>> *To:* Manoj Damodaran
>>> *Cc:* heb...@li...
>>> *Subject:* Re: [Hebmorph-thinktank] Help need with HdbMorph Setup on
>>> Solr****
>>>
>>>  ****
>>>
>>> That is probably because HebMorph is compiled against Lucene 3.0.2 in
>>> the Java version. Try changing that, or using a compatible version of Solr,
>>> let me know how it goes.****
>>>
>>> On Tue, Nov 22, 2011 at 7:57 PM, Manoj Damodaran <mda...@at...>
>>> wrote:****
>>>
>>> Hi,****
>>>
>>>  ****
>>>
>>> I am trying to use HebMorph to do hebrew search with  Solr in our
>>> application. HebMorph looks quite promising, but I am having difficulty
>>> making it work.****
>>>
>>>  ****
>>>
>>> I am not able to make solr useHebMorph. I am able to build the Jar
>>> files  and have put them in the lib folder. When I make schema change to
>>> add filed type to use lucene.analysis.hebrew.MorphAnalyzer , I get a
>>> run-time exception shown below. Any idea what is going wrong ? I am running
>>> Solr 1.4.1( Lucene 2.9.3)****
>>>
>>>  ****
>>>
>>> Nov 22, 2011 5:38:51 PM org.apache.solr.common.SolrException log****
>>>
>>> SEVERE: java.lang.ClassCastException:
>>> org.apache.lucene.analysis.hebrew.MorphAnalyzer cannot be cast to
>>> org.apache.lucene.analysis.Analyzer****
>>>
>>>         at
>>> org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:759)***
>>> *
>>>
>>>         at
>>> org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58)****
>>>
>>>         at
>>> org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:429)****
>>>
>>>  ****
>>>
>>> Thanks,****
>>>
>>> *Manoj*****
>>>
>>>  ****
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure
>>> contains a definitive record of customers, application performance,
>>> security threats, fraudulent activity, and more. Splunk takes this
>>> data and makes sense of it. IT sense. And common sense.
>>> http://p.sf.net/sfu/splunk-novd2d
>>> _______________________________________________
>>> Hebmorph-thinktank mailing list
>>> Heb...@li...
>>> https://lists.sourceforge.net/lists/listinfo/hebmorph-thinktank****
>>>
>>>  ****
>>>
>>> ** **
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure
>> contains a definitive record of customers, application performance,
>> security threats, fraudulent activity, and more. Splunk takes this
>> data and makes sense of it. IT sense. And common sense.
>> http://p.sf.net/sfu/splunk-novd2d
>> _______________________________________________
>> Hebmorph-thinktank mailing list
>> Heb...@li...
>> https://lists.sourceforge.net/lists/listinfo/hebmorph-thinktank
>>
>>
>