Re: [Hebmorph-thinktank] Help need with HdbMorph Setup on Solr

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I think that with the current configuration it don't load the hspell for
each query
because the query takes 2ms (for a very small amount of documents)


On Tue, Jan 10, 2012 at 10:52 AM, Itamar Syn-Hershko <it...@co...>wrote:

> Basically, MorphAnalyzer uses a custom tokenizer and some of it's own
> filters, so I'm not sure if its a good idea to define other ones like you
> did. The snowball one is definitely not helpful here.
>
> Also, you want to make sure MorphAnalyzer doesn't get recreated on each
> query. I'm not sure if and how this could be done with SOLR, but it's
> crucial as loading the hspell dictionary takes about 2 seconds...
>
>
> On Mon, Jan 9, 2012 at 12:06 PM, Shai <sh...@dr...> wrote:
>
>> Hi
>> I am re-testing HebMorph now for using with hebrew searches in apache-solr
>> I use apache-solr-1.4.1
>> and it seems to work with the latest HebMorph commit id
>> eb403a6ad63bfc0dc18cf100dc3f256a4a6eb8af
>> (even when compiled with lucene 3.0.2)
>>
>> it seems to work but I didn't test it fully yet
>>
>> I end up with something like this config for fieldType text  in
>> schema.xml -
>> I will be happy to know the configurations others use and if its fully
>> configured to work properly
>>  (if i need to use additional filters/tokenizers/analyzers and so on...)
>>
>>
>>     <fieldType name="text" class="solr.TextField">
>>       <analyzer type="index"
>> class="org.apache.lucene.analysis.hebrew.MorphAnalyzer">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>       </analyzer>
>>       <analyzer type="query"
>> class="org.apache.lucene.analysis.hebrew.MorphAnalyzer">
>>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
>> ignoreCase="true" expand="true"/>
>>         <filter class="solr.StopFilterFactory"
>>                 ignoreCase="true"
>>                 words="stopwords.txt"
>>                 enablePositionIncrements="true"
>>                 />
>>         <filter class="solr.WordDelimiterFilterFactory"
>> generateWordParts="1" generateNumberParts="1" catenateWords="0"
>> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>>         <filter class="solr.LowerCaseFilterFactory"/>
>>         <filter class="solr.SnowballPorterFilterFactory"
>> language="English" protected="protwords.txt"/>
>>       </analyzer>
>>     </fieldType>
>>
>>
>>
>> On Thu, Nov 24, 2011 at 11:29 PM, Itamar Syn-Hershko <it...@co...>wrote:
>>
>>> I'm not really sure what to tell you. I never used HebMorph with Solr,
>>> but I know some people did (
>>> http://lucene.472066.n3.nabble.com/using-HebMorph-td1826534.html),
>>> possibly with earlier versions.
>>>
>>> Java's ClassCastException is sometimes when compilation to jar isn't
>>> done correctly.
>>>
>>> Sorry I can't be of more help atm.
>>>
>>> On Thu, Nov 24, 2011 at 6:59 PM, Manoj Damodaran <mda...@at...>wrote:
>>>
>>>> Itamar,****
>>>>
>>>> ** **
>>>>
>>>> I gave up making it work with lucene 2.9.3 (solr 1.4.1) and tried to
>>>> compile HebMorph for other solr versions, but none of them work.****
>>>>
>>>> Solr                        Lucene****
>>>>
>>>> 1.4.1                      2.9.3****
>>>>
>>>> 3.1.0                      3.1.0****
>>>>
>>>> 3.2.0                      3.2.0****
>>>>
>>>> 3.3.0                      3.3.0****
>>>>
>>>> 3.4.0                      3.4.0****
>>>>
>>>> ** **
>>>>
>>>> Lucene 3.0.2 is not bundled with any solr. I am getting the below
>>>> runtime exception****
>>>>
>>>> ** **
>>>>
>>>> 24-Nov-2011 16:58:39 org.apache.solr.schema.IndexSchema readAnalyzer***
>>>> *
>>>>
>>>> SEVERE: Cannot load analyzer:
>>>> org.apache.lucene.analysis.hebrew.MorphAnalyzer****
>>>>
>>>> java.lang.ClassCastException: class
>>>> org.apache.lucene.analysis.hebrew.MorphAnalyzer****
>>>>
>>>>         at java.lang.Class.asSubclass(Unknown Source)****
>>>>
>>>>         at
>>>> org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:828)**
>>>> **
>>>>
>>>>         at
>>>> org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:62)****
>>>>
>>>>         at
>>>> org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:450)****
>>>>
>>>>         at
>>>> org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:435)****
>>>>
>>>>         at
>>>> org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> Has anyone had success running HebMorph on Solr, What version did they
>>>> use.****
>>>>
>>>> ** **
>>>>
>>>> Thanks,****
>>>>
>>>> Manoj****
>>>>
>>>> ** **
>>>>
>>>> *From:* ita...@gm... [mailto:ita...@gm...]
>>>> *On Behalf Of *Itamar Syn-Hershko
>>>> *Sent:* 23 November 2011 07:39 PM
>>>>
>>>> *To:* Manoj Damodaran
>>>> *Cc:* heb...@li...
>>>> *Subject:* Re: [Hebmorph-thinktank] Help need with HdbMorph Setup on
>>>> Solr****
>>>>
>>>> ** **
>>>>
>>>> MorphAnalyzer is compiled against 3.0.2, and the API might have
>>>> changed. Can you try looking at the project history, I think it was 2.9.3
>>>> not long ago, that should get you going.****
>>>>
>>>> On Wed, Nov 23, 2011 at 2:44 PM, Manoj Damodaran <mda...@at...>
>>>> wrote:****
>>>>
>>>> Itmar,****
>>>>
>>>>  ****
>>>>
>>>> Thanks for the quick response.****
>>>>
>>>> I would like to make it work with Lucene 2.9.3 (solr 1.41.) if possible
>>>> as upgrading the solr will bring other complications for me.  I changed the
>>>> ant build script to use <property name="lucene-version" value="2.9.3" />
>>>> now Solr loads Lucene 2.9.3 libs, but I still get the same runtime
>>>> error when loading MorphAnalyzer****
>>>>
>>>>  ****
>>>>
>>>> Thanks,****
>>>>
>>>> Manoj****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>> *From:* ita...@gm... [mailto:ita...@gm...]
>>>> *On Behalf Of *Itamar Syn-Hershko
>>>> *Sent:* 22 November 2011 18:45
>>>> *To:* Manoj Damodaran
>>>> *Cc:* heb...@li...
>>>> *Subject:* Re: [Hebmorph-thinktank] Help need with HdbMorph Setup on
>>>> Solr****
>>>>
>>>>  ****
>>>>
>>>> That is probably because HebMorph is compiled against Lucene 3.0.2 in
>>>> the Java version. Try changing that, or using a compatible version of Solr,
>>>> let me know how it goes.****
>>>>
>>>> On Tue, Nov 22, 2011 at 7:57 PM, Manoj Damodaran <mda...@at...>
>>>> wrote:****
>>>>
>>>> Hi,****
>>>>
>>>>  ****
>>>>
>>>> I am trying to use HebMorph to do hebrew search with  Solr in our
>>>> application. HebMorph looks quite promising, but I am having difficulty
>>>> making it work.****
>>>>
>>>>  ****
>>>>
>>>> I am not able to make solr useHebMorph. I am able to build the Jar
>>>> files  and have put them in the lib folder. When I make schema change to
>>>> add filed type to use lucene.analysis.hebrew.MorphAnalyzer , I get a
>>>> run-time exception shown below. Any idea what is going wrong ? I am running
>>>> Solr 1.4.1( Lucene 2.9.3)****
>>>>
>>>>  ****
>>>>
>>>> Nov 22, 2011 5:38:51 PM org.apache.solr.common.SolrException log****
>>>>
>>>> SEVERE: java.lang.ClassCastException:
>>>> org.apache.lucene.analysis.hebrew.MorphAnalyzer cannot be cast to
>>>> org.apache.lucene.analysis.Analyzer****
>>>>
>>>>         at
>>>> org.apache.solr.schema.IndexSchema.readAnalyzer(IndexSchema.java:759)**
>>>> **
>>>>
>>>>         at
>>>> org.apache.solr.schema.IndexSchema.access$100(IndexSchema.java:58)****
>>>>
>>>>         at
>>>> org.apache.solr.schema.IndexSchema$1.create(IndexSchema.java:429)****
>>>>
>>>>  ****
>>>>
>>>> Thanks,****
>>>>
>>>> *Manoj*****
>>>>
>>>>  ****
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> All the data continuously generated in your IT infrastructure
>>>> contains a definitive record of customers, application performance,
>>>> security threats, fraudulent activity, and more. Splunk takes this
>>>> data and makes sense of it. IT sense. And common sense.
>>>> http://p.sf.net/sfu/splunk-novd2d
>>>> _______________________________________________
>>>> Hebmorph-thinktank mailing list
>>>> Heb...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/hebmorph-thinktank****
>>>>
>>>>  ****
>>>>
>>>> ** **
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> All the data continuously generated in your IT infrastructure
>>> contains a definitive record of customers, application performance,
>>> security threats, fraudulent activity, and more. Splunk takes this
>>> data and makes sense of it. IT sense. And common sense.
>>> http://p.sf.net/sfu/splunk-novd2d
>>> _______________________________________________
>>> Hebmorph-thinktank mailing list
>>> Heb...@li...
>>> https://lists.sourceforge.net/lists/listinfo/hebmorph-thinktank
>>>
>>>
>>
>