From: Demian K. <dem...@vi...> - 2010-03-15 18:57:53
|
Thanks for the suggestion, but it looks like the cause of my problem is something different. I added preserveOriginal="1" to the WordDelimiterFilterFactory lines of the text field type indexer and query chains in the schema, restarted my Solr index, and reindexed the record in question. While the output of my analyzer chains changed to include the original hyphenated term when I tested them in the admin tool, it had no apparent effect on the outcomes of my test queries. thanks, Demian From: Tuan Nguyen [mailto:tu...@yo...] Sent: Monday, March 15, 2010 2:20 PM To: Demian Katz Cc: vuf...@li... Subject: Re: [VuFind-Tech] Problems with solr.WordDelimiterFilterFactory in text fields Demian, I remember running into a similarly strange problem with Mixed case, I think the fix was to set preserveOriginal="1" On Mar 15, 2010, at 2:10 PM, Demian Katz wrote: Hello, I've just run across another weird Solr problem. Searches for hyphenated terms are failing in strange ways. I'm sure it has something to do with the solr.WordDelimiterFilterFactory filter, but I'm not exactly sure what. First of all, I notice that in the "text" field type in VuFind's schema has "catenateWords" and "catenateNumbers" turned on in both the index and query analyzer chains. It is my understanding that these options should be disabled for the query chain and only enabled for the index chain -- this is how they are configured for the textProper field type. I haven't changed this yet, though, because it doesn't appear to make a difference for my immediate problem. The problem is that I have a record with the title "Love customs in eighteenth-century Spain." Depending on how I search for this, I get successes or failures in a seemingly unpredictable pattern, though I think it may have something to do with how phrase positions are being calculated. Demonstration queries below were tested using the direct Solr administration tool, just to eliminate any VuFind-related factors from the equation while debugging. Queries that work: title:(Love customs in eighteenth century Spain) // no hyphen, no phrases title:("Love customs in eighteenth-century Spain") // phrase search on whole title, with hyphen Queries that fail: title:(Love customs in eighteenth-century Spain) // hyphen, no phrases title:("Love customs in eighteenth century Spain") // phrase search on whole title, without hyphen title:(Love customs in "eighteenth-century" Spain) // hyphenated word as phrase title:(Love customs in "eighteenth century" Spain) // hyphenated word as phrase, hyphen removed Has anybody else run into this? Any ideas on a fix? I've noticed that the textProper field type doesn't have the same issue, so when I use unstemmed textProper fields as part of my query handler, the problem goes away -- but this is just masking the issue, not solving it. I would like to come up with a better solution to commit to the trunk! thanks, Demian ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev_______________________________________________ Vufind-tech mailing list Vuf...@li...<mailto:Vuf...@li...> https://lists.sourceforge.net/lists/listinfo/vufind-tech |