From: Demian K. <dem...@vi...> - 2010-04-16 19:15:49
|
Thanks -- that's a great suggestion. I've committed a variation as of r2445: <filter class="solr.PatternReplaceFilterFactory" pattern="(?<!\b[A-Z])[./s]*$" replacement="" replace="first"/> This handles multiple punctuation mixed with whitespace as before, but adds the single uppercase Latin letter exception you suggest. Seems to work very well with my test data! - Demian From: Brad Dewar [mailto:bd...@st...] Sent: Friday, April 16, 2010 2:54 PM To: Demian Katz; vuf...@li... Subject: RE: VUFIND-184 (punctuation in facet fields) This regex removes a single trailing period unless it is immediately preceded by a single uppercase Latin letter (i.e. not if it looks like initials). It drops the 'trim trailing spaces' functionality that your regex had, but you can use Solr's TrimFilterFactory earlier in the analyzer chain to handle that. pattern="(?<!\b[A-Z])\.$" replacement="" (regex tested in perl, not in java - but that shouldn't be an issue in this case) Brad From: Demian Katz [mailto:dem...@vi...] Sent: April-13-10 4:59 PM To: vuf...@li... Subject: [VuFind-Tech] VUFIND-184 (punctuation in facet fields) Hello, As of r2426, I have updated the default VuFind Solr schema to strip trailing periods and whitespace from key facet fields. This solves VUFIND-184 without the need for any SolrMarc configuration changes. The biggest downside I see right now is that authors with a desired trailing period (i.e. "Katz, Demian D.") get that stripped off and look a little strange... but it's probably a small price to pay for more functional facets, and it's easy to change the authorStr field type if you don't want this behavior. If anybody has a better regular expression to use here, please share... but for the most basic case of unwanted periods, I believe this works fine as-is. - Demian |