From: Demian K. <dem...@vi...> - 2009-09-22 19:16:41
|
I just ran into two weird problems related to a single issue. It's not terribly important (I worked around the problems and found a different solution), but the problems are strange enough that I thought I'd share them and see if anybody has any insights.... I wanted to create a normalized OCLC number index in VuFind. I just wanted to store the numeric value. It seemed like a reasonable solution to reintroduce the "long" data type using the LongField Solr class and then to use the pattern matching capabilities of SolrMarc to extract OCLC numbers and feed them into the index. Here's what I added to my marc.properties file: oclc_num = 035a, (pattern_map.oclc_num) pattern_map.oclc_num.pattern_0 = \\(OCoLC\\)[^0-9]*[0]*([0-9]+)=>$1 pattern_map.oclc_num.pattern_1 = ocm[0]*([0-9]+)=>$1 Problem #1: I expected that if Solr was storing values as long integers, it would automatically strip leading zeroes. This was not the case -- hence the [0]* section of my regular expressions above. Problem #2 (the big one): One MARC record contained this malformed field: 035 |a ocm05831717 810702 I would have expected pattern_1 in my marc.properties file to match "5831717" and ignore the "810702". Instead, it somehow matched the full string "5831717 810702" in the index! I don't understand why this was matched by the regular expression, and I REALLY don't understand how a value with a space in it can be stored as a long integer. This turned out to have really bad side effects. Once this bad value was in the index, the JSON string returned by Solr for any search involving this item included the space in the long integer, making the JSON invalid and impossible to parse, and searches completely stop working. Am I fundamentally misunderstanding something about SolrMarc regular expressions and/or Solr's data types? Did I just have the bad luck to run into two significant bugs at the same time? For now, I've changed my data type from "long" to "string" in order to avoid the fatal JSON problems, but I would like to understand why this didn't work in the first place. Any insight is welcome! thanks, Demian |