From: Alan R. <ala...@mn...> - 2008-07-22 16:26:47
|
Hello, I noticed there was a comment that the dateSpan field was not populated. I've written some code in the MarcImporter.java code to populate this field. I'm using the values in the 008, specifically the type(pos 6), date 1(7-10, and date 2(11-14). What I've done is created decade strings of ccd0-ccd9 for each decade the material is defined for. Some material will be in multiple decades because it is a continuing resource or a reprint of an original. This means the schema will need to be changed for dateSpan to multiValued. This requires the following: import java.util.ArrayList; import java.util.Calendar; The following is the call in the addToIndex function(I actually embedded it in the code that populates the langCode and language field as the 008 was already retrieved): char dateType; String date1; String date2; List<String> dateSpans; Iterator<String> dsiter; ControlField fixedLength = (ControlField) record.getVariableField("008"); if (fixedLength != null) { // Determine dateSpan dateType = fixedLengthStr.charAt(6); date1 = fixedLengthStr.substring(7, 11); date2 = fixedLengthStr.substring(11, 15); dateSpans = getDateSpan(dateType, date1, date2); dsiter = dateSpans.iterator(); while (dsiter.hasNext()) { builder.addField("dateSpan", dsiter.next()); } } This is the getDateSpan function: /** * Return a String representing a date span for the record * @param char 008 position 6 * @param string date1 008 position 7-10 * @param string date2 008 position 11-14 * @return string of date span nnnn-nnnn */ private ArrayList<String> getDateSpan(char dateType, String date1, String date2) { ArrayList<String> dateList = new ArrayList<String>(); Calendar today = Calendar.getInstance(); String dateSpan; int date_one = 0; int date_two = 0; int pos; int year = (today.get(Calendar.YEAR) / 10) * 10; // The following cleans up some of the data in our catalog date1 = date1.replace('|', ' '); date2 = date2.replace('|', ' '); date1 = date1.replace('-', ' '); date2 = date2.replace('-', ' '); date1 = date1.replace('?', ' '); date2 = date2.replace('?', ' '); date1 = date1.trim(); date2 = date2.trim(); if (date1.length() == 0) { date_one = 0; } else if ((pos = date1.indexOf('u')) == -1) { pos = ( date1.length() < 3 ) ? date1.length() : 3; date1 = date1.substring(0, pos); try { date_one = Integer.parseInt(date1); for (int i = 4; i > pos; i--) { date_one = date_one * 10; } } catch(Exception e) { date_one = 0; } } else if (pos > 0) { date1 = date1.substring(0, pos); try { date_one = Integer.parseInt(date1); for (int i = 4; i > pos; i--) { date_one = date_one * 10; } } catch(Exception e) { date_one = 0; } } else { date_one = 0; } if (date2.equals("9999")) { date_two = year; } else if (date2.length() == 0) { date_two = 0; } else { if ((pos = date2.indexOf('u')) == -1) { pos = ( date2.length() < 3 ) ? date2.length() : 3; date2 = date2.substring(0, pos); try { date_two = Integer.parseInt(date2); for (int i = 4; i > pos; i--) { date_two = date_two * 10; } } catch(Exception e) { date_two = 0; } } else if (pos > 0) { date2 = date2.substring(0, pos); try { date_two = Integer.parseInt(date2); for (int i = 4; i > pos; i--) { date_two = date_two * 10; } } catch(Exception e) { date_two = 0; } } else { date_two = 0; } } if (date_one > 0) { switch(dateType) { // continuing resource - mulitple dateSpans up to current year case 'c': case 'u': for (int i = date_one; i < (year + 10); i += 10) { dateSpan = Integer.toString(i) + '-' + Integer.toString(i + 9); dateList.add(dateSpan); } break; // single dates (ignore copyright date) case 'e': case 's': case 't': dateSpan = Integer.toString(date_one) + '-' + Integer.toString(date_one + 9); dateList.add(dateSpan); break; // ranges of dates, possibly multipe dateSpans case 'i': case 'k': case 'm': case 'q': if ((date_one > 0) && (date_two > 0)) { for (int i = date_one; i <= date_two; i += 10) { dateSpan = Integer.toString(i) + '-' + Integer.toString(i + 9); dateList.add(dateSpan); } } break; // 2 distinct dates, include both dateSpans if different case 'p': case 'r': if (date_one > 0) { dateSpan = Integer.toString(date_one) + '-' + Integer.toString(date_one + 9); dateList.add(dateSpan); } if ((date_two > 0) && (date_one != date_two)) { dateSpan = Integer.toString(date_two) + '-' + Integer.toString(date_two + 9); dateList.add(dateSpan); } break; } // end of switch } return dateList; } al -- Alan Rykhus PALS, A Program of the Minnesota State Colleges and Universities (507)389-1975 ala...@mn... |
From: Andrew N. <and...@vi...> - 2008-07-22 17:43:20
|
Al - This is great! I will see that it gets incorporated into the new importer for version 1.0. One suggestion however, please ask first before taking on a new task like this. This doesn't seem to be the same, but the folks at UVa with project blacklight have done something just like this and it is incorporated into the new importer :). They are parsing the 045 field to get time period information. Also, the datespan field was initially used to index the 362$a field, but we haven't used it and I have thought about ditching it. We might want to reevaluate the time period fields for vufind. Currently, we have the datespan field which uses the 362$a and the era field which uses the 650$y and 651$y. Maybe your code could populate a field called "decade"? Thanks! Andrew > -----Original Message----- > From: vuf...@li... [mailto:vufind- > gen...@li...] On Behalf Of Alan Rykhus > Sent: Tuesday, July 22, 2008 12:27 PM > To: vuf...@li... > Subject: [VuFind-General] code to populate the dateSpan field > > Hello, > > I noticed there was a comment that the dateSpan field was not > populated. > > I've written some code in the MarcImporter.java code to populate this > field. I'm using the values in the 008, specifically the type(pos 6), > date 1(7-10, and date 2(11-14). > > What I've done is created decade strings of ccd0-ccd9 for each decade > the material is defined for. Some material will be in multiple decades > because it is a continuing resource or a reprint of an original. This > means the schema will need to be changed for dateSpan to multiValued. > > This requires the following: > > import java.util.ArrayList; > import java.util.Calendar; > > The following is the call in the addToIndex function(I actually > embedded > it in the code that populates the langCode and language field as the > 008 > was already retrieved): > > char dateType; > String date1; > String date2; > List<String> dateSpans; > Iterator<String> dsiter; > ControlField fixedLength = (ControlField) > record.getVariableField("008"); > if (fixedLength != null) { > // Determine dateSpan > dateType = fixedLengthStr.charAt(6); > date1 = fixedLengthStr.substring(7, 11); > date2 = fixedLengthStr.substring(11, 15); > dateSpans = getDateSpan(dateType, date1, date2); > dsiter = dateSpans.iterator(); > while (dsiter.hasNext()) { > builder.addField("dateSpan", dsiter.next()); > } > } > > This is the getDateSpan function: > > /** > * Return a String representing a date span for the record > * @param char 008 position 6 > * @param string date1 008 position 7-10 > * @param string date2 008 position 11-14 > * @return string of date span nnnn-nnnn > */ > private ArrayList<String> getDateSpan(char dateType, String date1, > String date2) { > ArrayList<String> dateList = new ArrayList<String>(); > Calendar today = Calendar.getInstance(); > String dateSpan; > int date_one = 0; > int date_two = 0; > int pos; > int year = (today.get(Calendar.YEAR) / 10) * 10; > // The following cleans up some of the data in our catalog > date1 = date1.replace('|', ' '); > date2 = date2.replace('|', ' '); > date1 = date1.replace('-', ' '); > date2 = date2.replace('-', ' '); > date1 = date1.replace('?', ' '); > date2 = date2.replace('?', ' '); > date1 = date1.trim(); > date2 = date2.trim(); > if (date1.length() == 0) { > date_one = 0; > } else if ((pos = date1.indexOf('u')) == -1) { > pos = ( date1.length() < 3 ) ? date1.length() : 3; > date1 = date1.substring(0, pos); > try { > date_one = Integer.parseInt(date1); > for (int i = 4; i > pos; i--) { > date_one = date_one * 10; > } > } catch(Exception e) { > date_one = 0; > } > } else if (pos > 0) { > date1 = date1.substring(0, pos); > try { > date_one = Integer.parseInt(date1); > for (int i = 4; i > pos; i--) { > date_one = date_one * 10; > } > } catch(Exception e) { > date_one = 0; > } > } else { > date_one = 0; > } > if (date2.equals("9999")) { > date_two = year; > } else if (date2.length() == 0) { > date_two = 0; > } else { > if ((pos = date2.indexOf('u')) == -1) { > pos = ( date2.length() < 3 ) ? date2.length() : 3; > date2 = date2.substring(0, pos); > try { > date_two = Integer.parseInt(date2); > for (int i = 4; i > pos; i--) { > date_two = date_two * 10; > } > } catch(Exception e) { > date_two = 0; > } > } else if (pos > 0) { > date2 = date2.substring(0, pos); > try { > date_two = Integer.parseInt(date2); > for (int i = 4; i > pos; i--) { > date_two = date_two * 10; > } > } catch(Exception e) { > date_two = 0; > } > } else { > date_two = 0; > } > } > if (date_one > 0) { > switch(dateType) { > // continuing resource - mulitple dateSpans up to current year > case 'c': > case 'u': > for (int i = date_one; i < (year + 10); i += 10) { > dateSpan = Integer.toString(i) + '-' + Integer.toString(i + > 9); > dateList.add(dateSpan); > } > break; > // single dates (ignore copyright date) > case 'e': > case 's': > case 't': > dateSpan = Integer.toString(date_one) + '-' + > Integer.toString(date_one + 9); > dateList.add(dateSpan); > break; > // ranges of dates, possibly multipe dateSpans > case 'i': > case 'k': > case 'm': > case 'q': > if ((date_one > 0) && (date_two > 0)) { > for (int i = date_one; i <= date_two; i += 10) { > dateSpan = Integer.toString(i) + '-' + Integer.toString(i + > 9); > dateList.add(dateSpan); > } > } > break; > // 2 distinct dates, include both dateSpans if different > case 'p': > case 'r': > if (date_one > 0) { > dateSpan = Integer.toString(date_one) + '-' + > Integer.toString(date_one + 9); > dateList.add(dateSpan); > } > if ((date_two > 0) && (date_one != date_two)) { > dateSpan = Integer.toString(date_two) + '-' + > Integer.toString(date_two + 9); > dateList.add(dateSpan); > } > break; > } // end of switch > } > return dateList; > } > > al > -- > Alan Rykhus > PALS, A Program of the Minnesota State Colleges and Universities > (507)389-1975 > ala...@mn... > > > ----------------------------------------------------------------------- > -- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the > world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > VuFind-General mailing list > VuF...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-general |
From: Naomi D. <nd...@st...> - 2008-07-31 17:37:10
|
Hi Daniel, I posted our efforts along these lines yesterday. It was buried in one of my typically long posts, so I'll repeat that section here: url (for fulltext) urlSuppl_store ------------ We wanted to split the urls for full text from other urls for a resource (e.g. urls for table of contents). We achieved this with a combination of the 2nd indicator value in an 856 and some string matching. I am happy to share the code, the test code, and the test data I created. It was implemented for the solrmarc import. We don't have our holdings info yet, but we'll want to get URLs from there as well, if we have them. If you go further with this work, I'd be very interested in your code. - Naomi On Jul 31, 2008, at 10:27 AM, Lovins, Daniel wrote: > Has anyone tried to exploit the MARC21 856 tag subfield $3 > ("Material specified") or $z ("Public note") in order to identify > what type of URL is embedded there (cf.http://www.loc.gov/marc/856guide.html)? > It seems these subfields are ignored in VuFind, and the link text > is always: "Get full text", i.e., even if its only pointing to a > table-of-contents, finding aid, publisher's web site, etc.. For > example, in the following list view excerpt, both records say "Get > full text" when in fact they are referring to microfilm guides. > > <image001.png> > > By contrast, here's what the code generates in WebVoyage: > > <image003.png> > > Here the underlying MARC of the second record > > <image002.png> > > I'd be interested in collaborating on a solution (if there isn't one > out there already). > > Also-- has anyone tried to reveal within VuFind URLs that are > derived from MARC *holdings* records? We at Yale sometimes use this > technique for, say, a digitized rare book that serves as an "added > copy" as it were, to the original artifact. > > Thanks for your help. > > Daniel > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win > great prizes > Grand prize is a trip for two to an Open Source event anywhere in > the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/_______________________________________________ > VuFind-General mailing list > VuF...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-general Naomi Dushay nd...@st... |
From: Lovins, D. <dan...@ya...> - 2008-07-31 17:45:03
|
Thanks, Naomi! Sorry I missed reading your post yesterday. I'd definitely be interested in seeing the test code and results you came up with. My experience is more in MARC than PHP (much less Solr), but I could try to contribute to (or at the very least admire) what you've already accomplished. :-) -- Daniel From: Naomi Dushay [mailto:nd...@st...] Sent: Thursday, July 31, 2008 1:37 PM To: vuf...@li... Cc: Noh, Youn; Lovins, Daniel; Swanekamp, Joan; Arakawa, Steven; Beacom, Matthew Subject: Re: [VuFind-General] Identifying *type* of URL in MARC 856 Hi Daniel, I posted our efforts along these lines yesterday. It was buried in one of my typically long posts, so I'll repeat that section here: url (for fulltext) urlSuppl_store ------------ We wanted to split the urls for full text from other urls for a resource (e.g. urls for table of contents). We achieved this with a combination of the 2nd indicator value in an 856 and some string matching. I am happy to share the code, the test code, and the test data I created. It was implemented for the solrmarc import. We don't have our holdings info yet, but we'll want to get URLs from there as well, if we have them. If you go further with this work, I'd be very interested in your code. - Naomi |