From: Tod O. <to...@uc...> - 2012-08-13 20:36:12
|
We're partway to having more than one field per record show up in the title browse, but we're getting stuck. We've updated the solr schema and the solrmarc import to bring in the multiple browsable titles in a record. But if a record *has* multiple title browse fields, the PrintBrowseHeadings code does not print out any of the titles for that record. So multiple title_browse fields go into the Solr doc and indexes, but they do not make it out of the PrintBrowseHeadings code and into the SQLite db. Here are the details: We are currently experimenting with this under VuFind 1.3, but I'm not certain of the exact date of the browse index code. We've added the following to schema.xml: <field name="title_browse" type="string" indexed="true" stored="true" multiValued="true"/> <field name="title_browse_sort" type="string" indexed="true" stored="true" multiValued="true"/> And we've added the following to marc_local.properties: title_browse_sort = 210ab:211a:212a:214a:242abchnp:245abcdefghknps:246abfghnp:247abfghnp:490av:740ahnp:780bcst:785bcst:787bcst:840ahv:844a title_browse = 210ab:211a:212a:214a:242abchnp:245abcdefghknps:246abfghnp:247abfghnp:490av:740ahnp:780bcst:785bcst:787bcst:840ahv:844a And now re really do get multiple values in the Solr document, for example: <arr name="title_browse"> <str> Internationaal seinboek / uitgegeven op last van de Ministers van Verkeer en Waterstaat en van Defensie ; zoals vastgesteld door de Internationale Maritieme Consultatieve Organisatie. </str> <str>Internationaal seinboek 1969.</str> </arr> <arr name="title_browse_sort"> <str> Internationaal seinboek / uitgegeven op last van de Ministers van Verkeer en Waterstaat en van Defensie ; zoals vastgesteld door de Internationale Maritieme Consultatieve Organisatie. </str> <str>Internationaal seinboek 1969.</str> </arr> (Yes, we're not normalizing the contents of title_browse_sort yet, that will come later, once the first steps are working.) So far, so good, the different forms of browse titles are in the Solr document. But when we run PrintBrowseHeadings, any record with a multi-valued title_browse is omitted from the output. We've tweaked the the index-alphabetic-browse.sh script to use our locally-defined field. here's how the bulid_browse shell function, and how PrintBrowseHeadings are invoked, according to the bash output: build_browse title title_browse 1 '-Dbibleech=StoredFieldLeech -Dsortfield=title_browse_sort java -Dbibleech=StoredFieldLeech -Dsortfield=title_browse_sort -Dvaluefield=title_browse -Dfile.encoding=UTF-8 -Dfield.preferred=heading -Dfield.insteadof=use_for -cp browse-indexing.jar PrintBrowseHeadings ../solr/biblio/index title_browse title.tmp We've picked through both the resulting SQLite database and the .tmp file that it is generated from to confirm that non of the records with multiple title_browse values are output. What we have not figured out is *why*. It seems that the topic browse has no problem when there are multiple occurrences of topic, so it's a bit puzzling why title browse should have a problem when there are multiple occurrences of title_browse. We have been reading the most recent version of the nla-browse-handler code, but I'm not yet following the Leech and lucene code well enough to be certain why multiple values of title_browse are a problem. Seems like a simple loop over all instances of the title_browse field, but I'm obviously overlooking something. If anyone has any ideas where we should look, or maybe for getting some debugging output, or how else to proceed, I'd be grateful. Thanks, -Tod Tod Olson <to...@uc...> Systems Librarian University of Chicago Library On Jun 29, 2012, at 12:01 PM, Demian Katz wrote: > If you look at the index-alphabetic-browse.sh script, you'll see that the title browse is generated using the title_fullStr field, with title_sort used for determining the sort order. Since these are single-valued fields and should not be changed, I would recommend taking this approach: > > 1.) Add two new fields to the Solr schema (or use the dynamic field suffixes) -- both should be multi-valued strings. One should be for full titles, one should be for sort values. > > 2.) Update your import rules to populate these fields as you desire. You may need to write a custom "getSortableTitle" routine to account for the 7xx fields you are interested in -- I think the built-in one may be hard-coded to use 245, though my memory could be wrong. > > 3.) Update the alphabrowse generation script to use your new fields instead of the default ones for building the title browse. > > 4.) Reindex everything, then regenerate your browse indexes. > > Not trivial, but probably within the realm of possibility. Let me know if you have questions. > > - Demian > From: Sean F [str...@ya...] > Sent: Friday, June 29, 2012 12:26 PM > To: vuf...@li... > Subject: [VuFind-Tech] vufind title alpha browse > > For the title alpha browse, Vufind builds the browse index from the 245 only. We would like to include a few other title fields, such as the 740. > Does anyone know how to add 740 field as well? > > Thank you > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/_______________________________________________ > Vufind-tech mailing list > Vuf...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-tech |