Hello all,

 

Over the past while, users and librarians here at NLI have given us some feedback about problems with author searches in our Vufind instance. Eoghan has asked me to summarise these problems and suggest some solutions in order to kickstart a discussion about how to improve author search in Vufind. Since this would involve relatively core changes to the way that Vufind does search, we'd prefer to have some feedback from other developers before working on our own solution.

 

Summary of issues:

 

1)       At present only Main Authors - Personal Name (MARC 100) are indexed in the author field in Solr. Since MARC records only permit one main author, this has the disadvantage of relegating second authors to the 700 field and thus the author2 field in Solr. The 700 field (Added Entry - Personal Name) is the same used for other contributors such as illustrators, donors etc. This additional relationship information is typically defined in the $e field, although second authors will not receive an entry in the $e field.  This is means that second authors will not receive query boosting and will effectively be ranked the same in results as donors, illustrators etc. Similarly, where Main authors are Corporate Names or Meeting Names (MARC 110,111), they will be defined as Author2 in Solr instead of author. This problem also carries over into faceting. Since only main authors are used in faceting, it is not possible to facet by Corporate Name or second author.

 

2)       When searching for authors, users who enter only the initial for the first name, e.g. "Lee, J." for Joseph Lee will not receive any results. This is because Solr doesn't have any tokens for the initials.

 

Suggested Solutions:

 

  1. Add 110, 111 to author in marc.properties. This will have the effect of weighting corporate authors / meetings on the same level as personal names.

 

  1. A beanshell script could be written to distinguish between different types of 700 field entries, e.g.:

·         When $e of 700 is blank or value denoting authorship, index in Solr author field

·         When 700$e contains value denoted contribution (e.g. illustrator) index as author 2

·         when 700$e contains other values not related to authorship (e.g. donor) don't index as an author but possibly index elsewhere

 

This would require making author multi-valued which presumably would have a knock-on effect for both PHP logic and Smarty templates, and would require tweaking the search weightings. The script could use the LOC relator terms/codes [1] as a basis, but should be able to lookup a user-specified list of terms/codes too.

 

  1. A .bsh script or Solr regex script could be written to do some additional processing of names (e.g. Lee, Joseph -> Lee + J) and index the results in a new Solr field or in author_additional.

 

Looking forward to hearing from you,

 

Ronan McHugh

National Library of Ireland

 

[1] http://www.loc.gov/marc/relators/relaterm.html

 

 

 


Visit our free exhibitions

___________________________________________

Tabhair cuairt ar ár dtaispeántais saor in aisce

The contents of this e-mail (including attachments) are private and confidential and may also be subject to legal privilege. It is intended only for the use of the addressee. If you are not the addressee, or the person responsible for delivering it to the addressee, you may not copy or deliver this e-mail or any attachments to anyone else or make any use of its contents; you should not read any part of this e-mail or any attachments. Unauthorised disclosure or communication or other use of the contents of this e-mail or any part thereof may be prohibited by law and may constitute a criminal offence. If you receive this e-mail by mistake please notify the system manager @ 6030219.

Tá an ríomhphost seo (agus aon iatán a ghabhann leis) príobháideach agus rúnda agus d’fhéadfadh go mbeadh eolas inti atá faoi phribhléid dhlíthiúil. Ní ceadmhach úsáid an ríomhphoist seo d’éinne ach don té ar seoladh chuige é. Mura duitse an ríomhphost seo nó an té atá freagrach as é a sheoladh, tá cosc ar chóipeáil agus ar sheachadadh an ríomhphoist seo agus aon iatán a ghabhann leis chuig éinne nó úsáid a bhaint as a bhfuil ann; ní ceart an ríomhphost seo nó aon iatán a léamh. D’fhéadfadh go mbeadh cosc iomlán dlíthiúil ar sceitheadh nó comhfhreagras nó aon úsáid eile gan chead ar a bhfuil sa ríomhphost seo agus d’fhéadfadh sé a bheith ina chion coiriúil. Má fuair tú an ríomhphost seo trí earráid, déan teagmháil le bainisteoir an chórais @6030219