Yes, the main thing is to decide on the best and least diruptive way forward. We're happy to wait for 2.0 for this to be fixed, but it certainly seems necessary, especially as we move away from exclusively indexing MARC/Library data.

We made the 110/111 locally for our last release without any side-effects. Here for example, you can see corporate authors in the facets: http://catalogue.nli.ie/Search/Results?lookfor=ireland&type=AllFields&submit=FIND

A particularly nasty example of the 700 problem was actually reported to us directly by an author: excluding the first result where he is the "Main Author", when searching for "Paul Gorry" as an Author, the book "Tracing Irish Ancestors" is ranked as the 6th book even though "Paul Gorry" is an equal co-author of the book: http://catalogue.nli.ie/Search/Results?lookfor=paul+gorry&type=Author. This also leads to labelling problems, where co-authors are unfairly relegated to "Other Authors" (in trunk) or "Contributors" (in our catalogue).

We'll open a Jira Ticket & perhaps look at this when moving to 2.0 Do you have any objections to making "author" multi-valued in 2.0?


On 29 March 2012 15:14, Demian Katz <demian.katz@villanova.edu> wrote:

I think that these are legitimate problems that are worth addressing.  The limitation of “single main author,” in particular, has come to my attention as I try to make VuFind’s code more uniform – limitations of the MARC-inspired Solr schema are hard to reconcile with other systems like Summon which make different assumptions.


That being said, I’m not eager to dramatically change the Solr schema at this moment in time – since we’re on the cusp of VuFind 2.0, I’m trying to avoid disruptive changes to the code base until after the existing logic has been ported to the new architecture.  I’d rather redesign this once rather than twice.  We can certainly start discussing this, and it’s probably worth opening a JIRA ticket to collect feedback, but I would prefer to hold off a little longer on implementation.  Of course, the fact that I’m reluctant to do the work right now doesn’t mean that I’m going to stop anyone else who feels like contributing a patch!


The initial option is a significantly less disruptive change and might be a good place to start work.  I wonder if we could achieve this through a custom analysis chain and copyField in Solr in order to avoid having to change anything in the import process….


If you’re going to be on next week’s developers call, feel free to bring this up if you want to discuss some of these issues in real-time.


- Demian


From: Ronan McHugh [mailto:rmchugh@nli.ie]
Sent: Thursday, March 29, 2012 9:43 AM
To: vufind-tech@lists.sourceforge.net
Subject: [VuFind-Tech] Some discussion re issues with author searches


Hello all,


Over the past while, users and librarians here at NLI have given us some feedback about problems with author searches in our Vufind instance. Eoghan has asked me to summarise these problems and suggest some solutions in order to kickstart a discussion about how to improve author search in Vufind. Since this would involve relatively core changes to the way that Vufind does search, we'd prefer to have some feedback from other developers before working on our own solution.


Summary of issues:


1)  At present only Main Authors - Personal Name (MARC 100) are indexed in the author field in Solr. Since MARC records only permit one main author, this has the disadvantage of relegating second authors to the 700 field and thus the author2 field in Solr. The 700 field (Added Entry - Personal Name) is the same used for other contributors such as illustrators, donors etc. This additional relationship information is typically defined in the $e field, although second authors will not receive an entry in the $e field.  This is means that second authors will not receive query boosting and will effectively be ranked the same in results as donors, illustrators etc. Similarly, where Main authors are Corporate Names or Meeting Names (MARC 110,111), they will be defined as Author2 in Solr instead of author. This problem also carries over into faceting. Since only main authors are used in faceting, it is not possible to facet by Corporate Name or second author.


2)  When searching for authors, users who enter only the initial for the first name, e.g. "Lee, J." for Joseph Lee will not receive any results. This is because Solr doesn't have any tokens for the initials.


Suggested Solutions:


  1. Add 110, 111 to author in marc.properties. This will have the effect of weighting corporate authors / meetings on the same level as personal names.


  1. A beanshell script could be written to distinguish between different types of 700 field entries, e.g.:

·      When $e of 700 is blank or value denoting authorship, index in Solr author field

·      When 700$e contains value denoted contribution (e.g. illustrator) index as author 2

·      when 700$e contains other values not related to authorship (e.g. donor) don't index as an author but possibly index elsewhere


This would require making author multi-valued which presumably would have a knock-on effect for both PHP logic and Smarty templates, and would require tweaking the search weightings. The script could use the LOC relator terms/codes [1] as a basis, but should be able to lookup a user-specified list of terms/codes too.


  1. A .bsh script or Solr regex script could be written to do some additional processing of names (e.g. Lee, Joseph -> Lee + J) and index the results in a new Solr field or in author_additional.


Looking forward to hearing from you,


Ronan McHugh

National Library of Ireland


[1] http://www.loc.gov/marc/relators/relaterm.html





Visit our free exhibitions


Tabhair cuairt ar ?r dtaispe?ntais saor in aisce

The contents of this e-mail (including attachments) are private and confidential and may also be subject to legal privilege. It is intended only for the use of the addressee. If you are not the addressee, or the person responsible for delivering it to the addressee, you may not copy or deliver this e-mail or any attachments to anyone else or make any use of its contents; you should not read any part of this e-mail or any attachments. Unauthorised disclosure or communication or other use of the contents of this e-mail or any part thereof may be prohibited by law and may constitute a criminal offence. If you receive this e-mail by mistake please notify the system manager @ 6030219.

T? an r?omhphost seo (agus aon iat?n a ghabhann leis) pr?obh?ideach agus r?nda agus d?fh?adfadh go mbeadh eolas inti at? faoi phribhl?id dhl?thi?il. N? ceadmhach ?s?id an r?omhphoist seo d??inne ach don t? ar seoladh chuige ?. Mura duitse an r?omhphost seo n? an t? at? freagrach as ? a sheoladh, t? cosc ar ch?ipe?il agus ar sheachadadh an r?omhphoist seo agus aon iat?n a ghabhann leis chuig ?inne n? ?s?id a bhaint as a bhfuil ann; n? ceart an r?omhphost seo n? aon iat?n a l?amh. D?fh?adfadh go mbeadh cosc ioml?n dl?thi?il ar sceitheadh n? comhfhreagras n? aon ?s?id eile gan chead ar a bhfuil sa r?omhphost seo agus d?fh?adfadh s? a bheith ina chion coiri?il. M? fuair t? an r?omhphost seo tr? earr?id, d?an teagmh?il le bainisteoir an ch?rais @6030219


This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
Vufind-tech mailing list