Folks,

Using code from the solrmarc project, I've done some (test-driven!) local coding for standard numbers: ISBN, ISSN, OCLC and LCCN.  I thought I would share the information, FWIW; I am happy to share the code as well if folks are interested.  I also have the algorithms and a lot of relevant additional information in a Stanford-only wiki, but I can presumably get a PDF version or something that I could pass around (or possibly cut and paste the wiki text into another wiki somewhere).

I am of the belief that the indexing should take care of the massaging of data as necessary, not the UI code.  So stripping following text, prefixes and the like is done in my indexing code.

For ISBN and ISSN, our cataloging expert pointed out that we want to be as *inclusive* as possible for our users: when they are looking in *our* index, we should enable matching occurring in as many cases as possible (maximizing "recall"!).   On the other hand, when we are using these numbers for retrieving external resources (e.g. Google Book Search), we want the numbers that are most likely to get us a correct answer.  These are two different needs, and they require two different fields:

<!-- isbn is for code to do external lookups by ISBN (e.g. Google Book Search) -->
<!-- TODO:  change isbn to isbn_store -->
<field name="isbn" type="string" indexed="false" stored="true" multiValued="true"/>
<!-- isbnUser_search is for end users to search our index via an ISBN -->
<field name="isbnUser_search" type="string" indexed="true" stored="false" multiValued="true"/>
<!-- issn is for code to do external lookups by ISSN -->
<!-- TODO:  change isbn to issn_store -->
<field name="issn" type="string" indexed="false" stored="true" multiValued="true"/>
<!-- issnUser_search is for end users to search our index via an ISSN -->
<field name="issnUser_search" type="string" indexed="true" stored="false" multiValued="true"/>

ISBN
------
a. multiple ISBN in a single marc bib record are allowed.
b. 10 or 13 digit number (last digit may also be "X").
c. Strip any following text.

isbnUser_search field (for end users to search our index):
----
1.  all 020 subfields a starting with an ISBN string - strip following text
2.  AND  all 020 subfields z starting with an ISBN string - strip following text

isbn (for external lookups)
----
1.  all 020 subfields a starting with an ISBN string - strip following text
2.  if none,  all 020 subfields z starting with an ISBN string - strip following text

ISSN
-----
a. multiple ISSN in a single marc bib record are allowed.
b. 4 digit number followed by hyphen followed by 4 digit number (last digit may also be "X").

issnUser_search field (for end users to search our index):
   I was able to implement this using a pattern map in our vufind.properties file.
----
1.  all 020 subfields a with ISSN
2.  AND  all 020 subfields "l" (letter "L") with ISSN
3.  AND  all 020 subfields m with ISSN
4.  AND  all 020 subfields y with ISSN
5.  AND  all 020 subfields z with ISSN

issn (for external lookups)
----
1.  all 020 subfields a with ISSN
5.  if none,  all 020 subfields z with ISSN


OCLC and LCCN are not exposed to end users, so we want to use the data that is most likely to get us correct retrieval ("precision"!) in external resources, such as OCLCWorldCat or Google Book Search.   Moreover, since this data does not need to be searched in our catalog by our users, it is not imperative to index these fields, though we must store them.  Choosing to index these fields would enable staff searches on these numbers, if that is desired.

solr/conf/schema.xml:
<!-- lccn number for code to do external lookups -->
<field name="lccn_store" type="string" indexed="false" stored="true"/>
<!-- oclc number for google book search links and for oclc worldcat links -->
<field name="oclc_store" type="string" indexed="false" stored="true" multiValued="true"/>

OCLC:
------
a. multiple OCLC numbers in a single marc bib record are allowed.

1.  all 035 subfields a with *our local prefix* "(OCoLC-M)"
2.  if none, all 079 sufields a prefixed "ocm" or "ocn"
3.  if none of the above, all 035 subfields a prefixed "(OCoLC)"

LCCN :
-------    
a. at most one per marc bib record.
b. Strip following text, but not prefixes.  (Not sure this is correct, but that's what I did.)
c. I was able to implement this using a pattern map in our vufind.properties file.

1. 010 subfield a.
2.  if none, 010 subfield z.


Naomi Dushay
ndushay@stanford.edu