Glad to help, and as you say, these things are often helpful learning experiences!

- Demian

From: Shepard, Thomas - 1150 - MITLL [mailto:tshepard@ll.mit.edu]
Sent: Wednesday, July 02, 2014 3:23 PM
To: Demian Katz; Joe Atzberger
Cc: vufind-tech@lists.sourceforge.net; vufind-general@lists.sourceforge.net; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL
Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264

Ah…

Problem solved!

We – okay, it was probably ME! – somehow deleted oclc_num from schema.xml.

I re-inserted it and my test record loaded just fine.

Thanks for all of your input and sorry for not catching this myself.

The upside of this experience is that I learned more about vufind’s inner workings.

Thom

From: Demian Katz [mailto:demian.katz@villanova.edu]
Sent: Wednesday, July 02, 2014 3:06 PM
To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger
Cc: vufind-tech@lists.sourceforge.net; vufind-general@lists.sourceforge.net; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL
Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264

oclc_num is part of the standard VuFind import rules:

oclc_num = 035a, (pattern_map.oclc_num)

pattern_map.oclc_num.pattern_0 = \\([Oo][Cc][Oo][Ll][Cc]\\)[^0-9]*[0]*([0-9]+)=>\$1

pattern_map.oclc_num.pattern_1 = ocm[0]*([0-9]+)[ ]*[0-9]*=>\$1

pattern_map.oclc_num.pattern_2 = ocn[0]*([0-9]+).*=>\$1

pattern_map.oclc_num.pattern_3 = on[0]*([0-9]+).*=>\$1

It’s also defined in the standard schema:

<field name="oclc_num" type="string" indexed="true" stored="true" multiValued="true" />

Did this somehow get lost in your copy?

- Demian

From: Shepard, Thomas - 1150 - MITLL [mailto:tshepard@ll.mit.edu]
Sent: Wednesday, July 02, 2014 2:59 PM
To: Demian Katz; Joe Atzberger
Cc: vufind-tech@lists.sourceforge.net; vufind-general@lists.sourceforge.net; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL
Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264

Chris Miller, who knows much more about Solr than I do, reported this:

Thom, the Solr log has a bunch of these (from ~10:30 and ~13:40):

`org.apache.solr.common.SolrException: ERROR: [doc=416471] unknown field 'oclc_num'`

I don’t know where oclc_num is coming from, do you? – Thom

Assuming 416471 is the catkey for the record, here is the record itself:

*** DOCUMENT BOUNDARY ***

FORM=MARC

.000. |aam  0c

.001. |aocn874902700

.003. |aOCoLC

.005. |a20140530201152.0

.008. |a140327s2014    maua     b    001 0 eng d

.020.   |a9781608076994

.020.   |a1608076997

.035.   |a(Sirsi) a416471

.035.   |a(Sirsi) 31287005131616

.035.   |a(OCoLC)874902700

.035.   |a(Sirsi)31287005131616

.040.   |aVYR|beng|erda|cVYR|dOCLCO|dYDXCP|dBTCTA|dBDX|dUKMGB|dCDX|dLTSCA|dIXA

.050.  0|aTK5103.48325|b.K674 2014

.100. 1 |aKorhonen, Juha,|eauthor.

.245. 10|aIntroduction to 4G mobile communications /|cJuha Korhonen.

.264.  1|aBoston :|bArtech House,|c[2014]

.300.   |axi, 289 pages :|billustrations ;|c26 cm.

.336.   |atext|btxt|2rdacontent

.337.   |aunmediated|bn|2rdamedia

.338.   |avolume|bnc|2rdacarrier

.440.  0|aArtech House mobile communications series

.504.   |aIncludes bibliographical references and index.

.520.   |a"Juha Korhonen is a project manager within the Mobile Competence Center at ETSI, Sophia Antipolis, France. He earned his Ph.D. in telecommunications engineering from University of Cambridge in Cambridge, UK. Long Term Evolution (LTE) was originally an internal 3GPP name for a program to enhance the capabilities of 3G radio access networks. The nickname has now evolved to become synonymous with 4G. This book concentrates on 4G systems, also known as LTE-Advanced. Telecommunications engineers and students are provided with a history of these systems, along with an overview of a mobile telecommunications system. The overview addresses the components in the system as well as their function. This resource guides telecommunications engineers though many important aspects of 4G including the air interface physical layer, Radio Access Networks, and 3GPP standardization, to name a few" --|cProvided by publisher.

.650.  0|aLong-Term Evolution (Telecommunications)

.910.   |ajk

.994.   |aC0|bLIN

.590.   |anbl140606

.949.   |i31287005046830|hLIN

.596.   |a1

From: Demian Katz [mailto:demian.katz@villanova.edu]
Sent: Wednesday, July 02, 2014 2:24 PM
To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger
Cc: vufind-tech@lists.sourceforge.net; vufind-general@lists.sourceforge.net; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL
Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264

Yes, I would expect that to work as you say. So I guess I must return to my previous advice and suggest that you try direct writing and/or Jetty logging to see if you can find out the exact Solr error to help pinpoint what’s going wrong – either to confirm that 035 is the culprit (and we’re both missing something in the current config) or to discover that there’s some other unrelated problem at work.

- Demian

From: Shepard, Thomas - 1150 - MITLL [mailto:tshepard@ll.mit.edu]
Sent: Wednesday, July 02, 2014 2:19 PM
To: Demian Katz; Joe Atzberger
Cc: vufind-tech@lists.sourceforge.net; vufind-general@lists.sourceforge.net; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL
Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264

This is how we set up our id:

id = 035a, (pattern_map.id), first

then:

pattern_map.id.pattern_0 = \\(Sirsi\\)\\ a(.*)=>\$1

So using the example below, I wonder, if we have the following 035 fields:

.035.   |a(Sirsi) a360356

.035.   |a(Sirsi) 31287004800054

.035.   |a(OCoLC)58999172

.035.   |a31287004800054

shouldn’t the importer use the value “(Sirsi) a360356” and remove the “(Sirsi)” part, leaving us with “a360356”?

Thanks,

Thom

From: Demian Katz [mailto:demian.katz@villanova.edu]
Sent: Wednesday, July 02, 2014 2:10 PM
To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger
Cc: vufind-tech@lists.sourceforge.net; vufind-general@lists.sourceforge.net
Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264

Trying to load multiple values into a single-valued field will definitely cause your import to fail – very likely the culprit. What does your line for dealing with 035’s look like right now? Can you just add “first” on the end of it, or is it something more complicated that doesn’t support the “first” feature?

- Demian

From: Shepard, Thomas - 1150 - MITLL [mailto:tshepard@ll.mit.edu]
Sent: Wednesday, July 02, 2014 2:07 PM
To: Joe Atzberger
Cc: vufind-tech@lists.sourceforge.net; vufind-general@lists.sourceforge.net
Subject: Re: [VuFind-General] [VuFind-Tech] RDA 264

I think in this case, though, the problem is the result of having multiple/duplicate 035 fields. We’ve always had multiple 035s, but until now the vufind importer knew to use only the first one or the one whose ID number is preceded by (Sirsi). Now, after the RDA changes, we have multiple )35 fields preceded by Sirsi, so we need to figure out how to tell the importer to choose just one. (I think – still testing!)

My immediate thought was that it was indeed foreign characters or those damned smart quotes, as I’ve grappled with these in my XML harvests, but many of the successfully imported records contained these.

Thanks again. I’ll let you know when we know for sure and can fix this.

Thom

From: Joe Atzberger [mailto:joe@booksite.com]
Sent: Wednesday, July 02, 2014 1:48 PM
To: Shepard, Thomas - 1150 - MITLL
Cc: Tod Olson; vufind-tech@lists.sourceforge.net; vufind-general@lists.sourceforge.net
Subject: Re: [VuFind-Tech] [VuFind-General] RDA 264

Well, for one thing, there is no such thing as MARC 000 tag, right?

Certainly the most common import-buster for me is encoding.  Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations.  XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing.

For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here.  Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data.

You might be able to get more info from Solr's logs about why that particular transaction was rejected.

--Joe

On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL <tshepard@ll.mit.edu> wrote:

Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import.

While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import).

I’ve looked at dozens of these failed records for some common denominator, but haven’t found one.

Here is a sample book record that failed to import into vufind.

*** DOCUMENT BOUNDARY ***

FORM=MARC

.000. |aam  0c

.001. |aocm58999172

.003. |aOCoLC

.005. |a20140530203825.0

.008. |a050304s2005    cc ab         001 0 eng

.010.   |a  2005284588

.020.   |a0596008651 (pbk.)

.035.   |a(Sirsi) a360356

.035.   |a(Sirsi) 31287004800054

.035.   |a(OCoLC)58999172

.035.   |a31287004800054

.040.   |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP

.050. 00|aGA139|b.M58 2005

.100. 1 |aMitchell, Tyler.

.245. 10|aWeb mapping illustrated /|cTyler Mitchell.

.264.  1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005]

.264.  4|cÃ2005

.300.   |axvi, 349 pages :|billustrations, maps ;|c24 cm

.336.   |atext|btxt|2rdacontent

.337.   |acomputer|bc|2rdamedia

.338.   |aonline resource|bcr|2rdacarrier

.500.   |aIncludes index.

.521.   |aEbook.

.650.  0|aWeb site development.

.910.   |aems

.994.   |aC0|bLIN

.590.   |anbl070323

.856. 41|uhttp://proquest.safaribooksonline.com/0596008651

.949.   |i31287004951048|hLIN

.596.   |a1

Does anything significant stick out?

(Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.)

Here is the error when I tried to import only the above record:

Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ...

Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name=biblio  -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc

INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing.

INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties

INFO [main] (MarcImporter.java:784) -  Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update

INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc

ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request

at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434)

at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248)

at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)

at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313)

at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607)

at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at com.simontuffs.onejar.Boot.run(Boot.java:334)

at com.simontuffs.onejar.Boot.main(Boot.java:170)

ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ********

INFO [main] (MarcImporter.java:617) -  Adding 0 of 1 documents to index

INFO [main] (MarcImporter.java:618) -  Deleting 0 documents from index

INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false)

INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr

INFO [main] (MarcImporter.java:506) - Setting Solr closed flag

INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00

INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec

INFO [main] (MarcImporter.java:637) - Deleted 0 records

INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook

INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook

Thom Shepard

From: Tod Olson [mailto:tod@uchicago.edu]
Sent: Tuesday, July 01, 2014 4:23 PM

To: Shepard, Thomas - 1150 - MITLL

Cc: Tod Olson; Demian Katz; vufind-tech@lists.sourceforge.net; vufind-general@lists.sourceforge.net
Subject: Re: [VuFind-General] RDA 264

Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen.

1) import-marc.sh. For this just redirect the output from the import script to a file:

./import-marc.sh option option file file > import.log 2>&1

That will send both stdout and stderr to the file import.log, and you can see all of the messages there.

2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script:

JETTY_CONSOLE=jettyconsole.log ./vufind.sh start

There are ways to do the same stuff under Windows, but someone else would have to provide the syntax.

Best,

-Tod

On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tshepard@ll.mit.edu> wrote:

I believe it is 2.1.

And yes I see now that getpublishers.bsh DOES handle the 264 field.

Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow.

I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing.

Thanks,

Thom

From: Demian Katz [mailto:demian.katz@villanova.edu
Sent: Tuesday, July 01, 2014 4:00 PM
To: Shepard, Thomas - 1150 - MITLL; vufind-tech@lists.sourceforge.net; vufind-general@lists.sourceforge.net
Subject: RE: RDA 264

Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before).

- Demian

From: Shepard, Thomas - 1150 - MITLL [mailto:tshepard@ll.mit.edu
Sent: Tuesday, July 01, 2014 3:55 PM
To: vufind-tech@lists.sourceforge.netvufind-general@lists.sourceforge.net
Cc: Demian Katz
Subject: RDA 264

We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind.

I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that  contains a copyright year (with the copyright sign).

The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist.

Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA?

I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields.

Any help would be appreciated.

Thanks,

Thom Shepard