From: Shepard, T. - 1. - M. <tsh...@ll...> - 2014-07-02 18:19:57
|
This is how we set up our id: id = 035a, (pattern_map.id), first then: pattern_map.id.pattern_0 = \\(Sirsi\\)\\ a(.*)=>$1 So using the example below, I wonder, if we have the following 035 fields: .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 shouldn’t the importer use the value “(Sirsi) a360356” and remove the “(Sirsi)” part, leaving us with “a360356”? Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:10 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Trying to load multiple values into a single-valued field will definitely cause your import to fail – very likely the culprit. What does your line for dealing with 035’s look like right now? Can you just add “first” on the end of it, or is it something more complicated that doesn’t support the “first” feature? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:07 PM To: Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] [VuFind-Tech] RDA 264 Thanks, Joe. Great advice! I think in this case, though, the problem is the result of having multiple/duplicate 035 fields. We’ve always had multiple 035s, but until now the vufind importer knew to use only the first one or the one whose ID number is preceded by (Sirsi). Now, after the RDA changes, we have multiple )35 fields preceded by Sirsi, so we need to figure out how to tell the importer to choose just one. (I think – still testing!) My immediate thought was that it was indeed foreign characters or those damned smart quotes, as I’ve grappled with these in my XML harvests, but many of the successfully imported records contained these. Thanks again. I’ll let you know when we know for sure and can fix this. Thom From: Joe Atzberger [mailto:jo...@bo...] Sent: Wednesday, July 02, 2014 1:48 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; vuf...@li...; vuf...@li... Subject: Re: [VuFind-Tech] [VuFind-General] RDA 264 Well, for one thing, there is no such thing as MARC 000 tag, right? Certainly the most common import-buster for me is encoding. Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations. XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing. For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here. Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data. You might be able to get more info from Solr's logs about why that particular transaction was rejected. --Joe On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). I’ve looked at dozens of these failed records for some common denominator, but haven’t found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651 .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...; vuf...@li... Subject: RE: RDA 264 Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [ <mailto:tsh...@ll...> mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: <mailto:vuf...@li...> vuf...@li...; <mailto:vuf...@li...> vuf...@li... Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard |