From: Shepard, T. - 1. - M. <tsh...@ll...> - 2014-07-01 19:55:05
Attachments:
smime.p7s
|
We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The "Bad Request" errors I see during import are the same kind I've found when I've tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard Thom Shepard MIT Lincoln Lab 244 Wood St. Lexington, MA 01523 tsh...@ll... 781 981 0370 |
From: Demian K. <dem...@vi...> - 2014-07-01 19:59:52
|
Which version of VuFind are you using? We've included 264 support since release 2.0; if you're using a 2.x version, the problem isn't simply missing support - it's probably some more specific problem with the data. Feel free to send over a sample record if you'd like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven't done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: vuf...@li...; vuf...@li... Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The "Bad Request" errors I see during import are the same kind I've found when I've tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard Thom Shepard MIT Lincoln Lab 244 Wood St. Lexington, MA 01523 tsh...@ll...<mailto:tsh...@ll...> 781 981 0370 |
From: Shepard, T. - 1. - M. <tsh...@ll...> - 2014-07-01 20:12:55
Attachments:
smime.p7s
|
I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...; vuf...@li... Subject: RE: RDA 264 Which version of VuFind are you using? We've included 264 support since release 2.0; if you're using a 2.x version, the problem isn't simply missing support - it's probably some more specific problem with the data. Feel free to send over a sample record if you'd like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven't done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: vuf...@li...; vuf...@li... Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The "Bad Request" errors I see during import are the same kind I've found when I've tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard Thom Shepard MIT Lincoln Lab 244 Wood St. Lexington, MA 01523 tsh...@ll... 781 981 0370 |
From: Tod O. <to...@uc...> - 2014-07-01 20:23:05
|
Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...<mailto:tsh...@ll...>> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: RE: RDA 264 Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard Thom Shepard MIT Lincoln Lab 244 Wood St. Lexington, MA 01523 tsh...@ll...<mailto:tsh...@ll...> 781 981 0370 ------------------------------------------------------------------------------ Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft_______________________________________________ VuFind-General mailing list VuF...@li... https://lists.sourceforge.net/lists/listinfo/vufind-general |
From: Shepard, T. - 1. - M. <tsh...@ll...> - 2014-07-01 20:26:00
Attachments:
smime.p7s
|
Thanks, Tod. Yes, it is LINUX. I will try this tomorrow. Thom From: Tod Olson [mailto:to...@uc...] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...; vuf...@li... Subject: RE: RDA 264 Which version of VuFind are you using? We've included 264 support since release 2.0; if you're using a 2.x version, the problem isn't simply missing support - it's probably some more specific problem with the data. Feel free to send over a sample record if you'd like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven't done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [ <mailto:tsh...@ll...> mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: <mailto:vuf...@li...> vuf...@li...; <mailto:vuf...@li...> vuf...@li... Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The "Bad Request" errors I see during import are the same kind I've found when I've tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard Thom Shepard MIT Lincoln Lab 244 Wood St. Lexington, MA 01523 <mailto:tsh...@ll...> tsh...@ll... 781 981 0370 ---------------------------------------------------------------------------- -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft______________________________________________ _ VuFind-General mailing list VuF...@li... https://lists.sourceforge.net/lists/listinfo/vufind-general |
From: Shepard, T. - 1. - M. <tsh...@ll...> - 2014-07-02 16:29:15
Attachments:
smime.p7s
|
Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). Ive looked at dozens of these failed records for some common denominator, but havent found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651 .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU pdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...; vuf...@li... Subject: RE: RDA 264 Which version of VuFind are you using? Weve included 264 support since release 2.0; if youre using a 2.x version, the problem isnt simply missing support its probably some more specific problem with the data. Feel free to send over a sample record if youd like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you havent done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [ <mailto:tsh...@ll...> mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: <mailto:vuf...@li...> vuf...@li...; <mailto:vuf...@li...> vuf...@li... Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The Bad Request errors I see during import are the same kind Ive found when Ive tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard Thom Shepard MIT Lincoln Lab 244 Wood St. Lexington, MA 01523 <mailto:tsh...@ll...> tsh...@ll... 781 981 0370 ---------------------------------------------------------------------------- -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft______________________________________________ _ VuFind-General mailing list VuF...@li... https://lists.sourceforge.net/lists/listinfo/vufind-general |
From: Demian K. <dem...@vi...> - 2014-07-02 17:18:33
|
Recent discussion here reminded me of an issue with SolrMarc: right now, we index over HTTP by default due to limitations of the "direct write to index" technique... but doing a direct write tends to give better error messages. It might be worth trying to reindex using direct write mode to see if that gives better clues about what's going on. You can do this simply by editing your local/import/import.properties file and changing the REMOTE setting to the full path to your biblio index. (I recommend switching back to REMOTE after completing this experiment, since direct writing can be problematic). If that doesn't work or isn't practical, did you also try looking at JETTY_CONSOLE output as Tod suggested? It's possible that there are more verbose error messages coming out of the Solr side of things than what's propagating back to SolrMarc over HTTP. - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 12:29 PM To: Tod Olson Cc: Demian Katz; vuf...@li...; vuf...@li... Subject: RE: [VuFind-General] RDA 264 Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). I've looked at dozens of these failed records for some common denominator, but haven't found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651 .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...<mailto:tsh...@ll...>> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: RE: RDA 264 Which version of VuFind are you using? We've included 264 support since release 2.0; if you're using a 2.x version, the problem isn't simply missing support - it's probably some more specific problem with the data. Feel free to send over a sample record if you'd like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven't done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The "Bad Request" errors I see during import are the same kind I've found when I've tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard Thom Shepard MIT Lincoln Lab 244 Wood St. Lexington, MA 01523 tsh...@ll...<mailto:tsh...@ll...> 781 981 0370 ------------------------------------------------------------------------------ Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft_______________________________________________ VuFind-General mailing list VuF...@li...<mailto:VuF...@li...> https://lists.sourceforge.net/lists/listinfo/vufind-general |
From: Shepard, T. - 1. - M. <tsh...@ll...> - 2014-07-02 17:36:45
Attachments:
smime.p7s
|
I think Ive discovered why these records are failing to import. The Backstage RDA clean-up added additional 035 fields to the records in our book collection. If I delete all but one 035 field, the failed records that I tested imported just fine. (When we export from Symphony, we need to put our catalog_key in the 035 field so that it becomes the ID field in vufind.) I can try your technique of the direct write mode to help get confirmation, but Im pretty sure this is the problem. Now to look for a solution! Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 1:18 PM To: Shepard, Thomas - 1150 - MITLL; Tod Olson Cc: vuf...@li...; vuf...@li... Subject: RE: [VuFind-General] RDA 264 Recent discussion here reminded me of an issue with SolrMarc: right now, we index over HTTP by default due to limitations of the direct write to index technique but doing a direct write tends to give better error messages. It might be worth trying to reindex using direct write mode to see if that gives better clues about whats going on. You can do this simply by editing your local/import/import.properties file and changing the REMOTE setting to the full path to your biblio index. (I recommend switching back to REMOTE after completing this experiment, since direct writing can be problematic). If that doesnt work or isnt practical, did you also try looking at JETTY_CONSOLE output as Tod suggested? Its possible that there are more verbose error messages coming out of the Solr side of things than whats propagating back to SolrMarc over HTTP. - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 12:29 PM To: Tod Olson Cc: Demian Katz; vuf...@li...; vuf...@li... Subject: RE: [VuFind-General] RDA 264 Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). Ive looked at dozens of these failed records for some common denominator, but havent found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651 .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpS olrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractU pdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57 ) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...; vuf...@li... Subject: RE: RDA 264 Which version of VuFind are you using? Weve included 264 support since release 2.0; if youre using a 2.x version, the problem isnt simply missing support its probably some more specific problem with the data. Feel free to send over a sample record if youd like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you havent done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [ <mailto:tsh...@ll...> mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: <mailto:vuf...@li...> vuf...@li...; <mailto:vuf...@li...> vuf...@li... Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The Bad Request errors I see during import are the same kind Ive found when Ive tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard Thom Shepard MIT Lincoln Lab 244 Wood St. Lexington, MA 01523 <mailto:tsh...@ll...> tsh...@ll... 781 981 0370 ---------------------------------------------------------------------------- -- Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft______________________________________________ _ VuFind-General mailing list VuF...@li... https://lists.sourceforge.net/lists/listinfo/vufind-general |
From: Joe A. <jo...@bo...> - 2014-07-02 17:48:22
|
Well, for one thing, there is no such thing as MARC 000 tag, right? Certainly the most common import-buster for me is encoding. Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations. XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing. For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here. Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data. You might be able to get more info from Solr's logs about why that particular transaction was rejected. --Joe On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL < tsh...@ll...> wrote: > Redirecting the output of my marc imports (Thanks, Tod!), I was able to > isolate the 001 values of all the records that failed to import. > > While all of our documents and archives records imported successfully from > Symphony into vufind, only half of our book catalog got in (over 31,000 > book records failed to import). > > > > I’ve looked at dozens of these failed records for some common denominator, > but haven’t found one. > > Here is a sample book record that failed to import into vufind. > > > > *** DOCUMENT BOUNDARY *** > > FORM=MARC > > .000. |aam 0c > > .001. |aocm58999172 > > .003. |aOCoLC > > .005. |a20140530203825.0 > > .008. |a050304s2005 cc ab 001 0 eng > > .010. |a 2005284588 > > .020. |a0596008651 (pbk.) > > .035. |a(Sirsi) a360356 > > .035. |a(Sirsi) 31287004800054 > > .035. |a(OCoLC)58999172 > > .035. |a31287004800054 > > .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP > > .050. 00|aGA139|b.M58 2005 > > .100. 1 |aMitchell, Tyler. > > .245. 10|aWeb mapping illustrated /|cTyler Mitchell. > > .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] > > .264. 4|cÃ2005 > > .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm > > .336. |atext|btxt|2rdacontent > > .337. |acomputer|bc|2rdamedia > > .338. |aonline resource|bcr|2rdacarrier > > .500. |aIncludes index. > > .521. |aEbook. > > .650. 0|aDigital mapping. > > .650. 0|aWeb site development. > > .910. |aems > > .994. |aC0|bLIN > > .590. |anbl070323 > > .856. 41|uhttp://proquest.safaribooksonline.com/0596008651 > > .949. |i31287004951048|hLIN > > .596. |a1 > > > > Does anything significant stick out? > > > > (Regarding field 264, the copyright symbol does not seem to be the > problem, as many records got imported fine with it.) > > > > Here is the error when I tried to import only the above record: > > > > Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc > ... > > Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m > -Duser.timezone=UTC -Dsolr.core.name=biblio -jar > /usr/local/vufind2/import/SolrMarc.jar > /usr/local/vufind2/local/import/import_loginrequired-true.properties > /usr/local/vufind2/local/import/librarycat/record360356.mrc > > INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. > > INFO [main] (Utils.java:339) - Opening file: > /usr/local/vufind2/local/import/import_loginrequired-true.properties > > INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at > URL http://localhost:8181/solr/biblio/update > > INFO [main] (MarcHandler.java:371) - Attempting to open data file: > /usr/local/vufind2/local/import/librarycat/record360356.mrc > > ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 > (record count 1) -- Bad Request > > > > Bad Request > > > > request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 > > org.apache.solr.common.SolrException: Bad Request > > > > Bad Request > > > > request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 > > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) > > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) > > at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) > > at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) > > at > org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) > > at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) > > at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) > > at > org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) > > at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) > > at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at com.simontuffs.onejar.Boot.run(Boot.java:334) > > at com.simontuffs.onejar.Boot.main(Boot.java:170) > > ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** > > INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index > > INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index > > INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to > false) > > INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr > > INFO [main] (MarcImporter.java:506) - Setting Solr closed flag > > INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 > > INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per > sec > > INFO [main] (MarcImporter.java:637) - Deleted 0 records > > INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook > > INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook > > > > Thanks in advance, > > Thom Shepard > > > > *From:* Tod Olson [mailto:to...@uc...] > *Sent:* Tuesday, July 01, 2014 4:23 PM > > *To:* Shepard, Thomas - 1150 - MITLL > *Cc:* Tod Olson; Demian Katz; vuf...@li...; > vuf...@li... > *Subject:* Re: [VuFind-General] RDA 264 > > > > Are you running this on a Unix-like box? If so, there are two ways you > could be getting errors to the screen. > > > > 1) import-marc.sh. For this just redirect the output from the import > script to a file: > > > > ./import-marc.sh option option file file > import.log 2>&1 > > > > That will send both stdout and stderr to the file import.log, and you can > see all of the messages there. > > > > 2) Jetty console errors. If you do ./vufind.sh start in the shell and then > do the imports in the same shell, any jetty errors, including Solr errors, > will go to the console. You can send these to a file by setting the > JETTY_CONSOLE environment variable. You can even do this only for the > vufind script: > > > > JETTY_CONSOLE=jettyconsole.log ./vufind.sh start > > > > There are ways to do the same stuff under Windows, but someone else would > have to provide the syntax. > > > > Best, > > > > -Tod > > > > On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL < > tsh...@ll...> wrote: > > > > I believe it is 2.1. > > > > And yes I see now that getpublishers.bsh DOES handle the 264 field. > > > > Unfortunately, I am not allowed to send data beyond our firewall, but I > will look at this more closely tomorrow. > > > > I am wondering, though, if the errors I see flying past the screen are > captured or can be captured so I can determine which records are actually > failing. > > > > Thanks, > > Thom > > > > *From:* Demian Katz [mailto:dem...@vi... > <dem...@vi...>] > *Sent:* Tuesday, July 01, 2014 4:00 PM > *To:* Shepard, Thomas - 1150 - MITLL; vuf...@li...; > vuf...@li... > *Subject:* RE: RDA 264 > > > > Which version of VuFind are you using? We’ve included 264 support since > release 2.0; if you’re using a 2.x version, the problem isn’t simply > missing support – it’s probably some more specific problem with the data. > Feel free to send over a sample record if you’d like help troubleshooting. > You can also do some experimentation on your own using the > getpublishers.bsh BeanShell script if you wish. (I can provide more > information on using BeanShell with the import tool if you haven’t done > this before). > > > > - Demian > > > > *From:* Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll... > <tsh...@ll...>] > *Sent:* Tuesday, July 01, 2014 3:55 PM > *To:* vuf...@li...; > vuf...@li... > *Cc:* Demian Katz > *Subject:* RDA 264 > > > > We recently updated our book collection to accommodate RDA changes. After > Backstage processed our book records, we re-importing them into our > Symphony catalog, but then discovered that only about half of them can be > imported into vufind. > > > > I suspect that the cause is the RDA 264 field. It is used to store > publisher data previously located in the 260 field. In addition, there is > often a second 264 row that contains a copyright year (with the copyright > sign). > > > > The “Bad Request” errors I see during import are the same kind I’ve found > when I’ve tried to import fields that did not exist. > > > > Are there plans to update the vufind importer to include the 264 field and > possibly others related to RDA? > > > > I have successfully edited schema.xml to add non-marc fields for facets > but not sure of the steps in adding marc fields. > > > > Any help would be appreciated. > > > > Thanks, > > Thom Shepard > |
From: Shepard, T. - 1. - M. <tsh...@ll...> - 2014-07-02 18:07:13
Attachments:
smime.p7s
|
Thanks, Joe. Great advice! I think in this case, though, the problem is the result of having multiple/duplicate 035 fields. We’ve always had multiple 035s, but until now the vufind importer knew to use only the first one or the one whose ID number is preceded by (Sirsi). Now, after the RDA changes, we have multiple )35 fields preceded by Sirsi, so we need to figure out how to tell the importer to choose just one. (I think – still testing!) My immediate thought was that it was indeed foreign characters or those damned smart quotes, as I’ve grappled with these in my XML harvests, but many of the successfully imported records contained these. Thanks again. I’ll let you know when we know for sure and can fix this. Thom From: Joe Atzberger [mailto:jo...@bo...] Sent: Wednesday, July 02, 2014 1:48 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; vuf...@li...; vuf...@li... Subject: Re: [VuFind-Tech] [VuFind-General] RDA 264 Well, for one thing, there is no such thing as MARC 000 tag, right? Certainly the most common import-buster for me is encoding. Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations. XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing. For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here. Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data. You might be able to get more info from Solr's logs about why that particular transaction was rejected. --Joe On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). I’ve looked at dozens of these failed records for some common denominator, but haven’t found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651 .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...; vuf...@li... Subject: RE: RDA 264 Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [ <mailto:tsh...@ll...> mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: <mailto:vuf...@li...> vuf...@li...; <mailto:vuf...@li...> vuf...@li... Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard |
From: Demian K. <dem...@vi...> - 2014-07-02 18:10:14
|
Trying to load multiple values into a single-valued field will definitely cause your import to fail – very likely the culprit. What does your line for dealing with 035’s look like right now? Can you just add “first” on the end of it, or is it something more complicated that doesn’t support the “first” feature? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:07 PM To: Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] [VuFind-Tech] RDA 264 Thanks, Joe. Great advice! I think in this case, though, the problem is the result of having multiple/duplicate 035 fields. We’ve always had multiple 035s, but until now the vufind importer knew to use only the first one or the one whose ID number is preceded by (Sirsi). Now, after the RDA changes, we have multiple )35 fields preceded by Sirsi, so we need to figure out how to tell the importer to choose just one. (I think – still testing!) My immediate thought was that it was indeed foreign characters or those damned smart quotes, as I’ve grappled with these in my XML harvests, but many of the successfully imported records contained these. Thanks again. I’ll let you know when we know for sure and can fix this. Thom From: Joe Atzberger [mailto:jo...@bo...] Sent: Wednesday, July 02, 2014 1:48 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; vuf...@li...; vuf...@li... Subject: Re: [VuFind-Tech] [VuFind-General] RDA 264 Well, for one thing, there is no such thing as MARC 000 tag, right? Certainly the most common import-buster for me is encoding. Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations. XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing. For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here. Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data. You might be able to get more info from Solr's logs about why that particular transaction was rejected. --Joe On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...<mailto:tsh...@ll...>> wrote: Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). I’ve looked at dozens of these failed records for some common denominator, but haven’t found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651<http://proquest.safaribooksonline.com/0596008651> .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name<http://Dsolr.core.name>=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...<mailto:to...@uc...>] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...<mailto:tsh...@ll...>> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: RE: RDA 264 Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard |
From: Shepard, T. - 1. - M. <tsh...@ll...> - 2014-07-02 18:19:57
Attachments:
smime.p7s
|
This is how we set up our id: id = 035a, (pattern_map.id), first then: pattern_map.id.pattern_0 = \\(Sirsi\\)\\ a(.*)=>$1 So using the example below, I wonder, if we have the following 035 fields: .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 shouldn’t the importer use the value “(Sirsi) a360356” and remove the “(Sirsi)” part, leaving us with “a360356”? Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:10 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Trying to load multiple values into a single-valued field will definitely cause your import to fail – very likely the culprit. What does your line for dealing with 035’s look like right now? Can you just add “first” on the end of it, or is it something more complicated that doesn’t support the “first” feature? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:07 PM To: Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] [VuFind-Tech] RDA 264 Thanks, Joe. Great advice! I think in this case, though, the problem is the result of having multiple/duplicate 035 fields. We’ve always had multiple 035s, but until now the vufind importer knew to use only the first one or the one whose ID number is preceded by (Sirsi). Now, after the RDA changes, we have multiple )35 fields preceded by Sirsi, so we need to figure out how to tell the importer to choose just one. (I think – still testing!) My immediate thought was that it was indeed foreign characters or those damned smart quotes, as I’ve grappled with these in my XML harvests, but many of the successfully imported records contained these. Thanks again. I’ll let you know when we know for sure and can fix this. Thom From: Joe Atzberger [mailto:jo...@bo...] Sent: Wednesday, July 02, 2014 1:48 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; vuf...@li...; vuf...@li... Subject: Re: [VuFind-Tech] [VuFind-General] RDA 264 Well, for one thing, there is no such thing as MARC 000 tag, right? Certainly the most common import-buster for me is encoding. Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations. XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing. For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here. Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data. You might be able to get more info from Solr's logs about why that particular transaction was rejected. --Joe On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). I’ve looked at dozens of these failed records for some common denominator, but haven’t found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651 .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...; vuf...@li... Subject: RE: RDA 264 Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [ <mailto:tsh...@ll...> mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: <mailto:vuf...@li...> vuf...@li...; <mailto:vuf...@li...> vuf...@li... Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard |
From: Demian K. <dem...@vi...> - 2014-07-02 18:24:06
|
Yes, I would expect that to work as you say. So I guess I must return to my previous advice and suggest that you try direct writing and/or Jetty logging to see if you can find out the exact Solr error to help pinpoint what’s going wrong – either to confirm that 035 is the culprit (and we’re both missing something in the current config) or to discover that there’s some other unrelated problem at work. - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:19 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 This is how we set up our id: id = 035a, (pattern_map.id), first then: pattern_map.id.pattern_0 = \\(Sirsi\\)\\ a(.*)=>$1 So using the example below, I wonder, if we have the following 035 fields: .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 shouldn’t the importer use the value “(Sirsi) a360356” and remove the “(Sirsi)” part, leaving us with “a360356”? Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:10 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Trying to load multiple values into a single-valued field will definitely cause your import to fail – very likely the culprit. What does your line for dealing with 035’s look like right now? Can you just add “first” on the end of it, or is it something more complicated that doesn’t support the “first” feature? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:07 PM To: Joe Atzberger Cc: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: Re: [VuFind-General] [VuFind-Tech] RDA 264 Thanks, Joe. Great advice! I think in this case, though, the problem is the result of having multiple/duplicate 035 fields. We’ve always had multiple 035s, but until now the vufind importer knew to use only the first one or the one whose ID number is preceded by (Sirsi). Now, after the RDA changes, we have multiple )35 fields preceded by Sirsi, so we need to figure out how to tell the importer to choose just one. (I think – still testing!) My immediate thought was that it was indeed foreign characters or those damned smart quotes, as I’ve grappled with these in my XML harvests, but many of the successfully imported records contained these. Thanks again. I’ll let you know when we know for sure and can fix this. Thom From: Joe Atzberger [mailto:jo...@bo...] Sent: Wednesday, July 02, 2014 1:48 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: Re: [VuFind-Tech] [VuFind-General] RDA 264 Well, for one thing, there is no such thing as MARC 000 tag, right? Certainly the most common import-buster for me is encoding. Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations. XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing. For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here. Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data. You might be able to get more info from Solr's logs about why that particular transaction was rejected. --Joe On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...<mailto:tsh...@ll...>> wrote: Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). I’ve looked at dozens of these failed records for some common denominator, but haven’t found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651<http://proquest.safaribooksonline.com/0596008651> .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name<http://Dsolr.core.name>=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...<mailto:to...@uc...>] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...<mailto:tsh...@ll...>> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: RE: RDA 264 Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard |
From: Shepard, T. - 1. - M. <tsh...@ll...> - 2014-07-02 18:59:37
Attachments:
smime.p7s
|
Chris Miller, who knows much more about Solr than I do, reported this: Thom, the Solr log has a bunch of these (from ~10:30 and ~13:40): org.apache.solr.common.SolrException: ERROR: [doc=416471] unknown field 'oclc_num' I don’t know where oclc_num is coming from, do you? – Thom Assuming 416471 is the catkey for the record, here is the record itself: *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocn874902700 .003. |aOCoLC .005. |a20140530201152.0 .008. |a140327s2014 maua b 001 0 eng d .020. |a9781608076994 .020. |a1608076997 .035. |a(Sirsi) a416471 .035. |a(Sirsi) 31287005131616 .035. |a(OCoLC)874902700 .035. |a(Sirsi)31287005131616 .040. |aVYR|beng|erda|cVYR|dOCLCO|dYDXCP|dBTCTA|dBDX|dUKMGB|dCDX|dLTSCA|dIXA .050. 0|aTK5103.48325|b.K674 2014 .100. 1 |aKorhonen, Juha,|eauthor. .245. 10|aIntroduction to 4G mobile communications /|cJuha Korhonen. .264. 1|aBoston :|bArtech House,|c[2014] .300. |axi, 289 pages :|billustrations ;|c26 cm. .336. |atext|btxt|2rdacontent .337. |aunmediated|bn|2rdamedia .338. |avolume|bnc|2rdacarrier .440. 0|aArtech House mobile communications series .504. |aIncludes bibliographical references and index. .520. |a"Juha Korhonen is a project manager within the Mobile Competence Center at ETSI, Sophia Antipolis, France. He earned his Ph.D. in telecommunications engineering from University of Cambridge in Cambridge, UK. Long Term Evolution (LTE) was originally an internal 3GPP name for a program to enhance the capabilities of 3G radio access networks. The nickname has now evolved to become synonymous with 4G. This book concentrates on 4G systems, also known as LTE-Advanced. Telecommunications engineers and students are provided with a history of these systems, along with an overview of a mobile telecommunications system. The overview addresses the components in the system as well as their function. This resource guides telecommunications engineers though many important aspects of 4G including the air interface physical layer, Radio Access Networks, and 3GPP standardization, to name a few" --|cProvided by publisher. .650. 0|aLong-Term Evolution (Telecommunications) .910. |ajk .994. |aC0|bLIN .590. |anbl140606 .949. |i31287005046830|hLIN .596. |a1 From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:24 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Yes, I would expect that to work as you say. So I guess I must return to my previous advice and suggest that you try direct writing and/or Jetty logging to see if you can find out the exact Solr error to help pinpoint what’s going wrong – either to confirm that 035 is the culprit (and we’re both missing something in the current config) or to discover that there’s some other unrelated problem at work. - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:19 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 This is how we set up our id: id = 035a, (pattern_map.id), first then: pattern_map.id.pattern_0 = \\(Sirsi\\)\\ <file:///\\(Sirsi\)\> a(.*)=>$1 So using the example below, I wonder, if we have the following 035 fields: .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 shouldn’t the importer use the value “(Sirsi) a360356” and remove the “(Sirsi)” part, leaving us with “a360356”? Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:10 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Trying to load multiple values into a single-valued field will definitely cause your import to fail – very likely the culprit. What does your line for dealing with 035’s look like right now? Can you just add “first” on the end of it, or is it something more complicated that doesn’t support the “first” feature? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:07 PM To: Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] [VuFind-Tech] RDA 264 Thanks, Joe. Great advice! I think in this case, though, the problem is the result of having multiple/duplicate 035 fields. We’ve always had multiple 035s, but until now the vufind importer knew to use only the first one or the one whose ID number is preceded by (Sirsi). Now, after the RDA changes, we have multiple )35 fields preceded by Sirsi, so we need to figure out how to tell the importer to choose just one. (I think – still testing!) My immediate thought was that it was indeed foreign characters or those damned smart quotes, as I’ve grappled with these in my XML harvests, but many of the successfully imported records contained these. Thanks again. I’ll let you know when we know for sure and can fix this. Thom From: Joe Atzberger [mailto:jo...@bo...] Sent: Wednesday, July 02, 2014 1:48 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; vuf...@li...; vuf...@li... Subject: Re: [VuFind-Tech] [VuFind-General] RDA 264 Well, for one thing, there is no such thing as MARC 000 tag, right? Certainly the most common import-buster for me is encoding. Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations. XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing. For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here. Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data. You might be able to get more info from Solr's logs about why that particular transaction was rejected. --Joe On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). I’ve looked at dozens of these failed records for some common denominator, but haven’t found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651 .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...; vuf...@li... Subject: RE: RDA 264 Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [ <mailto:tsh...@ll...> mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: <mailto:vuf...@li...> vuf...@li...; <mailto:vuf...@li...> vuf...@li... Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard |
From: Demian K. <dem...@vi...> - 2014-07-02 19:06:23
|
oclc_num is part of the standard VuFind import rules: oclc_num = 035a, (pattern_map.oclc_num) pattern_map.oclc_num.pattern_0 = \\([Oo][Cc][Oo][Ll][Cc]\\)[^0-9]*[0]*([0-9]+)=>$1 pattern_map.oclc_num.pattern_1 = ocm[0]*([0-9]+)[ ]*[0-9]*=>$1 pattern_map.oclc_num.pattern_2 = ocn[0]*([0-9]+).*=>$1 pattern_map.oclc_num.pattern_3 = on[0]*([0-9]+).*=>$1 It’s also defined in the standard schema: <field name="oclc_num" type="string" indexed="true" stored="true" multiValued="true" /> Did this somehow get lost in your copy? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:59 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Chris Miller, who knows much more about Solr than I do, reported this: Thom, the Solr log has a bunch of these (from ~10:30 and ~13:40): org.apache.solr.common.SolrException: ERROR: [doc=416471] unknown field 'oclc_num' I don’t know where oclc_num is coming from, do you? – Thom Assuming 416471 is the catkey for the record, here is the record itself: *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocn874902700 .003. |aOCoLC .005. |a20140530201152.0 .008. |a140327s2014 maua b 001 0 eng d .020. |a9781608076994 .020. |a1608076997 .035. |a(Sirsi) a416471 .035. |a(Sirsi) 31287005131616 .035. |a(OCoLC)874902700 .035. |a(Sirsi)31287005131616 .040. |aVYR|beng|erda|cVYR|dOCLCO|dYDXCP|dBTCTA|dBDX|dUKMGB|dCDX|dLTSCA|dIXA .050. 0|aTK5103.48325|b.K674 2014 .100. 1 |aKorhonen, Juha,|eauthor. .245. 10|aIntroduction to 4G mobile communications /|cJuha Korhonen. .264. 1|aBoston :|bArtech House,|c[2014] .300. |axi, 289 pages :|billustrations ;|c26 cm. .336. |atext|btxt|2rdacontent .337. |aunmediated|bn|2rdamedia .338. |avolume|bnc|2rdacarrier .440. 0|aArtech House mobile communications series .504. |aIncludes bibliographical references and index. .520. |a"Juha Korhonen is a project manager within the Mobile Competence Center at ETSI, Sophia Antipolis, France. He earned his Ph.D. in telecommunications engineering from University of Cambridge in Cambridge, UK. Long Term Evolution (LTE) was originally an internal 3GPP name for a program to enhance the capabilities of 3G radio access networks. The nickname has now evolved to become synonymous with 4G. This book concentrates on 4G systems, also known as LTE-Advanced. Telecommunications engineers and students are provided with a history of these systems, along with an overview of a mobile telecommunications system. The overview addresses the components in the system as well as their function. This resource guides telecommunications engineers though many important aspects of 4G including the air interface physical layer, Radio Access Networks, and 3GPP standardization, to name a few" --|cProvided by publisher. .650. 0|aLong-Term Evolution (Telecommunications) .910. |ajk .994. |aC0|bLIN .590. |anbl140606 .949. |i31287005046830|hLIN .596. |a1 From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:24 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Yes, I would expect that to work as you say. So I guess I must return to my previous advice and suggest that you try direct writing and/or Jetty logging to see if you can find out the exact Solr error to help pinpoint what’s going wrong – either to confirm that 035 is the culprit (and we’re both missing something in the current config) or to discover that there’s some other unrelated problem at work. - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:19 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...>; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 This is how we set up our id: id = 035a, (pattern_map.id), first then: pattern_map.id.pattern_0 = \\(Sirsi\\)\\<file:///\\(Sirsi\)\> a(.*)=>$1 So using the example below, I wonder, if we have the following 035 fields: .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 shouldn’t the importer use the value “(Sirsi) a360356” and remove the “(Sirsi)” part, leaving us with “a360356”? Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:10 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Trying to load multiple values into a single-valued field will definitely cause your import to fail – very likely the culprit. What does your line for dealing with 035’s look like right now? Can you just add “first” on the end of it, or is it something more complicated that doesn’t support the “first” feature? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:07 PM To: Joe Atzberger Cc: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: Re: [VuFind-General] [VuFind-Tech] RDA 264 Thanks, Joe. Great advice! I think in this case, though, the problem is the result of having multiple/duplicate 035 fields. We’ve always had multiple 035s, but until now the vufind importer knew to use only the first one or the one whose ID number is preceded by (Sirsi). Now, after the RDA changes, we have multiple )35 fields preceded by Sirsi, so we need to figure out how to tell the importer to choose just one. (I think – still testing!) My immediate thought was that it was indeed foreign characters or those damned smart quotes, as I’ve grappled with these in my XML harvests, but many of the successfully imported records contained these. Thanks again. I’ll let you know when we know for sure and can fix this. Thom From: Joe Atzberger [mailto:jo...@bo...] Sent: Wednesday, July 02, 2014 1:48 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: Re: [VuFind-Tech] [VuFind-General] RDA 264 Well, for one thing, there is no such thing as MARC 000 tag, right? Certainly the most common import-buster for me is encoding. Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations. XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing. For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here. Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data. You might be able to get more info from Solr's logs about why that particular transaction was rejected. --Joe On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...<mailto:tsh...@ll...>> wrote: Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). I’ve looked at dozens of these failed records for some common denominator, but haven’t found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651<http://proquest.safaribooksonline.com/0596008651> .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name<http://Dsolr.core.name>=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...<mailto:to...@uc...>] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...<mailto:tsh...@ll...>> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: RE: RDA 264 Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard |
From: Shepard, T. - 1. - M. <tsh...@ll...> - 2014-07-02 19:30:09
Attachments:
smime.p7s
|
Ah… Problem solved! We – okay, it was probably ME! – somehow deleted oclc_num from schema.xml. I re-inserted it and my test record loaded just fine. Thanks for all of your input and sorry for not catching this myself. The upside of this experience is that I learned more about vufind’s inner workings. Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 3:06 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 oclc_num is part of the standard VuFind import rules: oclc_num = 035a, (pattern_map.oclc_num) pattern_map.oclc_num.pattern_0 = \\([Oo][Cc][Oo][Ll][Cc]\\)[^0-9]*[0]*([0-9]+)= <file:///\\([Oo][Cc][Oo][Ll][Cc]\)%5b%5e0-9%5d*%5b0%5d*(%5b0-9%5d+)=%3e$1> >$1 pattern_map.oclc_num.pattern_1 = ocm[0]*([0-9]+)[ ]*[0-9]*=>$1 pattern_map.oclc_num.pattern_2 = ocn[0]*([0-9]+).*=>$1 pattern_map.oclc_num.pattern_3 = on[0]*([0-9]+).*=>$1 It’s also defined in the standard schema: <field name="oclc_num" type="string" indexed="true" stored="true" multiValued="true" /> Did this somehow get lost in your copy? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:59 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Chris Miller, who knows much more about Solr than I do, reported this: Thom, the Solr log has a bunch of these (from ~10:30 and ~13:40): org.apache.solr.common.SolrException: ERROR: [doc=416471] unknown field 'oclc_num' I don’t know where oclc_num is coming from, do you? – Thom Assuming 416471 is the catkey for the record, here is the record itself: *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocn874902700 .003. |aOCoLC .005. |a20140530201152.0 .008. |a140327s2014 maua b 001 0 eng d .020. |a9781608076994 .020. |a1608076997 .035. |a(Sirsi) a416471 .035. |a(Sirsi) 31287005131616 .035. |a(OCoLC)874902700 .035. |a(Sirsi)31287005131616 .040. |aVYR|beng|erda|cVYR|dOCLCO|dYDXCP|dBTCTA|dBDX|dUKMGB|dCDX|dLTSCA|dIXA .050. 0|aTK5103.48325|b.K674 2014 .100. 1 |aKorhonen, Juha,|eauthor. .245. 10|aIntroduction to 4G mobile communications /|cJuha Korhonen. .264. 1|aBoston :|bArtech House,|c[2014] .300. |axi, 289 pages :|billustrations ;|c26 cm. .336. |atext|btxt|2rdacontent .337. |aunmediated|bn|2rdamedia .338. |avolume|bnc|2rdacarrier .440. 0|aArtech House mobile communications series .504. |aIncludes bibliographical references and index. .520. |a"Juha Korhonen is a project manager within the Mobile Competence Center at ETSI, Sophia Antipolis, France. He earned his Ph.D. in telecommunications engineering from University of Cambridge in Cambridge, UK. Long Term Evolution (LTE) was originally an internal 3GPP name for a program to enhance the capabilities of 3G radio access networks. The nickname has now evolved to become synonymous with 4G. This book concentrates on 4G systems, also known as LTE-Advanced. Telecommunications engineers and students are provided with a history of these systems, along with an overview of a mobile telecommunications system. The overview addresses the components in the system as well as their function. This resource guides telecommunications engineers though many important aspects of 4G including the air interface physical layer, Radio Access Networks, and 3GPP standardization, to name a few" --|cProvided by publisher. .650. 0|aLong-Term Evolution (Telecommunications) .910. |ajk .994. |aC0|bLIN .590. |anbl140606 .949. |i31287005046830|hLIN .596. |a1 From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:24 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Yes, I would expect that to work as you say. So I guess I must return to my previous advice and suggest that you try direct writing and/or Jetty logging to see if you can find out the exact Solr error to help pinpoint what’s going wrong – either to confirm that 035 is the culprit (and we’re both missing something in the current config) or to discover that there’s some other unrelated problem at work. - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:19 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 This is how we set up our id: id = 035a, (pattern_map.id), first then: pattern_map.id.pattern_0 = \\(Sirsi\\)\\ <file:///\\(Sirsi\)\> a(.*)=>$1 So using the example below, I wonder, if we have the following 035 fields: .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 shouldn’t the importer use the value “(Sirsi) a360356” and remove the “(Sirsi)” part, leaving us with “a360356”? Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:10 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Trying to load multiple values into a single-valued field will definitely cause your import to fail – very likely the culprit. What does your line for dealing with 035’s look like right now? Can you just add “first” on the end of it, or is it something more complicated that doesn’t support the “first” feature? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:07 PM To: Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] [VuFind-Tech] RDA 264 Thanks, Joe. Great advice! I think in this case, though, the problem is the result of having multiple/duplicate 035 fields. We’ve always had multiple 035s, but until now the vufind importer knew to use only the first one or the one whose ID number is preceded by (Sirsi). Now, after the RDA changes, we have multiple )35 fields preceded by Sirsi, so we need to figure out how to tell the importer to choose just one. (I think – still testing!) My immediate thought was that it was indeed foreign characters or those damned smart quotes, as I’ve grappled with these in my XML harvests, but many of the successfully imported records contained these. Thanks again. I’ll let you know when we know for sure and can fix this. Thom From: Joe Atzberger [mailto:jo...@bo...] Sent: Wednesday, July 02, 2014 1:48 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; vuf...@li...; vuf...@li... Subject: Re: [VuFind-Tech] [VuFind-General] RDA 264 Well, for one thing, there is no such thing as MARC 000 tag, right? Certainly the most common import-buster for me is encoding. Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations. XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing. For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here. Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data. You might be able to get more info from Solr's logs about why that particular transaction was rejected. --Joe On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). I’ve looked at dozens of these failed records for some common denominator, but haven’t found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651 .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...; vuf...@li... Subject: RE: RDA 264 Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [ <mailto:tsh...@ll...> mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: <mailto:vuf...@li...> vuf...@li...; <mailto:vuf...@li...> vuf...@li... Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard |
From: Demian K. <dem...@vi...> - 2014-07-02 19:33:07
|
Glad to help, and as you say, these things are often helpful learning experiences! - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 3:23 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Ah… Problem solved! We – okay, it was probably ME! – somehow deleted oclc_num from schema.xml. I re-inserted it and my test record loaded just fine. Thanks for all of your input and sorry for not catching this myself. The upside of this experience is that I learned more about vufind’s inner workings. Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 3:06 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 oclc_num is part of the standard VuFind import rules: oclc_num = 035a, (pattern_map.oclc_num) pattern_map.oclc_num.pattern_0 = \\([Oo][Cc][Oo][Ll][Cc]\\)[^0-9]*[0]*([0-9]+)=>$1<file:///\\([Oo][Cc][Oo][Ll][Cc]\)%5b%5e0-9%5d*%5b0%5d*(%5b0-9%5d+)=%3e$1> pattern_map.oclc_num.pattern_1 = ocm[0]*([0-9]+)[ ]*[0-9]*=>$1 pattern_map.oclc_num.pattern_2 = ocn[0]*([0-9]+).*=>$1 pattern_map.oclc_num.pattern_3 = on[0]*([0-9]+).*=>$1 It’s also defined in the standard schema: <field name="oclc_num" type="string" indexed="true" stored="true" multiValued="true" /> Did this somehow get lost in your copy? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:59 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...>; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Chris Miller, who knows much more about Solr than I do, reported this: Thom, the Solr log has a bunch of these (from ~10:30 and ~13:40): org.apache.solr.common.SolrException: ERROR: [doc=416471] unknown field 'oclc_num' I don’t know where oclc_num is coming from, do you? – Thom Assuming 416471 is the catkey for the record, here is the record itself: *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocn874902700 .003. |aOCoLC .005. |a20140530201152.0 .008. |a140327s2014 maua b 001 0 eng d .020. |a9781608076994 .020. |a1608076997 .035. |a(Sirsi) a416471 .035. |a(Sirsi) 31287005131616 .035. |a(OCoLC)874902700 .035. |a(Sirsi)31287005131616 .040. |aVYR|beng|erda|cVYR|dOCLCO|dYDXCP|dBTCTA|dBDX|dUKMGB|dCDX|dLTSCA|dIXA .050. 0|aTK5103.48325|b.K674 2014 .100. 1 |aKorhonen, Juha,|eauthor. .245. 10|aIntroduction to 4G mobile communications /|cJuha Korhonen. .264. 1|aBoston :|bArtech House,|c[2014] .300. |axi, 289 pages :|billustrations ;|c26 cm. .336. |atext|btxt|2rdacontent .337. |aunmediated|bn|2rdamedia .338. |avolume|bnc|2rdacarrier .440. 0|aArtech House mobile communications series .504. |aIncludes bibliographical references and index. .520. |a"Juha Korhonen is a project manager within the Mobile Competence Center at ETSI, Sophia Antipolis, France. He earned his Ph.D. in telecommunications engineering from University of Cambridge in Cambridge, UK. Long Term Evolution (LTE) was originally an internal 3GPP name for a program to enhance the capabilities of 3G radio access networks. The nickname has now evolved to become synonymous with 4G. This book concentrates on 4G systems, also known as LTE-Advanced. Telecommunications engineers and students are provided with a history of these systems, along with an overview of a mobile telecommunications system. The overview addresses the components in the system as well as their function. This resource guides telecommunications engineers though many important aspects of 4G including the air interface physical layer, Radio Access Networks, and 3GPP standardization, to name a few" --|cProvided by publisher. .650. 0|aLong-Term Evolution (Telecommunications) .910. |ajk .994. |aC0|bLIN .590. |anbl140606 .949. |i31287005046830|hLIN .596. |a1 From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:24 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...>; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Yes, I would expect that to work as you say. So I guess I must return to my previous advice and suggest that you try direct writing and/or Jetty logging to see if you can find out the exact Solr error to help pinpoint what’s going wrong – either to confirm that 035 is the culprit (and we’re both missing something in the current config) or to discover that there’s some other unrelated problem at work. - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:19 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...>; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 This is how we set up our id: id = 035a, (pattern_map.id), first then: pattern_map.id.pattern_0 = \\(Sirsi\\)\\<file:///\\(Sirsi\)\> a(.*)=>$1 So using the example below, I wonder, if we have the following 035 fields: .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 shouldn’t the importer use the value “(Sirsi) a360356” and remove the “(Sirsi)” part, leaving us with “a360356”? Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:10 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Trying to load multiple values into a single-valued field will definitely cause your import to fail – very likely the culprit. What does your line for dealing with 035’s look like right now? Can you just add “first” on the end of it, or is it something more complicated that doesn’t support the “first” feature? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:07 PM To: Joe Atzberger Cc: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: Re: [VuFind-General] [VuFind-Tech] RDA 264 Thanks, Joe. Great advice! I think in this case, though, the problem is the result of having multiple/duplicate 035 fields. We’ve always had multiple 035s, but until now the vufind importer knew to use only the first one or the one whose ID number is preceded by (Sirsi). Now, after the RDA changes, we have multiple )35 fields preceded by Sirsi, so we need to figure out how to tell the importer to choose just one. (I think – still testing!) My immediate thought was that it was indeed foreign characters or those damned smart quotes, as I’ve grappled with these in my XML harvests, but many of the successfully imported records contained these. Thanks again. I’ll let you know when we know for sure and can fix this. Thom From: Joe Atzberger [mailto:jo...@bo...] Sent: Wednesday, July 02, 2014 1:48 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: Re: [VuFind-Tech] [VuFind-General] RDA 264 Well, for one thing, there is no such thing as MARC 000 tag, right? Certainly the most common import-buster for me is encoding. Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations. XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing. For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here. Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data. You might be able to get more info from Solr's logs about why that particular transaction was rejected. --Joe On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...<mailto:tsh...@ll...>> wrote: Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). I’ve looked at dozens of these failed records for some common denominator, but haven’t found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651<http://proquest.safaribooksonline.com/0596008651> .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name<http://Dsolr.core.name>=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml&version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...<mailto:to...@uc...>] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...<mailto:tsh...@ll...>> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Subject: RE: RDA 264 Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: vuf...@li...<mailto:vuf...@li...>; vuf...@li...<mailto:vuf...@li...> Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard |
From: Shepard, T. - 1. - M. <tsh...@ll...> - 2014-07-02 19:37:02
Attachments:
smime.p7s
|
I just reloaded our 80,000+ book collection (the first complete import with RDA data)and it was 100% successful! Thanks again, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 3:33 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Glad to help, and as you say, these things are often helpful learning experiences! - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 3:23 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Ah… Problem solved! We – okay, it was probably ME! – somehow deleted oclc_num from schema.xml. I re-inserted it and my test record loaded just fine. Thanks for all of your input and sorry for not catching this myself. The upside of this experience is that I learned more about vufind’s inner workings. Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 3:06 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 oclc_num is part of the standard VuFind import rules: oclc_num = 035a, (pattern_map.oclc_num) pattern_map.oclc_num.pattern_0 = \\([Oo][Cc][Oo][Ll][Cc]\\)[^0-9]*[0]*([0-9]+)= <file:///\\([Oo][Cc][Oo][Ll][Cc]\)%5b%5e0-9%5d*%5b0%5d*(%5b0-9%5d+)=%3e$1> >$1 pattern_map.oclc_num.pattern_1 = ocm[0]*([0-9]+)[ ]*[0-9]*=>$1 pattern_map.oclc_num.pattern_2 = ocn[0]*([0-9]+).*=>$1 pattern_map.oclc_num.pattern_3 = on[0]*([0-9]+).*=>$1 It’s also defined in the standard schema: <field name="oclc_num" type="string" indexed="true" stored="true" multiValued="true" /> Did this somehow get lost in your copy? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:59 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Chris Miller, who knows much more about Solr than I do, reported this: Thom, the Solr log has a bunch of these (from ~10:30 and ~13:40): org.apache.solr.common.SolrException: ERROR: [doc=416471] unknown field 'oclc_num' I don’t know where oclc_num is coming from, do you? – Thom Assuming 416471 is the catkey for the record, here is the record itself: *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocn874902700 .003. |aOCoLC .005. |a20140530201152.0 .008. |a140327s2014 maua b 001 0 eng d .020. |a9781608076994 .020. |a1608076997 .035. |a(Sirsi) a416471 .035. |a(Sirsi) 31287005131616 .035. |a(OCoLC)874902700 .035. |a(Sirsi)31287005131616 .040. |aVYR|beng|erda|cVYR|dOCLCO|dYDXCP|dBTCTA|dBDX|dUKMGB|dCDX|dLTSCA|dIXA .050. 0|aTK5103.48325|b.K674 2014 .100. 1 |aKorhonen, Juha,|eauthor. .245. 10|aIntroduction to 4G mobile communications /|cJuha Korhonen. .264. 1|aBoston :|bArtech House,|c[2014] .300. |axi, 289 pages :|billustrations ;|c26 cm. .336. |atext|btxt|2rdacontent .337. |aunmediated|bn|2rdamedia .338. |avolume|bnc|2rdacarrier .440. 0|aArtech House mobile communications series .504. |aIncludes bibliographical references and index. .520. |a"Juha Korhonen is a project manager within the Mobile Competence Center at ETSI, Sophia Antipolis, France. He earned his Ph.D. in telecommunications engineering from University of Cambridge in Cambridge, UK. Long Term Evolution (LTE) was originally an internal 3GPP name for a program to enhance the capabilities of 3G radio access networks. The nickname has now evolved to become synonymous with 4G. This book concentrates on 4G systems, also known as LTE-Advanced. Telecommunications engineers and students are provided with a history of these systems, along with an overview of a mobile telecommunications system. The overview addresses the components in the system as well as their function. This resource guides telecommunications engineers though many important aspects of 4G including the air interface physical layer, Radio Access Networks, and 3GPP standardization, to name a few" --|cProvided by publisher. .650. 0|aLong-Term Evolution (Telecommunications) .910. |ajk .994. |aC0|bLIN .590. |anbl140606 .949. |i31287005046830|hLIN .596. |a1 From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:24 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Yes, I would expect that to work as you say. So I guess I must return to my previous advice and suggest that you try direct writing and/or Jetty logging to see if you can find out the exact Solr error to help pinpoint what’s going wrong – either to confirm that 035 is the culprit (and we’re both missing something in the current config) or to discover that there’s some other unrelated problem at work. - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:19 PM To: Demian Katz; Joe Atzberger Cc: vuf...@li...; vuf...@li...; Menk, Robert - 1150 - MITLL; Miller, Christopher - 1150 - MITLL Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 This is how we set up our id: id = 035a, (pattern_map.id), first then: pattern_map.id.pattern_0 = \\(Sirsi\\)\\ <file:///\\(Sirsi\)\> a(.*)=>$1 So using the example below, I wonder, if we have the following 035 fields: .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 shouldn’t the importer use the value “(Sirsi) a360356” and remove the “(Sirsi)” part, leaving us with “a360356”? Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Wednesday, July 02, 2014 2:10 PM To: Shepard, Thomas - 1150 - MITLL; Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: RE: [VuFind-General] [VuFind-Tech] RDA 264 Trying to load multiple values into a single-valued field will definitely cause your import to fail – very likely the culprit. What does your line for dealing with 035’s look like right now? Can you just add “first” on the end of it, or is it something more complicated that doesn’t support the “first” feature? - Demian From: Shepard, Thomas - 1150 - MITLL [mailto:tsh...@ll...] Sent: Wednesday, July 02, 2014 2:07 PM To: Joe Atzberger Cc: vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] [VuFind-Tech] RDA 264 Thanks, Joe. Great advice! I think in this case, though, the problem is the result of having multiple/duplicate 035 fields. We’ve always had multiple 035s, but until now the vufind importer knew to use only the first one or the one whose ID number is preceded by (Sirsi). Now, after the RDA changes, we have multiple )35 fields preceded by Sirsi, so we need to figure out how to tell the importer to choose just one. (I think – still testing!) My immediate thought was that it was indeed foreign characters or those damned smart quotes, as I’ve grappled with these in my XML harvests, but many of the successfully imported records contained these. Thanks again. I’ll let you know when we know for sure and can fix this. Thom From: Joe Atzberger [mailto:jo...@bo...] Sent: Wednesday, July 02, 2014 1:48 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; vuf...@li...; vuf...@li... Subject: Re: [VuFind-Tech] [VuFind-General] RDA 264 Well, for one thing, there is no such thing as MARC 000 tag, right? Certainly the most common import-buster for me is encoding. Sometimes it is from Japanese or Lithuanian characters, sometimes from damned smart-quotes, and sometimes it is non-ASCII whitespace or phantom combining characters that are invisible to most presentations. XML cannot directly contain non-ascii characters (including MARC control characters that faulty toolchain components have passed in as data), so it would be useful to see the XML output that this layer is producing. For completeness, also make sure your 001s do not include whitespace (leading, trailing or otherwise), though this one looks OK here. Bad, missing or duplicate IDs is another common tripping point when I'm working with low quality data. You might be able to get more info from Solr's logs about why that particular transaction was rejected. --Joe On Wed, Jul 2, 2014 at 12:29 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: Redirecting the output of my marc imports (Thanks, Tod!), I was able to isolate the 001 values of all the records that failed to import. While all of our documents and archives records imported successfully from Symphony into vufind, only half of our book catalog got in (over 31,000 book records failed to import). I’ve looked at dozens of these failed records for some common denominator, but haven’t found one. Here is a sample book record that failed to import into vufind. *** DOCUMENT BOUNDARY *** FORM=MARC .000. |aam 0c .001. |aocm58999172 .003. |aOCoLC .005. |a20140530203825.0 .008. |a050304s2005 cc ab 001 0 eng .010. |a 2005284588 .020. |a0596008651 (pbk.) .035. |a(Sirsi) a360356 .035. |a(Sirsi) 31287004800054 .035. |a(OCoLC)58999172 .035. |a31287004800054 .040. |aUKM|cUKM|dCUS|dIXA|dBAKER|dOCLCQ|dDLC|dVRC|dBTCTA|dLVB|dYDXCP .050. 00|aGA139|b.M58 2005 .100. 1 |aMitchell, Tyler. .245. 10|aWeb mapping illustrated /|cTyler Mitchell. .264. 1|aBeijing ;|aFarnham :|bO'Reilly,|c[2005] .264. 4|cÃ2005 .300. |axvi, 349 pages :|billustrations, maps ;|c24 cm .336. |atext|btxt|2rdacontent .337. |acomputer|bc|2rdamedia .338. |aonline resource|bcr|2rdacarrier .500. |aIncludes index. .521. |aEbook. .650. 0|aDigital mapping. .650. 0|aWeb site development. .910. |aems .994. |aC0|bLIN .590. |anbl070323 .856. 41|uhttp://proquest.safaribooksonline.com/0596008651 .949. |i31287004951048|hLIN .596. |a1 Does anything significant stick out? (Regarding field 264, the copyright symbol does not seem to be the problem, as many records got imported fine with it.) Here is the error when I tried to import only the above record: Now Importing /usr/local/vufind2/local/import/librarycat/record360356.mrc ... Jul 02, 11:23:04 /usr/lib/jvm/java-openjdk/bin/java -Xms512m -Xmx512m -Duser.timezone=UTC -Dsolr.core.name=biblio -jar /usr/local/vufind2/import/SolrMarc.jar /usr/local/vufind2/local/import/import_loginrequired-true.properties /usr/local/vufind2/local/import/librarycat/record360356.mrc INFO [main] (MarcImporter.java:851) - Starting SolrMarc indexing. INFO [main] (Utils.java:339) - Opening file: /usr/local/vufind2/local/import/import_loginrequired-true.properties INFO [main] (MarcImporter.java:784) - Connecting to remote Solr server at URL http://localhost:8181/solr/biblio/update INFO [main] (MarcHandler.java:371) - Attempting to open data file: /usr/local/vufind2/local/import/librarycat/record360356.mrc ERROR [main] (MarcImporter.java:380) - Unable to index record ocm58999172 (record count 1) -- Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 org.apache.solr.common.SolrException: Bad Request Bad Request request: http://localhost:8181/solr/biblio/update?wt=xml <http://localhost:8181/solr/biblio/update?wt=xml&version=2.2> &version=2.2 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:434) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:248) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:121) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:106) at org.solrmarc.solr.SolrServerProxy.addDoc(SolrServerProxy.java:56) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:474) at org.solrmarc.marc.MarcImporter.addToIndex(MarcImporter.java:400) at org.solrmarc.marc.MarcImporter.importRecords(MarcImporter.java:313) at org.solrmarc.marc.MarcImporter.handleAll(MarcImporter.java:607) at org.solrmarc.marc.MarcImporter.main(MarcImporter.java:867) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.simontuffs.onejar.Boot.run(Boot.java:334) at com.simontuffs.onejar.Boot.main(Boot.java:170) ERROR [main] (MarcImporter.java:383) - ******** Halting indexing! ******** INFO [main] (MarcImporter.java:617) - Adding 0 of 1 documents to index INFO [main] (MarcImporter.java:618) - Deleting 0 documents from index INFO [main] (MarcImporter.java:491) - Calling commit (with optimize set to false) INFO [main] (MarcImporter.java:503) - Done with the commit, closing Solr INFO [main] (MarcImporter.java:506) - Setting Solr closed flag INFO [main] (MarcImporter.java:627) - Finished indexing in 0:00.00 INFO [main] (MarcImporter.java:636) - Indexed 0 at a rate of about 0.0 per sec INFO [main] (MarcImporter.java:637) - Deleted 0 records INFO [Thread-1] (MarcImporter.java:566) - Starting Shutdown hook INFO [Thread-1] (MarcImporter.java:585) - Finished Shutdown hook Thanks in advance, Thom Shepard From: Tod Olson [mailto:to...@uc...] Sent: Tuesday, July 01, 2014 4:23 PM To: Shepard, Thomas - 1150 - MITLL Cc: Tod Olson; Demian Katz; vuf...@li...; vuf...@li... Subject: Re: [VuFind-General] RDA 264 Are you running this on a Unix-like box? If so, there are two ways you could be getting errors to the screen. 1) import-marc.sh. For this just redirect the output from the import script to a file: ./import-marc.sh option option file file > import.log 2>&1 That will send both stdout and stderr to the file import.log, and you can see all of the messages there. 2) Jetty console errors. If you do ./vufind.sh start in the shell and then do the imports in the same shell, any jetty errors, including Solr errors, will go to the console. You can send these to a file by setting the JETTY_CONSOLE environment variable. You can even do this only for the vufind script: JETTY_CONSOLE=jettyconsole.log ./vufind.sh start There are ways to do the same stuff under Windows, but someone else would have to provide the syntax. Best, -Tod On Jul 1, 2014, at 3:12 PM, Shepard, Thomas - 1150 - MITLL <tsh...@ll...> wrote: I believe it is 2.1. And yes I see now that getpublishers.bsh DOES handle the 264 field. Unfortunately, I am not allowed to send data beyond our firewall, but I will look at this more closely tomorrow. I am wondering, though, if the errors I see flying past the screen are captured or can be captured so I can determine which records are actually failing. Thanks, Thom From: Demian Katz [mailto:dem...@vi...] Sent: Tuesday, July 01, 2014 4:00 PM To: Shepard, Thomas - 1150 - MITLL; vuf...@li...; vuf...@li... Subject: RE: RDA 264 Which version of VuFind are you using? We’ve included 264 support since release 2.0; if you’re using a 2.x version, the problem isn’t simply missing support – it’s probably some more specific problem with the data. Feel free to send over a sample record if you’d like help troubleshooting. You can also do some experimentation on your own using the getpublishers.bsh BeanShell script if you wish. (I can provide more information on using BeanShell with the import tool if you haven’t done this before). - Demian From: Shepard, Thomas - 1150 - MITLL [ <mailto:tsh...@ll...> mailto:tsh...@ll...] Sent: Tuesday, July 01, 2014 3:55 PM To: <mailto:vuf...@li...> vuf...@li...; <mailto:vuf...@li...> vuf...@li... Cc: Demian Katz Subject: RDA 264 We recently updated our book collection to accommodate RDA changes. After Backstage processed our book records, we re-importing them into our Symphony catalog, but then discovered that only about half of them can be imported into vufind. I suspect that the cause is the RDA 264 field. It is used to store publisher data previously located in the 260 field. In addition, there is often a second 264 row that contains a copyright year (with the copyright sign). The “Bad Request” errors I see during import are the same kind I’ve found when I’ve tried to import fields that did not exist. Are there plans to update the vufind importer to include the 264 field and possibly others related to RDA? I have successfully edited schema.xml to add non-marc fields for facets but not sure of the steps in adding marc fields. Any help would be appreciated. Thanks, Thom Shepard |