From: Greg P. <pen...@us...> - 2009-05-21 23:18:05
|
I'm in the same boat. :) Last time I touched Java was in Uni. The source can be grabbed here : http://code.google.com/p/solrmarc/source/checkout And you'll want to install ant to make compiling easier : http://ant.apache.org/index.html I installed from ant from source on our server (solaris was a bit more painful), but the binary install for our windows test box was very easy. If it helps at all here's an excerpt from our server install notes... obviously the paths etc. are very specific to us. ===== 16 Dec 08 * Installing ant (source) under the solr user (/home/solr/ant/) * Installing solrmarc (source) under the solr user (/home/solr/solrmarc/) * Download the junit dependency for ant into (/home/solr/ant/lib/junit-X.jar): http://www.junit.org/ (Just get the latest .jar, 4.5 today) * ^^^ Undocumented bug in ant 1.7.0 building from source. Found in FAQ: http://ant.apache.org/faq.html (http://ant.apache.org/faq.html#170-requires-junit) * Modify /home/solr/ant/build.sh to include right at the top: JAVA_HOME="/usr/java" CLASSPATH=/home/solr/ant/lib/junit-4.5.jar * Modify /home/solr/ant/bootstrap.sh to include right at the top: JAVA_HOME="/usr/java" CLASSPATH=/home/solr/ant/lib/junit-4.5.jar * Run from command line: sh /home/solr/ant/build.sh -Ddist.dir="/home/solr/ant" dist * Ant is now built * Wrote a quick script for rebuilding solrmarc: /home/solr/solrmarc/rebuild_script.sh PATH=/home/solr/ant/bin:$PATH ANT_HOME=/home/solr/ant cd /home/solr/solrmarc ant * solrmarc rebuilds successfully ===== Whenever you rebuild solrmarc you need to copy the compiled jar file into the import directory. I ended up setting up a quick script to make my testing faster: ===== /home/solr/solrmarc/rebuild_script.sh cp /home/solr/solrmarc/dist/MarcImporter.jar /home/solr/import/dist/MarcImporter.jar /home/solr/stop_jetty.sh /home/solr/import.sh /home/solr/import/test.mrc /home/solr/start_jetty.sh ===== The stop/start Jetty lines are optional, sometimes I comment them out. I understand the latest version of solrmarc talks to Jetty a lot better, but for now I just deal with it. The test.mrc file contained a small sample of our most troublesome marc records. I hope that helps you get started. With regards to the code itself I just bumbled my way through. There are several far more knowledgeable people on this list about the codebase though. PS. You'll note I moved this to the tech list. Greg Pendlebury Electronic Services Officer (Systems Team) Division of Academic Information Services University of Southern Queensland Phone: +61 7 4631 1501 Fax: +61 7 4631 1841 ________________________________ From: Philip Shafer [mailto:sh...@ro...] Sent: Thursday, 21 May 2009 11:58 PM To: Greg Pendlebury; 'vuf...@li...' Subject: Re: Evaluating and reporting import errors Thanks Greg, very informative. Can you provide instructions how to best get the source, modify it, and recompile it? I'm not all that well versed in java, it's been quite a few years since I had last used it, and it was mainly for a class I had for a semester. Haven't touched it since. -Phil ------------------------------ Philip Shafer Library System Services Rowan University Library 201 Mullica Hill Rd Glassboro, NJ 08028 856-256-4418 856-256-4924 Fax ________________________________ From: Greg Pendlebury <pen...@us...> Date: Thu, 21 May 2009 10:43:41 +1000 To: 'Philip Shafer' <sh...@ro...>, "'vuf...@li...'" <vuf...@li...> Subject: RE: Evaluating and reporting import errors Hi Philip, This was a little bit of an iterative process for us. We just rebuilt solrmarc with the logging tweaked to the way we'd like it. Some simple code changes make the logs more readable. At the top level in MarcImporter we got rid of most log lines relating to successful imports and left only every 5,000th record with a timestamp. Even these aren't necessary, but it's nice to see progress in the terminal. addToIndex(record); if (recordCounter % 5000 == 1) { Date now = new Date(); logger.info("Adding record " + recordCounter + ": " + record.getControlNumber() + " : " + now.toString()); } We also added a whole bunch of extra logging inside the actual indexer class to account for (horrible) weirdness in our marc files (records without call numbers... malformed 008 etc), but the biggest difference was found in getting rid of all the success lines... it makes the errors easier to find. Here's the iterative bit. Once you get the overly verbose stack traces you can take them back to a line number in the indexer and add the error reporting and handling. After we'd found all those oddities our log files are were very small with simple errors like: ERROR main org.solrmarc.index.USQIndexer - Record with no call number! : vtls000596093 INFO main org.solrmarc.marc.MarcImporter - Adding record 360001: vtls000596643 : Thu Apr 16 11:37:38 EST 2009 INFO main org.solrmarc.marc.MarcImporter - Adding record 365001: vtls000601645 : Thu Apr 16 11:38:27 EST 2009 INFO main org.solrmarc.marc.MarcImporter - Adding record 370001: vtls000606647 : Thu Apr 16 11:39:02 EST 2009 INFO main org.solrmarc.marc.MarcImporter - Adding record 375001: vtls000611649 : Thu Apr 16 11:39:34 EST 2009 ERROR main org.solrmarc.index.USQIndexer - Record with no call number! : vtls000611773 Trim out all the success lines when it's finished and pass it on to cataloguers to fix, that's what I do anyway... they love hearing from me :) Bear in mind that this is all with the version of solrmarc that shipped with RC1. Haven't even looked at the newer version Andrew is adding in to RC2. Hope that helps, Greg Pendlebury Electronic Services Officer (Systems Team) Division of Academic Information Services University of Southern Queensland Phone: +61 7 4631 1501 Fax: +61 7 4631 1841 ________________________________ From: Philip Shafer [mailto:sh...@ro...] Sent: Wednesday, 20 May 2009 3:02 AM To: vuf...@li... Subject: [VuFind-General] Evaluating and reporting import errors What is the best way that we can analyze our import-log to find which records are not being imported. Our voyager export is reporting 382,000+ records being exported, however only a little over 378,000 actually exist in the index. How can I find which records are not being imported, and what their errors actually are? -Phil ------------------------------ Philip Shafer Library System Services Rowan University Library 201 Mullica Hill Rd Glassboro, NJ 08028 856-256-4418 856-256-4924 Fax ________________________________ This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government (CRICOS Institution Code No's. QLD 00244B / NSW 02225M) This email (including any attached files) is confidential and is for the intended recipient(s) only. If you received this email by mistake, please, as a courtesy, tell the sender, then delete this email. The views and opinions are the originator's and do not necessarily reflect those of the University of Southern Queensland. Although all reasonable precautions were taken to ensure that this email contained no viruses at the time it was sent we accept no liability for any losses arising from its receipt. The University of Southern Queensland is a registered provider of education with the Australian Government (CRICOS Institution Code No's. QLD 00244B / NSW 02225M) |