|
From: James G. <jg...@si...> - 2006-11-02 17:46:04
|
Greets, I have been attempting to follow the tutorial to get NutchWAX up and running in standalone mode, but I've reached an error that confounds me. The printlns seem to indicate that NutchWAX does successfully import the ARC files. I see this line: opening /tmp/mirror/heretrix/IAH-20061026194403-00000.arc.gz And after many individual pages being imported, I see this line: 061102 115327 opening /tmp/mirror/heretrix/IAH-20061026194522-00001.arc.gz This followed by more individual pages. So that seems fine. But no index is generated and the printlns end like this: ... 061102 115345 adding http://www.cnn.com/CNN/Programs/student.news/ 24869 text/html 061102 115345 adding http://www.cnn.com/CNN/Programs/people/ 367 text/html Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) at org.archive.access.nutch.ImportArcs.importArcs(ImportArcs.java:519) at org.archive.access.nutch.IndexArcs.doImport(IndexArcs.java:154) at org.archive.access.nutch.IndexArcs.doAll(IndexArcs.java:139) at org.archive.access.nutch.IndexArcs.doJob(IndexArcs.java:246) at org.archive.access.nutch.IndexArcs.main(IndexArcs.java:439) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.hadoop.util.RunJar.main(RunJar.java:130) -------- Any suggestions for this error? I am using a hadoop installation I acquired with the current version of nutch, and am running the "all" command as per the tutorial: ${HADOOP_HOME}/bin/hadoop jar ${NUTCHWAX_HOME}/nutchwax.jar all /tmp/inputs /tmp/outputs test Thanks, James |