|
From: Kaisa K. <kau...@cc...> - 2006-11-03 07:56:24
|
I had something similar and was given the advice to use the very lastest version of nutchwax with hadoop-0.5.0 (and not hadoop-0.7.2 for example) On Thu, 2 Nov 2006, James Grahn wrote: > Greets, > I have been attempting to follow the tutorial to get NutchWAX up and > running in standalone mode, but I've reached an error that confounds me. > > The printlns seem to indicate that NutchWAX does successfully import the > ARC files. > > I see this line: > opening /tmp/mirror/heretrix/IAH-20061026194403-00000.arc.gz > > And after many individual pages being imported, I see this line: > > 061102 115327 opening /tmp/mirror/heretrix/IAH-20061026194522-00001.arc.gz > > This followed by more individual pages. So that seems fine. But no > index is generated and the printlns end like this: > > ... > 061102 115345 adding http://www.cnn.com/CNN/Programs/student.news/ 24869 > text/html > 061102 115345 adding http://www.cnn.com/CNN/Programs/people/ 367 text/html > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357) > at > org.archive.access.nutch.ImportArcs.importArcs(ImportArcs.java:519) > at org.archive.access.nutch.IndexArcs.doImport(IndexArcs.java:154) > at org.archive.access.nutch.IndexArcs.doAll(IndexArcs.java:139) > at org.archive.access.nutch.IndexArcs.doJob(IndexArcs.java:246) > at org.archive.access.nutch.IndexArcs.main(IndexArcs.java:439) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:130) > > > -------- > > Any suggestions for this error? I am using a hadoop installation I > acquired with the current version of nutch, and am running the "all" > command as per the tutorial: > > ${HADOOP_HOME}/bin/hadoop jar ${NUTCHWAX_HOME}/nutchwax.jar all > /tmp/inputs /tmp/outputs test > > > Thanks, > James > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Archive-access-discuss mailing list > Arc...@li... > https://lists.sourceforge.net/lists/listinfo/archive-access-discuss > |