|
From: Natalia T. <nt...@ce...> - 2006-06-14 08:01:22
|
Hi
this is my first experience with NutchWax after crawl with heritrix.
I've installed all software (hadoop, nutch ...) to run nutchwax+wera
whith jobs crawled.
When I run all of the indexing steps in one go by passing the 'all'
directive to NutchWAX using this command
% ${HADOOP_HOME}/bin/hadoop jar ${NUTCHWAX_HOME}/nutchwax-0.6.1.jar
all /tmp/inputs /tmp/outputs test
I get this error
java.lang.RuntimeException: org.apache.nutch.net.URLFilter not found.
at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:47)
at
org.apache.nutch.parse.ParseOutputFormat.getRecordWriter(ParseOutputFormat.java:47)
at
org.apache.nutch.fetcher.FetcherOutputFormat$1.<init>(FetcherOutputFormat.java:69)
at
org.apache.nutch.fetcher.FetcherOutputFormat.getRecordWriter(FetcherOutputFormat.java:58)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:265)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:124)
Can anyone tellme which is the problem?
Thanks
Natalia
|