From: <sta...@us...> - 2007-04-10 18:18:58
|
Revision: 1715 http://archive-access.svn.sourceforge.net/archive-access/?rev=1715&view=rev Author: stack-sf Date: 2007-04-10 11:18:57 -0700 (Tue, 10 Apr 2007) Log Message: ----------- M nutchwax/xdocs/faq.fml Fix broke xml (close code element). M nutchwax/nutchwax-core/src/main/java/org/archive/access/nutch/Multiple.java Fix javadoc link. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/nutchwax-core/src/main/java/org/archive/access/nutch/Multiple.java trunk/archive-access/projects/nutchwax/xdocs/faq.fml Modified: trunk/archive-access/projects/nutchwax/nutchwax-core/src/main/java/org/archive/access/nutch/Multiple.java =================================================================== --- trunk/archive-access/projects/nutchwax/nutchwax-core/src/main/java/org/archive/access/nutch/Multiple.java 2007-04-10 18:12:24 UTC (rev 1714) +++ trunk/archive-access/projects/nutchwax/nutchwax-core/src/main/java/org/archive/access/nutch/Multiple.java 2007-04-10 18:18:57 UTC (rev 1715) @@ -32,7 +32,7 @@ /** * Run multiple concurrent non-mapreduce {@link ToolBase} tasks such as * {@link org.apache.nutch.indexer.IndexMerger} or - * {@link org.apache.indexer.IndexSorter}. + * {@link org.apache.nutch.indexer.IndexSorter}. * * Takes input that has per line the name of the class to run and the arguments * to pass. Here is an example line for IndexMerger: Modified: trunk/archive-access/projects/nutchwax/xdocs/faq.fml =================================================================== --- trunk/archive-access/projects/nutchwax/xdocs/faq.fml 2007-04-10 18:12:24 UTC (rev 1714) +++ trunk/archive-access/projects/nutchwax/xdocs/faq.fml 2007-04-10 18:18:57 UTC (rev 1715) @@ -85,11 +85,11 @@ <question>How do I merge segments in NutchWAX</question> <answer><p> Run the following to see the usage: -<pre>$ ${HADOOP_HOME}/bin/hadoop jar nutchwax-job-0.11.0-SNAPSHOT.jar class org.apache.nutch.segment.SegmentMerger</pre> +<pre>% ${HADOOP_HOME}/bin/hadoop jar nutchwax-job-0.11.0-SNAPSHOT.jar class org.apache.nutch.segment.SegmentMerger</pre> </p> <p> Run the following to see the usage: -<pre>$ ${HADOOP_HOME}/bin/hadoop jar nutchwax-job-0.11.0-SNAPSHOT.jar class org.apache.nutch.segment.SegmentMerger ~/tmp/crawl/segments_merged/ ~/tmp/crawl/segments/20070406155807-test/ ~/tmp/crawl/segments/20070406155856-test/</pre> +<pre>% ${HADOOP_HOME}/bin/hadoop jar nutchwax-job-0.11.0-SNAPSHOT.jar class org.apache.nutch.segment.SegmentMerger ~/tmp/crawl/segments_merged/ ~/tmp/crawl/segments/20070406155807-test/ ~/tmp/crawl/segments/20070406155856-test/</pre> </p> <p>If creating multiple indices, you may want to make use of the NutchWAX facility that runs a mapreduce job to farm out the multiple index merges across the cluster @@ -99,10 +99,10 @@ It takes an inputs directory and an outputs (The latter is usually not used). The inputs lists per line a job to run on a remote machine. Here is an example line from an input that would run an index merge of the directory <code>indexes-monday</code> into -<code>index-monday</index> using <code>/tmp</code> as working directory: +<code>index-monday</code> using <code>/tmp</code> as working directory: <pre> org.apache.nutch.indexer.IndexMerger -workingdir /tmp index-monday indexes-monday -</pre>. +</pre> </p> <p>In a similar fashion its possible to run multiple concurrent index sorts. Here is an example line from the inputs: This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |