Update of /cvsroot/archive-access/archive-access/projects/nutch/bin
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9560/bin
Modified Files:
nutch
Log Message:
* bin/nutch
Call the nutchwax merge.
* src/java/org/archive/access/nutch/NutchwaxIndexMerger.java
Adds being able to pass dir of segments (For Dan).
Index: nutch
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/nutch/bin/nutch,v
retrieving revision 1.1
retrieving revision 1.2
diff -C2 -d -r1.1 -r1.2
*** nutch 5 Sep 2005 20:04:31 -0000 1.1
--- nutch 15 Sep 2005 18:22:53 -0000 1.2
***************
*** 145,149 ****
CLASS=org.apache.nutch.indexer.IndexSegment
elif [ "$COMMAND" = "merge" ] ; then
! CLASS=org.apache.nutch.indexer.IndexMerger
elif [ "$COMMAND" = "dedup" ] ; then
CLASS=org.apache.nutch.indexer.DeleteDuplicates
--- 145,152 ----
CLASS=org.apache.nutch.indexer.IndexSegment
elif [ "$COMMAND" = "merge" ] ; then
! # Use the nutchwax merger. It adds being able to take a dir of segments.
! # TODO: Make this a subclass rather than a copy. Looks like I can. But
! # am in a bit of hurry at moment.
! CLASS=org.archive.access.nutch.NutchwaxIndexMerger
elif [ "$COMMAND" = "dedup" ] ; then
CLASS=org.apache.nutch.indexer.DeleteDuplicates
***************
*** 153,159 ****
CLASS=org.apache.nutch.tools.UpdateSegmentsFromDb
elif [ "$COMMAND" = "mergesegs" ] ; then
! # Copy over the nutchwax version of segment merge.
! # It will work w/ segments made by nutchwax. Also
! # does not do a merge.
CLASS=org.archive.access.nutch.NutchwaxSegmentMergeTool
elif [ "$COMMAND" = "readdb" ] ; then
--- 156,161 ----
CLASS=org.apache.nutch.tools.UpdateSegmentsFromDb
elif [ "$COMMAND" = "mergesegs" ] ; then
! # Use the merge fom nutchwax. It doesn't expect content to be in place
! # and it disables deduping.
CLASS=org.archive.access.nutch.NutchwaxSegmentMergeTool
elif [ "$COMMAND" = "readdb" ] ; then
|