From: Michael S. <sta...@us...> - 2005-09-15 18:23:06
|
Update of /cvsroot/archive-access/archive-access/projects/nutch/bin In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv9560/bin Modified Files: nutch Log Message: * bin/nutch Call the nutchwax merge. * src/java/org/archive/access/nutch/NutchwaxIndexMerger.java Adds being able to pass dir of segments (For Dan). Index: nutch =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/nutch/bin/nutch,v retrieving revision 1.1 retrieving revision 1.2 diff -C2 -d -r1.1 -r1.2 *** nutch 5 Sep 2005 20:04:31 -0000 1.1 --- nutch 15 Sep 2005 18:22:53 -0000 1.2 *************** *** 145,149 **** CLASS=org.apache.nutch.indexer.IndexSegment elif [ "$COMMAND" = "merge" ] ; then ! CLASS=org.apache.nutch.indexer.IndexMerger elif [ "$COMMAND" = "dedup" ] ; then CLASS=org.apache.nutch.indexer.DeleteDuplicates --- 145,152 ---- CLASS=org.apache.nutch.indexer.IndexSegment elif [ "$COMMAND" = "merge" ] ; then ! # Use the nutchwax merger. It adds being able to take a dir of segments. ! # TODO: Make this a subclass rather than a copy. Looks like I can. But ! # am in a bit of hurry at moment. ! CLASS=org.archive.access.nutch.NutchwaxIndexMerger elif [ "$COMMAND" = "dedup" ] ; then CLASS=org.apache.nutch.indexer.DeleteDuplicates *************** *** 153,159 **** CLASS=org.apache.nutch.tools.UpdateSegmentsFromDb elif [ "$COMMAND" = "mergesegs" ] ; then ! # Copy over the nutchwax version of segment merge. ! # It will work w/ segments made by nutchwax. Also ! # does not do a merge. CLASS=org.archive.access.nutch.NutchwaxSegmentMergeTool elif [ "$COMMAND" = "readdb" ] ; then --- 156,161 ---- CLASS=org.apache.nutch.tools.UpdateSegmentsFromDb elif [ "$COMMAND" = "mergesegs" ] ; then ! # Use the merge fom nutchwax. It doesn't expect content to be in place ! # and it disables deduping. CLASS=org.archive.access.nutch.NutchwaxSegmentMergeTool elif [ "$COMMAND" = "readdb" ] ; then |