From: Michael S. <sta...@us...> - 2005-11-29 21:43:54
|
Update of /cvsroot/archive-access/archive-access/projects/nutch/src/plugin/parse-ext In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30555/src/plugin/parse-ext Modified Files: plugin.xml Log Message: Merge 'mapred' branch into HEAD. * .classpath * project.properties Update to point at new 0.8 nutch. * build.xml Merge in 'mapred'. Add job target. * conf/nutch-site.xml Cleanup. Removed unused properties or properties that have same values as nutch-default.xml (Except 'searcher.dir' -- keeping that here because we'll usually want to change it). Reordered so archive properties are towards the end. Brought forward descriptions from nutch-default where missing. * conf/nutch-site.xml.template Copy of nutch-site.xml but with the nutchwax defaults turned on. * src/plugin/build.xml Commented out parse-default. * src/plugin/parse-ext/plugin.xml Changed path to parse-pdf.sh. * src/web/search.jsp 'mapred' update. * bin/indexArcs.sh * conf/ia-parse-plugins.xml * lib/commons-codec-1.3.jar * src/java/org/archive/access/nutch/ImportArcs.java * src/java/org/archive/access/nutch/IndexArcs.java Added. * bin/arc2seg.sh * src/java/org/archive/access/nutch/Arc2Segment.java Removed. Index: plugin.xml =================================================================== RCS file: /cvsroot/archive-access/archive-access/projects/nutch/src/plugin/parse-ext/plugin.xml,v retrieving revision 1.3 retrieving revision 1.4 diff -C2 -d -r1.3 -r1.4 *** plugin.xml 4 Jun 2005 01:01:32 -0000 1.3 --- plugin.xml 29 Nov 2005 21:43:43 -0000 1.4 *************** *** 24,28 **** contentType="application/pdf" pathSuffix="pdf" ! command="@PWD@/bin/parse-pdf.sh" timeout="30"/> --- 24,28 ---- contentType="application/pdf" pathSuffix="pdf" ! command="bin/parse-pdf.sh" timeout="30"/> |