Update of /cvsroot/archive-access/archive-access/projects/nutch/src/plugin/parse-ext
In directory sc8-pr-cvs1.sourceforge.net:/tmp/cvs-serv30555/src/plugin/parse-ext
Modified Files:
plugin.xml
Log Message:
Merge 'mapred' branch into HEAD.
* .classpath
* project.properties
Update to point at new 0.8 nutch.
* build.xml
Merge in 'mapred'. Add job target.
* conf/nutch-site.xml
Cleanup. Removed unused properties or properties that have same values
as nutch-default.xml (Except 'searcher.dir' -- keeping that here because
we'll usually want to change it). Reordered so archive properties are
towards the end. Brought forward descriptions from nutch-default where
missing.
* conf/nutch-site.xml.template
Copy of nutch-site.xml but with the nutchwax defaults turned on.
* src/plugin/build.xml
Commented out parse-default.
* src/plugin/parse-ext/plugin.xml
Changed path to parse-pdf.sh.
* src/web/search.jsp
'mapred' update.
* bin/indexArcs.sh
* conf/ia-parse-plugins.xml
* lib/commons-codec-1.3.jar
* src/java/org/archive/access/nutch/ImportArcs.java
* src/java/org/archive/access/nutch/IndexArcs.java
Added.
* bin/arc2seg.sh
* src/java/org/archive/access/nutch/Arc2Segment.java
Removed.
Index: plugin.xml
===================================================================
RCS file: /cvsroot/archive-access/archive-access/projects/nutch/src/plugin/parse-ext/plugin.xml,v
retrieving revision 1.3
retrieving revision 1.4
diff -C2 -d -r1.3 -r1.4
*** plugin.xml 4 Jun 2005 01:01:32 -0000 1.3
--- plugin.xml 29 Nov 2005 21:43:43 -0000 1.4
***************
*** 24,28 ****
contentType="application/pdf"
pathSuffix="pdf"
! command="@PWD@/bin/parse-pdf.sh"
timeout="30"/>
--- 24,28 ----
contentType="application/pdf"
pathSuffix="pdf"
! command="bin/parse-pdf.sh"
timeout="30"/>
|