From: <bi...@us...> - 2008-12-16 02:43:28
|
Revision: 2669 http://archive-access.svn.sourceforge.net/archive-access/?rev=2669&view=rev Author: binzino Date: 2008-12-16 02:43:25 +0000 (Tue, 16 Dec 2008) Log Message: ----------- Removed Nutch OPIC scoring filter and replaced with NutchWAX PageRank scoring filter. Also added a comment about the HTTP code filter. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml Modified: trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml =================================================================== --- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-12-16 02:42:20 UTC (rev 2668) +++ trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-12-16 02:43:25 UTC (rev 2669) @@ -10,7 +10,7 @@ <!-- Add 'index-nutchwax' and 'query-nutchwax' to plugin list. --> <!-- Also, add 'parse-pdf' --> <!-- Remove 'urlfilter-regex' and 'normalizer-(pass|regex|basic)' --> - <value>protocol-http|parse-(text|html|js|pdf)|index-(basic|nutchwax)|query-(basic|site|url|nutchwax)|summary-basic|scoring-opic|urlfilter-nutchwax</value> + <value>protocol-http|parse-(text|html|js|pdf)|index-(basic|nutchwax)|query-(basic|site|url|nutchwax)|summary-basic|scoring-nutchwax|urlfilter-nutchwax</value> </property> <!-- The indexing filter order *must* be specified in order for @@ -115,6 +115,9 @@ <description>Implementation of URL canonicalizer to use.</description> </property> +<!-- Only pass URLs with an HTTP status in this range. Used by the + NutchWAX importer. + --> <property> <name>nutchwax.filter.http.status</name> <value> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |