From: <bi...@us...> - 2008-07-28 19:29:07
|
Revision: 2505 http://archive-access.svn.sourceforge.net/archive-access/?rev=2505&view=rev Author: binzino Date: 2008-07-28 19:29:16 +0000 (Mon, 28 Jul 2008) Log Message: ----------- Added length metadata field to list of indexed fields. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml Modified: trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml =================================================================== --- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-07-26 15:47:56 UTC (rev 2504) +++ trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-07-28 19:29:16 UTC (rev 2505) @@ -52,6 +52,7 @@ collection date type + length </value> </property> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-12-11 22:21:49
|
Revision: 2659 http://archive-access.svn.sourceforge.net/archive-access/?rev=2659&view=rev Author: binzino Date: 2008-12-11 22:21:44 +0000 (Thu, 11 Dec 2008) Log Message: ----------- Added proprty for per-collection segments. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml Modified: trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml =================================================================== --- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-12-10 05:02:19 UTC (rev 2658) +++ trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-12-11 22:21:44 UTC (rev 2659) @@ -134,4 +134,14 @@ <value>1048576</value> </property> +<!-- Enable per-collection segment sub-dirs, e.g. + segments/<collectionId>/segment1 + /segment2 + ... + --> +<property> + <name>nutchwax.FetchedSegments.perCollection</name> + <value>true</value> +</property> + </configuration> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-12-15 02:19:55
|
Revision: 2665 http://archive-access.svn.sourceforge.net/archive-access/?rev=2665&view=rev Author: binzino Date: 2008-12-15 02:19:53 +0000 (Mon, 15 Dec 2008) Log Message: ----------- Added some property values which we commonly use in deployments. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml Modified: trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml =================================================================== --- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-12-15 01:47:48 UTC (rev 2664) +++ trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-12-15 02:19:53 UTC (rev 2665) @@ -144,4 +144,26 @@ <value>true</value> </property> -</configuration> +<!-- The following are over-rides of property values in + nutch-default which the Internet Archive uses in + most NutchWAX projects. --> + +<property> + <name>io.map.index.skip</name> + <value>32</value> +</property> + +<property> + <name>searcher.max.hits</name> + <value>1000</value> +</property> + +<property> + <name>searcher.summary.context</name> + <value>8</value> +</property> + +<property> + <name>searcher.summary.length</name> + <value>80</value> +</property> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-12-15 17:47:04
|
Revision: 2666 http://archive-access.svn.sourceforge.net/archive-access/?rev=2666&view=rev Author: binzino Date: 2008-12-15 17:47:01 +0000 (Mon, 15 Dec 2008) Log Message: ----------- Oops, fix bug where I accidentally removed closing tag in previous edit. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml Modified: trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml =================================================================== --- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-12-15 02:19:53 UTC (rev 2665) +++ trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-12-15 17:47:01 UTC (rev 2666) @@ -147,7 +147,6 @@ <!-- The following are over-rides of property values in nutch-default which the Internet Archive uses in most NutchWAX projects. --> - <property> <name>io.map.index.skip</name> <value>32</value> @@ -167,3 +166,5 @@ <name>searcher.summary.length</name> <value>80</value> </property> + +</configuration> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2008-12-16 02:43:28
|
Revision: 2669 http://archive-access.svn.sourceforge.net/archive-access/?rev=2669&view=rev Author: binzino Date: 2008-12-16 02:43:25 +0000 (Tue, 16 Dec 2008) Log Message: ----------- Removed Nutch OPIC scoring filter and replaced with NutchWAX PageRank scoring filter. Also added a comment about the HTTP code filter. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml Modified: trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml =================================================================== --- trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-12-16 02:42:20 UTC (rev 2668) +++ trunk/archive-access/projects/nutchwax/archive/conf/nutch-site.xml 2008-12-16 02:43:25 UTC (rev 2669) @@ -10,7 +10,7 @@ <!-- Add 'index-nutchwax' and 'query-nutchwax' to plugin list. --> <!-- Also, add 'parse-pdf' --> <!-- Remove 'urlfilter-regex' and 'normalizer-(pass|regex|basic)' --> - <value>protocol-http|parse-(text|html|js|pdf)|index-(basic|nutchwax)|query-(basic|site|url|nutchwax)|summary-basic|scoring-opic|urlfilter-nutchwax</value> + <value>protocol-http|parse-(text|html|js|pdf)|index-(basic|nutchwax)|query-(basic|site|url|nutchwax)|summary-basic|scoring-nutchwax|urlfilter-nutchwax</value> </property> <!-- The indexing filter order *must* be specified in order for @@ -115,6 +115,9 @@ <description>Implementation of URL canonicalizer to use.</description> </property> +<!-- Only pass URLs with an HTTP status in this range. Used by the + NutchWAX importer. + --> <property> <name>nutchwax.filter.http.status</name> <value> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |