From: <bi...@us...> - 2010-08-09 23:53:28
|
Revision: 3214 http://archive-access.svn.sourceforge.net/archive-access/?rev=3214&view=rev Author: binzino Date: 2010-08-09 23:53:22 +0000 (Mon, 09 Aug 2010) Log Message: ----------- Remove date from ConfigurableIndexingFilter as it is now handled by the DateIndexer. Modified Paths: -------------- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml Modified: tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml =================================================================== --- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2010-08-09 23:52:32 UTC (rev 3213) +++ tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2010-08-09 23:53:22 UTC (rev 3214) @@ -42,30 +42,14 @@ dest-key = src-key --> <name>nutchwax.filter.index</name> -<!-- <value> title:false:true:tokenized content:false:compress:tokenized site:false:false:untokenized - url:false:true:tokenized - digest:false:true:no - - collection:true:true:no_norms - date:true:true:no_norms type:true:true:no_norms length:false:true:no </value> ---> - <value> - title:false:true:tokenized - content:false:compress:tokenized - site:false:false:untokenized - url:false:true:tokenized - type:true:true:no_norms - date:false:true:no - length:false:true:no - </value> </property> <property> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2010-10-27 07:07:26
|
Revision: 3309 http://archive-access.svn.sourceforge.net/archive-access/?rev=3309&view=rev Author: binzino Date: 2010-10-27 07:07:20 +0000 (Wed, 27 Oct 2010) Log Message: ----------- Disable the import.content.limit. Modified Paths: -------------- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml Modified: tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml =================================================================== --- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2010-10-27 07:06:57 UTC (rev 3308) +++ tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2010-10-27 07:07:20 UTC (rev 3309) @@ -125,7 +125,7 @@ --> <property> <name>nutchwax.import.content.limit</name> - <value>1048576</value> + <value>-1</value> </property> <!-- Whether or not we store the full content in the segment's This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2010-10-27 16:13:32
|
Revision: 3312 http://archive-access.svn.sourceforge.net/archive-access/?rev=3312&view=rev Author: binzino Date: 2010-10-27 16:13:26 +0000 (Wed, 27 Oct 2010) Log Message: ----------- Store digest in index. Modified Paths: -------------- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml Modified: tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml =================================================================== --- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2010-10-27 07:08:09 UTC (rev 3311) +++ tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2010-10-27 16:13:26 UTC (rev 3312) @@ -47,6 +47,7 @@ content:false:compress:tokenized site:false:false:untokenized url:false:true:tokenized + digest:false:true:no type:true:true:no_norms length:false:true:no </value> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2010-10-28 00:50:41
|
Revision: 3314 http://archive-access.svn.sourceforge.net/archive-access/?rev=3314&view=rev Author: binzino Date: 2010-10-28 00:50:30 +0000 (Thu, 28 Oct 2010) Log Message: ----------- Enabled mime-type deduction via magic numbers. Modified Paths: -------------- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml Modified: tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml =================================================================== --- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2010-10-28 00:49:16 UTC (rev 3313) +++ tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2010-10-28 00:50:30 UTC (rev 3314) @@ -81,7 +81,7 @@ the Content-Type that is already in the (W)ARC file. --> <property> <name>mime.type.magic</name> - <value>false</value> + <value>true</value> <description>Defines if the mime content type detector uses magic resolution.</description> </property> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2010-10-28 04:32:38
|
Revision: 3322 http://archive-access.svn.sourceforge.net/archive-access/?rev=3322&view=rev Author: binzino Date: 2010-10-28 04:32:32 +0000 (Thu, 28 Oct 2010) Log Message: ----------- Replace per-format parsers with Tika. Modified Paths: -------------- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml Modified: tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml =================================================================== --- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2010-10-28 04:32:11 UTC (rev 3321) +++ tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2010-10-28 04:32:32 UTC (rev 3322) @@ -10,7 +10,7 @@ <!-- Add 'index-nutchwax' and 'query-nutchwax' to plugin list. --> <!-- Also, add 'parse-pdf' --> <!-- Remove 'urlfilter-regex' and 'normalizer-(pass|regex|basic)' --> - <value>protocol-http|parse-(text|html|pdf|msword|mspowerpoint|oo)|index-nutchwax|query-(basic|nutchwax)|summary-basic|scoring-nutchwax|urlfilter-nutchwax</value> + <value>protocol-http|parse-tika|index-nutchwax|query-(basic|nutchwax)|summary-basic|scoring-nutchwax|urlfilter-nutchwax</value> </property> <!-- This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <bi...@us...> - 2012-01-19 21:45:04
|
Revision: 3598 http://archive-access.svn.sourceforge.net/archive-access/?rev=3598&view=rev Author: binzino Date: 2012-01-19 21:44:58 +0000 (Thu, 19 Jan 2012) Log Message: ----------- Fix type-o. Modified Paths: -------------- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml Modified: tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml =================================================================== --- tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2012-01-19 21:44:43 UTC (rev 3597) +++ tags/nutchwax-0_13-JIRA-WAX-75/archive/src/nutch/conf/nutch-site.xml 2012-01-19 21:44:58 UTC (rev 3598) @@ -184,7 +184,7 @@ <property> <name>encodingdetector.charset.min.confidence</name> <value>1</value> - <description>A integer between 0-100 indicating minimum confidence value + <description>An integer between 0-100 indicating minimum confidence value for charset auto-detection. Any negative value disables auto-detection. </description> </property> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |