From: <bi...@us...> - 2010-03-18 22:43:10
|
Revision: 2980 http://archive-access.svn.sourceforge.net/archive-access/?rev=2980&view=rev Author: binzino Date: 2010-03-18 22:43:04 +0000 (Thu, 18 Mar 2010) Log Message: ----------- Updated for NW 0.13. Modified Paths: -------------- trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt Modified: trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt =================================================================== --- trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt 2010-03-18 22:40:39 UTC (rev 2979) +++ trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt 2010-03-18 22:43:04 UTC (rev 2980) @@ -1,57 +1,56 @@ RELEASE-NOTES.TXT -2009-05-05 +2010-02-13 Aaron Binns -Release notes for NutchWAX 0.12.4 +Release notes for NutchWAX 0.13 For the most recent updates and information on NutchWAX, please visit the project wiki at: - http://webteam.archive.org/confluence/display/search/NutchWAX + http://webarchive.jira.com/wiki/display/search/NutchWAX - ====================================================================== Overview ====================================================================== -NutchWAX 0.12.4 contains numerous enhancements and fixes to 0.12.3 +NutchWAX 0.13 is an update of NutchWAX code the Nutch 1.0 +release. - o Option to omit storing of content during import. - o Support for per-collection segments in master/slave config. - o Additional diagnostic/log messages to help troubleshoot common - deployment mistakes. - o PageRankDb similar to LinkDb but only keeping inlink counts. - o Improved paging through results, handling "paging past the end". +This release also allows for field values to be stored in the index in +compressed form. Simply change the field storage specification in the +'nutchwax.filter.index' property from "true" to "compress". +For example, +<property> + <name>nutchwax.filter.index</name> + <value> + title:false:true:tokenized + content:false:compress:tokenized + ... + </value> +</property> + +This stores the entire content field in the Lucene index, using +compression. + ====================================================================== Issues ====================================================================== For an up-to-date list of NutchWAX issues: - http://webteam.archive.org/jira/browse/WAX + http://webarchive.jira.com/browse/WAX Issues resolved in this release: -WAX-27 Sensible output for requesting page of results past the end. +WAX-74 Add support for storing fields in compressed form. -WAX-34 Add option to omit storing of content in segment +WAX-73 Change default value of searcher.fieldcache in nutch-site.xml to 'false' -WAX-35 Add pagerankdb similar to linkdb but which only keeps counts - rather than actual inlinks. +WAX-72 Simply build system to copy NW files into Nutch dirs and use Nutch build.xml -WAX-36 Some additional diagnostics on connecting results to segments - and snippets would be very helpful. +WAX-71 NutchWAX-required libraries not included in nutch-1.0.job -WAX-37 Per-collection segments not supported in distributed - master-slave configuration. - -WAX-38 Build omits neessary libraries from .job file. - -WAX-39 Write more efficient, specialized segment parse_text merging. - -WAX-41 Option to enable/disable the FIELDCACHE in the Nutch IndexSearcher - -WAX-42 Add option to continue importing if an arcfile cannot be read. +WAX-69 Class not found when importing within a Hadoop MR job. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |