[Archive-access-cvs] SF.net SVN: archive-access:[2980] trunk/archive-access/projects/nutchwax/ arch

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Revision: 2980
          http://archive-access.svn.sourceforge.net/archive-access/?rev=2980&view=rev
Author:   binzino
Date:     2010-03-18 22:43:04 +0000 (Thu, 18 Mar 2010)

Log Message:
-----------
Updated for NW 0.13.

Modified Paths:
--------------
    trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt

Modified: trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt
===================================================================

--- trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt	2010-03-18 22:40:39 UTC (rev 2979)
+++ trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt	2010-03-18 22:43:04 UTC (rev 2980)
@@ -1,57 +1,56 @@
 
 RELEASE-NOTES.TXT
-2009-05-05
+2010-02-13
 Aaron Binns
 
-Release notes for NutchWAX 0.12.4
+Release notes for NutchWAX 0.13
 
 For the most recent updates and information on NutchWAX,
 please visit the project wiki at:
 
-  http://webteam.archive.org/confluence/display/search/NutchWAX
+  http://webarchive.jira.com/wiki/display/search/NutchWAX
 
-
 ======================================================================
 Overview
 ======================================================================
 
-NutchWAX 0.12.4 contains numerous enhancements and fixes to 0.12.3
+NutchWAX 0.13 is an update of NutchWAX code the Nutch 1.0
+release.
 
-  o Option to omit storing of content during import.
-  o Support for per-collection segments in master/slave config.
-  o Additional diagnostic/log messages to help troubleshoot common
-    deployment mistakes.
-  o PageRankDb similar to LinkDb but only keeping inlink counts.
-  o Improved paging through results, handling "paging past the end".
+This release also allows for field values to be stored in the index in
+compressed form.  Simply change the field storage specification in the
+'nutchwax.filter.index' property from "true" to "compress".  
 
+For example,
 
+<property>
+  <name>nutchwax.filter.index</name>
+  <value>
+    title:false:true:tokenized
+    content:false:compress:tokenized
+    ...
+  </value>
+</property>
+
+This stores the entire content field in the Lucene index, using
+compression.
+
 ======================================================================
 Issues
 ======================================================================
 
 For an up-to-date list of NutchWAX issues:
 
-  http://webteam.archive.org/jira/browse/WAX
+  http://webarchive.jira.com/browse/WAX
 
 Issues resolved in this release:
 
-WAX-27 Sensible output for requesting page of results past the end.
+WAX-74  Add support for storing fields in compressed form.
 
-WAX-34 Add option to omit storing of content in segment
+WAX-73  Change default value of searcher.fieldcache in nutch-site.xml to 'false'
 
-WAX-35 Add pagerankdb similar to linkdb but which only keeps counts
-       rather than actual inlinks.
+WAX-72  Simply build system to copy NW files into Nutch dirs and use Nutch build.xml
 
-WAX-36 Some additional diagnostics on connecting results to segments
-       and snippets would be very helpful.
+WAX-71  NutchWAX-required libraries not included in nutch-1.0.job
 
-WAX-37 Per-collection segments not supported in distributed
-       master-slave configuration.
-
-WAX-38 Build omits neessary libraries from .job file.
-
-WAX-39 Write more efficient, specialized segment parse_text merging.
-
-WAX-41 Option to enable/disable the FIELDCACHE in the Nutch IndexSearcher
-
-WAX-42 Add option to continue importing if an arcfile cannot be read.
+WAX-69  Class not found when importing within a Hadoop MR job.


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.




[Archive-access-cvs] SF.net SVN: archive-access:[2980] trunk/archive-access/projects/nutchwax/ arch

[Archive-access-cvs] SF.net SVN: archive-access:[2980] trunk/archive-access/projects/nutchwax/ archive/RELEASE-NOTES.txt