Revision: 2403
http://archive-access.svn.sourceforge.net/archive-access/?rev=2403&view=rev
Author: binzino
Date: 2008-07-03 13:37:17 -0700 (Thu, 03 Jul 2008)
Log Message:
-----------
Initial revision.
Added Paths:
-----------
trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt
Added: trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt
===================================================================
--- trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt (rev 0)
+++ trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt 2008-07-03 20:37:17 UTC (rev 2403)
@@ -0,0 +1,62 @@
+
+RELEASE-NOTES.TXT
+2007-07-03
+Aaron Binns
+
+Release notes for NutchWAX 0.12
+
+For the most recent updates and information on NutchWAX,
+please visit the project wiki at:
+
+ http://webteam.archive.org/confluence/display/search/NutchWAX
+
+
+======================================================================
+Overview
+======================================================================
+
+NutchWAX 0.12-beta-1 was released on June 2, 2008. We anticipated
+releasing another beta mid-June with bug fixes and some minor
+enhancements based on feedback from the community.
+
+During internal testing by the Internet Archive Web Team, a few
+serious problems were found, the most critical being the failure to
+store different copies of the same URL when importing large batches of
+archive files.
+
+The NutchWAX team canceled the mid-month release in order to focus on
+fixing this problem.
+
+The good news is that not only has that problem been fixed, but the
+solution is part of a broader enhancement to manage the de-duplication
+of archive contnet during import and indexing.
+
+For more details on de-duplication in NutchWAX, please see
+
+ HOWTO-dedup.txt
+ README-dedup.txt
+
+
+======================================================================
+Issues
+======================================================================
+
+For an up-to-date list of NutchWAX issues:
+
+ http://webteam.archive.org/jira/browse/WAX
+
+Issues resolved in this release:
+
+WAX-9 Entire file not imported
+WAX-8 Investigate why so many PDFs fail to parse
+
+ Fixing the first one caused nearly all of the PDF parsing errors to
+ disappear.
+
+WAX-7 Change config to that URL filters are not applied during link inversion
+
+ This is easily achieved by using command-line options when invoking
+ the Nutch "invertlinks" command.
+
+WAX-3 Observe content size limit on importing
+WAX-2 Date queries cause TooManyClauses exceptions
This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.
|