From: <bi...@us...> - 2008-07-03 20:37:13
|
Revision: 2403 http://archive-access.svn.sourceforge.net/archive-access/?rev=2403&view=rev Author: binzino Date: 2008-07-03 13:37:17 -0700 (Thu, 03 Jul 2008) Log Message: ----------- Initial revision. Added Paths: ----------- trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt Added: trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt =================================================================== --- trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt (rev 0) +++ trunk/archive-access/projects/nutchwax/archive/RELEASE-NOTES.txt 2008-07-03 20:37:17 UTC (rev 2403) @@ -0,0 +1,62 @@ + +RELEASE-NOTES.TXT +2007-07-03 +Aaron Binns + +Release notes for NutchWAX 0.12 + +For the most recent updates and information on NutchWAX, +please visit the project wiki at: + + http://webteam.archive.org/confluence/display/search/NutchWAX + + +====================================================================== +Overview +====================================================================== + +NutchWAX 0.12-beta-1 was released on June 2, 2008. We anticipated +releasing another beta mid-June with bug fixes and some minor +enhancements based on feedback from the community. + +During internal testing by the Internet Archive Web Team, a few +serious problems were found, the most critical being the failure to +store different copies of the same URL when importing large batches of +archive files. + +The NutchWAX team canceled the mid-month release in order to focus on +fixing this problem. + +The good news is that not only has that problem been fixed, but the +solution is part of a broader enhancement to manage the de-duplication +of archive contnet during import and indexing. + +For more details on de-duplication in NutchWAX, please see + + HOWTO-dedup.txt + README-dedup.txt + + +====================================================================== +Issues +====================================================================== + +For an up-to-date list of NutchWAX issues: + + http://webteam.archive.org/jira/browse/WAX + +Issues resolved in this release: + +WAX-9 Entire file not imported +WAX-8 Investigate why so many PDFs fail to parse + + Fixing the first one caused nearly all of the PDF parsing errors to + disappear. + +WAX-7 Change config to that URL filters are not applied during link inversion + + This is easily achieved by using command-line options when invoking + the Nutch "invertlinks" command. + +WAX-3 Observe content size limit on importing +WAX-2 Date queries cause TooManyClauses exceptions This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |