Home
Name Modified Size InfoDownloads / Week
LanguageModels 2013-05-09
OldVersions 2013-02-21
README 2013-05-09 3.9 kB
ziprec-1.00gamma.zip 2013-05-09 1.1 MB
ziprec-1.00beta.zip 2013-02-21 1.1 MB
ziprec-0.9k-patch1.zip 2013-02-21 1.2 MB
Totals: 6 Items   3.4 MB 1
The ZIP archives in this directory contain source code only.  Due to
their size, the language models for reconstruction are located in the
subdirectory LanguageModels and the language identification models are
not included with this release (get them from the Language-Aware
Strings project, https://sourceforge.net/projects/la-strings/files/).

RECENT CHANGES
==============

v1.00gamma 2013-05-07:
   Hotspot optimization reduced reconstruction time by about 25%.
   Avoiding recomputation of n-gram scores during incremental updates
     when the original computation did not contribute to a wildcard's
     overall score increased the speed-up to 35% relative to
     v1.00beta.
   Changed scoring function to eliminate an exp() in the innermost
     loop, increasing the speed-up to 50+% relative to v1.00beta with
     virtually identical reconstruction accuracy.
   Made "aggressive inference" (periodically assigning replacements
     for all wildcards with highly-skewed score distributions) the
     default, as it proved to improve both reconstruction accuracy and
     run time.  Reversed the sense of the -r^ flag to allow the user
     to disable it.
   Initial implementation of a word-length model for automatically
     detecting DEFLATE stream corruption; added -r:l flag to enable its
     use.  This approach proved unsuccessful in detecting corruption.
   Restored word-unigram model code from v0.9 and adapted it for use
     in detecting corruption; added -r:w to control its use. 
   Fixed segfault while verifying a candidate RAR header when the
     header-size field produces a header size which extends beyond the
     end of the input file.
   Fixed test-mode reference matching to correctly handle a
     within-packet corruption when re-alignment across corruption is
     disabled.
   Ensure proper display of multiple newlines in HTML mode.

v1.00beta 2013-02-13:
   Initial implementation of first phase of packet-end recovery.
     Search proved to be intractible in the general case, but usable
     when the Huffman trees are known (e.g. corruption in the middle
     of a packet).
   Implemented recovery of packets with corruption in the middle,
     including a search to re-align the decompressed data such that
     back-references across the corrupt region refer to the correct
     bytes.
   Added handling of zlib-style sync/flush markers as additional
     headers for finding DEFLATEd data.
   Refactored recovery code to use a list of DEFLATE packets,
     permitting multiple packets to contain corruption and enabling a
     user-specified corruption range in each packet.  Updated -t flag
     to permit an arbitrary range of up to 4096 bytes in the first
     packet to be designated as "corrupt" for testing purposes.
   Tweaked HTML-mode output formatting and added a key to the start of
     the file to remind users of the color coding.
   Switched storage of DecodedByte in files from three bytes to four
     bytes in preparation for extension of reconstruction code to
     other LZ77-based compression algorithms.   
   Extended search for reconstruction language models to look in the
     current directory, a "models" subdirectory, the directory
     containing the language identification database, and a
     system-wide directory, e.g. /usr/share/ziprec/.
   Updated valgrind header files to valgrind-3.7.0.
   Fixes for GCC 4.6.3 warnings.
   Added scripts for running evaluations on Europarl corpus.

v1.00alpha 2012-04-03:
   Complete re-write of reconstruction code, now using longer n-grams
     and eliminating the word-based reconstruction.  This removes the
     need to have a word-splitter that works on any given character
     encoding and improves reconstruction of whitespace and
     punctuation.  The new reconstruction method is also three to five
     times faster with the same or better accuracy.
   Removed ziprec -r- option.

Source: README, updated 2013-05-09