From: James B. <jk...@sa...> - 2012-09-21 13:06:06
|
On Fri, Sep 21, 2012 at 03:50:14PM +0400, Artem Tarasov wrote: > In fact, if you look at the documentation of Zlib library ( > http://www.zlib.net/manual.html#Utility), > it mentions a function compressBound(ulong) that returns the upper bound of > compressed block size. > So anything less or equal to max { v | compressBound(v) <= 65536 } would do. Thanks I hadn't noticed that, although it's at odds with the RFC, maybe due to headers. uLong ZEXPORT compressBound (sourceLen) uLong sourceLen; { return sourceLen + (sourceLen >> 12) + (sourceLen >> 14) + (sourceLen >> 25) + 13; } Whereas from RFC 1951: "A simple counting argument shows that no lossless compression algorithm can compress every possible input data set. For the format defined here, the worst case expansion is 5 bytes per 32K-byte block, i.e., a size increase of 0.015% for large data sets." 5 bytes every 32K is 1+4 every 32K or (sourceLen >> 13) + (sourceLen >> 15). The compressBound seems to indicate it's ~10 every 32K and some for the header. Anwyay it's better to err on the side of caution and trust the more conservative zlib version instead. That implies 65477 as max, I think... James -- James Bonfield (jk...@sa...) | Hora aderat briligi. Nunc et Slythia Tova | Plurima gyrabant gymbolitare vabo; A Staden Package developer: | Et Borogovorum mimzebant undique formae, https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |