Menu

#49 Rare behavior in deflate.c

v1.0 (example)
open
nobody
None
5
2016-03-13
2016-03-11
No

It is likely a harmless bug, but probably an unexpected behavior.
The variable match_start, in very rare cases, is less than WSIZE in deflate.c's fill_window().
How to see it:

1) add the line marked with ">>>" (line numbers of Zip 3.1c)

623      memcpy((char*)window, (char*)window+WSIZE, (unsigned)WSIZE);
624 >>>  if (match_start < WSIZE) { fprintf(stderr, "\n match_start = %d, WSIZE = %d", match_start, WSIZE); }
625      match_start -= WSIZE;

2) build and run Zip with the enwik8 benchmark (to be found here: http://mattmahoney.net/dc/enwik8.zip )

zip -9 zzz enwik8

you get at some point
match_start = 32765, WSIZE = 32768

As a result, match_start will go at the other end of the window on line 625...
Could in some cases wrong matches be possible ? Cannot say...

This behavior happens with Zip 2.32 (win32), Zip 3.1c (win32).
I guess it does in all platforms and in zlib as well.

Cheers
Gautier

Discussion

  • Steven Schweda

    Steven Schweda - 2016-03-13

    Thanks for the report. I've never studied this (old) code closely,
    so I know nothing, but...

    Yes, I can reproduce this behavior (with "-7", "-8", or "-9"), but
    it's not immediately clear that it's a bug. My eye was drawn to this
    comment (near line 413):


    • Set match_start to the longest match starting at the given string and
    • return its length. Matches shorter or equal to prev_length are discarded,
    • in which case the result is equal to prev_length and match_start is
    • garbage.

    "match_start" is an unsigned 32-bit variable, so setting it to
    32765 - 32768 means that it's set to 4294967293. If that value were
    used for anything, then I'd expect some seriously bad results somewhere.
    But no bad results (SIGSEGV, bad compressed data, ...) are seen (at
    least by me). So, my first guess (and hope) is that this is one of
    those "match_start is garbage" cases, and this (very bad-looking) value
    is never used for anything.

    How did you discover this overflow? Did you look to see if this
    overflowed value is ever used?

    We did get a report a while ago of bad expanded data from UnZip, but
    I can't recall anyone reporting bad compressed data from Zip. That
    proves nothing, of course, but I'd be a little amazed if the Deflate
    implementation were actually defective.

     
  • Gautier de Montmollin

    It is very likely a "match_start is garbage" case, but since the calls to fill_window() happen for feeding new data, that is, probabilistically speaking, independently of the search for matches in longest_match() and deflate() where prev_length and match_start are changed, it is difficult for me to be so sure, unless I try to understand the algorithm in detail... If we can't prove match_start >= WSIZE when match_start is not garbage, upon a call to fill_window(), perhaps someone clever could build a specially crafted input data that produces wrong output.
    Note that the original code was developped for MS-DOS and perhaps the type unsigned is 16 bit there, then the supposedly garbage index would be a valid position within the window.

    I came across the overflow after translating a few days ago deflate.c (more precisely, the variant with all conditional compilation variables undefined) to a programming language where you can define subranges for discrete types - in this case, a signed integer for which you specify it cannot be negative, typically used for array indexes.

     

Log in to post a comment.