Menu

#2502 Possible DictionarySize bug in C# and Java SDKs

open
nobody
None
5
2024-08-12
2024-08-12
No

The specification says the following: If the value of dictionary size in properties is smaller than (1 << 12), the LZMA decoder must set the dictionary size variable to (1 << 12).

The C++ code follows this, but C# and Java versions have a separate check field, which is not set to the minimum size correctly. This only affects the rep0 >= m_DictionarySizeCheck check.

The bug is in the decoder SetDictionarySize.

For C# version I think the function should look like this:
- Remove the m_DictionarySizeCheck field (I don't really see why it's necessary to be separate)

void SetDictionarySize(uint dictionarySize)
{
    if (m_DictionarySize != dictionarySize)
    {
        m_DictionarySize = Math.Max(dictionarySize, (1 << 12));
        m_OutWindow.Create(m_DictionarySize);
    }
}

I'm thinking the m_DictionarySizeCheck = Math.Max(m_DictionarySize, 1); line was supposed to be m_DictionarySizeCheck = Math.Max(m_DictionarySize, 1 << 12);

Discussion

  • Igor Pavlov

    Igor Pavlov - 2024-08-12

    C# and Java code of LZMA is old (2005-2008). And it was not updated from that time.
    The (1 << 12) limit was selected later for C code (for speed optimization) and later it was inserted to LZMA specification.
    I'm not sure that I want to change old C# and Java now.

     
  • Pavel Djundik

    Pavel Djundik - 2024-08-12

    The window is created correctly with the limit in mind, it's just that the rep0 check can fail looking at the code.

    Is it possible for it to fail in practice?

     
  • Igor Pavlov

    Igor Pavlov - 2024-08-12

    If encoder writes correct value to properties, then all decoders (C C#, JAVA) must work same way for non-corrupted streams (as I suppose).

     
  • Pavel Djundik

    Pavel Djundik - 2024-08-12

    If a dict size is always over 1<<12 in the compressed stream, then yeah.

     
  • Igor Pavlov

    Igor Pavlov - 2024-08-12

    Smaller dictionary is not problem also.
    All decoders will unpack correct streams.
    The difference for corrupted streams only.

     

Log in to post a comment.