Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

LZMA1+BCJ made by 7-Zip

ggcue
2011-12-05
2013-05-30
  • ggcue
    ggcue
    2011-12-05

    Hi,

    When I was making 7-Zip reader for libarchive I found liblzma couldn't extract all the data
    LZMA1+BCJ made by 7-Zip: liblzma didn't return  the last four or three bytes of the data,
    and  also LZMA_STREAM_END.  I examined and soon understood that liblzma needs
    EOPM or the extracted size to find out the end of the data and return LZMA_STREAM_END;
    moreover LZMA_FILTER_X86 doesn't handle less five bytes data if I could have an understanding.
    It seems the data 7-Zip made does not have EOPM and there is no way to tell liblzma
    the extracted size when using lzma_raw_decoder(), so that is, I think, why liblzma couldn't
    extract all the data of LZMA1+BCJ made by 7-Zip.

    Is there any way to resolve it or, in the future,  will I be able to tell liblzma the extracted size when
    using lzma_raw_decoder() ?

     
  • Igor Pavlov
    Igor Pavlov
    2011-12-05

    Why BCJ changes something?
    Try same without BCJ.

     
  • Lasse Collin
    Lasse Collin
    2011-12-06

    I think there's no way to do it with liblzma right now. The raw decoder API works only with streams that have end of payload/stream marker. The raw stream APIs don't support much else than what would be valid inside a .xz file.

    The .lzma file decoder (lzma_alone_decoder) works with files that have a known size in the header and no end marker. It's handled as a special case internally and is not exported to raw decoder API; maybe it should be.

    If you want full support for .7z files, it might be better to use LZMA SDK or 7-Zip source code. liblzma doesn't include BCJ2 or PPMD code, so unless you take those algorithms elsewhere, you won't be able to decompress many typical .7z files. At least BCJ2 is pretty common in .7z files.

    ipavlov: In liblzma, the BCJ decoder won't give the last bytes before LZMA1 has told the BCJ decoder that the end of the LZMA1 stream has been reached. With plain LZMA1 one gets the last bytes, but liblzma doesn't support decoding plain BCJ as a separate step, because BCJ doesn't support an end marker and it's not allowed inside .xz either. The raw APIs in liblzma simply are quite limited.

     
  • ggcue
    ggcue
    2011-12-07

    I thought it would be nice of liblzma to support the API by which I could give an uncompressed size to the raw decoder.

    The libarchive project prefers to use  liblzma(and also libz and libbz2) as much as possible. so I have brought BCJ code from
    LZMA SDK and used it with liblzma only when decoding LZMA1 + BCJ. libarchive has already had PPMd code for RAR reader
    and I can use it.

     
  • Lasse Collin
    Lasse Collin
    2011-12-08

    Yes, it would be nice. The raw coding API also shouldn't impose .xz's restrictions on raw streams.

    The simplest way could be to make LZMA1 decoder see when the caller has specified LZMA_FINISH. After decoding all the input, it would check if the decoder is in a state where a valid stream (without end marker) might end. Similar LZMA_FINISH check could make it possible to allow plain BCJ decoder in the raw decoder API.

    I will keep this in mind. There are many somewhat small things like this on my to-do queue already.