Menu

#1580 Please add zstd support

open
nobody
zstd (3)
5
2024-09-28
2022-08-08
Linda
No

RHEL 9 RPM files now use zstd compression. I cannot view them in 7zFM any more.

Discussion

  • Igor Pavlov

    Igor Pavlov - 2022-08-08

    I work for zstd support in 7-zip now.
    So next version of 7-zip probably will support it.
    Now you can try to use some external plugin with zstd support.

    The question to everyone:
    What exact 7z method ids in external plugins were used for zstd method in 7z archives?
    I know 4F71101 with 5 bytes properties in plugin of Tino Reichardt.
    Did anybody use zip-space id of 4015D?

    If somebody has zip or 7z archives with zstd method, please attach here small examples or write the link for downloading, and write what program did create these archives.
    It can help some debugging.

     

    Last edit: Igor Pavlov 2022-08-08
    • Ninimu

      Ninimu - 2022-08-08

      A test file can be found in this comment. It is a zip using Zstd method, created by WinZip.

      Zip-space id of 4015D should be correct because the method ID is 93 (as specified in APPNOTE.TXT - .ZIP File Format Specification).

      Note 1: Apparently PKWARE assigned method ID 20 for Zstd at first, but WinZip uses 93 for Zstd (WinZip is probably the first major tool to implement Zstd in Zip format), so PKWARE later changed the ID to 93 officially. Other open-source libraries and tools, such as libzip and libarchive, also uses ID 93.
      Note 2: cielavenir's 7-zip fork has Zstd support in Zip, and supports reading Zstd ZIPs using both the deprecated ID 20 and official ID 93, and writing Zstd ZIPs using ID 93.

       
      • Igor Pavlov

        Igor Pavlov - 2022-08-08

        We must decide what exact IDs we need to support zstd for 7z decoding, and what IDs and properties to use in 7z encoding with zstd.
        We can use 4F71101, 4015D or some another new id for zstd.
        I don't want to bloat 7-Zip code with many supported ids. So the best solution probably is single ID for zstd supported. Zip code in 7-zip will use zstd code internally. So zip-zstd in 7-zip will work even if 4015D will be not defined. 4015D just allows some external dll to work with current 7-zip that doesn't support zstd internally.

        If we use 4F71101, do we need 3/5 bytes properties of that 4F71101 method?
        https://github.com/mcmilk/7-Zip-zstd/blob/master/DOC/Methods-Extern.md
        Are they 5 bytes properties with version and level are really useful?
        And are there big number of 7z archives already created with that 4F71101 ID?
        Is it worth to care about compatibilty with these created 4F71101 archives?

        About zstd is zip. Who is responsible for specification?
        Do they limit supported zstd features?
        For example, test zip archive from that thread doesn't use XXH64 . So is allowed to use XXH64 in zip-zstd?
        Attached file contains zip with created by new 7-zip. Is it allowed to use such compression settings for zstd in zip?
        Winrar at least can unpack it.
        And does Winzip-zstd allow to change some settings for archive creation?
        Also can pkware programs create zip-zstd now?

         

        Last edit: Igor Pavlov 2022-08-08
        • Ninimu

          Ninimu - 2022-08-09

          WinZip 26 can extract your test file (a.zip) without any issues, and also based on the information from this page, I think they don't limit supported Zstd features.

          In WinZip I can choose what compression method to use in Zip (LZMA, XZ, Zstd, etc.), but I cannot choose the compression level, etc.

          I tried PKWARE's PKZIP for Windows 14.50.0010, but it does not support Zstd (shows "unknown compression method" error when extracting).

           
          👍
          1
    • Sam Tansy

      Sam Tansy - 2022-08-09

      What exact 7z method ids in external plugins were used for zstd method in 7z archives?
      I know 4F71101 with 5 bytes properties in plugin of Tino Reichardt.

      jinfeihan57/p7zip uses 4F71101 after mcmilk/7-Zip-zstd. If there are any archive then they will be made by either of them, which come down do 7-Zip-zstd as it's the source of plugin.

      BTW., what's wrong with that plugin? It' s already made and uses mainstream zstd. The only thing in my opinion is these plugins should be standalone, rather than embedded. Not everyone needs or want to use them. In fact minority do; most peolple don't.

       
    • h11p5g

      h11p5g - 2022-10-03

      Hi Igor,

      here is my test files: https://workupload.com/file/TAgu7pjptwp
      It was created on linux.
      I prefer one ID - jinfeihan57 with support -mlong

       
    • Michael

      Michael - 2022-10-22

      A bit late, but here's another archive sample, created using zip-rs rust crate (https://github.com/zip-rs/zip), with compression level 20

      Its not directly usable as an application, but here's a sample cli utility that uses it internally (slightly patched since the original is outdated, you'll need to compile it yourself) -> https://github.com/Zapeth/zippy

       

      Last edit: Michael 2022-10-22
  • Tino Reichardt

    Tino Reichardt - 2022-08-11

    Hello Igor,

    thank you for working on Zstandard inclusion into 7-Zip.

    To the Zstandard ZIP/ZIPX thing:
    In summer 2020 I wrote a E-Mail to WinZIP and PKWARE because of the different IDs.

    The answer of Winzip was:

    Hi Mr. Reichardt,

    Per our Development Team: zstd is only implemented in our Zipx format which created and that we define. PKware added it to the zip format. Therefore there is no conflict as ID=20 will only occur in Zip files and ID=93 will only occur in zipx files. WinZip will not create a zip file with ID=93 but however if PKWare is creating a Zipx file with ID=20 then you will have to contact pkware about correcting the PKWare implementation of the zipx format.

    But I think also, that reading should be implemented for ID 20 and ID 93... and the writing of Zstd ZIPs should be done by using ID 93.

    For 7-Zip ZS with method ID 4F71101 with 5 bytes properties... my version was defined this way:

     Byte _ver_major;   // currently 1
     Byte _ver_minor;   // currently 2
     Byte _level;       // currently 1..22 or 33..MaxFastLevel
     Byte _reserved[2]; // not given in 3 byte header
    

    So a 3 or 5 byte header is required for this .7z method. But you may choose a new ID without that extra header for the official Zstandard support in 7-Zip.

    I will abandon my 7-Zip ZS fork.... but it would be nice, when you can implement the extracting the method 4F71101 with 3 or 5 byte header also.

    PS: You can reuse my code of 7-Zip ZS under a public domain license when you want.

     
    • Igor Pavlov

      Igor Pavlov - 2022-08-11

      Why did you require 3/5 bytes properties in decoder, if the decoder doesn't use it?
      Why you don't allow no_properties archives in decoder now?
      Is it for compatibility with old zstd?
      But maybe it was simpler to use new id, when zstd has changed format and zstd signature from 0xFD2FB527 to 0xFD2FB528.
      That way we still could decode any stream without properties.

      Also I think that I can change default presets of zstd in some cases, for example, if user changes the dictionary size in GUI.
      So the property level actually will not be too informative in properties in these cases.

      So I'm still thinking for No-Properties ID.

      Also there are some implementation related things:
      I think I can use ZSTD_CCtx_setPledgedSrcSize() for zstd and zip formats, and ZSTD_c_srcSizeHint for 7z format compression.
      PledgedSrcSize can write uncompressed size to zstd header that is the only way to detect uncompressed size without decoding archive.
      The idea with PledgedSrcSize in archive is to be more byte-to-byte compatible with original zstd program that uses PledgedSrcSize.

      Archive with PledgedSrcSize without content-size field and srcSizeHint also give almost same result in many cases, except of some corner cases for 1 KiB, 2 KiB, 4 KiB, ... files, where srcSizeHint version writes additional empty block of 3 bytes at the end.

      A possible problem is that when we provide PledgedSrcSize or srcSizeHint for multiple files, zstd can require new reallocation for structures for each new stream. And probably same reallocation problem for decoding multiple streams. I hope it will not hit the performance.

       

      Last edit: Igor Pavlov 2022-08-11
      • Yann Collet

        Yann Collet - 2022-08-15

        Hi Igor,
        I'm the main author of Zstandard,
        and will try to help here for questions related to libzstd .

        • level : this information is not needed by the decoder. Whatever compression level is employed, the decoder will interpret the produced data correctly. This is similar in design to most LZ compressors available.

        • zstd's signature (also called magic number) : The Zstandard format is officially stable since v1.0, released in September 2016. Technically, even older v0.8.x versions are compatible, but we don't underline it, for simplicity. This stability is further entrenched by the publication of an IETF RFC : https://datatracker.ietf.org/doc/html/rfc8878 .
          In this document, the signature is clearly stated as 0xFD2FB528:

        Magic_Number:  4 bytes, little-endian format.  Value: 0xFD2FB528.
        

        That's the only signature I would suggest to support. All other numbers were short-lived development versions, which lasted typically a couple of months each. At this point in time, I would suggest to not bother supporting them. At this moment, libzstd, when compiled with its default build flags, supports decompressing v0.5, v0.6 and v0.7 development formats, but for the record, we plan on removing these last traces in a future v1.6.0 version of libzstd, leaving only the official v1.0. All versions after v1.0 support the same format, and are therefore both backward and forward compatible, for both write and read.

        • Note on internal structures within zstd compression state : you seem to have already a pretty good understanding of how this works. Yes, PledgedSrcSize and srcSizeHint will have a direct impact on how much resources are needed to compress next input. Yes, if the current amount of memory allocated for zstd compression state is not large enough, it will be resized. This is technically some work, and therefore can slow down processing. However, this effect is so small that it only matters for very small files. Moreover, while zstd reallocate whenever it needs more memory, it is more conservative when less memory is needed, and will try to re-use already allocated memory, thus avoiding another free/malloc round-trip, and generally escaping even initialization stage, which ends up saving a significant amount of work when repetitively compressing small data. zstd will only size down its memory usage once it's convinced that it's really reserving too much memory for too long, i.e. there was a one-off use case employing a lot of memory for one large file, but then, all other jobs are about small files and require less memory : then it will size down.
          I suspect that none of these effects will have a tangible impact in an archiver use case, though maybe there are limit scenarios that could benefit from it. In data bases or serializer scenarios though, it happens all the time, and that's where these optimizations make sense.
         
        👍
        2

        Last edit: Yann Collet 2022-08-15
      • Jeff Han

        Jeff Han - 2022-08-23

        _level is not used in decode.
        But I think _ver_major and _ver_minoris needed. The version can determine whether decompression is supported

         
  • Anonymous

    Anonymous - 2023-02-04

    Hi Igor. I don't know if it's the same thing or what. But i'd like to ask you to add Tar Zstandard (tar.zst tzst) for decompression, it's widely used now for Linux and cross-platform software (manuals, sources, packages) so I have to use bsdtar (libarchive ) Windows version to unpack them. So I think If you've started to work with Zstandard maybe you could add this type of format.
    Best Regards.

     
  • csm10495

    csm10495 - 2023-07-04

    Any updates on this front? I just successfully used: https://github.com/mcmilk/7-Zip-zstd but it would be nice if this was integrated into regular 7-zip.

     
  • Igor Pavlov

    Igor Pavlov - 2023-07-04

    I'll add zstd extraction (.zst files) feature in near versions of 7-zip.
    The compressing to zstd is more complicated thing. There are many problems to solve with that zstd code, including compiler compatibility and another problems.

     

    Last edit: Igor Pavlov 2023-07-04
  • Uri

    Uri - 2023-10-05

    Do you plan to add multimedia compression (i.e. jpeg, wav & mp3). I saw that there are open source wavpack & packJPG. I have no idea about mp3. Maybe I am wrong but I believe that WinZip use open source compression methods for their zipx format, Thank you in advance

     
  • Igor Pavlov

    Igor Pavlov - 2023-10-05

    I have no plans for jpeg and mp3 codecs now.
    Unpacked wav is supported by Delta filter now.
    That Delta filter doesn't provide best compression ratio, but it's still better than compressing without Delta filter.

     

    Last edit: Igor Pavlov 2023-10-05
  • soldatovaua

    soldatovaua - 2023-10-07

    Sorry, please, for anxiety.
    Tell me, when is it planned to release version 7-Zip with zstd support?
    Thank you!

     
  • JK

    JK - 2023-12-28

    7zip on Linux got some cool features. Would be nice to have zstd available and supported, so I can get rid of tar :)

     
  • J7N

    J7N - 2024-04-23

    Maybe 7-Zip could try to apply a delta filter on all data, with a switch. Compute something like a "histogram" and check if the distribution of values became more favorable after delta, with small values around 0. Also a simple 16-bit delta would further improve compression.

    Plain WAV files are uncommon in archives. But sound and graphics data occurs within games, part of bigger files. Old games still exist; sometimes DDS is uncompressed even today. Competing archivers can generally handle this without knowing anything about the format. Sometimes this hurts overall compression though if the input contains repetitions, so there is always a checkbox for multimedia.

    Delta compression still survives in WinRAR even after they declared multimedia obsolete, and works fast.

    I'm pleased to see that the speed of "delta:4 Copy" has improved between v17 and v23 by around 30%.

     
  • mirh

    mirh - 2024-09-28

    This landed in version 24.01 and can be closed.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.