I work for zstd support in 7-zip now.
So next version of 7-zip probably will support it.
Now you can try to use some external plugin with zstd support.
The question to everyone:
What exact 7z method ids in external plugins were used for zstd method in 7z archives?
I know 4F71101 with 5 bytes properties in plugin of Tino Reichardt.
Did anybody use zip-space id of 4015D?
If somebody has zip or 7z archives with zstd method, please attach here small examples or write the link for downloading, and write what program did create these archives.
It can help some debugging.
Last edit: Igor Pavlov 2022-08-08
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A test file can be found in this comment. It is a zip using Zstd method, created by WinZip.
Zip-space id of 4015D should be correct because the method ID is 93 (as specified in APPNOTE.TXT - .ZIP File Format Specification).
Note 1: Apparently PKWARE assigned method ID 20 for Zstd at first, but WinZip uses 93 for Zstd (WinZip is probably the first major tool to implement Zstd in Zip format), so PKWARE later changed the ID to 93 officially. Other open-source libraries and tools, such as libzip and libarchive, also uses ID 93.
Note 2: cielavenir's 7-zip fork has Zstd support in Zip, and supports reading Zstd ZIPs using both the deprecated ID 20 and official ID 93, and writing Zstd ZIPs using ID 93.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We must decide what exact IDs we need to support zstd for 7z decoding, and what IDs and properties to use in 7z encoding with zstd.
We can use 4F71101, 4015D or some another new id for zstd.
I don't want to bloat 7-Zip code with many supported ids. So the best solution probably is single ID for zstd supported. Zip code in 7-zip will use zstd code internally. So zip-zstd in 7-zip will work even if 4015D will be not defined. 4015D just allows some external dll to work with current 7-zip that doesn't support zstd internally.
If we use 4F71101, do we need 3/5 bytes properties of that 4F71101 method? https://github.com/mcmilk/7-Zip-zstd/blob/master/DOC/Methods-Extern.md
Are they 5 bytes properties with version and level are really useful?
And are there big number of 7z archives already created with that 4F71101 ID?
Is it worth to care about compatibilty with these created 4F71101 archives?
About zstd is zip. Who is responsible for specification?
Do they limit supported zstd features?
For example, test zip archive from that thread doesn't use XXH64 . So is allowed to use XXH64 in zip-zstd?
Attached file contains zip with created by new 7-zip. Is it allowed to use such compression settings for zstd in zip?
Winrar at least can unpack it.
And does Winzip-zstd allow to change some settings for archive creation?
Also can pkware programs create zip-zstd now?
WinZip 26 can extract your test file (a.zip) without any issues, and also based on the information from this page, I think they don't limit supported Zstd features.
In WinZip I can choose what compression method to use in Zip (LZMA, XZ, Zstd, etc.), but I cannot choose the compression level, etc.
I tried PKWARE's PKZIP for Windows 14.50.0010, but it does not support Zstd (shows "unknown compression method" error when extracting).
👍
1
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What exact 7z method ids in external plugins were used for zstd method in 7z archives?
I know 4F71101 with 5 bytes properties in plugin of Tino Reichardt.
jinfeihan57/p7zip uses 4F71101 after mcmilk/7-Zip-zstd. If there are any archive then they will be made by either of them, which come down do 7-Zip-zstd as it's the source of plugin.
BTW., what's wrong with that plugin? It' s already made and uses mainstream zstd. The only thing in my opinion is these plugins should be standalone, rather than embedded. Not everyone needs or want to use them. In fact minority do; most peolple don't.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
A bit late, but here's another archive sample, created using zip-rs rust crate (https://github.com/zip-rs/zip), with compression level 20
Its not directly usable as an application, but here's a sample cli utility that uses it internally (slightly patched since the original is outdated, you'll need to compile it yourself) -> https://github.com/Zapeth/zippy
thank you for working on Zstandard inclusion into 7-Zip.
To the Zstandard ZIP/ZIPX thing:
In summer 2020 I wrote a E-Mail to WinZIP and PKWARE because of the different IDs.
The answer of Winzip was:
Hi Mr. Reichardt,
Per our Development Team: zstd is only implemented in our Zipx format which created and that we define. PKware added it to the zip format. Therefore there is no conflict as ID=20 will only occur in Zip files and ID=93 will only occur in zipx files. WinZip will not create a zip file with ID=93 but however if PKWare is creating a Zipx file with ID=20 then you will have to contact pkware about correcting the PKWare implementation of the zipx format.
But I think also, that reading should be implemented for ID 20 and ID 93... and the writing of Zstd ZIPs should be done by using ID 93.
For 7-Zip ZS with method ID 4F71101 with 5 bytes properties... my version was defined this way:
Byte_ver_major;// currently 1Byte_ver_minor;// currently 2Byte_level;// currently 1..22 or 33..MaxFastLevelByte_reserved[2];// not given in 3 byte header
So a 3 or 5 byte header is required for this .7z method. But you may choose a new ID without that extra header for the official Zstandard support in 7-Zip.
I will abandon my 7-Zip ZS fork.... but it would be nice, when you can implement the extracting the method 4F71101 with 3 or 5 byte header also.
PS: You can reuse my code of 7-Zip ZS under a public domain license when you want.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Why did you require 3/5 bytes properties in decoder, if the decoder doesn't use it?
Why you don't allow no_properties archives in decoder now?
Is it for compatibility with old zstd?
But maybe it was simpler to use new id, when zstd has changed format and zstd signature from 0xFD2FB527 to 0xFD2FB528.
That way we still could decode any stream without properties.
Also I think that I can change default presets of zstd in some cases, for example, if user changes the dictionary size in GUI.
So the property level actually will not be too informative in properties in these cases.
So I'm still thinking for No-Properties ID.
Also there are some implementation related things:
I think I can use ZSTD_CCtx_setPledgedSrcSize() for zstd and zip formats, and ZSTD_c_srcSizeHint for 7z format compression. PledgedSrcSize can write uncompressed size to zstd header that is the only way to detect uncompressed size without decoding archive.
The idea with PledgedSrcSize in archive is to be more byte-to-byte compatible with original zstd program that uses PledgedSrcSize.
Archive with PledgedSrcSize without content-size field and srcSizeHint also give almost same result in many cases, except of some corner cases for 1 KiB, 2 KiB, 4 KiB, ... files, where srcSizeHint version writes additional empty block of 3 bytes at the end.
A possible problem is that when we provide PledgedSrcSize or srcSizeHint for multiple files, zstd can require new reallocation for structures for each new stream. And probably same reallocation problem for decoding multiple streams. I hope it will not hit the performance.
Last edit: Igor Pavlov 2022-08-11
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Igor,
I'm the main author of Zstandard,
and will try to help here for questions related to libzstd .
level : this information is not needed by the decoder. Whatever compression level is employed, the decoder will interpret the produced data correctly. This is similar in design to most LZ compressors available.
zstd's signature (also called magic number) : The Zstandard format is officially stable since v1.0, released in September 2016. Technically, even older v0.8.x versions are compatible, but we don't underline it, for simplicity. This stability is further entrenched by the publication of an IETF RFC : https://datatracker.ietf.org/doc/html/rfc8878 .
In this document, the signature is clearly stated as 0xFD2FB528:
That's the only signature I would suggest to support. All other numbers were short-lived development versions, which lasted typically a couple of months each. At this point in time, I would suggest to not bother supporting them. At this moment, libzstd, when compiled with its default build flags, supports decompressing v0.5, v0.6 and v0.7 development formats, but for the record, we plan on removing these last traces in a future v1.6.0 version of libzstd, leaving only the official v1.0. All versions after v1.0 support the same format, and are therefore both backward and forward compatible, for both write and read.
Note on internal structures within zstd compression state : you seem to have already a pretty good understanding of how this works. Yes, PledgedSrcSize and srcSizeHint will have a direct impact on how much resources are needed to compress next input. Yes, if the current amount of memory allocated for zstd compression state is not large enough, it will be resized. This is technically some work, and therefore can slow down processing. However, this effect is so small that it only matters for very small files. Moreover, while zstd reallocate whenever it needs more memory, it is more conservative when less memory is needed, and will try to re-use already allocated memory, thus avoiding another free/malloc round-trip, and generally escaping even initialization stage, which ends up saving a significant amount of work when repetitively compressing small data. zstd will only size down its memory usage once it's convinced that it's really reserving too much memory for too long, i.e. there was a one-off use case employing a lot of memory for one large file, but then, all other jobs are about small files and require less memory : then it will size down.
I suspect that none of these effects will have a tangible impact in an archiver use case, though maybe there are limit scenarios that could benefit from it. In data bases or serializer scenarios though, it happens all the time, and that's where these optimizations make sense.
👍
2
Last edit: Yann Collet 2022-08-15
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
_level is not used in decode.
But I think _ver_major and _ver_minoris needed. The version can determine whether decompression is supported
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2023-02-04
Hi Igor. I don't know if it's the same thing or what. But i'd like to ask you to add Tar Zstandard (tar.zst tzst) for decompression, it's widely used now for Linux and cross-platform software (manuals, sources, packages) so I have to use bsdtar (libarchive ) Windows version to unpack them. So I think If you've started to work with Zstandard maybe you could add this type of format.
Best Regards.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Any updates on this front? I just successfully used: https://github.com/mcmilk/7-Zip-zstd but it would be nice if this was integrated into regular 7-zip.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'll add zstd extraction (.zst files) feature in near versions of 7-zip.
The compressing to zstd is more complicated thing. There are many problems to solve with that zstd code, including compiler compatibility and another problems.
Last edit: Igor Pavlov 2023-07-04
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Do you plan to add multimedia compression (i.e. jpeg, wav & mp3). I saw that there are open source wavpack & packJPG. I have no idea about mp3. Maybe I am wrong but I believe that WinZip use open source compression methods for their zipx format, Thank you in advance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have no plans for jpeg and mp3 codecs now.
Unpacked wav is supported by Delta filter now.
That Delta filter doesn't provide best compression ratio, but it's still better than compressing without Delta filter.
Last edit: Igor Pavlov 2023-10-05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Maybe 7-Zip could try to apply a delta filter on all data, with a switch. Compute something like a "histogram" and check if the distribution of values became more favorable after delta, with small values around 0. Also a simple 16-bit delta would further improve compression.
Plain WAV files are uncommon in archives. But sound and graphics data occurs within games, part of bigger files. Old games still exist; sometimes DDS is uncompressed even today. Competing archivers can generally handle this without knowing anything about the format. Sometimes this hurts overall compression though if the input contains repetitions, so there is always a checkbox for multimedia.
Delta compression still survives in WinRAR even after they declared multimedia obsolete, and works fast.
I'm pleased to see that the speed of "delta:4 Copy" has improved between v17 and v23 by around 30%.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I work for zstd support in 7-zip now.
So next version of 7-zip probably will support it.
Now you can try to use some external plugin with zstd support.
The question to everyone:
What exact 7z method ids in external plugins were used for zstd method in 7z archives?
I know 4F71101 with 5 bytes properties in plugin of Tino Reichardt.
Did anybody use zip-space id of 4015D?
If somebody has zip or 7z archives with zstd method, please attach here small examples or write the link for downloading, and write what program did create these archives.
It can help some debugging.
Last edit: Igor Pavlov 2022-08-08
A test file can be found in this comment. It is a zip using Zstd method, created by WinZip.
Zip-space id of 4015D should be correct because the method ID is 93 (as specified in APPNOTE.TXT - .ZIP File Format Specification).
Note 1: Apparently PKWARE assigned method ID 20 for Zstd at first, but WinZip uses 93 for Zstd (WinZip is probably the first major tool to implement Zstd in Zip format), so PKWARE later changed the ID to 93 officially. Other open-source libraries and tools, such as libzip and libarchive, also uses ID 93.
Note 2: cielavenir's 7-zip fork has Zstd support in Zip, and supports reading Zstd ZIPs using both the deprecated ID 20 and official ID 93, and writing Zstd ZIPs using ID 93.
We must decide what exact IDs we need to support zstd for 7z decoding, and what IDs and properties to use in 7z encoding with zstd.
We can use 4F71101, 4015D or some another new id for zstd.
I don't want to bloat 7-Zip code with many supported ids. So the best solution probably is single ID for zstd supported. Zip code in 7-zip will use zstd code internally. So zip-zstd in 7-zip will work even if 4015D will be not defined. 4015D just allows some external dll to work with current 7-zip that doesn't support zstd internally.
If we use 4F71101, do we need 3/5 bytes properties of that 4F71101 method?
https://github.com/mcmilk/7-Zip-zstd/blob/master/DOC/Methods-Extern.md
Are they 5 bytes properties with version and level are really useful?
And are there big number of 7z archives already created with that 4F71101 ID?
Is it worth to care about compatibilty with these created 4F71101 archives?
About zstd is zip. Who is responsible for specification?
Do they limit supported zstd features?
For example, test zip archive from that thread doesn't use XXH64 . So is allowed to use XXH64 in zip-zstd?
Attached file contains zip with created by new 7-zip. Is it allowed to use such compression settings for zstd in zip?
Winrar at least can unpack it.
And does Winzip-zstd allow to change some settings for archive creation?
Also can pkware programs create zip-zstd now?
Last edit: Igor Pavlov 2022-08-08
WinZip 26 can extract your test file (a.zip) without any issues, and also based on the information from this page, I think they don't limit supported Zstd features.
In WinZip I can choose what compression method to use in Zip (LZMA, XZ, Zstd, etc.), but I cannot choose the compression level, etc.
I tried PKWARE's PKZIP for Windows 14.50.0010, but it does not support Zstd (shows "unknown compression method" error when extracting).
jinfeihan57/p7zip uses 4F71101 after mcmilk/7-Zip-zstd. If there are any archive then they will be made by either of them, which come down do 7-Zip-zstd as it's the source of plugin.
BTW., what's wrong with that plugin? It' s already made and uses mainstream zstd. The only thing in my opinion is these plugins should be standalone, rather than embedded. Not everyone needs or want to use them. In fact minority do; most peolple don't.
Hi Igor,
here is my test files: https://workupload.com/file/TAgu7pjptwp
It was created on linux.
I prefer one ID - jinfeihan57 with support -mlong
A bit late, but here's another archive sample, created using zip-rs rust crate (https://github.com/zip-rs/zip), with compression level 20
Its not directly usable as an application, but here's a sample cli utility that uses it internally (slightly patched since the original is outdated, you'll need to compile it yourself) -> https://github.com/Zapeth/zippy
Last edit: Michael 2022-10-22
Hello Igor,
thank you for working on Zstandard inclusion into 7-Zip.
To the Zstandard ZIP/ZIPX thing:
In summer 2020 I wrote a E-Mail to WinZIP and PKWARE because of the different IDs.
The answer of Winzip was:
But I think also, that reading should be implemented for ID 20 and ID 93... and the writing of Zstd ZIPs should be done by using ID 93.
For 7-Zip ZS with method ID 4F71101 with 5 bytes properties... my version was defined this way:
So a 3 or 5 byte header is required for this .7z method. But you may choose a new ID without that extra header for the official Zstandard support in 7-Zip.
I will abandon my 7-Zip ZS fork.... but it would be nice, when you can implement the extracting the method 4F71101 with 3 or 5 byte header also.
PS: You can reuse my code of 7-Zip ZS under a public domain license when you want.
Why did you require 3/5 bytes properties in decoder, if the decoder doesn't use it?
Why you don't allow no_properties archives in decoder now?
Is it for compatibility with old zstd?
But maybe it was simpler to use new id, when zstd has changed format and zstd signature from 0xFD2FB527 to 0xFD2FB528.
That way we still could decode any stream without properties.
Also I think that I can change default presets of zstd in some cases, for example, if user changes the dictionary size in GUI.
So the property
level
actually will not be too informative in properties in these cases.So I'm still thinking for No-Properties ID.
Also there are some implementation related things:
I think I can use
ZSTD_CCtx_setPledgedSrcSize()
for zstd and zip formats, andZSTD_c_srcSizeHint
for 7z format compression.PledgedSrcSize
can writeuncompressed size
to zstd header that is the only way to detect uncompressed size without decoding archive.The idea with
PledgedSrcSize
in archive is to be more byte-to-byte compatible with original zstd program that usesPledgedSrcSize
.Archive with
PledgedSrcSize
withoutcontent-size
field andsrcSizeHint
also give almost same result in many cases, except of some corner cases for 1 KiB, 2 KiB, 4 KiB, ... files, wheresrcSizeHint
version writes additional empty block of 3 bytes at the end.A possible problem is that when we provide
PledgedSrcSize
orsrcSizeHint
for multiple files, zstd can require new reallocation for structures for each new stream. And probably same reallocation problem for decoding multiple streams. I hope it will not hit the performance.Last edit: Igor Pavlov 2022-08-11
Hi Igor,
I'm the main author of Zstandard,
and will try to help here for questions related to
libzstd
.level
: this information is not needed by the decoder. Whatever compression level is employed, the decoder will interpret the produced data correctly. This is similar in design to most LZ compressors available.zstd
's signature (also called magic number) : The Zstandard format is officially stable sincev1.0
, released in September 2016. Technically, even olderv0.8.x
versions are compatible, but we don't underline it, for simplicity. This stability is further entrenched by the publication of an IETF RFC : https://datatracker.ietf.org/doc/html/rfc8878 .In this document, the signature is clearly stated as
0xFD2FB528
:That's the only signature I would suggest to support. All other numbers were short-lived development versions, which lasted typically a couple of months each. At this point in time, I would suggest to not bother supporting them. At this moment,
libzstd
, when compiled with its default build flags, supports decompressingv0.5
,v0.6
andv0.7
development formats, but for the record, we plan on removing these last traces in a futurev1.6.0
version oflibzstd
, leaving only the officialv1.0
. All versions afterv1.0
support the same format, and are therefore both backward and forward compatible, for both write and read.zstd
compression state : you seem to have already a pretty good understanding of how this works. Yes,PledgedSrcSize
andsrcSizeHint
will have a direct impact on how much resources are needed to compress next input. Yes, if the current amount of memory allocated forzstd
compression state is not large enough, it will be resized. This is technically some work, and therefore can slow down processing. However, this effect is so small that it only matters for very small files. Moreover, whilezstd
reallocate whenever it needs more memory, it is more conservative when less memory is needed, and will try to re-use already allocated memory, thus avoiding anotherfree
/malloc
round-trip, and generally escaping even initialization stage, which ends up saving a significant amount of work when repetitively compressing small data.zstd
will only size down its memory usage once it's convinced that it's really reserving too much memory for too long, i.e. there was a one-off use case employing a lot of memory for one large file, but then, all other jobs are about small files and require less memory : then it will size down.I suspect that none of these effects will have a tangible impact in an archiver use case, though maybe there are limit scenarios that could benefit from it. In data bases or serializer scenarios though, it happens all the time, and that's where these optimizations make sense.
Last edit: Yann Collet 2022-08-15
_level
is not used in decode.But I think
_ver_major
and_ver_minor
is needed. The version can determine whether decompression is supportedHi Igor. I don't know if it's the same thing or what. But i'd like to ask you to add Tar Zstandard (tar.zst tzst) for decompression, it's widely used now for Linux and cross-platform software (manuals, sources, packages) so I have to use bsdtar (libarchive ) Windows version to unpack them. So I think If you've started to work with Zstandard maybe you could add this type of format.
Best Regards.
Any updates on this front? I just successfully used: https://github.com/mcmilk/7-Zip-zstd but it would be nice if this was integrated into regular 7-zip.
I'll add zstd extraction (.zst files) feature in near versions of 7-zip.
The compressing to zstd is more complicated thing. There are many problems to solve with that zstd code, including compiler compatibility and another problems.
Last edit: Igor Pavlov 2023-07-04
Do you plan to add multimedia compression (i.e. jpeg, wav & mp3). I saw that there are open source wavpack & packJPG. I have no idea about mp3. Maybe I am wrong but I believe that WinZip use open source compression methods for their zipx format, Thank you in advance
I have no plans for jpeg and mp3 codecs now.
Unpacked wav is supported by
Delta
filter now.That
Delta
filter doesn't provide best compression ratio, but it's still better than compressing withoutDelta
filter.Last edit: Igor Pavlov 2023-10-05
Sorry, please, for anxiety.
Tell me, when is it planned to release version 7-Zip with zstd support?
Thank you!
7zip on Linux got some cool features. Would be nice to have zstd available and supported, so I can get rid of
tar
:)Maybe 7-Zip could try to apply a delta filter on all data, with a switch. Compute something like a "histogram" and check if the distribution of values became more favorable after delta, with small values around 0. Also a simple 16-bit delta would further improve compression.
Plain WAV files are uncommon in archives. But sound and graphics data occurs within games, part of bigger files. Old games still exist; sometimes DDS is uncompressed even today. Competing archivers can generally handle this without knowing anything about the format. Sometimes this hurts overall compression though if the input contains repetitions, so there is always a checkbox for multimedia.
Delta compression still survives in WinRAR even after they declared multimedia obsolete, and works fast.
I'm pleased to see that the speed of "delta:4 Copy" has improved between v17 and v23 by around 30%.
This landed in version 24.01 and can be closed.