Bug with .bz2 archives created by lbzip2 (linux command-line archiver, multi-thread version of standard bzip2 archiver).
Using Ubuntu 14.04.3 LTS 64-bit, lbzip2 2.3-1 64-bit for compressing, 7-zip 9.20 64-bit on Windows 7 64-bit/15.14 32-bit on Windows 7 32-bit for decompressing.
When compressing and decompressing small .txt file all is ok.
When compressing big file (for example mirror.yandex.ru/altlinux/old/GIVC/livecd.iso) and then trying to decompress it with 7-zip (different versions) it says that archive is corrupted. Different files and different versions of 7-zip may show error on different progress (in percents) of decompression.
In the same time on linux decompressing of this file with tar or lbunzip works good.
The same file on the same system compressed in .bz2 with standard bzip2 archiver causes no errors when decompressing with 7-zip.
Please provide some example of such bz2 file.
You can split "bad" file to parts, and compress each part, so probably "bad" bz2 file will be smaller.
I experienced same issue
128MB tar.bz2 created on RHEL7 kernel 3.10.0-693.2.2.el7.x86_64 with
tar x86_64 epoch 2, version 1.26, release 32.e17
lbzip2 x86_64 version 2.5 release 1.e17
attempted to test/extract with 7zip 64bit 16.04 on Windows 7 64bit and 7zip 64bit 16.02 on RHEL 7
tar.bz2 created with standard bz2 tests/extracts OK
tar.bz2 created with lbzip2 test/extract operation fails quickly with 'Data error : <filename>.tar'
tar.bz2 created with lbzip2 test/extract OK with tar and lbunzip2 on RHEL 7
tar.bz2 created with lbzip2 test/extract OK with WinRar 64bit on Windows 7 64</filename>
lbzip2 file
https://www.dropbox.com/s/gm77vafmhxk93ph/lbzip2-R4.x267.000.0003.tar.bz2?dl=0
md5 - 85fc812e5b99d44ba50c93d532f4d278
sha256 - 802c85b9230968cbf71701fef8513dff6aee875aa300993e2d6fbb9df3b962f2
standard bz2 file
https://www.dropbox.com/s/mx9cufpilnvs2k6/nonlbzip2-R4.x267.000.0003.tar.bz2?dl=0
md5 - cc40f6ba61d4c796b2f88cbb30e71dd2
sha256 - 2a41b8d7da89779424ace383ffa46fe29e8c85d0593f32982adb934cd7fa563c
OK.
I've sent question to lbzip2 developer.
Technical description of lbzip2/7-Zip compatibility problems:
lbzip2 - Problem 1 - The number of selectors
The bzip2 decoder can use up to 18001 selectors (90000/50 + 1).
But the "number of selectors" is stored in 15-bit field (32767 is max value)
The number of selectors:
bzip2 1.0.6
bzip2 decoder doesn't check exact number of selectors.
So decoder can overflow
selectorandselectorMtfarrays. But there are another arrays afterselectorMtfarray in structure. So overflow data is written to these arrays, and bzip2 C decoder still works correctly.Some JAVA bzip2 implementations allocate only 18002 items in
selectorarrays and it can overflow.The lbzip2 decoder supports up to 32767 selectors.
lbzip2 encoder can write (18001 + 7) selectors.
It can use up to 7 dummy selectors in order to make block size multiply of 8 bits. Additional dummy selectors can help for better speed.
So I suggested that lbzip2 reduce block size for 7 selectors (18002 - 7) . It's 350 bytes reduction. So it will be not more than 18002 selectors with additional dummy selectors.
lbzip2 - Problem 2: dummy huffman tree
lbzip2 uses dummy huffman tree.
But all length values are equal to 20 (MAX_CODE_LENGTH) in these tables.
And these values don't cover whole bit code huffman range.
7-Zip checks it when building huffman tree, and 7-Zip reports about data error.
Probably lbzip2 encoder can be changed to write some "good" lengths for full range tree.
7-Zip/lbzip2
I can fix both problems in 7-Zip decoder, so 7-Zip will be able to unpack such bz2 archives. But I'm not sure that I want to do it.
I suppose that lbzip2 encoder must be fixed also, at least for problem-1.
Thanks for quick response, will use pbzip2 or pigz instead.
p7zip can do multi-threading bzip2 compressing also.
p7zip can do multi-threading xz compressing also.
Not related with the bug, but with your comment:
To make a tarball (tar.bz2), you can use gnutar/bsdtar with pbzip2 or lbzip2 to create it directly and using multi-threading compression.
As far as I can tell p7zip does not make tarballs. You can also use gnutar and p7zip using an intermediate script to pass the -d/-c arguments, but this is not supported in bsdtar. So if you need to use bsdtar and want to create a tarball with multithread compression, the way to go is pbzip2 or lbzip2. I supose adding support for -d/-c parameters in p7zip won't be hard, anyway.
Umm, you don’t have to do that. You can just use a pipe, e.g.
Or in reverse
Last edit: Ruarí Ødegaard 2018-04-15