"ERROR: Data Error". cpio archive is interpreted as cab archive. regression...
A free file archiver for extremely high compression
Brought to you by:
ipavlov
7z 23.01 fails to extract a specific cpio archive
7z 22.01 succeeds to extract the same file
the problem is:
7z 23.01 interprets the cpio archive as a cab archive, which fails
downstream issue: https://github.com/timsutton/brigadier/issues/37
to reproduce:
cd $(mktemp -d)
# download *.pkg archive file
# warning: this file has 1.6 GB
# sorry i dont have a smaller test file
wget http://swcdn.apple.com/content/downloads/62/58/041-98143-A_HN8B941A1T/nknv1gt3xcgylggwc11kl5e0j4296tjfo1/BootCampESD.pkg
sha256sum BootCampESD.pkg
# f72f3d43355321f35432a25432b2c56cd66df34997fa14a0e66830f456e5da7a BootCampESD.pkg
file -i BootCampESD.pkg
# BootCampESD.pkg: application/x-xar; charset=binary
# extract Payload~ from the xar archive BootCampESD.pkg
# this works with both 7z versions
7z e BootCampESD.pkg
file -i Payload~
# Payload~: application/x-cpio; charset=binary
# extract WindowsSupport.dmg from the cpio archive Payload~
# this works with 7z 22.01
# this fails with 7z 23.01
7z e Payload~
output of 7z 23.01 (this fails)
note: Type = Cab
$ 7z e -bd -bb3 Payload~
7-Zip (z) 23.01 (x64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20
64-bit locale=en_US.UTF-8 Threads:8 OPEN_MAX:1024
Scanning the drive for archives:
1 file, 1613847552 bytes (1540 MiB)
Extracting archive: Payload~
WARNINGS:
There are data after the end of archive
--
Path = Payload~
Type = Cab
WARNINGS:
There are data after the end of archive
Offset = 2177233
Physical Size = 794777
Tail Size = 1610875542
Method = MSZip
Blocks = 1
Volumes = 1
Volume Index = 0
ID = 0
- WSUSSCAN.cab
- Windows6.1-KB2685811-x64.cab
ERROR: Data Error : Windows6.1-KB2685811-x64.cab
- Windows6.1-KB2685811-x64-pkgProperties.txt
ERROR: Data Error : Windows6.1-KB2685811-x64-pkgProperties.txt
- Windows6.1-KB2685811-x64.xml
ERROR: Data Error : Windows6.1-KB2685811-x64.xml
Sub items Errors: 3
Archives with Errors: 1
Warnings: 1
Sub items Errors: 3
output of 7z 22.01 (this works)
note: Type = Cpio
$ 7z e -bd -bb3 Payload~
7-Zip (z) 22.01 (x64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15
64-bit locale=en_US.UTF-8 Threads:8, ASM
Scanning the drive for archives:
1 file, 1613847552 bytes (1540 MiB)
Extracting archive: Payload~
--
Path = Payload~
Type = Cpio
Physical Size = 1613847552
SubType = Portable ASCII
- ./
- ./Library/
- ./Library/Application Support/
- ./Library/Application Support/BootCamp/
- ./Library/Application Support/BootCamp/WindowsSupport.dmg
Everything is Ok
Folders: 4
Files: 1
Size: 1613846904
Compressed: 1613847552
$ sha256sum WindowsSupport.dmg
7c84ace54a095156e6f57da3ba825baf87b9f67308b39099c65b308a94461066 WindowsSupport.dmg
output of cpio (this works)
$ cpio --extract --verbose <Payload~
.
./Library
./Library/Application Support
./Library/Application Support/BootCamp
./Library/Application Support/BootCamp/WindowsSupport.dmg
3152046 blocks
In that cpio file there is unusual value
rdev=255
in the fieldrdev Device major/minor for special file
But these files in cpio are not special.
So I supposed that this
rdev
must be zero, as in another CPIO archives.7-Zip checks that field because I want to reduce the probability that some garbage data file is open as cpio file.
cpio file has no signature. So 7-Zip checks some data from data fields to detect that file is really cpio archive.
I can change that code and remove that check in next version of 7-Zip. So 7-Zip will open that file.
Do you know why they (Apple?) write
rdev=255
in that file?When they have started to use
(rdev!=0)
in cpio archives?Are there another cpio examples with
(rdev!=0)
from apple or another source?Last edit: Igor Pavlov 2023-09-30
file -i Payload~
saysapplication/x-cpio
so i would not blame applehttps://github.com/file/file/blob/de7d52dce3e7a0bb1a72f299a265c2b641187842/magic/Magdir/archive#L194
It's cpio with unusual values in some header fields.
If some program writes unusual value to header fields, it can be bug of that program.
So we can try to find the origin of that problem/
Wht exact program was used to create that file, and why that program writes
255
tordev
.if 7z expects unusual value in header fields, it is a bug of 7z
It's trade-off.
7-Zip supports many archive formats and we also search archive data in many starting positions in file.
So we try to reduce the probability to open junk data as archive, because it can be archive of another type instead.
So we check as many fields for correct data as possible.
And it works ok in most cases when data is correct.
When we see some unusual data in real archive, we can change these conditions in 7-Zip. But it increases the probability of problem with another data.
So I still want to know what creation software writes unusual data for these archives. They didn't do it before as I suppose. So something was changed in their part. What the origin of that change?
Last edit: Igor Pavlov 2023-10-15
this should be covered by tests
https://github.com/file/file "knows the 'magic number' of several thousands of file types."
ask apple, but thats the wrong question. "file -i" says the file is a cpio archive, and "cpio --extract" works too, so the chance is 99.9% that the file is valid
Last edit: milahu 2023-10-15
There are many formats that have no signatures at start of file.
Also there are archives without signatures and archives that do not start from offset 0 (sfx archives).
And we want to open such archives too.
So we try to use stronger conditions to detect archive type.
Some CPIO signatures are just 2 bytes. So we try to use additional conditions that reduce the probability of false open.
If we think that it's not cpio, we can check another formats and another starting offsets to open that data as another type.
I have no contacts to apple.
So I asked you because you probably is more close to them, if you work with such files.
Maybe there is simple way to create such file, if they could create it.
Or maybe there was bug in their software, and that bug was fixed or it will be fixed in future. So in future there will be no cpio files with such unusal field values.
Some additional checks in 7-Zip can be good idea, because these checks allow to find bugs in another software that create such archives.
Last edit: Igor Pavlov 2023-10-15
wrong. st_rdev is the "device type" of a file. it is zero only for regular files
see also
https://man7.org/linux/man-pages/man1/stat.1.html
https://man7.org/linux/man-pages/man3/stat.3type.html
Yes, but files inside cpio are marked as regular files in file type field inside
mode
field.And 7-Zip checks that file is regular with
S_ISCHR
/S_ISBLK
checks:7-zip checks
rDevMinor
andrDevMajor
only for special files (non regular files).Note that it's some new problem. I don't know about another files that also have such unusual field values.
That's why I want to find the origin of problem.
Last edit: Igor Pavlov 2023-10-15
https://discussions.apple.com/thread/255207794
I've noticed that this cpio file is old (2016).
So it's some old problem.
Did you check similar files from apple that were created before 2016 and after 2016?
Do they have same problem?
no, i did not check other files