Menu

#2414 "ERROR: Data Error". cpio archive is interpreted as cab archive. regression from 7z 22.01 to 7z 23.01

None
open-accepted
nobody
None
5
2023-10-16
2023-09-29
milahu
No

7z 23.01 fails to extract a specific cpio archive
7z 22.01 succeeds to extract the same file

the problem is:
7z 23.01 interprets the cpio archive as a cab archive, which fails

downstream issue: https://github.com/timsutton/brigadier/issues/37

to reproduce:

cd $(mktemp -d)

# download *.pkg archive file
# warning: this file has 1.6 GB
# sorry i dont have a smaller test file
wget http://swcdn.apple.com/content/downloads/62/58/041-98143-A_HN8B941A1T/nknv1gt3xcgylggwc11kl5e0j4296tjfo1/BootCampESD.pkg

sha256sum BootCampESD.pkg 
# f72f3d43355321f35432a25432b2c56cd66df34997fa14a0e66830f456e5da7a  BootCampESD.pkg

file -i BootCampESD.pkg
# BootCampESD.pkg: application/x-xar; charset=binary

# extract Payload~ from the xar archive BootCampESD.pkg
# this works with both 7z versions
7z e BootCampESD.pkg

file -i Payload~ 
# Payload~: application/x-cpio; charset=binary

# extract WindowsSupport.dmg from the cpio archive Payload~
# this works with 7z 22.01
# this fails with 7z 23.01
7z e Payload~

output of 7z 23.01 (this fails)
note: Type = Cab

$ 7z e -bd -bb3 Payload~

7-Zip (z) 23.01 (x64) : Copyright (c) 1999-2023 Igor Pavlov : 2023-06-20
 64-bit locale=en_US.UTF-8 Threads:8 OPEN_MAX:1024

Scanning the drive for archives:
1 file, 1613847552 bytes (1540 MiB)

Extracting archive: Payload~

WARNINGS:
There are data after the end of archive

--
Path = Payload~
Type = Cab
WARNINGS:
There are data after the end of archive
Offset = 2177233
Physical Size = 794777
Tail Size = 1610875542
Method = MSZip
Blocks = 1
Volumes = 1
Volume Index = 0
ID = 0

- WSUSSCAN.cab
- Windows6.1-KB2685811-x64.cab
ERROR: Data Error : Windows6.1-KB2685811-x64.cab
- Windows6.1-KB2685811-x64-pkgProperties.txt
ERROR: Data Error : Windows6.1-KB2685811-x64-pkgProperties.txt
- Windows6.1-KB2685811-x64.xml
ERROR: Data Error : Windows6.1-KB2685811-x64.xml

Sub items Errors: 3

Archives with Errors: 1

Warnings: 1

Sub items Errors: 3

output of 7z 22.01 (this works)
note: Type = Cpio

$ 7z e -bd -bb3 Payload~

7-Zip (z) 22.01 (x64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15
 64-bit locale=en_US.UTF-8 Threads:8, ASM

Scanning the drive for archives:
1 file, 1613847552 bytes (1540 MiB)

Extracting archive: Payload~
--
Path = Payload~
Type = Cpio
Physical Size = 1613847552
SubType = Portable ASCII

- ./
- ./Library/
- ./Library/Application Support/
- ./Library/Application Support/BootCamp/
- ./Library/Application Support/BootCamp/WindowsSupport.dmg
Everything is Ok

Folders: 4
Files: 1
Size:       1613846904
Compressed: 1613847552

$ sha256sum WindowsSupport.dmg
7c84ace54a095156e6f57da3ba825baf87b9f67308b39099c65b308a94461066  WindowsSupport.dmg

output of cpio (this works)

$ cpio --extract --verbose <Payload~
.
./Library
./Library/Application Support
./Library/Application Support/BootCamp
./Library/Application Support/BootCamp/WindowsSupport.dmg
3152046 blocks

Discussion

  • Igor Pavlov

    Igor Pavlov - 2023-09-30

    In that cpio file there is unusual value rdev=255 in the field
    rdev Device major/minor for special file
    But these files in cpio are not special.
    So I supposed that this rdev must be zero, as in another CPIO archives.
    7-Zip checks that field because I want to reduce the probability that some garbage data file is open as cpio file.
    cpio file has no signature. So 7-Zip checks some data from data fields to detect that file is really cpio archive.
    I can change that code and remove that check in next version of 7-Zip. So 7-Zip will open that file.

    Do you know why they (Apple?) write rdev=255 in that file?
    When they have started to use (rdev!=0) in cpio archives?
    Are there another cpio examples with (rdev!=0) from apple or another source?

     

    Last edit: Igor Pavlov 2023-09-30
  • Igor Pavlov

    Igor Pavlov - 2023-09-30
    • status: open --> open-accepted
    • Group: -->
     
  • milahu

    milahu - 2023-09-30

    file -i Payload~ says application/x-cpio so i would not blame apple

    https://github.com/file/file/blob/de7d52dce3e7a0bb1a72f299a265c2b641187842/magic/Magdir/archive#L194

    # cpio archives
    #
    # Yes, the top two "cpio archive" formats *are* supposed to just be "short".
    # The idea is to indicate archives produced on machines with the same
    # byte order as the machine running "file" with "cpio archive", and
    # to indicate archives produced on machines with the opposite byte order
    # from the machine running "file" with "byte-swapped cpio archive".
    #
    # The SVR4 "cpio(4)" hints that there are additional formats, but they
    # are defined as "short"s; I think all the new formats are
    # character-header formats and thus are strings, not numbers.
    # URL:      http://fileformats.archiveteam.org/wiki/Cpio
    #       https://en.wikipedia.org/wiki/Cpio
    # Reference:    https://people.freebsd.org/~kientzle/libarchive/man/cpio.5.txt
    # Update:   Joerg Jenderek
    #
    # Reference:    http://mark0.net/download/triddefs_xml.7z/defs/a/ark-cpio-bin.trid.xml
    # Note:     called "CPIO archive (binary)" by TrID, "cpio/Binary LE" by 7-Zip and "CPIO" by DROID via PUID fmt/635
    0   short       070707
    # skip DROID fmt-635-signature-id-960.cpio by looking for pathname of 1st entry
    >26 string      >\0     cpio archive
    !:mime  application/x-cpio
    # https://download.opensuse.org/distribution/leap/15.4/iso/openSUSE-Leap-15.4-NET-x86_64-Media.iso
    # boot/x86_64/loader/bootlogo
    # message.cpi
    !:ext   /cpio/cpi
    >>0 use cpio-bin
    # Reference:    http://mark0.net/download/triddefs_xml.7z/defs/a/ark-cpio-bin-sw.trid.xml
    # Note:     called "CPIO archive (byte swapped binary)" by TrID and "Cpio/Binary BE" by 7-Zip
    0   short       0143561     byte-swapped cpio archive
    !:mime  application/x-cpio # encoding: swapped
    # https://telparia.com/fileFormatSamples/archive/cpio/skeleton2.cpio
    !:ext   cpio
    >0  use cpio-bin-be
    # Reference:    http://mark0.net/download/triddefs_xml.7z/defs/a/ark-cpio.trid.xml
    # Note:     called "CPIO archive (portable)" by TrID, "cpio/Portable ASCII" by 7-Zip and "cpio/odc" by GNU cpio
    0   string      070707      ASCII cpio archive (pre-SVR4 or odc)
    !:mime  application/x-cpio
    # https://telparia.com/fileFormatSamples/archive/cpio/ pthreads-1.60B5.osr5src.cpio cinema.cpi VOL.000.008 VOL.000.012
    !:ext   cpio/cpi/008/012
    # Note:     called "CPIO archive (portable)" by TrID, "cpio/New ASCII" by 7-Zip and "cpio/newc" by GNU cpio
    0   string      070701      ASCII cpio archive (SVR4 with no CRC)
    !:mime  application/x-cpio
    # https://telparia.com/fileFormatSamples/archive/cpio/MainActor-2.06.3.cpio
    !:ext   cpio
    # Note:     called "CPIO archive (portable)" by TrID, "cpio/New CRC" by 7-Zip and "cpio/crc" by GNU cpio
    0   string      070702      ASCII cpio archive (SVR4 with CRC)
    !:mime  application/x-cpio
    # http://ftp.gnu.org/gnu/tar/tar-1.27.cpio.gz
    # https://telparia.com/fileFormatSamples/archive/cpio/pcmcia
    !:ext   /cpio
    #   display information of old binary cpio archive
    # Note: verfied by 7-Zip `7z l -tcpio -slt *.cpio` and
    #   `cpio -ivt --numeric-uid-gid --file=clam.bin-le.cpio`
    0   name    cpio-bin
    # c_dev; device number; WHAT IS THAT?
    >2  uleshort    x       \b; device %u
    # c_ino; truncated inode number; use `ls --inode`
    >4  uleshort    x       \b, inode %u
    # c_mode; mode specifies permissions and file type like: ?622~?rw-r--r-- by `ls -l`
    >6  uleshort    x       \b, mode %o
    # c_uid; numeric user id; use `ls --numeric-uid-gid`
    >8  uleshort    x       \b, uid %u
    # c_gid; numeric group id
    >10 uleshort    x       \b, gid %u
    # c_nlink; links to this file; directories at least 2
    >12 uleshort    >1      \b, %u links
    # c_rdev; device number for block and character entries; zero for all other entries by writers
    # like 0x0440 for /dev/ttyS0
    >14 uleshort    >0      \b, device %#4.4x
    # c_mtime[2]; modification time in seconds since 1 January 1970; most-significant 16 bits first 
    >16 medate      x       \b, modified %s
    # c_filesize[2]; size of pathname; most-significant 16 bits first like: 544
    >22 melong      x       \b, %u bytes
    # c_namesize; bytes in the pathname that follows the header like: 9
    #>20    uleshort    x       \b, namesize %u
    # pathname of entry like: "clam.exe"
    >26 string      x       "%s"
    #   display information of old binary byte swapped cpio archive
    # Note: verfied by 7-Zip `7z l -tcpio -slt *.cpio` and
    #   `LANGUAGE=C cpio -ivt --numeric-uid-gid --file=clam.bin-be.cpio`
    0   name    cpio-bin-be
    >2  ubeshort    x       \b; device %u
    >4  ubeshort    x       \b, inode %u
    >6  ubeshort    x       \b, mode %o
    >8  ubeshort    x       \b, uid %u
    >10 ubeshort    x       \b, gid %u
    >12 ubeshort    >1      \b, %u links
    >14 ubeshort    >0      \b, device %#4.4x
    >16 bedate      x       \b, modified %s
    >22 ubelong     x       \b, %u bytes
    #>20    ubeshort    x       \b, namesize %u
    >26 string      x       "%s"
    
     
  • Igor Pavlov

    Igor Pavlov - 2023-09-30

    It's cpio with unusual values in some header fields.
    If some program writes unusual value to header fields, it can be bug of that program.
    So we can try to find the origin of that problem/
    Wht exact program was used to create that file, and why that program writes 255 to rdev.

     
  • milahu

    milahu - 2023-10-15

    If some program writes unusual value to header fields, it can be bug of that program.

    if 7z expects unusual value in header fields, it is a bug of 7z

     
  • Igor Pavlov

    Igor Pavlov - 2023-10-15

    It's trade-off.
    7-Zip supports many archive formats and we also search archive data in many starting positions in file.
    So we try to reduce the probability to open junk data as archive, because it can be archive of another type instead.
    So we check as many fields for correct data as possible.

    And it works ok in most cases when data is correct.
    When we see some unusual data in real archive, we can change these conditions in 7-Zip. But it increases the probability of problem with another data.
    So I still want to know what creation software writes unusual data for these archives. They didn't do it before as I suppose. So something was changed in their part. What the origin of that change?

     

    Last edit: Igor Pavlov 2023-10-15
  • milahu

    milahu - 2023-10-15

    we can change these conditions in 7-Zip. But it increases the probability of problem with another data.

    this should be covered by tests

    7-Zip supports many archive formats

    https://github.com/file/file "knows the 'magic number' of several thousands of file types."

    What the origin of that change?

    ask apple, but thats the wrong question. "file -i" says the file is a cpio archive, and "cpio --extract" works too, so the chance is 99.9% that the file is valid

     

    Last edit: milahu 2023-10-15
  • Igor Pavlov

    Igor Pavlov - 2023-10-15

    There are many formats that have no signatures at start of file.
    Also there are archives without signatures and archives that do not start from offset 0 (sfx archives).
    And we want to open such archives too.
    So we try to use stronger conditions to detect archive type.
    Some CPIO signatures are just 2 bytes. So we try to use additional conditions that reduce the probability of false open.
    If we think that it's not cpio, we can check another formats and another starting offsets to open that data as another type.

    I have no contacts to apple.
    So I asked you because you probably is more close to them, if you work with such files.
    Maybe there is simple way to create such file, if they could create it.
    Or maybe there was bug in their software, and that bug was fixed or it will be fixed in future. So in future there will be no cpio files with such unusal field values.
    Some additional checks in 7-Zip can be good idea, because these checks allow to find bugs in another software that create such archives.

     

    Last edit: Igor Pavlov 2023-10-15
  • milahu

    milahu - 2023-10-15

    So I supposed that this rdev must be zero, as in another CPIO archives.

    wrong. st_rdev is the "device type" of a file. it is zero only for regular files

    touch regular-file 
    stat -c "%r = 0x%R" regular-file 
    # 0 = 0x0
    
    stat -c "%r = 0x%R" /dev/null 
    # 259 = 0x103
    
    stat -c "%r = 0x%R" /dev/random 
    # 264 = 0x108
    
    stat -c "%r = 0x%R" /dev/sda
    # 2048 = 0x800
    

    see also
    https://man7.org/linux/man-pages/man1/stat.1.html
    https://man7.org/linux/man-pages/man3/stat.3type.html

     
  • Igor Pavlov

    Igor Pavlov - 2023-10-15

    wrong. st_rdev is the "device type" of a file. it is zero only for regular files

    Yes, but files inside cpio are marked as regular files in file type field inside mode field.
    And 7-Zip checks that file is regular with
    S_ISCHR/ S_ISBLK checks:

      /* v23.02: we have disabled rDevMinor check because real file
         from Apple contains rDevMinor==255 by some unknown reason */
      if (rDevMajor != 0
          // || rDevMinor != 0
          )
      {
        if (!MY_LIN_S_ISCHR(mode) &&
            !MY_LIN_S_ISBLK(mode))
          return k_IsArc_Res_NO;
      }
    

    7-zip checks rDevMinor and rDevMajor only for special files (non regular files).

    Note that it's some new problem. I don't know about another files that also have such unusual field values.
    That's why I want to find the origin of problem.

     

    Last edit: Igor Pavlov 2023-10-15
  • milahu

    milahu - 2023-10-16
     
  • Igor Pavlov

    Igor Pavlov - 2023-10-16

    I've noticed that this cpio file is old (2016).
    So it's some old problem.
    Did you check similar files from apple that were created before 2016 and after 2016?
    Do they have same problem?

     
  • milahu

    milahu - 2023-10-16

    no, i did not check other files

     

Log in to post a comment.