Menu

#102 "no images" in .tar.xz archives

MComix_1.01
closed-fixed
nobody
None
3
2016-01-27
2016-01-26
Wyatt Ward
No

Running Debian Sid with mcomix 1.01, I attempted to open a .tar.xz file containing several large .tif images.

.tif images are supported, so the only answer I can come to is that mcomix is not seeing them.
The archive in question has a copyrighted work in high resolution (a manga scan), so I don't think it's wise to upload it here. I created the archive with tar cJf archive.tar.xz *.tif.
To get a sense of scale, each TIF is about 12 megabytes in size. They open fine outside of the tarball.

I do have p7zip-full and xz-utils installed, so I'm not sure where the problem comes from. Nothing is printed to the console when I attempt to open the tarball. I have opened the archive in a hex editor and the header matches the definition for a .XZ file perfectly.

Discussion

  • Benoit Pierre

    Benoit Pierre - 2016-01-26

    The problem is Python 2 tarfile module does not have support for LZMA:

    > python2 -c 'import sys, tarfile; tarfile.open(sys.argv[1], "r").list()' archive.tar.xz
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/usr/lib/python2.7/tarfile.py", line 1678, in open
        raise ReadError("file could not be opened successfully")
    tarfile.ReadError: file could not be opened successfully
    > python -c 'import sys, tarfile; tarfile.open(sys.argv[1], "r").list()' archive.tar.xz
    20:18:46> python -c 'import sys, tarfile; tarfile.open(sys.argv[1], "r").list()' archive.tar.xz
    ?rw------- bpierre/bpierre      15268 2015-02-22 13:22:51 Tests/images/12bit.cropped.tif
    ?rw------- bpierre/bpierre      20274 2015-02-22 13:22:51 Tests/images/12in16bit.tif
    ?rw------- bpierre/bpierre       8302 2015-02-22 13:22:51 Tests/images/16bit.cropped.tif
    ?rw------- bpierre/bpierre       5112 2015-02-22 13:22:51 Tests/images/16bit.deflate.tif
    
     
    • Wyatt Ward

      Wyatt Ward - 2016-01-26

      Yes, and I looked at the code, it uses 7-zip for decompressing them.
      This should not be an issue.

           # Headers for TAR-XZ and TAR-LZMA that aren't supported by tarfile
          elif magic[0:5] == '\xFD7zXZ' or magic[0:5] == ']\x00\x00\x80\x00':
              return constants.SEVENZIP
      

      And as I said, I got no errors on the console. It was not using tarfile when it failed.

      The header in the .xz file is 0xFD followed by '7zXZ', as described in archive_tools.py.

       

      Last edit: Wyatt Ward 2016-01-26
  • Benoit Pierre

    Benoit Pierre - 2016-01-26

    Indeed, except in my case the listing output of 7z is different than for the regular case, which is why MComix think the archive is empty. Can you copy/paste the output of 7z l -slt archive.tar.xz?

     
  • Wyatt Ward

    Wyatt Ward - 2016-01-26

    I can predict before I try it that it will not unpack the tarball inside - it will turn the .tar.xz into a normal .tar file.

    Anyway, here's the output.

    7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
    p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,2 CPUs)
    
    Listing archive: Jitsu Wa Chapter 109 raw.tar.xz
    
    --
    Path = Jitsu Wa Chapter 109 raw.tar.xz
    Type = xz
    Method = LZMA2:23 CRC64
    
    ----------
    Size = 262318080
    Packed Size = 162604736
    Method = LZMA2:23 CRC64
    

    Is it not handling the secondary '.tar' unpacking, perhaps? Might it need to be differentiated from a normal 'SEVENZIP' file and carry an additional step?

    Unpacking it with 7z x 'Jitsu Wa Chapter 109 raw.tar.xz' unpacks a .tar file, not the images in the tarball.

     

    Last edit: Wyatt Ward 2016-01-26
  • Benoit Pierre

    Benoit Pierre - 2016-01-26

    No, again the problem is the listing is different: no 'Path = Jitsu Wa Chapter 109 raw.tar' line. Secondary unpacking is correctly handled if it's a regular 7z archive.

     
    • Wyatt Ward

      Wyatt Ward - 2016-01-26

      I compressed it with GNU tar in the format of tar cJf tarball.tar.xz *.tif.

      By the way, I just tried it again by first making a normal, non-compressed .tar and then running xz -z Jitsu\ Wa\ Chapter\ 109 raw.tar on it.

      The output from 7z -slt is the same. This is just how xz-utils' xz program works.

      7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
      p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,2 CPUs)
      
      Listing archive: Jitsu Wa Chapter 109-7z.tar.xz
      
      --
      Path = Jitsu Wa Chapter 109-7z.tar.xz
      Type = xz
      Method = LZMA2:24 CRC32
      
      ----------
      Size = 262318080
      Packed Size = 162351392
      Method = LZMA2:24 CRC32
      

      Looks like maybe a different method (LZMA:24) but otherwise identical.

      If anything, the problem looks to be that it expects all '.xz' archives to have been made by 7-zip instead of by the original program.
      EDIT
      made a .xz archive with 7z a -txz and it has the same issue. I think you might have to use sevenzip nested inside of tarfile's decompression function to get the files out, perhaps.

      Still pretty sure the major change is that a separate use case needs to exist for .tar.xz archives than for .7z ones. even if both get unpacked partway with sevenzip.

       

      Last edit: Wyatt Ward 2016-01-26
  • Benoit Pierre

    Benoit Pierre - 2016-01-26

    The problem is 7z output is inconsistent with other type of archives.

    > xz -d -k archive.tar.xz
    > 7z a archive.tar.7z archive.tar
    7-Zip [64] 9.38 beta  Copyright (c) 1999-2014 Igor Pavlov  2015-01-03
    p7zip Version 9.38.1 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,8 CPUs,ASM)
    Scanning
    
    Creating archive archive.tar.7z
    
    Compressing  archive.tar
    
    Everything is Ok
    > 7z l -slt archive.tar.7z
    7-Zip [64] 9.38 beta  Copyright (c) 1999-2014 Igor Pavlov  2015-01-03
    p7zip Version 9.38.1 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,8 CPUs,ASM)
    
    Listing archive: archive.tar.7z
    
    --
    Path = archive.tar.7z
    Type = 7z
    Physical Size = 619805
    Headers Size = 130
    Method = LZMA2:20
    Solid = -
    Blocks = 1
    
    ----------
    Path = archive.tar
    Size = 962560
    Packed Size = 619675
    Modified = 2016-01-26 20:10:36
    Attributes = A_
    CRC = F666CA87
    Encrypted = -
    Method = LZMA2:20
    Block = 0
    > 7z l -slt test/files/archives/SolidFlat.tar.gz
    
    7-Zip [64] 9.38 beta  Copyright (c) 1999-2014 Igor Pavlov  2015-01-03
    p7zip Version 9.38.1 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,8 CPUs,ASM)
    
    Listing archive: test/files/archives/SolidFlat.tar.gz
    
    --
    Path = test/files/archives/SolidFlat.tar.gz
    Type = gzip
    Headers Size = 10
    
    ----------
    Path = SolidFlat.tar
    Size = 10240
    Packed Size = 857
    Modified = 2015-04-12 20:47:37
    Host OS = Unix
    CRC = 718CC7AD
    

    So of course this is causing issues with the parsing code. IMHO it's a bug with 7z, as we don't know how the archive member will be named, e.g. if it does not end in .tar.xz:

    > 7z l xzarchivetar
    
    7-Zip [64] 9.38 beta  Copyright (c) 1999-2014 Igor Pavlov  2015-01-03
    p7zip Version 9.38.1 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,8 CPUs,ASM)
    
    Listing archive: xzarchivetar
    
    --
    Path = xzarchivetar
    Type = xz
    Physical Size = 619800
    Method = LZMA2:23 CRC64
    Streams = 1
    Blocks = 1
    
       Date      Time    Attr         Size   Compressed  Name
    ------------------- ----- ------------ ------------  ------------------------
                        .....       962560       619800  xzarchivetar~
    ------------------- ----- ------------ ------------  ------------------------
                                    962560       619800  1 files
    
     
    • Wyatt Ward

      Wyatt Ward - 2016-01-26

      no; 7z archives are not able to contain streams like xz archives, so that makes perfect sense. 'xz' sees these things differently.

      quoting Igor Pavlov from here:
      https://sourceforge.net/p/sevenzip/discussion/45797/thread/920f3324/

      The main difference is stream feaure (unavailable in .7z):
      1) you can pack xz to stdout.
      2) you can unpack xz from stdin.
      Also .xz supports CRC-64 and SHA-256 checksums.
      .xz can be popular in Linux because of
      - it's better than gzip/bzip2 in compression ratio
      - it's simpler and smaller than .7z.
      - it provides more stream features. Linux developers and users like it.

       

      Last edit: Wyatt Ward 2016-01-26
  • Wyatt Ward

    Wyatt Ward - 2016-01-26

    Also, why is -slt necessary?
    It works fine without it.
    7z l Jitsu\ Wa\ Chapter\ 109 raw.tar.xz

    7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
    p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,2 CPUs)
    
    Listing archive: Jitsu Wa Chapter 109.tar.xz
    
    --
    Path = Jitsu Wa Chapter 109.tar.xz
    Type = xz
    Method = LZMA2:23 CRC64
    
       Date      Time    Attr         Size   Compressed  Name
    ------------------- ----- ------------ ------------  ------------------------
                        .....    262318080    162623152  Jitsu Wa Chapter 109.tar
    ------------------- ----- ------------ ------------  ------------------------
                                 262318080    162623152  1 files, 0 folders
    
     
  • Benoit Pierre

    Benoit Pierre - 2016-01-26

    I fail to see how this is relevant. Check the output without -slt: the archive member is correctly named. We need this name when we ask 7z to extract the entry.

     
    • Wyatt Ward

      Wyatt Ward - 2016-01-26

      I'm trying to suggest solutions, not find more problems.
      but in the case of a .tar.xz file, it's a safer assumption that if it has the xz header it's going to behave like a compressed data stream. If you need a filename, the tarball will be the only file inside.

      7z x -txz -si -so < 'Jitsu Wa Chapter 109 raw.tar.xz' > tarball.tar is a (bash/linux) command line that will extract the file to a tarball named tarball.tar that can then be unpacked or listed manually.

      on windows, I don't have a computer immediately available but I believe this will work:

      type "Jitsu Wa Chapter 109 raw.tar.xz" | 7z.exe x -txz -si -so > tarball.tar

       

      Last edit: Wyatt Ward 2016-01-26
      • Benoit Pierre

        Benoit Pierre - 2016-01-26

        That's the problem, the need to add a special case because 7z output is not consistent, which I consider again a bug.

         
        • Wyatt Ward

          Wyatt Ward - 2016-01-26

          I don't know why we're still discussing, honestly, since it seems we're on the same page of it being a bug.

           
          • Benoit Pierre

            Benoit Pierre - 2016-01-26

            In 7z or in MComix? ;P

            Anyway, I'm looking into it.

             
  • Benoit Pierre

    Benoit Pierre - 2016-01-26

    -slt is necessary for supporting some things, like detecting if the archive is solid, better encryption support, etc...

     
    • Wyatt Ward

      Wyatt Ward - 2016-01-26

      Why do you need to know if the archive is solid if all you're going to do is decompress it? And BTW, all .tar.* files are solid. tar is a solid block of data.

      Also, xz does NOT support encryption, so that's of no concern.

       
      • Benoit Pierre

        Benoit Pierre - 2016-01-26

        The code is not just there just to support tar archives... Yes, in this particular case it's solid, and not encrypted...

         
  • Benoit Pierre

    Benoit Pierre - 2016-01-26

    Fix here.

     
  • Ark

    Ark - 2016-01-27

    @Benoit Pierre: Thanks for the fix, it seems to work. Please push it to SVN.

     
  • Benoit Pierre

    Benoit Pierre - 2016-01-27

    Fix pushed to SVN.

     
  • Benoit Pierre

    Benoit Pierre - 2016-01-27
    • status: open --> closed-fixed
     

Log in to post a comment.