Menu

7z.exe cannot unpack some Arj archives

2018-06-25
2023-02-17
  • Dmitrii Evdokimov

    Today we have noticed our batch job was not able to unpack few Arj archive files unlike all previous days we used 18.05 x64 on Windows 10 x64. We have quickly checked one of those files with all major versions backward until the 9.20 x64 success.

    All Arj files created with ARJ 3.08a (ARJ32) Copyright (c) 1990-2000 ARJ Software, Inc. Oct 11 2000.

    v18.05 wrongly tells:

    7-Zip 18.05 (x64) : Copyright (c) 1999-2018 Igor Pavlov : 2018-04-30
    
    Scanning the drive for archives:
    1 file, 7500 bytes (8 KiB)
    
    Extracting archive: BN307021806010001.arj
    
    Can't open as archive: 1
    Files: 0
    Size:       0
    Compressed: 0
    

    and v9.20 successfully tells:

    7-Zip [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
    
    Processing archive: BN307021806010001.arj
    
    Extracting  SFC014030702_783520180601_319400001800001530_700.xml
    Extracting  SFC014030702_783520180601_319400001800001531_700.xml
    Extracting  SFC014030702_783520180601_319400001800001532_700.xml
    Extracting  SFC014030702_783520180601_319400001800001533_700.xml
    Extracting  SFC014030702_783520180601_319400001800001534_700.xml
    Extracting  SFC014030702_783520180601_319400001800001535_700.xml
    
    Everything is Ok
    
    Files: 6
    Size:       6378
    Compressed: 7500
    

    I attach a sample Arj file here. Internal Xml files are encrypted so do not try to see the XML code inside.

     

    Last edit: Dmitrii Evdokimov 2018-06-25
  • Dmitrii Evdokimov

     

    Last edit: Dmitrii Evdokimov 2018-06-25
  • Vladimir Surguchev

    1) You can use 7zFM File -> Open Inside #
    2) Remove first 65 bytes from .arj file
    3) Use farmanager arclite

     
    • Dmitrii Evdokimov

      Well, I like these tools too. But for BATCH jobs this list includes also:

      4) old ARJ
      5) old 7z 9.20

      Why did NEW 7z abandon to read the ASN.1 format files?! It is the future of the Russian signed data aka PKCS#7 accepted! And all tools in this list operate with that data inside. We liked 7z for its reading of everything...

       
  • Igor Pavlov

    Igor Pavlov - 2018-06-27

    By default 7-Zip doesn't like any data before archive.
    The reason:
    you can have 1 GB arj file that contains another 1 KB arj file without compression.
    And if start bytes of big 1 GB arj file are corrupted, then if 7-zip opens 1 KB arj file instead, but actually you wanted to open big 1 GB file. So it's better to show error in such case, then make wrong open operation for small 1 KB archive.

    But you can extract your archive so:

    7z x BN307021806010001.arj -tarj
    
     
    • Dmitrii Evdokimov

      Thanks for your good reason explanation.
      But we use the ability of 7z to unpack everything (all unknown receivings for our batch job) starting from Cab to Arj, Zip, etc (option -t unaplicable), w/out PKCS#7 around. Just as FAR does it. So we will revert to use 7z 9.20 as before.

       
    • Dmitrii Evdokimov

      It might be right for 7-Zip to recognize the strong ASN.1 file format (please see RFC for the Cryptographic Message Syntax, CMS) inside/around archives before their processing. It really is not "any data before archive" to just remove first 65 bytes or to seek for known archive signatures.
      And it will be for the glory of 7-Zip itself above others - to support the world wide enterprise standards!

       
  • Igor Pavlov

    Igor Pavlov - 2018-06-27

    1) if it's not pure ARJ, then probably you can use additional extension, like file.arj.asn1
    2) if it's popular format, then provide some link that describes signatures and headers, Also write about software that uses that format.

     
    • Dmitrii Evdokimov

      1) FAR opens everything. It is a really nice feature, without looking into extensions. Nowadays the Bank of Russia with some Federal Services or Russia create more and more file formats with wrong extensions. So its systems main data store keeps various files with an extension of time (.hhmmss) that file was received. 7z 9.20 worked nice with it.
      2) An introduction (in the Russian language) to ASN.1 might be https://habr.com/post/150757/. I use a viever at http://lapo.it/asn1js/ as reference to my development of http://dievdo.ru/PTK-PSD-Browser-hta/. Please load my attached file to this viewer to see the internal tree structure of this enveloping format. Whole data exchange with the Bank of Russia and the Federal Services moves to this PKCS#7 standardized format of signed (and optionally crypted) data enclosed. Please look at http://www.cbr.ru/collection/collection/file/4413/inf_mci_48(2015).pdf (in Russian) to see about implementing of this standard into everything. The Russian State Structures still require to use old Arj because it traditionally can make multivolume archives or proprietary WinRAR or conventional Zip for smallers (and Cab as transport envelopes for all them). We love 7z to eat them all in our batch jobs!

      If you cannot read Russian, I will provide international links. The Bank of Russia references to RFC3369 in its regulations. It is not the latest CMS standard (https://en.wikipedia.org/wiki/Cryptographic_Message_Syntax) but the Bank of Russia is now the main regulator for all banks and insurance companies in Russia.

       
  • Andrei Evgrafov

    Andrei Evgrafov - 2019-12-23
    We have the same issue. Is an alternative possible?
    
     
  • Dmitrii Evdokimov

    As before in 1918, we still use 7z 9.20 for ASN.1 encoded files used widely in the cryptographic and signature services. For example every TLS certificate around is ASN.1 encoded. All files around are becoming digitally signed. We need to have a fast preview of some archives enclosed deep recursively with some signed/encrypted of them without any PKCS #7 software installed.

    The old good 7z 9.20 might easily extract a solid data part from PKCS #7 signedData message, but now it fails to parse the same data splitted with the ITU-T standard X.690 Indefined form (advanced for streamed data) separators inside data as specified at https://en.wikipedia.org/wiki/X.690

    How I do it in my software. If a file starts with "0", it is a highly probable PKCS #7 strongly structured file. It might be much better to parse the entire tree ASN.1 structure, but I do just a simple seek for OID 1.2.840.113549.1.7.1 (hex sequence of 06 09 2A 86 48 86 F7 0D 01 07 01), then parse the length of a block and read the data (Definite form of X.690) or parse the lengths of chunks delimited 04 until 00 00 ends the stream (Indefinite form). I attach here both of these samples parsed with the excellent ASN.1 JavaScript decoder by Lapo Luchini http://lapo.it/asn1js/

    So, those source files start with "0" (0x30) and enclose the "signedData" PKCS #7 structure. The CAB archive inside starts with its signature "MSCF" (4D 53 43 46 ...) in the "data" PKCS #7 field. And this is the field that data need to be extracted as the destination content (as a clean CAB file with an enclosed ARJ file inside and so on deeper...)

    Sorry, I do not attach those source files (I attach just images of some parsed data) due to security reasons for a public forum.

    And at the last just few notes about Usage of this format from Wikipedia (link above):

    BER is a popular format for transmitting data, particularly in systems with different native data encodings.

    • The SNMP and LDAP protocols specify ASN.1 with BER as their required encoding scheme.
    • The EMV standard for credit and debit cards uses BER to encode data onto the card
    • The digital signature standard PKCS #7 also specifies ASN.1 with BER to encode encrypted messages and their digital signature or digital envelope.
    • Many telecommunication systems, such as ISDN, toll-free call routing, and most cellular phone services use ASN.1 with BER to some degree for transmitting control messages over the network.
    • GSM TAP (Transferred Account Procedures), NRTRDE (Near Real Time Roaming Data Exchange) files are encoded using BER.

    DER encoding is widely used to transfer digital certificates such as X.509.

     
  • Igor Pavlov

    Igor Pavlov - 2023-02-08

    I develop any new code, if I have test example files to test that new code. You must provide:
    1) test example files
    2) some proof that your problem is important for many users.
    Then I think how diffcult to implement new code. If it's not difficult, I can try it.

     
    • Dmitrii Evdokimov

      1) well, I attach those two files. Some later I ask you to remove them (or I will do it).
      2) these files are sent/received by every bank in the country, 100-300 files per a day each. I maintain a popular repo to quickly browse and control them - https://dievdo.ru/PTK-PSD-Browser-hta/ Yesterday I was asked by users from few various cities to quickly fix the sender's changes when it moved to the new type of encoding. The official client processes these changes, but it is so very unfriendly and requires PKCS #7 software installed that my browser replaces it at all. Many users exist but they are not developers to make me stars on GitHub :)

      My JavaScript code to extract PKCS #7 data is here. There is a JS hack to read byte values from UTF strings, but comments explain. Also I do not parse the ASN.1 tree, but I just seek the byte signature of the known OID and then I parse data from the found position.

      My code to run 7z (not so interested for this topic, but it lookups for different versions of 7z, c'est la vie) is here.

       
      • Igor Pavlov

        Igor Pavlov - 2023-02-08

        There are two ways to unpack that file:
        1) Parser mode:

        7z x a.cab -t#
        

        2) Rename file from cab extension to any other non-archive extension:

        move test_x690_bad.cab a.rrr
        7z x a.rrr
        
         
        • Dmitrii Evdokimov

          1) 7z x a.cab -t#

          7-Zip 22.01 (x64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15
          
          Scanning the drive for archives:
          1 file, 16007 bytes (16 KiB)
          
          Extracting archive: a.cab
          --
          Path = a.cab
          Type = #
          
          Everything is Ok
          
          Files: 3
          Size:       16007
          Compressed: 16007
          

          Result: 3 files (1, 2.12345_1.cab, 3) instead one AFN_MIFNS00_4030702_20230206_00005.arj

          2) 7z.exe x a.rrr

          7-Zip 22.01 (x64) : Copyright (c) 1999-2022 Igor Pavlov : 2022-07-15
          
          Scanning the drive for archives:
          1 file, 15266 bytes (15 KiB)
          
          Extracting archive: a.rrr
          
          WARNINGS:
          There are data after the end of archive
          
          --
          Path = a.rrr
          Type = Cab
          WARNINGS:
          There are data after the end of archive
          Offset = 57
          Physical Size = 14859
          Tail Size = 350
          Method = MSZip
          Blocks = 1
          Volumes = 1
          Volume Index = 0
          ID = 12345
          
          ERROR: Data Error : AFN_MIFNS00_4030702_20230208_00001.arj
          
          Sub items Errors: 1
          
          Archives with Errors: 1
          
          Warnings: 1
          
          Sub items Errors: 1
          

          Result: AFN_MIFNS00_4030702_20230208_00001.arj of 0 bytes

           
          • Igor Pavlov

            Igor Pavlov - 2023-02-10

            after x -t#you have 2.12345_1.cab that is required cab file.
            But then 7-Zip can't extract arj from this cab.
            So it's not original cab format, but it's some "modified cab" format.
            Is it encrypted?
            Who modified cab format?
            Why they did it?
            Who use it?
            Where this modification is described?

            Please write simpler without long messages.

             

            Last edit: Igor Pavlov 2023-02-10
            • Dmitrii Evdokimov

               

              Last edit: Igor Pavlov 2023-02-17
              • Igor Pavlov

                Igor Pavlov - 2023-02-17

                For each new feature (or format) there are many factors.
                It's like scoring system where each factor can add or remove points:
                - how difficult to study, develop, debug, and support it, and how many hours it requires.
                - how useful it will be, and how many users will use it.
                - what code size of that new code.
                For example, if the new code will increase the program size for 1%, but only 0.001% of users will use it, I don't want to support it.

                Now I don't understand the complexity of your problem, I don't understand required changes and I don't understand the level of usefulness.

                 

                Last edit: Igor Pavlov 2023-02-17
                • Dmitrii Evdokimov

                  Well, I just repeat the 2018's topic "7z.exe cannot unpack some Arj archives".
                  Well, now more fine tuned: 7z cannot unpack something enclosed into a PKCS #7 / CMS / ASN.1 / X.690 cryptographic industry standard container and it's author does not want to easily add the unpacking of .p7s files to his glory.

                  It is still not a problem for me to unpack that with few lines of pure JavaScript.

                   

Log in to post a comment.