Recovering 7zip files is such a pain...

Help
Anonymous
2014-04-19
2014-04-26
  • Anonymous - 2014-04-19

    Hi Igor,

    Is there any better tool than PhotoRec which is able to create/recover proper 7zip files?
    PhotoRec fails when deleted 7zip files are fragmented.

    I have 190MB block device with ext4 filesystem on it.
    Logical block size is 1024 bytes.

    I was able to scan device and find all start blocks of archives using "377a bcaf 271c" header. Four files in total.
    Thanks to information in "StartHeader" I was able to pinpoint last block of every archive. I've read block device again and checked for NextHeaderCRC==crc32((NextHeaderOffset+32) mod 1024 , NextHeaderSize).
    Also last block helped me with archive exclusion. Three of archives had one file only so 7zip decided to store file name in last block.

    One archive left. Here is header.

    0000000: 377a bcaf 271c 0003 25ca 042b 256b 1700 7z..'...%..+%k..
    0000010: 0000 0000 2400 0000 0000 0000 2e50 0292 ....$........P..
    0000020: 0011 8840 22f8 2716 54e8 d6fa f761 ac47 ...@".'.T....a.G
    0000030: a049 cb0f f4f2 bb05 d7c8 3dee 2bed 7095 .I........=.+.p.
    0000040: a33a dc5a b150 b612 d477 73be 350e d24c .:.Z.P...ws.5..L
    0000050: fe7a dac0 ce8a 4a57 1524 004c c64b bfc3 .z....JW.$.L.K..
    0000060: b119 d68f 452f b898 6672 aaf9 3094 8763 ....E/..fr..0..c
    0000070: 01ce a2f4 149c 8cc8 00b2 8ede 09f5 5976 ..............Yv
    0000080: 5e5b 57de 8273 340d 499a 2869 3608 38c8 ^[W..s4.I.(i6.8.
    0000090: d404 9fa0 e7d8 2506 94b8 8269 68ae 86c4 ......%....ih...
    00000a0: 05d2 8ffe 4f14 1475 135e 6e9d 9590 378e ....O..u.^n...7.
    00000b0: 38cc f629 b44f cd80 cc71 331b 8c07 066c 8..).O...q3....l
    00000c0: 2b1c b5e8 c83b eb96 41bc bb9e f7b7 e89f +....;..A.......
    00000d0: 46f5 98b0 c05e 2220 b6d1 c91e e90b 4d32 F....^" ......M2
    00000e0: fa37 9cfb ec50 2626 6bf2 c2bd 9f41 fb0c .7...P&&k....A..
    00000f0: e4d9 15d1 dd47 89f8 6601 870a 5569 c092 .....G..f...Ui..

    Some decoded data:
    NextHeaderOffset is 0x176b25 - 1,534,757 - I had to add 32 to it.
    NextHeaderSize is 0x24 - 36
    NextHeaderCRC is 0x9202502E

    And here is NextHeader content:
    1706d799650109858c00070b01000123030101055d002000000c9a5c0a0179f2e3460000
    If I calculate crc32 checksum of it I'll get proper 0x9202502E value.

    Please decode it for me.
    I'm looking for any checksums which allows me to create proper data stream.

    I was able to create 7zip archive which contains:
    - few blocks starting at SignatureHeader block
    - zero filled gap
    - few blocks which ended at proper archive block end (NextHeader)

    Now 7zip is able to print archive contents:

    $ 7za l t2.7z

    7-Zip (A) [64] 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18
    p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,2 CPUs)

    Listing archive: t2.7z

    --
    Path = t2.7z
    Type = 7z
    Method = LZMA
    Solid = +
    Blocks = 1
    Physical Size = 1534825
    Headers Size = 1488

    Date Time Attr Size Compressed Name


    2014-04-06 02:33:57 ....A 3593 1533337 stardust/rip
    2014-03-20 21:45:49 ....A 118344 stardust/in/sd1.001
    2014-03-20 22:02:51 ....A 58740 stardust/in/sd2.001
    2014-03-20 21:49:27 ....A 36790 stardust/in/sd3.001
    2014-03-20 21:45:51 ....A 126138 stardust/in/sd1.002
    2014-03-20 22:02:51 ....A 61440 stardust/in/sd2.002
    2014-03-20 21:49:28 ....A 11192 stardust/in/sd3.002
    2014-03-20 21:45:52 ....A 479136 stardust/in/sd1.003
    2014-03-20 22:02:52 ....A 17542 stardust/in/sd2.003
    2014-03-20 21:49:29 ....A 2142 stardust/in/sd3.003
    2014-03-20 21:45:54 ....A 520868 stardust/in/sd1.004
    2014-03-20 22:02:53 ....A 1536 stardust/in/sd2.004
    2014-03-20 21:49:29 ....A 24024 stardust/in/sd3.004
    2014-03-20 21:45:56 ....A 58740 stardust/in/sd1.005
    2014-03-20 22:02:54 ....A 1536 stardust/in/sd2.005
    2014-03-20 21:49:30 ....A 15660 stardust/in/sd3.005
    2014-03-20 21:45:57 ....A 58740 stardust/in/sd1.006
    2014-03-20 22:02:55 ....A 1536 stardust/in/sd2.006
    2014-03-20 21:49:31 ....A 9462 stardust/in/sd3.006
    2014-03-20 21:45:58 ....A 58740 stardust/in/sd1.007
    2014-03-20 22:02:55 ....A 1536 stardust/in/sd2.007
    2014-03-20 21:49:32 ....A 4668 stardust/in/sd3.007
    2014-03-20 21:45:59 ....A 58740 stardust/in/sd1.008
    2014-03-20 22:02:56 ....A 1536 stardust/in/sd2.008
    2014-03-20 21:49:32 ....A 1488 stardust/in/sd3.008
    2014-03-20 21:46:00 ....A 58740 stardust/in/sd1.009
    2014-03-20 22:02:57 ....A 1536 stardust/in/sd2.009
    2014-03-20 21:49:33 ....A 9564 stardust/in/sd3.009
    2014-03-20 22:02:57 ....A 1536 stardust/in/sd2.010
    2014-03-20 21:49:34 ....A 43986 stardust/in/sd3.010
    2014-03-20 22:02:58 ....A 1536 stardust/in/sd2.011
    2014-03-20 21:49:35 ....A 8172 stardust/in/sd3.011
    2014-03-20 22:02:59 ....A 512 stardust/in/sd2.012
    2014-03-20 21:49:36 ....A 13686 stardust/in/sd3.012
    2014-03-20 22:03:00 ....A 1176 stardust/in/sd2.013
    2014-03-20 21:49:36 ....A 24066 stardust/in/sd3.013
    2014-03-20 22:03:00 ....A 392 stardust/in/sd2.014
    2014-03-20 21:49:37 ....A 28842 stardust/in/sd3.014
    2014-03-20 22:03:01 ....A 2208 stardust/in/sd2.015
    2014-03-20 21:49:38 ....A 31620 stardust/in/sd3.015
    2014-03-20 22:03:02 ....A 1260 stardust/in/sd2.016
    2014-03-20 21:49:39 ....A 8130 stardust/in/sd3.016
    2014-03-20 22:03:02 ....A 420 stardust/in/sd2.017
    2014-03-20 21:49:39 ....A 23826 stardust/in/sd3.017
    2014-03-20 22:03:03 ....A 3108 stardust/in/sd2.018
    2014-03-20 21:49:40 ....A 18732 stardust/in/sd3.018
    2014-03-20 22:03:04 ....A 1036 stardust/in/sd2.019
    2014-03-20 21:49:41 ....A 45000 stardust/in/sd3.019
    2014-03-20 22:03:04 ....A 1704 stardust/in/sd2.020
    2014-03-20 21:49:42 ....A 45000 stardust/in/sd3.020
    2014-03-20 22:03:05 ....A 568 stardust/in/sd2.021
    2014-03-20 21:49:43 ....A 69096 stardust/in/sd3.021
    2014-03-20 22:03:06 ....A 1416 stardust/in/sd2.022
    2014-03-20 21:49:44 ....A 254800 stardust/in/sd3.022
    2014-03-20 22:03:07 ....A 472 stardust/in/sd2.023
    2014-03-20 21:49:45 ....A 149296 stardust/in/sd3.023
    2014-03-20 22:03:07 ....A 1098 stardust/in/sd2.024
    2014-03-20 21:49:47 ....A 134056 stardust/in/sd3.024
    2014-03-20 22:03:08 ....A 2616 stardust/in/sd2.025
    2014-03-20 21:49:48 ....A 142920 stardust/in/sd3.025
    2014-03-20 22:03:09 ....A 2892 stardust/in/sd2.026
    2014-03-20 21:49:49 ....A 194232 stardust/in/sd3.026
    2014-03-20 22:03:09 ....A 4608 stardust/in/sd2.027
    2014-03-20 22:03:10 ....A 20298 stardust/in/sd2.028
    2014-03-20 22:03:11 ....A 10308 stardust/in/sd2.029
    2014-03-20 22:03:12 ....A 11880 stardust/in/sd2.030
    2014-03-20 22:03:12 ....A 6792 stardust/in/sd2.031
    2014-03-20 22:03:13 ....A 5940 stardust/in/sd2.032
    2014-03-20 22:03:14 ....A 41634 stardust/in/sd2.033
    2014-03-20 22:03:15 ....A 7068 stardust/in/sd2.034
    2014-03-20 22:03:15 ....A 2334 stardust/in/sd2.035
    2014-03-20 22:03:16 ....A 39800 stardust/in/sd2.036
    2014-03-20 22:03:17 ....A 7960 stardust/in/sd2.037
    2014-03-20 22:03:18 ....A 2440 stardust/in/sd2.038
    2014-03-20 22:03:18 ....A 488 stardust/in/sd2.039
    2014-03-20 22:03:19 ....A 312 stardust/in/sd2.040
    2014-03-20 22:03:20 ....A 104 stardust/in/sd2.041
    2014-03-20 22:03:21 ....A 300 stardust/in/sd2.042
    2014-03-20 22:03:21 ....A 100 stardust/in/sd2.043
    2014-03-20 22:03:22 ....A 300 stardust/in/sd2.044
    2014-03-20 22:03:23 ....A 100 stardust/in/sd2.045
    2014-03-20 22:03:23 ....A 312 stardust/in/sd2.046
    2014-03-20 22:03:24 ....A 104 stardust/in/sd2.047
    2014-03-20 22:03:25 ....A 852 stardust/in/sd2.048
    2014-03-20 22:03:26 ....A 708 stardust/in/sd2.049
    2014-03-20 22:03:26 ....A 480 stardust/in/sd2.050
    2014-03-20 22:03:27 ....A 4200 stardust/in/sd2.051
    2014-03-20 22:03:28 ....A 840 stardust/in/sd2.052
    2014-03-20 22:03:30 ....A 11358 stardust/in/sd2.054
    2014-03-20 22:03:30 ....A 20512 stardust/in/sd2.055
    2014-03-20 22:03:31 ....A 3072 stardust/in/sd2.056
    2014-03-20 22:03:32 ....A 1944 stardust/in/sd2.057
    2014-03-20 22:03:33 ....A 3570 stardust/in/sd2.058
    2014-03-20 22:03:34 ....A 3570 stardust/in/sd2.059
    2014-03-20 22:03:34 ....A 3072 stardust/in/sd2.060
    2014-03-20 22:03:35 ....A 3072 stardust/in/sd2.061
    2014-03-20 22:03:36 ....A 18480 stardust/in/sd2.062
    2014-03-20 22:03:37 ....A 15072 stardust/in/sd2.063
    2014-03-20 22:03:37 ....A 51648 stardust/in/sd2.064
    2014-03-20 22:03:38 ....A 23376 stardust/in/sd2.065
    2014-03-20 22:03:39 ....A 12288 stardust/in/sd2.066
    2014-03-20 22:03:40 ....A 11064 stardust/in/sd2.067
    2014-03-20 22:03:40 ....A 9536 stardust/in/sd2.068
    2014-03-20 22:03:41 ....A 12684 stardust/in/sd2.069
    2014-03-20 22:03:42 ....A 45624 stardust/in/sd2.070
    2014-03-20 22:03:43 ....A 12096 stardust/in/sd2.071
    2014-03-20 22:03:44 ....A 6210 stardust/in/sd2.072
    2014-03-20 22:03:44 ....A 12156 stardust/in/sd2.073
    2014-03-20 22:03:45 ....A 9660 stardust/in/sd2.074
    2014-03-20 22:03:46 ....A 11484 stardust/in/sd2.075
    2014-03-20 22:03:46 ....A 2868 stardust/in/sd2.076
    2014-03-20 22:03:48 ....A 11712 stardust/in/sd2.077
    2014-03-20 22:03:49 ....A 9504 stardust/in/sd2.078
    2014-04-06 02:36:41 D.... 0 0 stardust/out
    2014-04-03 20:56:32 D.... 0 0 stardust/in
    2014-04-06 02:36:34 D.... 0 0 stardust


                               3559011      1533337  113 files, 3 folders
    

    All I need is "stardust/rip" file, 3593 bytes long.

    If I run "7za t t2.7z" I'll get "Data Error" for every file.

    Do I need whole LZMA block to recover this first file?
    If no. What archive part should I parse and how to get this file?

    So far I have first and last block of archive which I'm 100% sure about it.
    Is there some clever way of guessing block order from pool of device blocks? Or at least check if block candidate may be added next.

    Yes. I've tried to read "7zFormat.txt" and "CPP/7zip/Archive/7z/7zIn.cpp" files. But looks like I'm unable to guess all things which might be obvious to You.

    Thanks for help,
    Mike

     
  • Igor Pavlov

    Igor Pavlov - 2014-04-20

    7-Zip writes 7-zip archives so:
    1) it writes 32 bytes start header
    2) it writes data
    3) it writes end header
    4) it rewrites start header.

    Maybe when it rewrites start header at step 4, your system creates fragmentation.
    Try to create new 7z archive in new ext4 system and look how it is fragmented and exact positions of all fragments.

    And another thing.
    Load lzma decoder in debugger and look exact offsets in data stream when lzma thinks that there is error. Maybe it will be about 1024, if first block is fragmented.

     
  • Igor Pavlov

    Igor Pavlov - 2014-04-20

    Note also that 7-Zip rewrites only first 32 bytes.
    So another 512-32 bytes (or 1024-32 bytes) of first sector are identical to second version of that first sector.

    You can try to search first version of start of archive in disk. It also must contain signature 377abcaf271c and zeros in fields that refer to next header.

    If you can find first version of start of archive, try to use these sectors (maybe there is no fragmentation there).

     
  • Anonymous - 2014-04-20

    Igor,

    Thanks for fast replay.

    More background information.

    Every month or two I've made backup of this media (on sector level). It is only 190MB so I can store like 20 of them.
    Recent backup had file system filled up to 65% but without my new files.
    I've this media in current state with additional files/data but everything is deleted.

    First thing I've done was additional/third image which contains only differences from above two images.
    Algorithm is simple: if (data block from backup == data block from now) write zeros, else write data block from deleted media.
    Now I have image with only relevant data inside.

    PhotoRec was helpful for non fragmented files. I was able to recover a lot.
    So I've decided to create "additional/third/diff image" again but this time all recovered files were excluded.
    I've scanned directory with recovered files and for every file I've split it into 1024 chunks and made md5 checksum of it (stored in array). Last pieces of files were filled with 0x00 to block boundaries (1024 bytes).
    During "diff_image" creation I'm calculating md5 of every data block from "deleted media". If it matches md5 in my table of recovered blocks then zeros are written.

    Next task was to write my cheap PhotoRec. I've made tool which creates files for all non zeroed blocks of data. Now I have files like:
    "13322" size 24 1KB blocks
    "15725" size 57 1KB blocks.
    File name is start block number.

    I've also scanned deleted data (on block level) for:
    - 7z start header, and then another run/scan for
    - end header with crc32 validation
    - it also printed some useful information like end header offset which is approximate archive size

    There were only 4 7zip files in total. So no ghost headers.

    Having above data I was able to guess which chunks of data should be merged.
    All 3 archives mentioned in previous post (contained only one file) were easy to recover. Just simple "cat file1 file2 file3 > test_it.7z" and win.

    Where I'm now?

    There is like 20MB data left to analyse. I have to find 1,5MB 7z file in this mess. Fragmentation is heavy because this file was added last when there was not much space left.
    I have starting and ending block of 7z file 100% confirmed.
    All left data looks random - there is not much more to exclude.

    I know that there are 3 more EXE files to throw away. I have their md5_sums and versions so I'm waiting on another forum for application developer to dig older version of their software.
    If it succeed I guess another 10MB of data will be thrown away.

    Loading LZMA decoder into debugger is overkill. Lucky we have sources of p7zip.
    All I need and kindly asking You to help is one printf which will tell that
    "lzma decoder read chunk of XXX bytes at file offset XXX"

    I think this one code line will help me a lot.

    Thanks,
    Mike

     
  • Anonymous - 2014-04-20

    According to strace 7zip reads:

    32 bytes at 0 - file header
    36 bytes at 1534789 - header at end
    1420 bytes at 1533369 - archive list content
    1048576 bytes at 32 - first LZMA block

    Looks like I have to glue together 1MB of data to check if it is working fine.
    It's like 68% of whole file.

    Igor, maybe there is a way to do partial LZMA block check?

     
  • Igor Pavlov

    Igor Pavlov - 2014-04-21

    CPP\7zip\Compress\LzmaDecoder.cpp

    HRESULT CDecoder::CodeSpec(ISequentialInStream *inStream, ISequentialOutStream *outStream, ICompressProgressInfo *progress)
    
      next = (_state.dicBufSize - _state.dicPos < _outBufSize) ? _state.dicBufSize : (_state.dicPos + _outBufSize);
    
      if (res != 0)
        return S_FALSE;
    

    Probably you will have error at that line.
    And you can printf _inSizeProcessed variable.

     
  • Anonymous - 2014-04-23

    Here is patch:

    --- p7zip_9.20.1/CPP/7zip/Compress/LzmaDecoder.cpp      2014-04-23 16:52:02.004098335 +0200
    +++ p7zip_9.20.1-modify/CPP/7zip/Compress/LzmaDecoder.cpp       2014-04-23 16:52:35.627098113 +0200
    @@ -140,7 +140,10 @@
           next = (_state.dicBufSize - _state.dicPos < _outBufSize) ? _state.dicBufSize : (_state.dicPos + _outBufSize);
    
           if (res != 0)
    +      {
    +        printf("LZMA decode error at %d %d %d %d %d %d %d\n",_inSizeProcessed,dicPos,curSize,_inBuf,_inPos,startInProgress,_inSize);
             return S_FALSE;
    +      }
           RINOK(res2);
           if (stopDecoding)
             return S_OK;
    

    And output:

    LZMA decode error at 5 0 3559011 16304256 5 0 1048576

    How should I interpret this "5" value?

     
  • Igor Pavlov

    Igor Pavlov - 2014-04-24

    LZMA decoder reads 5 bytes at start of decoding.
    I don't know why you have error for that position.
    first bytes of lzma stream looks OK for me:
    0000020: 0011 8840 22f8 2716
    Do you call
    7za t t2.7z
    ?

    Does printf with "%d" work for 64-bit integers?

     
  • jno

    jno - 2014-04-24

    %d may not work for longs.
    %ld should do.

     
  • Anonymous - 2014-04-24

    Yes, I test archive by command "7za t t2.7z".

    For %d, %ld, %lld, %u, %lu, %llu I'm getting same result which is 5.

    $ gcc -v
    gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC)

     
  • Igor Pavlov

    Igor Pavlov - 2014-04-25

    Maybe you must try some debugger.

     
  • Anonymous - 2014-04-26

    I've created test archive around the same size (1,5MB).
    If I modify some bytes around end of first 1MB LZMA block I'm still getting "5".
    So I'm guessing that printf might be positioned in wrong code location/path.

     
    Last edit: Anonymous 2014-04-26
  • Igor Pavlov

    Igor Pavlov - 2014-04-26

    Yes, lzma decoder doesn't update processed size in case of error.
    But you can try to reduce decoding step size:
    _inBufSize(1 << 20),
    _outBufSize(1 << 22),
    Just reduce any of these variables (or both) to small value.

     
  • Anonymous - 2014-04-26

    _inBufSize(1 << 8),
    _outBufSize(1 << 12),

    Instant WIN.
    Thank You very much.

    I think 7zip needs better recovery support.
    Simple crc32 hashes calculated for 16KB block of final archive would be nice.
    Just like in torrent files.

    AFAIK more user friendly option like support for https://en.wikipedia.org/wiki/Forward_error_correction is not available in 7zip :(

     
  • Anonymous - 2014-04-26

    doublepost

     
    Last edit: Anonymous 2014-04-26

Log in to post a comment.