Compressing uncompressed files in compressed archives

adxx
2014-03-04
2014-04-13
  • adxx

    adxx - 2014-03-04

    I had a quick look but couldn't find any way to compress compressed files efficiently.

    My situation is a directory of pdfs and 7z archives. They are a string of updates sent to a client, therefore I want to retain the original files and date metadata without loss. 7zipping that goes from 21 to 17 MB, not worth it.

    I could extract the contents of the 7z files, somehow expand the /FlateDecode section in the pdfs, and I guess (have done similar tests) that a solid archive of this would be vastly smaller than of the compressed files.

    So what I want here is a compressor which can decompress every file type back to its rawest form (includes pcx, png, gif, jpeg, zip, arc, docx etc) for recompression with a superior compressor and greatly improved solid archives, without being lossy or very time consuming. Of course it also means extraction will require compression. I know it can be done for jpeg and a few others, but there is nothing recursive or generic about it, thus no early precedent for including such support into a future format/method. The fact that 7zip already can extract so many different archive formats makes this a no-brainer as a candidate for this technique. I assume this has been suggested plenty of times before (or maybe just once), in any case, another vote for that.

     
    Last edit: adxx 2014-03-04
    • Shell

      Shell - 2014-03-09

      It is a great (yet not new) idea. However, it has one major drawback: many compressed streams cannot be reconstructed to be byte-to-byte identical. At least, no LZ-based method stores the fb parameter in the stream. For example, /FlateEncode stores only two parameters, dictionary size and /Effort, so the recompressor cannot deduce the others (except for trying zlib defaults).

       
      • adxx

        adxx - 2014-04-13

        Ah yes, I realised that some time after posting (could be after reading your response - can't remember now!). I was hoping the popular compressors could be tried, maybe learning what is most likely to be used on the user's system, or even using these compressors which presumably they still have around (which implies they need to keep them around to extract...). Messy, but I don't know if it even could work. I have done some experiments a while ago to check various compressors for determinism (if that's the right word) for related reasons. I can't remember the results, and it certainly wasn't enough to tell if this particular idea might work. Oh well.

         
  • therube

    therube - 2014-03-06

    ARJ has a REARJ component.

    "REARJ is an archive conversion program designed to facilitate the
    conversion of LZH, ZIP, PAK, ARC, DWC, HYP, LZS, and ZOO archives to
    the ARJ format."

    Though its more generic then that, so could be a start, or at least could give you some ideas. (Also seem to recall some general [Windows] GUI based "recompressors" of various sorts are out there...?)

    REARJ can be found within the ARJ archiver.

    http://www.arjsoftware.com/arj32.htm

     

Log in to post a comment.