Bad compression with solid archive when source files are bigger than 1GB?

Help
colnector
2014-04-12
2014-05-02
  • colnector

    colnector - 2014-04-12

    Here's the scenario: I have database dumps (.sql text) that are created daily. I compress many of them together using solid archive with 7z as it works wonders. As the database doesn't change completely everyday the files are similar enough and solid archive tremendously cuts the size.

    PROBLEM/BUG is that when the .sql files became bigger than 1GB the compression has become MUCH worse. A 7z archive that was supposed to compress to about 180MB has been compressed only to 2000MB. I have withnessed this behavior with two different database dumps and the problem started at a different time. The only common cause I could find was that 1GB filesize limit. Am I hitting a limit? Anyway around it?

    Using 7z on linux with switches -mx9 -ms=on

    Thanks

     
  • Igor Pavlov

    Igor Pavlov - 2014-04-12

    Try to increase the dictionary size:
    -md30
    or
    -md29

     
  • colnector

    colnector - 2014-04-12

    Thanks for answering. I previously had -md1000m and as a result it was taking over 12GB of RAM which eventually made the computer start using swap. I thought adding a 1GB dictionary shouldn't take much more than 1GB.

    This problem has happened with -md1000m and happens without it. What would -md30 stand for? 30mb?

     
  • Igor Pavlov

    Igor Pavlov - 2014-04-12

    -md30 means -md1024m

     
  • colnector

    colnector - 2014-04-12

    Thanks, but I doubt 1024m would make a huge different from 1000m as if this is the reason, the size increase will happen in slightly bigger files. Any other suggestions of what might be tweaked? If not, this is likely a weakness of the LZMA algorithm and should be reported.

     
  • colnector

    colnector - 2014-04-12

    Igor, I see in an old post ( https://sourceforge.net/p/sevenzip/discussion/45797/thread/8f6e4356/ ) you've written "7-zip with 512 MB dictinary can find matches in files only when distance between these matches is less than 512 MB." which clarifies the weakness. The files are very similar but as they become bigger than the dictionary size, the whole advantage of a solid archive is lost. Could any fix for such a situation be made? For example, comparing different files and putting "similar" chunks next to each other to that they are compressed much better? Another option I can think of is splitting the files manually and having a 7z archive for all first parts, another for all second parts and so on... Obviously not comfortable but the compression different is huge here.

     
  • colnector

    colnector - 2014-04-23

    Bump.,.,
    Any idea on how to overcome this dictionary size limitation?

     
  • colnector

    colnector - 2014-05-02

    Nothing?

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks