Here's the scenario: I have database dumps (.sql text) that are created daily. I compress many of them together using solid archive with 7z as it works wonders. As the database doesn't change completely everyday the files are similar enough and solid archive tremendously cuts the size.
PROBLEM/BUG is that when the .sql files became bigger than 1GB the compression has become MUCH worse. A 7z archive that was supposed to compress to about 180MB has been compressed only to 2000MB. I have withnessed this behavior with two different database dumps and the problem started at a different time. The only common cause I could find was that 1GB filesize limit. Am I hitting a limit? Anyway around it?
Using 7z on linux with switches -mx9 -ms=on
Try to increase the dictionary size:
Thanks for answering. I previously had -md1000m and as a result it was taking over 12GB of RAM which eventually made the computer start using swap. I thought adding a 1GB dictionary shouldn't take much more than 1GB.
This problem has happened with -md1000m and happens without it. What would -md30 stand for? 30mb?
-md30 means -md1024m
Thanks, but I doubt 1024m would make a huge different from 1000m as if this is the reason, the size increase will happen in slightly bigger files. Any other suggestions of what might be tweaked? If not, this is likely a weakness of the LZMA algorithm and should be reported.
Igor, I see in an old post ( https://sourceforge.net/p/sevenzip/discussion/45797/thread/8f6e4356/ ) you've written "7-zip with 512 MB dictinary can find matches in files only when distance between these matches is less than 512 MB." which clarifies the weakness. The files are very similar but as they become bigger than the dictionary size, the whole advantage of a solid archive is lost. Could any fix for such a situation be made? For example, comparing different files and putting "similar" chunks next to each other to that they are compressed much better? Another option I can think of is splitting the files manually and having a 7z archive for all first parts, another for all second parts and so on... Obviously not comfortable but the compression different is huge here.
Any idea on how to overcome this dictionary size limitation?
Log in to post a comment.