7-Zip / Discussion / Open Discussion: 7-zip compression regression in 15.xx vs 9.38

Hello Igor!

First of all, I'm a big fan of 7zip. It's unbeatable at what it does; none of the commercial archivers come even close.

Recently, I upgraded from 9.38 to the 15.xx branch of 7-zip, because why not. But I found out that there is a compression regression. I don't know to what extent my scenario is an extreme/specific one, but this issue renders 15.xx far worse than 9.38 for me.

Scenario:

Imagine a backup of your data - not incremental, but rather say a monthly backup of your folder. Various type of data - pictures (jpeg, png, etc), exe files, documents, videos, even some archives. You change the contents of this folder from time to time but a lot of these files remain as they are - not changing, or maybe just moving from folder A to folder B but still being within the "root" folder of this backup.

Imagine your 'base' folder is ~50gb - so if you have 10 months of backup, you will have 500gb of backup.

In 9.38, when I compress (LZMA2, dictionary 1024M, word size 256, 3 threads, solid block), the compressor works like this:

(assume file A is always the same file):

2015_01_01/folder/a.jpg
2015_02_01/folder/a.jpg
2015_03_01/folder/a.jpg
2015_01_01/folder/b.jpg
2015_02_01/folder/b.jpg
2015_03_01/folder/b.jpg

..... you get the idea. It compresses the first file and then simply goes "yes this is the same file here" so the compression is essentially smaller than one file for all X counts of the same file.

**The 15.xx version compressor goes like this:
**
2015_01_01/folder/a.jpg
2015_01_01/folder/b.jpg
2015_01_01/folder/c.jpg
2015_01_01/folder/d.jpg
2015_02_01/folder/a.jpg
2015_02_01/folder/b.jpg
2015_02_01/folder/c.jpg
2015_02_01/folder/d.jpg

Which I assume is okay if the dictionary can cover for the entire size of the archive. But imagine that the folders are all 50GB big - I 'think' that by the time the compressor gets to the new folder, it already no longer has the required 'word' in the dictionary to pack file a by simply referencing it as a copy of a file it had already compressed previously.

Now, I don't know how this algorithm works. If it compares hashes or if it just immediately looks for the same filenames and sorts files by filenames.

It feels like the 15.xx compression suffers because the compressor is trying to prioritize file order (in the archive) over compression.

For illustration. 9.38 was able to pack my 800GB archive into ~94GB. 15.xx was not even 40% done but the compression of the same files was already at 150GB.

Last edit: Little Vulpix 2015-11-18

7-zip compression regression in 15.xx vs 9.38

A free file archiver for extremely high compression

Forums

Help

7-zip compression regression in 15.xx vs 9.38

7-zip compression regression in 15.xx vs 9.38

A free file archiver for extremely high compression

Forums

Help

7-zip compression regression in 15.xx vs 9.38 document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

7-zip compression regression in 15.xx vs 9.38