7-Zip / Discussion / Open Discussion: 7-zip deflate (15.08 vs 4.32)

lorents - 2015-10-04

Tested old and new versions of 7-zip. Noticed an interesting feature.

7za a -mx=9 -mfb=257 -mpass=5

1842.png
7-zip 15.08 - 1842.png.gz - 328 KB (336 784 byte)
7-zip 4.32 - 1842.png.gz - 336 KB (344 826 byte)

Ratio 2,33%

2248.png
7-zip 15.08 - 2248.png.gz - 415 KB (425 781 byte)
7-zip 4.32 - 2248.png.gz - 403 KB (413 352 byte)

Ratio 2,91%

How to make 15.08 squeezed not worse 4.32?

Last edit: lorents 2015-10-04

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Pavlov - 2015-10-04

Probably there is some point that triggers encoder to non-optimal way.
Now I'm not ready to discover it more. Maybe later.

I suppose you tested more png examples.
How often new version provides worse compression ratio?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

lorents - 2015-10-04

I made two tests 3800 and 8600 of images PNG.

Test 1 - 3800 PNG
7-zip 4.32 - 1 186 798 089 byte
7-zip 15.08 - 1 186 056 550 byte
MIN (7-zip 4.32 + 7-zip 15.08) - 1 184 771 210 byte

Test 2 - 8600 PNG
7-zip 4.32 - 605 934 986 byte
7-zip 15.08 - 607 720 425 byte
MIN (7-zip 4.32 + 7-zip 15.08) - 605 332 081 byte

I am the developer of the small project on optimization and compression of images - iCatalyst. The algorithm 7-zip is necessary to deflate for compression of images PNG.

I understand, for you the algorithm of compression of deflate isn't the priority direction, but I very much hope that you will help me with the matter.

Last edit: lorents 2015-10-04

test1.csv

test2.csv

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Pavlov - 2015-10-05

Now It can be difficult to detect exact reason of problem.
Probably I'll look it later.

I suppose that there was some tradeoff when I changed that algorithm, and new code worked better for some test files.
Maybe png files have some non-typical patterns.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

lorents - 2015-10-05

Thank you! I will wait!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shell - 2015-10-05

I have read lorents's post on Ru-Board forum about the optimal use of the -mmc switch. It is not listed as a parameter for Deflate, but it does influence the compression ratio. In my tests, the size decreased monotonically to eventual saturation with increasing -mmc. I suggest to mention this parameter in 7-Zip's manual in the future.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

lorents - 2015-10-22

I made a new test using the application x128.ho.ua/ec-idat.zip
Optimization parameters -mx=9 -mfb=257 -mpass=5

Test 1:
4.32 - 1 185 943 426 bytes
4.43 - 1 184 940 381 bytes
4.56 - 1 185 699 315 bytes
4.57 - 1 185 576 422 bytes
9.20 - 1,186,613,793 bytes
15.09 - 1 185 934 035 bytes
advdef - 1 185 952 066 bytes

min (4.32 + 4.43) = 1 183 808 935 bytes

Test 2:
4.32 - 605 473 569 bytes
4.43 - 607 034 271 bytes
4.56 - 608 231 878 bytes
4.57 - 608 052 839 bytes
9.20 - 608 931 061 bytes
15.09 - 607 743 946 bytes
advdef - 605 305 934 bytes

min (4.32 + 4.43) = 604 905 857 bytes

Last edit: lorents 2015-10-22

test1.csv

test2.csv

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Shell - 2015-10-22
  
  Will the winner remain the same if you take the maximal values: -mfb258 -mpass15? Also I've noticed that -mpass14 is sometimes better than 15.
  
  Concerning the latter fact, I have a question to Igor: what is a "pass"? Couldn't 7-Zip remember the best result from previous passes to prevent compression ratio from degrading?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Igor Pavlov - 2015-10-23
    
    It's combined parameter.
    
    ~~~~~
    static const unsigned kNumDivPassesMax = 10; // [0, 16); ratio/speed/ram tradeoff; use big value for better compression ratio.
    
    m_NumDivPasses = props.numPasses;
    if (m_NumDivPasses == 0)
    m_NumDivPasses = 1;
    if (m_NumDivPasses == 1)
    m_NumPasses = 1;
    else if (m_NumDivPasses <= kNumDivPassesMax)
    m_NumPasses = 2;
    else
    {
    m_NumPasses = 2 + (m_NumDivPasses - kNumDivPassesMax);
    m_NumDivPasses = kNumDivPassesMax;
    }
    ~~~~
    
    m_NumPasses - the number of passes for optimal huffman parsing.
    
    m_NumDivPasses - how many times we try to split block into 2 small blocks. Sometimes small blocks are better than one big block.
    
    example numPasses = 15;
    m_NumPasses = 2 + (15 - 10) = 7;
    m_NumDivPasses = 10;
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Shell - 2015-10-23
      
      Block division is a great idea. I suspect that other Deflate optimization programs rely on fixed block count (though this count is usually user-controlled).
      
      If I understand the source code right, there is a possibility to select the match finder for Deflate (namely between bt3 and hc3). Please, consider documenting this feature along with -mmc.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Pavlov - 2015-10-23

Couldn't 7-Zip remember the best result from previous passes to prevent compression ratio from degrading?

Maybe 7-Zip doesn't remember it.
I don't remember details of code now.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

lorents - 2015-10-23

Igor Pavlov
Is it possible to update the 7-zip deflate to avoid this variation in compression.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

lorents - 2015-10-24

Did a test on compression speed

test1:
4.32 - 0,932 s
4.43 - 1,057 s
4.56 - 0,772 s
4.57 - 0,775 s
9.20 - 0,738 s
15.09 - 1,472 s

test2:
4.32 - 2,650 s
4.43 - 2,610 s
4.56 - 2,107 s
4.57 - 2,069 s
9.20 - 1,984 s
15.09 - 3,240 s

Last edit: lorents 2015-10-24

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Igor Pavlov - 2015-10-25
  
  What files (the number of files, size)
  What cpu?
  What exact switches?
  
  Try some big files also.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

lorents - 2015-10-25

compression settings:

7za a -tgzip -mx=9 -mfb=257 -mpass=5

Tested different files. in almost all cases, the results are the same. Version 15.09 is the slowest version 9.20 is the fastest.

Last edit: lorents 2015-10-25

Снимок.PNG
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Pavlov - 2015-10-25

There is bug in some latest versions of 7-Zip for bzip2 and gzip archives.
If you specify parameters (-m switch) in command line, 7-Zip parses parameters:
x
mt
any_other_parameter

and 7-Zip 15.09 ignores any parameter after first "any_other_parameter".

In your case, it probably used default pass=10 (for x=9).

So wait next fixed version of 7-Zip.
Or use zip format insead of gz format. I suppose there is no that bug in ZIP code.

THANKS for report!!!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

lorents - 2015-11-01

THANKS!!!
The new version works perfectly.

Can I hope that update the Deflate compression algorithm

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

New test (7-zip 15.10)

1842.png
7za a -tgzip -mx=9 -mfb=257 -mpass=5 1842.png.gz - 328 KB (336 727 byte)
7za a -tgzip -mx=9 -mfb=258 -mpass=15 1842.png.gz - 336 KB (344 109 byte)

So what is your question?
If you test different cases, try all combinations:
pass - [5 ... 15]
fb - [257 ... 258]
and then we can think why some combination is better.

I have run the tests for lorents (size increasing):

fb	pass	size
257	5	336727
257	11	336744
257	6-10	336784
257	15	337912
257	12	338046
258	14	338178
258	13	338198
258	12	338252
258	11	338349
258	15	344109
258	5-10	344220
257	14	344616
257	13	344633

I cannot believe fb257 is better than 258! In Deflate stream, the sequence of 258 repeated bytes has no bit overhead in encoding, whereas 257 has 5 extra bits per sequence.

lorents - 2015-11-04

I always thought, the value of parameters - mfb and - mpass is higher, the extent of compression is higher, but there is it not so. I here think how to make automatic selection of value of parameters.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

lorents - 2015-11-20

As I know, value of the block parameter strongly influences compression of deflate. Prompt as parameters - mfb and - mpass are connected with the block parameter and as it is possible to specify a certain block value in 7-zip deflate.

Как я знаю, значение параметра block сильно влияет на сжатие deflate. Подскажите, как параметры -mfb and -mpass связаны с параметром block, и как можно указать определенное значение block в 7-zip deflate.

Last edit: lorents 2015-11-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

  m_ValueBlockSize = (7 << 10) + (1 << 12) * m_NumDivPasses;

But also it reduces block to sublocks.

static const unsigned kNumDivPassesMax = 10; // [0, 16); ratio/speed/ram tradeoff; use big value for better compression ratio.
static const UInt32 kDivideCodeBlockSizeMin = (1 << 7); // [1, (1 << 32)); ratio/speed tradeoff; use small value for better compression ratio.
static const UInt32 kDivideBlockSizeMin = (1 << 6); // [1, (1 << 32)); ratio/speed tradeoff; use small value for better compression ratio.

static const UInt32 kMaxUncompressedBlockSize = ((1 << 16) - 1) * 1; // [1, (1 << 32))

Maybe you can increase compression ratio for some files, if you increase kMaxUncompressedBlockSize in source code.

lorents - 2015-11-21

Если я правильно нонял, то получается размер блока 7168+ (4096 * mNumDivPasses)
а что такое mNumDivPasses?
И еще такой момент, как можно задать не размер блока, а кол-во блоков?

Last edit: lorents 2015-11-21

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Shell - 2015-11-22
  
  mNumDivPasses seems to be the number of attempts to split Deflate blocks into smaller ones. I suppose you cannot directly specify the number of blocks in 7-Zip. kzip can do that, you may examine its source code to create your own Deflate.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

7-zip deflate (15.08 vs 4.32)

A free file archiver for extremely high compression

Forums

Help

7-zip deflate (15.08 vs 4.32)

7-zip deflate (15.08 vs 4.32)

A free file archiver for extremely high compression

Forums

Help

7-zip deflate (15.08 vs 4.32) document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

7-zip deflate (15.08 vs 4.32)