I am the developer of the small project on optimization and compression of images - iCatalyst. The algorithm 7-zip is necessary to deflate for compression of images PNG.
I understand, for you the algorithm of compression of deflate isn't the priority direction, but I very much hope that you will help me with the matter.
Now It can be difficult to detect exact reason of problem.
Probably I'll look it later.
I suppose that there was some tradeoff when I changed that algorithm, and new code worked better for some test files.
Maybe png files have some non-typical patterns.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have read lorents's post on Ru-Board forum about the optimal use of the -mmc switch. It is not listed as a parameter for Deflate, but it does influence the compression ratio. In my tests, the size decreased monotonically to eventual saturation with increasing -mmc. I suggest to mention this parameter in 7-Zip's manual in the future.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Will the winner remain the same if you take the maximal values: -mfb258 -mpass15? Also I've noticed that -mpass14 is sometimes better than 15.
Concerning the latter fact, I have a question to Igor: what is a "pass"? Couldn't 7-Zip remember the best result from previous passes to prevent compression ratio from degrading?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Block division is a great idea. I suspect that other Deflate optimization programs rely on fixed block count (though this count is usually user-controlled).
If I understand the source code right, there is a possibility to select the match finder for Deflate (namely between bt3 and hc3). Please, consider documenting this feature along with -mmc.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is bug in some latest versions of 7-Zip for bzip2 and gzip archives.
If you specify parameters (-m switch) in command line, 7-Zip parses parameters:
x
mt
any_other_parameter
and 7-Zip 15.09 ignores any parameter after first "any_other_parameter".
In your case, it probably used default pass=10 (for x=9).
So wait next fixed version of 7-Zip.
Or use zip format insead of gz format. I suppose there is no that bug in ZIP code.
THANKS for report!!!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So what is your question?
If you test different cases, try all combinations:
pass - [5 ... 15]
fb - [257 ... 258]
and then we can think why some combination is better.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have run the tests for lorents (size increasing):
fb
pass
size
257
5
336727
257
11
336744
257
6-10
336784
257
15
337912
257
12
338046
258
14
338178
258
13
338198
258
12
338252
258
11
338349
258
15
344109
258
5-10
344220
257
14
344616
257
13
344633
I cannot believe fb257 is better than 258! In Deflate stream, the sequence of 258 repeated bytes has no bit overhead in encoding, whereas 257 has 5 extra bits per sequence.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I always thought, the value of parameters - mfb and - mpass is higher, the extent of compression is higher, but there is it not so. I here think how to make automatic selection of value of parameters.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
As I know, value of the block parameter strongly influences compression of deflate. Prompt as parameters - mfb and - mpass are connected with the block parameter and as it is possible to specify a certain block value in 7-zip deflate.
Как я знаю, значение параметра block сильно влияет на сжатие deflate. Подскажите, как параметры -mfb and -mpass связаны с параметром block, и как можно указать определенное значение block в 7-zip deflate.
Last edit: lorents 2015-11-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Если я правильно нонял, то получается размер блока 7168+ (4096 * mNumDivPasses)
а что такое mNumDivPasses?
И еще такой момент, как можно задать не размер блока, а кол-во блоков?
Last edit: lorents 2015-11-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
mNumDivPasses seems to be the number of attempts to split Deflate blocks into smaller ones. I suppose you cannot directly specify the number of blocks in 7-Zip. kzip can do that, you may examine its source code to create your own Deflate.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Tested old and new versions of 7-zip. Noticed an interesting feature.
7za a -mx=9 -mfb=257 -mpass=5
1842.png
7-zip 15.08 - 1842.png.gz - 328 KB (336 784 byte)
7-zip 4.32 - 1842.png.gz - 336 KB (344 826 byte)
Ratio 2,33%
2248.png
7-zip 15.08 - 2248.png.gz - 415 KB (425 781 byte)
7-zip 4.32 - 2248.png.gz - 403 KB (413 352 byte)
Ratio 2,91%
How to make 15.08 squeezed not worse 4.32?
Last edit: lorents 2015-10-04
Probably there is some point that triggers encoder to non-optimal way.
Now I'm not ready to discover it more. Maybe later.
I suppose you tested more png examples.
How often new version provides worse compression ratio?
I made two tests 3800 and 8600 of images PNG.
Test 1 - 3800 PNG
7-zip 4.32 - 1 186 798 089 byte
7-zip 15.08 - 1 186 056 550 byte
MIN (7-zip 4.32 + 7-zip 15.08) - 1 184 771 210 byte
Test 2 - 8600 PNG
7-zip 4.32 - 605 934 986 byte
7-zip 15.08 - 607 720 425 byte
MIN (7-zip 4.32 + 7-zip 15.08) - 605 332 081 byte
I am the developer of the small project on optimization and compression of images - iCatalyst. The algorithm 7-zip is necessary to deflate for compression of images PNG.
I understand, for you the algorithm of compression of deflate isn't the priority direction, but I very much hope that you will help me with the matter.
Last edit: lorents 2015-10-04
Now It can be difficult to detect exact reason of problem.
Probably I'll look it later.
I suppose that there was some tradeoff when I changed that algorithm, and new code worked better for some test files.
Maybe png files have some non-typical patterns.
Thank you! I will wait!
I have read lorents's post on Ru-Board forum about the optimal use of the
-mmc
switch. It is not listed as a parameter forDeflate
, but it does influence the compression ratio. In my tests, the size decreased monotonically to eventual saturation with increasing-mmc
. I suggest to mention this parameter in 7-Zip's manual in the future.I made a new test using the application x128.ho.ua/ec-idat.zip
Optimization parameters -mx=9 -mfb=257 -mpass=5
Test 1:
4.32 - 1 185 943 426 bytes
4.43 - 1 184 940 381 bytes
4.56 - 1 185 699 315 bytes
4.57 - 1 185 576 422 bytes
9.20 - 1,186,613,793 bytes
15.09 - 1 185 934 035 bytes
advdef - 1 185 952 066 bytes
min (4.32 + 4.43) = 1 183 808 935 bytes
Test 2:
4.32 - 605 473 569 bytes
4.43 - 607 034 271 bytes
4.56 - 608 231 878 bytes
4.57 - 608 052 839 bytes
9.20 - 608 931 061 bytes
15.09 - 607 743 946 bytes
advdef - 605 305 934 bytes
min (4.32 + 4.43) = 604 905 857 bytes
Last edit: lorents 2015-10-22
Will the winner remain the same if you take the maximal values:
-mfb258 -mpass15
? Also I've noticed that-mpass14
is sometimes better than 15.Concerning the latter fact, I have a question to Igor: what is a "pass"? Couldn't 7-Zip remember the best result from previous passes to prevent compression ratio from degrading?
It's combined parameter.
~~~~~
static const unsigned kNumDivPassesMax = 10; // [0, 16); ratio/speed/ram tradeoff; use big value for better compression ratio.
m_NumDivPasses = props.numPasses;
if (m_NumDivPasses == 0)
m_NumDivPasses = 1;
if (m_NumDivPasses == 1)
m_NumPasses = 1;
else if (m_NumDivPasses <= kNumDivPassesMax)
m_NumPasses = 2;
else
{
m_NumPasses = 2 + (m_NumDivPasses - kNumDivPassesMax);
m_NumDivPasses = kNumDivPassesMax;
}
~~~~
m_NumPasses - the number of passes for optimal huffman parsing.
m_NumDivPasses - how many times we try to split block into 2 small blocks. Sometimes small blocks are better than one big block.
example numPasses = 15;
m_NumPasses = 2 + (15 - 10) = 7;
m_NumDivPasses = 10;
Block division is a great idea. I suspect that other
Deflate
optimization programs rely on fixed block count (though this count is usually user-controlled).If I understand the source code right, there is a possibility to select the match finder for
Deflate
(namely betweenbt3
andhc3
). Please, consider documenting this feature along with-mmc
.Maybe 7-Zip doesn't remember it.
I don't remember details of code now.
Igor Pavlov
Is it possible to update the 7-zip deflate to avoid this variation in compression.
Did a test on compression speed
test1:
4.32 - 0,932 s
4.43 - 1,057 s
4.56 - 0,772 s
4.57 - 0,775 s
9.20 - 0,738 s
15.09 - 1,472 s
test2:
4.32 - 2,650 s
4.43 - 2,610 s
4.56 - 2,107 s
4.57 - 2,069 s
9.20 - 1,984 s
15.09 - 3,240 s
Last edit: lorents 2015-10-24
What files (the number of files, size)
What cpu?
What exact switches?
Try some big files also.
compression settings:
Tested different files. in almost all cases, the results are the same. Version 15.09 is the slowest version 9.20 is the fastest.
Last edit: lorents 2015-10-25
There is bug in some latest versions of 7-Zip for bzip2 and gzip archives.
If you specify parameters (-m switch) in command line, 7-Zip parses parameters:
x
mt
any_other_parameter
and 7-Zip 15.09 ignores any parameter after first "any_other_parameter".
In your case, it probably used default pass=10 (for x=9).
So wait next fixed version of 7-Zip.
Or use zip format insead of gz format. I suppose there is no that bug in ZIP code.
THANKS for report!!!
THANKS!!!
The new version works perfectly.
Can I hope that update the Deflate compression algorithm
New test (7-zip 15.10)
1842.png
7za a -tgzip -mx=9 -mfb=257 -mpass=5 1842.png.gz - 328 KB (336 727 byte)
7za a -tgzip -mx=9 -mfb=258 -mpass=15 1842.png.gz - 336 KB (344 109 byte)
So what is your question?
If you test different cases, try all combinations:
pass - [5 ... 15]
fb - [257 ... 258]
and then we can think why some combination is better.
I have run the tests for lorents (size increasing):
I cannot believe
fb257
is better than 258! In Deflate stream, the sequence of 258 repeated bytes has no bit overhead in encoding, whereas 257 has 5 extra bits per sequence.I always thought, the value of parameters - mfb and - mpass is higher, the extent of compression is higher, but there is it not so. I here think how to make automatic selection of value of parameters.
As I know, value of the block parameter strongly influences compression of deflate. Prompt as parameters - mfb and - mpass are connected with the block parameter and as it is possible to specify a certain block value in 7-zip deflate.
Как я знаю, значение параметра block сильно влияет на сжатие deflate. Подскажите, как параметры -mfb and -mpass связаны с параметром block, и как можно указать определенное значение block в 7-zip deflate.
Last edit: lorents 2015-11-20
But also it reduces block to sublocks.
Maybe you can increase compression ratio for some files, if you increase
kMaxUncompressedBlockSize
in source code.Если я правильно нонял, то получается размер блока 7168+ (4096 * mNumDivPasses)
а что такое mNumDivPasses?
И еще такой момент, как можно задать не размер блока, а кол-во блоков?
Last edit: lorents 2015-11-21
mNumDivPasses
seems to be the number of attempts to split Deflate blocks into smaller ones. I suppose you cannot directly specify the number of blocks in 7-Zip. kzip can do that, you may examine its source code to create your own Deflate.