Menu

7-zip deflate (15.08 vs 4.32)

lorents
2015-10-04
2015-12-15
1 2 > >> (Page 1 of 2)
  • lorents

    lorents - 2015-10-04

    Tested old and new versions of 7-zip. Noticed an interesting feature.

    7za a -mx=9 -mfb=257 -mpass=5

    1842.png
    7-zip 15.08 - 1842.png.gz - 328 KB (336 784 byte)
    7-zip 4.32 - 1842.png.gz - 336 KB (344 826 byte)

    Ratio 2,33%

    2248.png
    7-zip 15.08 - 2248.png.gz - 415 KB (425 781 byte)
    7-zip 4.32 - 2248.png.gz - 403 KB (413 352 byte)

    Ratio 2,91%

    How to make 15.08 squeezed not worse 4.32?

     

    Last edit: lorents 2015-10-04
  • Igor Pavlov

    Igor Pavlov - 2015-10-04

    Probably there is some point that triggers encoder to non-optimal way.
    Now I'm not ready to discover it more. Maybe later.

    I suppose you tested more png examples.
    How often new version provides worse compression ratio?

     
  • lorents

    lorents - 2015-10-04

    I made two tests 3800 and 8600 of images PNG.

    Test 1 - 3800 PNG
    7-zip 4.32 - 1 186 798 089 byte
    7-zip 15.08 - 1 186 056 550 byte
    MIN (7-zip 4.32 + 7-zip 15.08) - 1 184 771 210 byte

    Test 2 - 8600 PNG
    7-zip 4.32 - 605 934 986 byte
    7-zip 15.08 - 607 720 425 byte
    MIN (7-zip 4.32 + 7-zip 15.08) - 605 332 081 byte

    I am the developer of the small project on optimization and compression of images - iCatalyst. The algorithm 7-zip is necessary to deflate for compression of images PNG.

    I understand, for you the algorithm of compression of deflate isn't the priority direction, but I very much hope that you will help me with the matter.

     

    Last edit: lorents 2015-10-04
  • Igor Pavlov

    Igor Pavlov - 2015-10-05

    Now It can be difficult to detect exact reason of problem.
    Probably I'll look it later.

    I suppose that there was some tradeoff when I changed that algorithm, and new code worked better for some test files.
    Maybe png files have some non-typical patterns.

     
  • lorents

    lorents - 2015-10-05

    Thank you! I will wait!

     
  • Shell

    Shell - 2015-10-05

    I have read lorents's post on Ru-Board forum about the optimal use of the -mmc switch. It is not listed as a parameter for Deflate, but it does influence the compression ratio. In my tests, the size decreased monotonically to eventual saturation with increasing -mmc. I suggest to mention this parameter in 7-Zip's manual in the future.

     
  • lorents

    lorents - 2015-10-22

    I made a new test using the application x128.ho.ua/ec-idat.zip
    Optimization parameters -mx=9 -mfb=257 -mpass=5

    Test 1:
    4.32 - 1 185 943 426 bytes
    4.43 - 1 184 940 381 bytes
    4.56 - 1 185 699 315 bytes
    4.57 - 1 185 576 422 bytes
    9.20 - 1,186,613,793 bytes
    15.09 - 1 185 934 035 bytes
    advdef - 1 185 952 066 bytes

    min (4.32 + 4.43) = 1 183 808 935 bytes

    Test 2:
    4.32 - 605 473 569 bytes
    4.43 - 607 034 271 bytes
    4.56 - 608 231 878 bytes
    4.57 - 608 052 839 bytes
    9.20 - 608 931 061 bytes
    15.09 - 607 743 946 bytes
    advdef - 605 305 934 bytes

    min (4.32 + 4.43) = 604 905 857 bytes

     

    Last edit: lorents 2015-10-22
    • Shell

      Shell - 2015-10-22

      Will the winner remain the same if you take the maximal values: -mfb258 -mpass15? Also I've noticed that -mpass14 is sometimes better than 15.

      Concerning the latter fact, I have a question to Igor: what is a "pass"? Couldn't 7-Zip remember the best result from previous passes to prevent compression ratio from degrading?

       
      • Igor Pavlov

        Igor Pavlov - 2015-10-23

        It's combined parameter.

        ~~~~~
        static const unsigned kNumDivPassesMax = 10; // [0, 16); ratio/speed/ram tradeoff; use big value for better compression ratio.

        m_NumDivPasses = props.numPasses;
        if (m_NumDivPasses == 0)
        m_NumDivPasses = 1;
        if (m_NumDivPasses == 1)
        m_NumPasses = 1;
        else if (m_NumDivPasses <= kNumDivPassesMax)
        m_NumPasses = 2;
        else
        {
        m_NumPasses = 2 + (m_NumDivPasses - kNumDivPassesMax);
        m_NumDivPasses = kNumDivPassesMax;
        }
        ~~~~

        m_NumPasses - the number of passes for optimal huffman parsing.

        m_NumDivPasses - how many times we try to split block into 2 small blocks. Sometimes small blocks are better than one big block.

        example numPasses = 15;
        m_NumPasses = 2 + (15 - 10) = 7;
        m_NumDivPasses = 10;

         
        • Shell

          Shell - 2015-10-23

          Block division is a great idea. I suspect that other Deflate optimization programs rely on fixed block count (though this count is usually user-controlled).

          If I understand the source code right, there is a possibility to select the match finder for Deflate (namely between bt3 and hc3). Please, consider documenting this feature along with -mmc.

           
  • Igor Pavlov

    Igor Pavlov - 2015-10-23
    • Couldn't 7-Zip remember the best result from previous passes to prevent compression ratio from degrading?

    Maybe 7-Zip doesn't remember it.
    I don't remember details of code now.

     
  • lorents

    lorents - 2015-10-23

    Igor Pavlov
    Is it possible to update the 7-zip deflate to avoid this variation in compression.

     
  • lorents

    lorents - 2015-10-24

    Did a test on compression speed

    test1:
    4.32 - 0,932 s
    4.43 - 1,057 s
    4.56 - 0,772 s
    4.57 - 0,775 s
    9.20 - 0,738 s
    15.09 - 1,472 s

    test2:
    4.32 - 2,650 s
    4.43 - 2,610 s
    4.56 - 2,107 s
    4.57 - 2,069 s
    9.20 - 1,984 s
    15.09 - 3,240 s

     

    Last edit: lorents 2015-10-24
    • Igor Pavlov

      Igor Pavlov - 2015-10-25

      What files (the number of files, size)
      What cpu?
      What exact switches?

      Try some big files also.

       
  • lorents

    lorents - 2015-10-25

    compression settings:

    7za a -tgzip -mx=9 -mfb=257 -mpass=5
    

    Tested different files. in almost all cases, the results are the same. Version 15.09 is the slowest version 9.20 is the fastest.

     

    Last edit: lorents 2015-10-25
  • Igor Pavlov

    Igor Pavlov - 2015-10-25

    There is bug in some latest versions of 7-Zip for bzip2 and gzip archives.
    If you specify parameters (-m switch) in command line, 7-Zip parses parameters:
    x
    mt
    any_other_parameter

    and 7-Zip 15.09 ignores any parameter after first "any_other_parameter".

    In your case, it probably used default pass=10 (for x=9).

    So wait next fixed version of 7-Zip.
    Or use zip format insead of gz format. I suppose there is no that bug in ZIP code.

    THANKS for report!!!

     
  • lorents

    lorents - 2015-11-01

    THANKS!!!
    The new version works perfectly.

    Can I hope that update the Deflate compression algorithm

     
  • lorents

    lorents - 2015-11-04

    New test (7-zip 15.10)

    1842.png
    7za a -tgzip -mx=9 -mfb=257 -mpass=5 1842.png.gz - 328 KB (336 727 byte)
    7za a -tgzip -mx=9 -mfb=258 -mpass=15 1842.png.gz - 336 KB (344 109 byte)

     
    • Igor Pavlov

      Igor Pavlov - 2015-11-04

      So what is your question?
      If you test different cases, try all combinations:
      pass - [5 ... 15]
      fb - [257 ... 258]
      and then we can think why some combination is better.

       
      • Shell

        Shell - 2015-11-04

        I have run the tests for lorents (size increasing):

        fb pass size
        257 5 336727
        257 11 336744
        257 6-10 336784
        257 15 337912
        257 12 338046
        258 14 338178
        258 13 338198
        258 12 338252
        258 11 338349
        258 15 344109
        258 5-10 344220
        257 14 344616
        257 13 344633

        I cannot believe fb257 is better than 258! In Deflate stream, the sequence of 258 repeated bytes has no bit overhead in encoding, whereas 257 has 5 extra bits per sequence.

         
      • lorents

        lorents - 2015-11-04

        I always thought, the value of parameters - mfb and - mpass is higher, the extent of compression is higher, but there is it not so. I here think how to make automatic selection of value of parameters.

         
  • lorents

    lorents - 2015-11-20

    As I know, value of the block parameter strongly influences compression of deflate. Prompt as parameters - mfb and - mpass are connected with the block parameter and as it is possible to specify a certain block value in 7-zip deflate.

    Как я знаю, значение параметра block сильно влияет на сжатие deflate. Подскажите, как параметры -mfb and -mpass связаны с параметром block, и как можно указать определенное значение block в 7-zip deflate.

     

    Last edit: lorents 2015-11-20
  • Igor Pavlov

    Igor Pavlov - 2015-11-21
      m_ValueBlockSize = (7 << 10) + (1 << 12) * m_NumDivPasses;
    

    But also it reduces block to sublocks.

    static const unsigned kNumDivPassesMax = 10; // [0, 16); ratio/speed/ram tradeoff; use big value for better compression ratio.
    static const UInt32 kDivideCodeBlockSizeMin = (1 << 7); // [1, (1 << 32)); ratio/speed tradeoff; use small value for better compression ratio.
    static const UInt32 kDivideBlockSizeMin = (1 << 6); // [1, (1 << 32)); ratio/speed tradeoff; use small value for better compression ratio.
    
    static const UInt32 kMaxUncompressedBlockSize = ((1 << 16) - 1) * 1; // [1, (1 << 32))
    

    Maybe you can increase compression ratio for some files, if you increase kMaxUncompressedBlockSize in source code.

     
  • lorents

    lorents - 2015-11-21

    Если я правильно нонял, то получается размер блока 7168+ (4096 * mNumDivPasses)
    а что такое mNumDivPasses?
    И еще такой момент, как можно задать не размер блока, а кол-во блоков?

     

    Last edit: lorents 2015-11-21
    • Shell

      Shell - 2015-11-22

      mNumDivPasses seems to be the number of attempts to split Deflate blocks into smaller ones. I suppose you cannot directly specify the number of blocks in 7-Zip. kzip can do that, you may examine its source code to create your own Deflate.

       
1 2 > >> (Page 1 of 2)

Log in to post a comment.