LZMA2 Multi-Core Compression bug results in larger file size.

2013-07-05
2013-07-05
  • Robert Readman
    Robert Readman
    2013-07-05

    Hello,

    Tested the same file on 3 systems, 1x E8400 CPU (Dual-Core) 1x E7500 (Dual-Core) and a server with 2X L5430 (Quad-Core x2).
    The reason for testing with the Q6600 came after I found something very odd.

    I backed up 3 of our accountancy-files into a .tar resulting in the following;

    file.tar being 346.2MiB.
    So using the E8400 on Windows XP (32bit) to create a .tar.xz using the maximum settings.
    Which are Archive Format .xz, Compression Level Ultra, Compression Method LZMA2, Dictionary Size 64MB, Word size 273, Number of CPU Threads 2.
    The resulting file.tar.xz being 15.6MiB

    I then tried this on Windows Server SBS 2011 (64bit), to see if I could get the file smaller, by setting the Dictionary Size from 64MB to 256MB, the only other setting changed was setting the CPU Threads to 8, as it has 2x Quad Core CPUs.
    To my amazement the resulting file.tar.xz was 21MiB.
    So still on the server, set the directory size to 64MiB to see if it would make the same size as earlier on Windows XP.
    The resulting file was 22.6MiB
    Leaving everything the same as before but dropping to 4 cores from 8.
    The resulting file was 22.6MiB
    Leaving everything the same as before but dropping to 2 cores from 4.
    The file was now the same (byte per byte) as I created on Windows XP.
    The resulting file.tar.xz being 15.6MiB

    I then wondered if this was because of the server, but I have a Windows 7 Pro (64bit) with the E7500.
    Obviously I could only test the same settings with 2 cores and this made the same byte per byte file of file.tar.xz being 15.6MiB.

    Doing all of the same, (i.e. all with 64MB directory size) but all set to 1core, all create the same filesize of 15.6MiB (however this produces a slightly smaller file by just a few kilibytes.

    So is this a bug? 1 core and 2 core appear to be fine, on both 32bit and 64bit versions, however once you try 4 or 8 cores, the filesize just jumps, I thought it would just use more cores to do the data faster, can Igor confirm the results above?

    If on the server I finally set the directory size to 256MiB but however leave the cores set at 2.
    The resulting file was 13.1MiB instead of 21MiB with the cores set at 8.
    The 13.1MiB is what I was expecting in the first place, compared to the 15.6MiB.

    At least I know for now, to always use single core or dual core, but not 4 or 8.

    Version 9.25 on both, with 32bit on XP, and 64bit version of 7zip on Win7Pro and SBS2011.

    Thanks for reading, hopefully the above is detailed enough for others to go and do similar tests on their own files.

    Robert Readman.

     
    Last edit: Robert Readman 2013-07-05
  • Shell
    Shell
    2013-07-05

    It is not a bug. The manual says: "If LZMA2 is set <...> it doesn't split stream to chunks. So you can get different compression ratio for different number of threads. You can get the best compression ratio, when you use 1 or 2 threads."

    If I correctly understand the algorithm of multithreaded compression, archive size can never benefit from additional threads. Usually, however, the size increases only slightly, whereas the compression time can drop almost proportionally.

    Concerning your case, check the chunk size that 7-Zip uses. If you have multiple files, you can roughly estimate the chunk size as a sum of uncompressed sizes of the files from a single block (the block's number is displayed in 7-Zip File Manager). Try increasing block size up to the total size of the files - this would increase compression (at the expense of the time, I think, since the threads will not be able to process different data simultaneously). If you need ultimate compression, use a single thread.