Menu

#1095 7-zip should use parallel decompress to avoid bad behavior

open
nobody
None
5
2015-11-02
2012-04-23
Astara
No

7-zip runs abysmally slow on modern machines doing decompression, when it could easily run 4-6 x faster on most machines and 8-12 times faster on many.

As it isn't even close to being I/O bound (writing decompressed files at 13MB/s on disks capable of over 50 times that speed), this would be like detroit building a car for a moddern highway that can only do 3 MPH...

Most people would consider than unacceptable performance -- even a bug, but some will response that is performance according to it's designed specification -- and by that definition, it isn't a bug. But from a performance standpoint -- I don't think it would pass modern QA standards I was getting 10MB / s 10 years
ago with gzip!... and that was on compression....here we are talking uncompression which is supposed to be faster...

I guess I don't get why decompression is so slow...?

Anyway, as 'implied', it seemed you wanted this added as a feature request as running with a 86-93% performance penalty isn't considered a bug.

Ok.

I don't care... just as long as it gets addressed at some point -- but obviously, if it was important enough for me -- I'd do it myself and send you the patch or release it or something...;-)

I have a reasonably fast machine. 3.33GHz (6 core Xeon), yet on decompressing a file
7-zip achieves an astounding rate of 15MB/s... fully broken.

The disk is a RAID0 of SSD's easily capable of well over 100MB/s (200-400 is more typical).

Even a single HD can usually write over 100MB/s. So using parallel decompression
on a 6-core processor would, for most people not saturate their HD. With increasingly
common RAID and SSD setups, it wouldn't even touch 1/3-1/4 the bandwidth (if I did it on
my server -- - it's got a 1GB/s write rate... so 15MB/s when unpacking a 4GB archive
is unreasonably slow given current I/O speeds and parallel options...

This would ***presume*** the archive was not packed as 1 continuous block,
but used, say, block sizes of that made parallelism, possible -- something I've learned
to do since I found (surprisingly) that a larger block size isn't always smaller.

By default I used a 1G BS (64MBdic, 128bit word)... I found this was quite a bit less than optimal..
(I'm running a program that often needs to access / extract parts of a .7z archive to repair damage
created when one switches around load load orders (game=oblivion). I figured a smaller block size
would make so it didn't have to unpack as many blocks to access the files it needed (vs. if it is a 1GB
or solid blocksize, it would need to compress linearly from the beginning as I understand it.

So I ran several tests and better than my previous defaults were several smaller block sizes with the best
being 'a solid archive at max word size and 1GB dictionary.' Chart:
src=346,735,328 (338601K, 338668K alloc)
test compressions
BS(MB) dic(MB) Word size(K) (Overhead)
1024 64 128 270796 1.77% larger
64 32 64 269708 1.36% larger
64 32 32 269696 1.36% larger
64 32 24 269696 1.36% larger
64 32 16 269692 1.36% larger
64 48 32 269528 1.30% larger
64 48 24 269528 1.30% larger
64 48 16 269520 1.29% larger
64 64 32 269468 1.27% larger
64 64 24 269468 1.27% larger
64 64 16 268462 0.89% larger
64 64 16 268462 0.89% larger
solid 1024 273 266082 0.00% larger

----
Of course this would vary depending on the data compressed, but a 64MB MB
block size with 64MB dict and 16bit word gave results darn close to optimal.

So... For these files I'm using a 64MB block size.

My test file above was 5.<fraction> x 64MB, meaning it likely could have been
decompressed in parallel given enough memory to buffer things.

That's something that would have to be taken into account when activating multiple decompression
threads unless the final uncompressed size of each block is known in advance.

Regardless the entire file's post-allocated size *should* be pre-allocated (NOTE --
this should be done in even a non-parallel compress so the OS allocates 1 large
chunk to hold the whole file). Another risk if you write small chunks -- some other
process might ask for allocations in the middle of your file.

In any event, you can see that in even my small test file -- 6 threads would help,
and on my server, with the 1GB/s RAID and 12 threads, parallelism would kick butt!

;-)

Thanks,
Astara

Discussion

  • Igor Pavlov

    Igor Pavlov - 2012-04-23

    There are many reasons why LZMA Decompression is single threaded still:
    1) It's simpler.
    2) It's better for RAM consumptation.
    3) Most users use only one HDD. And HDD is not too fast when you decompress to same hdd.
    For example, LZMA can decompress with rate about
    15 MB/s -> 30-60 MB/s.
    And that 15 MB/s read plus 50 MB/s write speed is close to HDD speed.
    4) modern CPUs now can increase CPU frequency when only one thread works. And that thread can use full L3 cache to use.

    If you need 1 GB/s decompression. You must use something else.

    What data do you compress (type and size)?
    Did you use LZMA2 method?

     
    • Shmerl

      Shmerl - 2014-07-30

      These answers aren't convincing.

      > 1) It's simpler.

      Yes, but parallel decompression can achieve more performance. Single threaded compression is also simpler, yet 7z offers parallel option there.

      > 2) It's better for RAM consumptation.

      RAM sizes gradually increase and modern machines can have gigabytes of RAM. So not implementing parallel decompression because of RAM concerns doesn't make sense (especially if it's optional).

      > 3) Most users use only one HDD. And HDD is not too fast when you decompress to same hdd.

      Not if it's an SSD or even a fast enough rotational HDD. This isn't so black and white and different scenarios can still offer big advantages for parallel decompression.

      > 4) modern CPUs now can increase CPU frequency when only one thread works.

      It still can't beat decompressing it with many threads.

      All in all, not having a parallel decompressor is a very big downside.

       
  • Jacob G

    Jacob G - 2012-12-13

    Even if LZMA Decompression remains single-threaded, there are large improvements that can be made in making simple zip decompression multi-threaded. Many years back, UberZip, a one-off java proof of concept for multi-threaded unzip was written. You can try it out here: http://www.matthicks.com/2008/01/multithreaded-unzip.html

    The speed-up that is available is impressive. Implementing multiple threads, even for simple unzip, could make 7-Zip the fastest unzip utility available.

     
  • mauromol

    mauromol - 2015-11-02

    I also think that nowadays parallel decompression would be valuable. The increase in frequency of a single core for modern processor won't ever be as effective as using all the cores of the CPU.
    The HDD speed limit doesn't justify the absence of parallel decompression, for multiple reasons:
    - when you just want to test archives, you don't need to write
    - even if you need to write, you may write to a different hard disk
    - SSDs are very affordable and common nowadays and they even work much better with parallel streams

    RAM also is very cheap and I think decompression will consume less memory than compression, in any case won't it?

     

Log in to post a comment.