I have a reasonably fast machine. 3.33GHz (6 core Xeon), yet on decompressing a file
7-zip achieves an astounding rate of 15MB/s... fully broken.
The disk is a RAID0 of SSD's easily capable of well over 100MB/s (200-400 is more typical).
Even a single HD can usually write over 100MB/s. So using parallel decompression
on a 6-core processor would, for most people not saturate their HD. With increasingly
common RAID and SSD setups, it wouldn't even touch 1/3-1/4 the bandwidth (if I did it on
my server -- - it's got a 1GB/s write rate... so 15MB/s when unpacking a 4GB archive
is unreasonably slow given current I/O speeds and parallel options...
This would ***presume*** the archive was not packed as 1 continuous block,
but used, say, block sizes of that made parallelism, possible -- something I've learned
to do since I found (surprisingly) that a larger block size isn't always smaller.
By default I used a 1G BS (64MBdic, 128bit word)... I found this was quite a bit less than optimal..
(I'm running a program that often needs to access / extract parts of a .7z archive to repair damage
created when one switches around load load orders (game=oblivion). I figured a smaller block size
would make so it didn't have to unpack as many blocks to access the files it needed (vs. if it is a 1GB
or solid blocksize, it would need to compress linearly from the beginning as I understand it.
So I ran several tests and better than my previous defaults were several smaller block sizes with the best
being 'a solid archive at max word size and 1GB dictionary.' Chart:
src=346,735,328 (338601K, 338668K alloc)
BS(MB) dic(MB) Word size(K) (Overhead)
1024 64 128 270796 1.77% larger
64 32 64 269708 1.36% larger
64 32 32 269696 1.36% larger
64 32 24 269696 1.36% larger
64 32 16 269692 1.36% larger
64 48 32 269528 1.30% larger
64 48 24 269528 1.30% larger
64 48 16 269520 1.29% larger
64 64 32 269468 1.27% larger
64 64 24 269468 1.27% larger
64 64 16 268462 0.89% larger
64 64 16 268462 0.89% larger
solid 1024 273 266082 0.00% larger
Of course this would vary depending on the data compressed, but a 64MB MB
block size with 64MB dict and 16bit word gave results darn close to optimal.
So... For these files I'm using a 64MB block size.
My test file above was 5.<fraction> x 64MB, meaning it likely could have been
decompressed in parallel given enough memory to buffer things.
That's something that would have to be taken into account when activating multiple decompression
threads unless the final uncompressed size of each block is known in advance.
Regardless the entire file's post-allocated size *should* be pre-allocated (NOTE --
this should be done in even a non-parallel compress so the OS allocates 1 large
chunk to hold the whole file). Another risk if you write small chunks -- some other
process might ask for allocations in the middle of your file.
In any event, you can see that in even my small test file -- 6 threads would help,
and on my server, with the 1GB/s RAID and 12 threads, parallelism would kick butt!
Log in to post a comment.