It's one of the most important features to add after XZ Utils 5.0.0 is out. I cannot say anything more exact, since it depends on how quickly or slowly I get things done.
Note that 7-Zip and p7zip 9 betas support the .xz format. They can use multi-threading (not limited to two threads) when compressing into the .xz format. It may be worth trying before XZ Utils catches up.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for the response. Unfortunately, p7zip is only multithreaded in some distributions, and I was not able to get multithreaded from any of those packages (p7zip, lxz, pxz) on the platforms that I care about… So I created another one.
threadzip ( http://code.google.com/p/threadzip/ ) is implemented in python and therefore highly cross-platform compatible. Unfortunately the only library that ships with python by default is zlib (like gzip) so you have to add-on pylzma as a separate module. This was very easy on some machines, but I didn't get it working on solaris (yet).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The problem with most parallel implementations is that they are not in-place replacements for the non-parallel ones. The program should take same options and work well with stdin/stdout.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi vasi,
Could you please support stdin and stdout ?
I am after a multithreaded decompressor (like pigz) for lzma/xz that works ok with a piped input stream and piped output. xz works but even the latest 5.1.1 alpha doesnt do multithreaded decompress.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anyone working on a MT implementation, should just put their effort toward the xz-utils code base. No other temporary implementation will bring that level of accomplishment.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'd also like to know what is the major implementation problem? is MT support not in the LZMA SDK? More specifically, the code is not 20 years old like gzip, why wasn't MT support considered from conception?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The problem is that I haven't worked on the code much in the past months. I don't use LZMA SDK code as is. liblzma in XZ Utils has different API (buffer-to-buffer, no callbacks), which affects the internal implementation too.
Current git snapshot is kind of usable already though. RAM usage would be lower in a more sophisticated implementation without affecting anything else, and the progress indicator doesn't work very well. But it does compress in parallel and has decent performance and shouldn't corrupt data. :-) I still don't recommend it for production use just to be safe.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Not in 5.2.0 and maybe never. The current implementation of threading makes the compression slightly worse. I currently guess that I will get fewer complaints by keeping the old behavior as the default, at least for the foreseeable future.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Threading in 5.1.3alpha (and 5.2.0, whenever it gets released) works on GNU/Linux and other OSes that support pthreads. For Windows there's support for native threading APIs.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, and thanks for implementing the MT improvements in 5.2! I tested 5.2.1 -T0 on a 16 core VM with -3 mode, here are my findings vs. default:
speed: 7.3x than default (ideally it could be 16.0x)
size: the file size is 1.30x (ideally it should be 1.00x)
I managed to improve the MT compression ratio (to a slightly better size than without -T0) using --lzma2=preset=3,dict=16MiB (speed is also a bit faster(!), however it uses more RAM), however when using the same setting without -T0 I still get a much better compression ratio. So -T0 vs -T1 with --lzma2=preset=3,dict=16MiB:
speed: 9.9x than default (ideally it could be 16.0x)
size: the file size is 1.11x (ideally it should be 1.00x)
In the end MT is very good, but still doesn't scale perfectly. Is there any hope to improve MT scaling or these limitations can't be overcomed?
Thanks again for this great tool!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The current threading method splits the data into blocks and compresses them independently. This makes the compression ratio worse, but usually the increase is only a few percent. With some types of data the effect on compression ratio is higher, and the worst case can be really bad (making threading useless).
Your 30 % increase in file size sounds bad. If you have many files like the one you tested, then searching for different compression options is a good idea like you have already done. The counterintuitive effect on speed may be explained by the change in compression ratio: with a bigger dictionary and block size the encoder might be able to find long repeated chunks of data more easily and thus need less time to analyze the data.
The default block size is 3 times the LZMA2 dictionary size or at least 1 MiB. The -3 preset uses 4 MiB dictionary and thus 12 MiB block size by default. To get 16 threads, you need at least 16 * 12 MiB = 192 MiB input file. If your input file isn't a multiple of that, the end of the file probably won't use all cores (depends on how fast the blocks finish). When you set 16 MiB dictionary, the default block size increases to 48 MiB.
Increasing the block size has smaller effect on memory usage than increasing the dictionary size. You could test e.g. "xz -3 --block-size=78MiB" which uses the same amount of memory as the settings you tested but the bigger block size might improve the file size.
For the current threading method (splitting into blocks) there's some hope to get the memory usage down a little (not a big improvement), but I don't see much hope for improving compression ratio or performance. However, other threading methods should be implemented. 7-Zip has had match finder threading as long as I remember (I think over a decade) which scales to two cores without increasing memory usage. Combining it with the current method would allow using bigger dictionaries and block sizes with the same amount of memory.
The third planned threading method comes from pigz (parallel gzip). There's an old prototype (not intended for production use!) in case you are curious. See the comments in the code for more information.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Any ideas on when will the multi-threaded compression be available?
It's one of the most important features to add after XZ Utils 5.0.0 is out. I cannot say anything more exact, since it depends on how quickly or slowly I get things done.
Note that 7-Zip and p7zip 9 betas support the .xz format. They can use multi-threading (not limited to two threads) when compressing into the .xz format. It may be worth trying before XZ Utils catches up.
Now that 5.0.1 is out, I'm looking for a changelog, or a roadmap or something to indicate when xz will support multithreading…
I'm not able to give any schedule. This is a hobby for me. Recently I have been able to work on xz only a little.
The positive thing is that you can use p7zip, lxz, or pxz to do threaded compression. p7zip is included in most distros already.
Thank you for the response. Unfortunately, p7zip is only multithreaded in some distributions, and I was not able to get multithreaded from any of those packages (p7zip, lxz, pxz) on the platforms that I care about… So I created another one.
threadzip ( http://code.google.com/p/threadzip/ ) is implemented in python and therefore highly cross-platform compatible. Unfortunately the only library that ships with python by default is zlib (like gzip) so you have to add-on pylzma as a separate module. This was very easy on some machines, but I didn't get it working on solaris (yet).
The problem with most parallel implementations is that they are not in-place replacements for the non-parallel ones. The program should take same options and work well with stdin/stdout.
My pixz utility does multi-threaded compression and decompression, with a file-format fully compatible with existing xz. I'd love any feedback.
Gah, I hate BBCode. That link should be https://github.com/vasi/pixz
Thanks, vasi and rahvee! I added these programs to the Arch Linux AUR:
https://aur.archlinux.org/packages.php?ID=49137
https://aur.archlinux.org/packages.php?ID=49138
Hi vasi,
Could you please support stdin and stdout ?
I am after a multithreaded decompressor (like pigz) for lzma/xz that works ok with a piped input stream and piped output. xz works but even the latest 5.1.1 alpha doesnt do multithreaded decompress.
Thanks!
Threadzip uses multithreaded compress & decompress on stdin/stdout.
If you want to use lzma with threadzip, you need to install pylzma.
Anyone working on a MT implementation, should just put their effort toward the xz-utils code base. No other temporary implementation will bring that level of accomplishment.
I'd also like to know what is the major implementation problem? is MT support not in the LZMA SDK? More specifically, the code is not 20 years old like gzip, why wasn't MT support considered from conception?
The problem is that I haven't worked on the code much in the past months. I don't use LZMA SDK code as is. liblzma in XZ Utils has different API (buffer-to-buffer, no callbacks), which affects the internal implementation too.
Current git snapshot is kind of usable already though. RAM usage would be lower in a more sophisticated implementation without affecting anything else, and the progress indicator doesn't work very well. But it does compress in parallel and has decent performance and shouldn't corrupt data. :-) I still don't recommend it for production use just to be safe.
Is this feature available now?
Last edit: Broken.zhou 2013-09-02
There is 5.1.2alpha which has threading. A few minor fixes have been made after that release and they are available in the git repository.
I still don't know when a stable release will be made. I don't plan much new things before 5.2.0. I just need to get it done.
I have tested the new version of xz and found it really great! Thanks for the work!
In order to use the multithread feature, I need to add
XZ_DEFAULTS="--threads 0"
to my environment variable. Will this be the defaults setting finally?
Not in 5.2.0 and maybe never. The current implementation of threading makes the compression slightly worse. I currently guess that I will get fewer complaints by keeping the old behavior as the default, at least for the foreseeable future.
Can 5.2.0 mutithread feature be used in Linux?
Threading in 5.1.3alpha (and 5.2.0, whenever it gets released) works on GNU/Linux and other OSes that support pthreads. For Windows there's support for native threading APIs.
Hi, and thanks for implementing the MT improvements in 5.2! I tested 5.2.1 -T0 on a 16 core VM with -3 mode, here are my findings vs. default:
I managed to improve the MT compression ratio (to a slightly better size than without -T0) using --lzma2=preset=3,dict=16MiB (speed is also a bit faster(!), however it uses more RAM), however when using the same setting without -T0 I still get a much better compression ratio. So -T0 vs -T1 with --lzma2=preset=3,dict=16MiB:
In the end MT is very good, but still doesn't scale perfectly. Is there any hope to improve MT scaling or these limitations can't be overcomed?
Thanks again for this great tool!
The current threading method splits the data into blocks and compresses them independently. This makes the compression ratio worse, but usually the increase is only a few percent. With some types of data the effect on compression ratio is higher, and the worst case can be really bad (making threading useless).
Your 30 % increase in file size sounds bad. If you have many files like the one you tested, then searching for different compression options is a good idea like you have already done. The counterintuitive effect on speed may be explained by the change in compression ratio: with a bigger dictionary and block size the encoder might be able to find long repeated chunks of data more easily and thus need less time to analyze the data.
The default block size is 3 times the LZMA2 dictionary size or at least 1 MiB. The -3 preset uses 4 MiB dictionary and thus 12 MiB block size by default. To get 16 threads, you need at least 16 * 12 MiB = 192 MiB input file. If your input file isn't a multiple of that, the end of the file probably won't use all cores (depends on how fast the blocks finish). When you set 16 MiB dictionary, the default block size increases to 48 MiB.
Increasing the block size has smaller effect on memory usage than increasing the dictionary size. You could test e.g. "xz -3 --block-size=78MiB" which uses the same amount of memory as the settings you tested but the bigger block size might improve the file size.
For the current threading method (splitting into blocks) there's some hope to get the memory usage down a little (not a big improvement), but I don't see much hope for improving compression ratio or performance. However, other threading methods should be implemented. 7-Zip has had match finder threading as long as I remember (I think over a decade) which scales to two cores without increasing memory usage. Combining it with the current method would allow using bigger dictionaries and block sizes with the same amount of memory.
The third planned threading method comes from pigz (parallel gzip). There's an old prototype (not intended for production use!) in case you are curious. See the comments in the code for more information.