When compressing files, I often find myself manually compressing the same files 3 times, once with each supported compression method (LZMA, PPMd, BZip2), and keep the one with the best compression ratio.
Is it possible to somehow do it automatically?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Imho it's not just which method LZMA or PPMd is best for a single file type, but which is best for a mix of different types.
If you take the time to try different settings (dictionary/wordsize,) you can achieve miraciously improvement but I don't think this can be achieved with just a reference table, because compressability will change. E.g.:
I tried to archive a folder with 7zip and WinRAR consisting of
As you can see most formats provide about the same compression ratio until I decreased the dictionary and word size, thats when I got the large improvement.
packer format time size
7z zip 10:40 927516
rar rar 6:03 926089
7z 7z 20~~ 920647 Bzip2
rar zip 1:49 920422
7z 7z 905435 PPMd 256 dic 32 word 4GB
7z 7z 9:40 901917 LZMA 64 MB duc, 256 word, 4GB 923/880 1632 kB/s
7z 7z 26:37 894956 916 434 753 PPMd 1024 dic, 32Word, 4GB
7z 7z 26:56 890 906 PPMd 1024 MB 8 MB word 1GB 923/869
========PPMd 1024 MB 8 MB word 64GB
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
possible workaround:
- write a batch with the three wanted commandlines ending up with three different archives
- make the batch compare them
- make the batch delete the second- and the third-best
Integrating such function into 7-Zip could possibly slow it down enormous and thus make it unusable in the eyes of the common user.
Interesting idea though!
Best regards!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, please test my profiler. It will test all file extensions in a directory with different compression methods to see which is best (where "best" means smallest size). Once you have 'profiled' all of the file types you want, you can compress them certain extensions with a certain method, and then compress others with a different method and add those to the original archive. It should make efficient batch archiving easy.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You compress a DLL file. You expect LZMA to work best on this one. Wrong, PPMD worked better on that particular DLL file. When you try with the next one, LZMA works best.
It's so unpredictable to compress data, even on the same filetypes. You need some analyzing from 7zip itself to be sure of what works best. Or an autoswitching algorithm that quickly analyzes the data and select the proper algorithm.
Me think......
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
And perhaps a small "learning" function. 7zip stores the results of the analyzed data in a text or html file and uses that data to compare and recognize new similar data. Like an opening book for chess programs.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You compress a DLL file. You expect LZMA to work best on this one. Wrong, PPMD worked better on that particular DLL file. When you try with the next one, LZMA works best.
It's so unpredictable to compress data, even on the same filetypes. You need some analyzing from 7zip itself to be sure of what works best. Or an autoswitching algorithm that quickly analyzes the data and select the proper algorithm. "
Like the powerful new WinZip 11.0!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>> You compress a DLL file. You expect LZMA to work best on this one. Wrong, PPMD worked better on that particular DLL file. When you try with the next one, LZMA works best.
That is why you profile using several DLLs: to find the setting that is best on average.
>> Or an autoswitching algorithm that quickly analyzes the data and select the proper algorithm.
This can be achieved via pre-filters I think.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
http://www.winzip.com/whatsnew111.htm
Our new "best compression" option allows you to let WinZip decide the best compression method for each file based on the file type. This will ensure that you maximize the compression of every file that you add to your Zip file....
Are you planning to do it for 7-zip?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
One year from promise create autodetection...
I think now it not important.
Why?
Look at http://freearc.org
New powerful archiver FreeArc written by Bulat Ziganshin can do it now!
-Overall, 11 compression algorithms and filters are included (compared to 3 in 7-zip and 7 in RAR) and this number still grows
-Includes LZMA, PPMD, TrueAudio and generic Multimedia compression algorithms with automatic switching by file type
7-zip good arciver,but time to die.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
totally disagree with the time to die comment, freearc is still missing essential features for me e.g. volumes, also it is written using other peoples compression routines, for me 7-zip is by far the best, yes is could do with some new features like the auto file type stuff, but the stuff under the hood is excelent.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes,may be so, but on next versions of FreeArc time to die for 7-zip EXACTLY!
Planning for FreeArc
Version 0.60
Dictionary of up to 1 GB in LZMA
backup support (save file times / attributes / ACLs)
Reed-Solomon codes for data recovery
Issue freearc.dll to support. arc in other archiver
integration with Explorer
full, at the level of WinRAR, GUI
Version 0.70
- and catch up with peregnat RAR!
multi-volume and recovery volumes
work with archives containing millions of files
zip support and other archival formats
Version 0.80
- the maximum compression!
optimization multicore CPU
Support compression algorithms with several exits
bcj2
segmentation files with further dozhatiem lzma / ppmd / multimedia algorithms
bmp/tif- file compression algorithm
change the format of the archive, in particular blur recovery record on the entire archive
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
me too ;) just look at the download rates - millions for 7-zip and thousands for fa
but you say that fa us wriiten using existing compression algos and it is the bad thing. i don't think so. fa uses best free compression methods available and i think it's very good. just look at the picture:
- it includes lzma and ppmd algos. like 7-zip
- it adds audio compression with True Audio algo - one of the best lossless audio compressors
- it mixes the things with automatic detection of filetype - i.e. program autodetects which algorithm to use. if you have experience optimizing 7zip archives you know what this means
- next it adds all the same filters as RAR and little more - REP (which is like lrzip, quite popular large-dictionary filter), DICT (that may be compared with XML-WRT), LZP (which is recommended by PPMD author to improve compression ratio)
- even more, it adds two more algorithms specifically for fast compression modes
- using filetype detection feature, it skips compression on already compressed files
just look at the list. it is exactly the features that you have asked for a years. from 11 algos in fa my own are 5 ones. these are small algos, mainly filters. when it was possible, i've used existing algos and implemented myself only methods that lacks free, open-source implementation
nevertheless, i don't think that freearc was *already* killed 7-zip. it should become much more stable and provide reasonable GUI, at very least.
also i don't think that we compete with Igor. i think that main Igor's work is lzma algorithm itself. it's excellent. it is the thing he is paid for. 7-zip by itself looks like a tool to show lzma potential for potential clients. it's solid, highly reliable, has a huge user base but nothing more
fa, on the other side, is concentrated not on showing lzma strength but on providing the best compression technologies available as well as best archiving features - updatable solid archives, portability, rich command line. so it also advertizes lzma by maing it necessary part of the best archiver at the planet. and you, users, win - because you got excellent lzma compression together with a rich set of other outstanding algos and a lot of archive processing features that you've also asked for a years. we just work in cooperation. and i think that it's much better for users rather than try to develop everything from scratch
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When compressing files, I often find myself manually compressing the same files 3 times, once with each supported compression method (LZMA, PPMd, BZip2), and keep the one with the best compression ratio.
Is it possible to somehow do it automatically?
Some sort of preview perhaps, were 7zip quick analyzes the data.
Imho it's not just which method LZMA or PPMd is best for a single file type, but which is best for a mix of different types.
If you take the time to try different settings (dictionary/wordsize,) you can achieve miraciously improvement but I don't think this can be achieved with just a reference table, because compressability will change. E.g.:
I tried to archive a folder with 7zip and WinRAR consisting of
2025 files 139 folder 968 705 491 973 701 120
.jpg 718files 718/719 MB
.png 361 files 105/106MB
.htm* 31 files 1.25/1.32MB
.html 131files 337/3,66 MB
.gif 339 796kB/1,74 MB
.doc 7 files 13.1 MB
As you can see most formats provide about the same compression ratio until I decreased the dictionary and word size, thats when I got the large improvement.
packer format time size
7z zip 10:40 927516
rar rar 6:03 926089
7z 7z 20~~ 920647 Bzip2
rar zip 1:49 920422
7z 7z 905435 PPMd 256 dic 32 word 4GB
7z 7z 9:40 901917 LZMA 64 MB duc, 256 word, 4GB 923/880 1632 kB/s
7z 7z 26:37 894956 916 434 753 PPMd 1024 dic, 32Word, 4GB
7z 7z 26:56 890 906 PPMd 1024 MB 8 MB word 1GB 923/869
========PPMd 1024 MB 8 MB word 64GB
please try this test with FreeArc to compare
I ran a few tests the best I achieved 904232 kB.
Hmm...more size on max when 7-zip?
I wouldn't mind even full compression, as long as I don't have to do it manually! :)
Hello everyone,
possible workaround:
- write a batch with the three wanted commandlines ending up with three different archives
- make the batch compare them
- make the batch delete the second- and the third-best
Integrating such function into 7-Zip could possibly slow it down enormous and thus make it unusable in the eyes of the common user.
Interesting idea though!
Best regards!
for the record: a 7zip profiler that does something similar exists. Check
http://sourceforge.net/forum/message.php?msg_id=3988231
Ddot
Yes, please test my profiler. It will test all file extensions in a directory with different compression methods to see which is best (where "best" means smallest size). Once you have 'profiled' all of the file types you want, you can compress them certain extensions with a certain method, and then compress others with a different method and add those to the original archive. It should make efficient batch archiving easy.
I don't know how to use it...
Imagine this.....
You compress a DLL file. You expect LZMA to work best on this one. Wrong, PPMD worked better on that particular DLL file. When you try with the next one, LZMA works best.
It's so unpredictable to compress data, even on the same filetypes. You need some analyzing from 7zip itself to be sure of what works best. Or an autoswitching algorithm that quickly analyzes the data and select the proper algorithm.
Me think......
And perhaps a small "learning" function. 7zip stores the results of the analyzed data in a text or html file and uses that data to compare and recognize new similar data. Like an opening book for chess programs.
To speed things up. :)
- The Swede
Hello everyone,
correct me if I'm wrong, but hadn't there been effort by Igor to sort files by a list to improve compression (at least at solid archives)?
Best regards!
Please tell me how to use this profile?
"Imagine this.....
You compress a DLL file. You expect LZMA to work best on this one. Wrong, PPMD worked better on that particular DLL file. When you try with the next one, LZMA works best.
It's so unpredictable to compress data, even on the same filetypes. You need some analyzing from 7zip itself to be sure of what works best. Or an autoswitching algorithm that quickly analyzes the data and select the proper algorithm. "
Like the powerful new WinZip 11.0!
And Winrar!
>> You compress a DLL file. You expect LZMA to work best on this one. Wrong, PPMD worked better on that particular DLL file. When you try with the next one, LZMA works best.
That is why you profile using several DLLs: to find the setting that is best on average.
>> Or an autoswitching algorithm that quickly analyzes the data and select the proper algorithm.
This can be achieved via pre-filters I think.
http://www.winzip.com/whatsnew111.htm
Our new "best compression" option allows you to let WinZip decide the best compression method for each file based on the file type. This will ensure that you maximize the compression of every file that you add to your Zip file....
Are you planning to do it for 7-zip?
- Are you planning to do it for 7-zip?
Yes, but that task is more difficult for solid archives.
One year from promise create autodetection...
I think now it not important.
Why?
Look at http://freearc.org
New powerful archiver FreeArc written by Bulat Ziganshin can do it now!
-Overall, 11 compression algorithms and filters are included (compared to 3 in 7-zip and 7 in RAR) and this number still grows
-Includes LZMA, PPMD, TrueAudio and generic Multimedia compression algorithms with automatic switching by file type
7-zip good arciver,but time to die.
totally disagree with the time to die comment, freearc is still missing essential features for me e.g. volumes, also it is written using other peoples compression routines, for me 7-zip is by far the best, yes is could do with some new features like the auto file type stuff, but the stuff under the hood is excelent.
Yes,may be so, but on next versions of FreeArc time to die for 7-zip EXACTLY!
Planning for FreeArc
Version 0.60
Dictionary of up to 1 GB in LZMA
backup support (save file times / attributes / ACLs)
Reed-Solomon codes for data recovery
Issue freearc.dll to support. arc in other archiver
integration with Explorer
full, at the level of WinRAR, GUI
Version 0.70
- and catch up with peregnat RAR!
multi-volume and recovery volumes
work with archives containing millions of files
zip support and other archival formats
Version 0.80
- the maximum compression!
optimization multicore CPU
Support compression algorithms with several exits
bcj2
segmentation files with further dozhatiem lzma / ppmd / multimedia algorithms
bmp/tif- file compression algorithm
change the format of the archive, in particular blur recovery record on the entire archive
>totally disagree with the time to die comment
me too ;) just look at the download rates - millions for 7-zip and thousands for fa
but you say that fa us wriiten using existing compression algos and it is the bad thing. i don't think so. fa uses best free compression methods available and i think it's very good. just look at the picture:
- it includes lzma and ppmd algos. like 7-zip
- it adds audio compression with True Audio algo - one of the best lossless audio compressors
- it mixes the things with automatic detection of filetype - i.e. program autodetects which algorithm to use. if you have experience optimizing 7zip archives you know what this means
- next it adds all the same filters as RAR and little more - REP (which is like lrzip, quite popular large-dictionary filter), DICT (that may be compared with XML-WRT), LZP (which is recommended by PPMD author to improve compression ratio)
- even more, it adds two more algorithms specifically for fast compression modes
- using filetype detection feature, it skips compression on already compressed files
just look at the list. it is exactly the features that you have asked for a years. from 11 algos in fa my own are 5 ones. these are small algos, mainly filters. when it was possible, i've used existing algos and implemented myself only methods that lacks free, open-source implementation
my goal was always the most optimal compression for users, not records, and i've reached this goal - look at http://www.maximumcompression.com/data/summary_mf2.php#data
nevertheless, i don't think that freearc was *already* killed 7-zip. it should become much more stable and provide reasonable GUI, at very least.
also i don't think that we compete with Igor. i think that main Igor's work is lzma algorithm itself. it's excellent. it is the thing he is paid for. 7-zip by itself looks like a tool to show lzma potential for potential clients. it's solid, highly reliable, has a huge user base but nothing more
fa, on the other side, is concentrated not on showing lzma strength but on providing the best compression technologies available as well as best archiving features - updatable solid archives, portability, rich command line. so it also advertizes lzma by maing it necessary part of the best archiver at the planet. and you, users, win - because you got excellent lzma compression together with a rich set of other outstanding algos and a lot of archive processing features that you've also asked for a years. we just work in cooperation. and i think that it's much better for users rather than try to develop everything from scratch