I asked this question over on Superuser, but I figure you guys might know too.
I have a set of files I would like to compress that I know to be repetitive and compressible, but 7zip chooses a non-optimal order to compress the files and fails to take advantage of their compressibility. How can I get 7zip to compress the files in another order?
The files I want to compress are the following:
I know it is possible for 7zip to take advantage of the repetition between the PDF and the bare JPGs because when I archive just the PDF and the JPGs together, I get a compression ratio of 47%. But when I try to include the 500MB of other files, 7zip compresses the JPGs first, then the miscellaneous other data, and by the time it gets to the PDF, the compression algorithm must have 'forgotten' about the JPGs because the PDF is hardly compressed at all.
With x64 7-zip 9.32 alpha on Windows 8.1, using the 7z archive format, ultra compression level, LZMA2 algorithm, 256MB dictionary size, 128 word size, 4GB solid block size, and 2 CPU threads, I get the following compression ratios:
Since the misc. files are compressible to 44% of their original size, and the PDFs and JPGs together are compressible to 47%, I would expect everything together to be compressible to somewhere on the lower end of 44-47%, but due to the poor ordering of files by 7zip, I get a significantly worse result.
I have tried to alter the order 7zip compresses files by playing with file creation, modification, and access dates. I have tried moving the files to another folder and copying them back so they are rewritten to disc consecutively. I have even tried archiving all the JPGs in a zip file with store-level compression, so that their filesize will approximately match the PDF. No matter what I do, I can't seem to get 7zip to compress the PDF and the JPGs without the misc. files in between.
Any help or ideas would be greatly appreciated. I am unable to increase the dictionary size due to memory constraints.
Now there is no way to change the order of files.
You can try to increase dictionary size with "fast" / 1 cpu thread mode instead of "ultra" mode.
Thanks Igor, I will try that. Even if there's no official way to change the compression order through the GUI, is there some way I could work around it? What is the internal method 7zip uses to decide compression order, and would I be able to coax it into doing what I want?
I managed to solve this problem. The solution was to create an archive containing only the miscellaneous files, then select "add to archive" from the explorer context menu while selecting both the PDF and JPGs. In the 7zip "Add to Archive" dialog, I chose the same compression settings and archive name as before.
This resulted in the overall 45% compression ratio I was looking for.
7-Zip uses built-in list of extensions to sort files:
" lzma 7z ace arc arj bz bz2 deb lzo lzx gz pak rpm sit tgz tbz tbz2 tgz cab ha lha lzh rar zoo"
" zip jar ear war msi"
" 3gp avi mov mpeg mpg mpe wmv"
" aac ape fla flac la mp3 m4a mp4 ofr ogg pac ra rm rka shn swa tta wv wma wav"
" swf "
" chm hxi hxs"
" gif jpeg jpg jp2 png tiff bmp ico psd psp"
" awg ps eps cgm dxf svg vrml wmf emf ai md"
" cad dwg pps key sxi"
" max 3ds"
" iso bin nrg mdf img pdi tar cpio xpi"
" vfd vhd vud vmc vsv"
" vmdk dsk nvram vmem vmsd vmsn vmss vmtm"
" inl inc idl acf asa h hpp hxx c cpp cxx rc java cs pas bas vb cls ctl frm dlg def"
" f77 f f90 f95"
" asm sql manifest dep "
" mak clw csproj vcproj sln dsp dsw "
" class "
" bat cmd"
" xml xsd xsl xslt hxk hxc htm html xhtml xht mht mhtml htw asp aspx css cgi jsp shtml"
" awk sed hta js php php3 php4 php5 phptml pl pm py pyo rb sh tcl vbs"
" text txt tex ans asc srt reg ini doc docx mcw dot rtf hlp xls xlr xlt xlw ppt pdf"
" sxc sxd sxi sxg sxw stc sti stw stm odt ott odg otg odp otp ods ots odf"
" abw afp cwk lwp wpd wps wpt wrf wri"
" abf afm bdf fon mgf otf pcf pfa snf ttf"
" dbf mdb nsf ntf wdb db fdb gdb"
" exe dll ocx vbx sfx sys tlb awx com obj lib out o so "
" pdb pch idb ncb opt";
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.