Exploring Windows unzippers

Finally got chance rather forced myself to make use of this chance :), to explore some freely available .zip Windows unzippers last night. Main reason behind this has been the promise made to Paul about an update to this project surely by August 2009 end.

Paul had long back interacted with me wrt CakeCmd and it has been behind another sourceforge project (damageddocx2txt), an offshoot of this project.

To my surprise I found that CakeCmd extracts data from corrupted .zip archives where unzip, ZipReader, 7z and WinRAR fail with errors, of course this extracted data is not all clean, as expected. I had two files from Paul to test these.

CakeCmd also gives failure message like

Processing document.xml...
Processing document.xml... Fail

but it still tries to extract more data than above mentioned unzippers that either extract little or no data.

CakeCmd usage has an issue; it waits for user to "Press <enter> to continue.", but that can easily be circumvented in the script. What are pipes for! :)

Another thing that I noticed in Windows unzippers is that an option to send the extracted file to stdout is in general missing.

Also, unlike Linux "unzip" that is as much happy with .docx as with .zip, that's not the case with CakeCmd etc..

This gives more meaning to the enhancement I wanted to make in this script for quite some time.

Another interesting task that gets added to the list now after trying out the modified perl script using CakeCmd, is to explore whether clobbered output could be sanitized to some extent in case of damaged docx files.

Posted by Sandeep Kumar 2009-08-23