Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.
There is an issue with the way unicode filenames are handled in the RAR handler. The filenames are not normalized to a consistent form ( either precomposed or decomposed), which can lead to issues. Here's a sample scenario that I encountered :
Say a rar archive contains the following file path : "Café foobar/quux.txt" in decomposed form ( i.e., the é is actually an ascii character e followed by an acute ).
From an external source, ( say a list file or something ), you read that the user wants to strip the "Café foobar" from the path during extraction, and this is in precomposed form (i.e., single character é ).
Now, when the extract callback's getstream compares the removePathParts variables component using MyStringCompareNoCase, it's going to mess up ( because é!=e ), the operation will fail.
I've tested this on a Mac OS X, and this appears to only occur with RAR archives because for other formats MultiByteToUnicodeString is used, which in turn calls CFStringNormalize.
On OS X, A quick and naive fix for this would be to add this line to RarIn.cpp :
void CInArchive::ReadName(CItemEx &item, int nameSize)
+item.UnicodeName = MultiByteToUnicodeString(UnicodeStringToMultiByte(item.UnicodeName, 0), 0);
( Ideally, this normalization would occur in a more "core" location, so that removePathParts and other strings are all in a consistent form. )
I think that there are these composed/decomposed problems in many places in 7-zip code.
Now I'm not ready to work with that problem. Maybe later.