7z with unicode file name with surrogate pair is not handled well in Linux. For example, if the file name is U+2004E, it will become U+004E ("N") in 7z archive in Linux. However, U+2004E can be encoded into header with surrogate pair correctly in Windows.
This problem is caused by different in wchar_t. In Linux, sizeof(wchar_t) is 4, and mbstowcs will convert U+2004E from UTF-8 into single wchar_t. However, when writing 7z header, it assumes sizeof(wchar_t) is 2, so U+2004E becomes U+004E. In Windows, the sizeof(wchar_t) is 2, and U+2004E is encoded by surrogate pair, so it does not have this problem.
In my patch, I added another convert to add surrogate pair when necessary. With this patch, Linux can create correct 7z archive with file name like U+2004E.
Log in to post a comment.