Menu

#3 String encoding related with encryption issue

open-accepted
nobody
None
5
2010-07-01
2009-12-26
No

I've found some more encryption password related info.

Read below link.
http://www.artpol-software.com/ZipArchive/KB/0610051525.aspx#intro

It said that password is encoded with Windows current system ANSI codepage not OEM one,
when the user tried to make encrypted archive file in Windows.

And refer below
http://msdn.microsoft.com/en-us/goglobal/cc563921.aspx#XPHomeProx86

Some languages which has different codepage between ANSI and OEM has more complicated problem that I commented before.

With above consideration, I had a simple test with Hebrew filename.
I made a simple archive file in Hebrew regional locale setting in control panel.
I used Windows 'send to compress folder' to archive, and set the password as 'אבג'.
( You can enter the password indirectly using 'copy and paste'. )
And then I examined filename's 'אבג' hex string in zip file, it turned out 0x80 0x81 0x82.
They are OEM Hebrew codepage characters.
Compare it with this, http://en.wikipedia.org/wiki/Code_page_862
Please make sure it is different with ANSI one(CP1255), they are 0xE0 0xE1 0xE2.
http://en.wikipedia.org/wiki/Windows-1255

I extracted it with my patched p7zip binary without assigning CP1255 into password encoding.
At this time, 7za will assume password encoding as same as '-mENC' value.
'7za t -mENC=cp862 -pאבג test.zip'

Here is a result. I got a error!!!
-------------------------------------------------------------------------------------
7-Zip (A) 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30
p7zip Version 9.04 (locale=utf8,Utf16=on,HugeFiles=on,2 CPUs)

Processing archive: test.zip

Testing : test/01. ASCII Hello.txt
CRC Failed in encrypted file. Wrong password?
Testing : test/02. Hebrew אבג.txt
CRC Failed in encrypted file. Wrong password?

Sub items Errors: 2
-------------------------------------------------------------------------------------

In this case, even 'The Unarchiver' can't extract it.
Because it consider codepage of filename and password are same.
In case of Korean or Japanese, it's correct. They have same ANSI and OEM codepage.

Now, I added option '-mPWENC' to assign password's encoding separately.
'7za t -mENC=cp862 -mPWENC=cp1255 -pאבג test.zip'

Here is the result that I expected.
-------------------------------------------------------------------------------------
7-Zip (A) 9.04 beta Copyright (c) 1999-2009 Igor Pavlov 2009-05-30
p7zip Version 9.04 (locale=utf8,Utf16=on,HugeFiles=on,2 CPUs)

Processing archive: test.zip

Testing : test/01. ASCII Hello.txt
Testing : test/02. Hebrew אבג.txt

Everything is Ok

Files: 2
Size: 8
Compressed: 300
-------------------------------------------------------------------------------------

I attached test.zip and patched p7zip binary
http://dl.dropbox.com/u/1364565/temp/p7zip%20codepage%20support%20binary.7z
.
Finally, I have to say that these encoding consideration might be abandoned for spreading new UTF-8 filename convention widely.
And it might be better way to use UTF-8 Unicode password string in zip archive.

Discussion

  • ChangBeom Park

    ChangBeom Park - 2009-12-26

    Test archive file which has Hebrew filename in it with Hebrew password 'אבג'(without quote).

     
  • aONe

    aONe - 2009-12-30

    So the way is UTF-8? Let's comment it to Pavlov?

    Cause what I see you're doing it's an special code for each language, as is done on The Unarchiever. Think it's better to cut the problem in 7-zip (p7zip) using UTF-8.

     
  • ChangBeom Park

    ChangBeom Park - 2009-12-30

    Actually, we can't prohibit these user's habit to use non-ASCII code into password.
    But it is certain this habit make a problem for exchanging files between two diffrent foreign language users.

    So I want to say that we have to enlightened users with this issue, and then they will decrease to use that finally.
    I'm not saying that stopping to support these Windows locale related password extraction in p7zip or something right now. The reason why I wrote this thread is also noticing this issue to you and others.

    I have tested some archiver programs, winrar, 7zip, Alzip, V3zip, Winzip, PKZip...
    In point of how well these support Unicode, each has some pros. and cons. have some bugs also.
    But it's hard to say what is the definitive solution to dealing zip format.
    Now on I think just support these all adhoc issues first, and guide users to more better way.
    It's all.

    >So the way is UTF-8? Let's comment it to Pavlov?

    I asked Pavlov and info-zip developer to support encoding coversion option in 7zip and info-zip
    1. conversion from UTF-8 in OSX to Windows local codepage
    2. normalize filenames from OSX's variant Unicode NFD to Unicode standard NFC.

    You can read it using below link.
    https://sourceforge.net/projects/sevenzip/forums/forum/45797/topic/3332513
    http://www.info-zip.org/board/board.pl?m-1247398198

    At that time, they showed me a tepid answer. Info-zip's developer as well answered like Pavlov.
    Actually some of many users had asked these issue before I asked.
    I don't know what's going on now in p7zip.
    I don't think they're wrong with denining my suggestion at that time.
    I just think they have their own plan. I just have a little different think from them.
    And it's a communication problem cause I'm not good at English, So I can't express what I thought in Engslish completely.

    Sincerely,
    ChangBeom Park

     
  • aONe

    aONe - 2010-06-30

    Hi ChangBeom Park,

    Any news with UTF-8? I'm going to update the p7zip version to 9.13. You've fixed it?

     
  • aONe

    aONe - 2010-07-01
    • status: open --> open-accepted
     

Log in to post a comment.