Menu

#1675 Cannot select encoding of files that was detected as UTF-8

Trunk
closed-fixed
None
5
2008-02-24
2008-02-24
No

When comparing the attached files that the codepages are 932 (Japanese), WinMerge wrongly detects these files as UTF-8.
So I specified codepage 932 with File Encoding Dialog but no effect.

This bug is caused by double encoding detection.
The first is by GuessCodepageEncoding().
The second is by UniFile::ReadBom().

The first one isn't called when using File Encoding Dialog. But the second one is always called. If the ReadBom() detects the file as UTF-8, specified codepage on File Encoding Dialog is ignored.

I think it should not call ReadBom() if specifying the codepage of the files.

I am attaching the fix.

Discussion

  • Takashi Sawanaka

     
  • Takashi Sawanaka

    Logged In: YES
    user_id=954028
    Originator: YES

    File Added: cp-932.7z

     
  • Takashi Sawanaka

     
  • Kimmo Varis

    Kimmo Varis - 2008-02-24

    Logged In: YES
    user_id=631874
    Originator: NO

    - can you fix the comment above lines you are changing in MergeDoc.cpp, line 1683:
    > // Recognize Unicode files with BOM (byte order mark)
    > // or else, use the codepage we were given to interpret the 8-bit characters

    As it now detects also UTF-8 files without BOM...

    Otherwise looks OK to me.

     
  • Takashi Sawanaka

    • assigned_to: nobody --> sdottaka
    • status: open --> closed-fixed
     
  • Takashi Sawanaka

    Logged In: YES
    user_id=954028
    Originator: YES

    I committed to SVN trunk with fixing comment. Completed: At revision: 5075

     

Log in to post a comment.