Menu

#2159 UTF-8 encoding not reliably detected for files over 4K

Trunk
closed-fixed
None
5
2018-02-25
2013-12-18
Jim Maloy
No

Using WinMerge 2.14.0.0 (Unicode)

In the attached set of three text files (all UTF-8 encoded), there are at most three lines different between any two of them. However, comparing file C-UTF8WithFirstUCAfter4096.txt to either A-UTF8WithFirstUCBefore4096.txt or B-UTF8WithFirstUCBefore4096.txt shows several spurious differences. This happens because WinMerge incorrectly classifies the file as using CP1252 encoding vs. UTF-8. This in turn is apparently due to the first unicode character appearing after byte 4096 in the file.

Discussion

  • Jim Maloy

    Jim Maloy - 2013-12-18

    Comparison files, plus configuration.

     
  • Jochen Tucht

    Jochen Tucht - 2013-12-21

    The situation is somewhat better in WinMerge 2011:
    The status bar shows the wrong encoding, but editing works correctly.

    https://bitbucket.org/jtuc/winmerge2011/commits/eb29a0a fixes the status bar issue.

     
  • Takashi Sawanaka

    • status: open --> closed-fixed
    • assigned_to: Takashi Sawanaka
    • Group: Branch --> Trunk
     
  • Takashi Sawanaka

    Seems to be fixed in 2.15.2 experimental version

     

Log in to post a comment.