#704 I think codepage handling is really broken


Current cvs, non-unicode build

unicoder.cpp is getting its idea of the internal
codepage by calling "getDefaultCodepage()" (from
common/unicoder.cpp function getDefaultEncoding).

This function is implemented in codepage.cpp as using a
variable which was set by the Edit/Options/codepage tab.

So, we're letting the user's chocie of default codepage
drive what unicoder thinks the internal windows
codepage is.

I think this is badly broken, because unicoder needs to
convert to the windows internal codepage -- the one
that ExtTextOut believes in, because the user is going
to see what ExtTextOut displays.

So if you set the codepage to anything besides the
system default, you have (a) told winmerge how to
interpret most 8-bit files (which is fine), and (b)
told winmerge what the windows codepage is, which is
not fine -- now you are in WYSINWYH* hell, because you
are playing a make believe game that ExtTextOut is not
playing with you. :)

*What You See Is Not What You Have (I just made this up :))

However, I've just figured this out, so I'm open to
debate, discussion, or being proven wrong.


  • Anonymous - 2004-09-29

    Logged In: YES

    I tested to see what codepage is the one actually used by
    ExtTextOut, according to what I see, and unsurprisingly it
    is GetACP(). I saw no indication on the web or elsewhere
    that CP_THREAD_ACP (the other contender in my mind) is ever
    used, by anything.

    I set my thread codepage, user codepage, and system
    codepage, to three different codepages, and did the tests on
    a file starting off with the two bytes 0xBB 0xB7. These make
    up a single multibyte Jambo character in codepage 949 and a
    single multibyte Chinese character in codepage 950, and
    those were the system codepages in the two tests,
    respectively. The attached zip includes screenshots, the
    files compared, and text files explaining the setup on XP,
    and excerpts of WinMerge's reported configuration.

    I believe this proves that we should only ever use GetACP in
    unicoder.cpp, and that my code in unicoder.cpp that tries to
    use CP_THREAD_ACP is misguided and should be removed, and
    the code in unicoder.cpp that calls to codepage.cpp to get
    the default codepage is misguided and should be removed.

  • Anonymous - 2004-09-29

    zip of files, screenshots, and explanatory files

  • Kimmo Varis

    Kimmo Varis - 2004-09-29

    Logged In: YES

    Can we live with this in 2.2RC release? I'm intending to
    build that on thursday evening (morning for you :) since I
    don't have time on friday and propably no time in weekend..

    And do we need UI changes to fix this, to codepage options

  • Anonymous - 2004-09-29
    • assigned_to: nobody --> puddle
    • status: open --> closed-rejected
  • Anonymous - 2004-09-29

    Logged In: YES

    See discussion in PATCH [ 1036683 ] (unicoder.cpp always use
    GetACP codepage). Closing this bug as rejected.

    We prefer to have "What You See Is Not What You Have", and
    produce the same output bytes that were input, which is what
    we have now, to what we used to have, which was "What You
    See Is What You Have, but not what you started with".


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks