Menu

#17 Unicode support on linux target

open
None
5
2008-01-28
2008-01-17
_root
No

I've discovered that OIS 1.0 (and 1.1 from CVS) does not support unicode characters in KeyEvent's. It grabs a buffer from X using XLookupString() function, and injects the first returned character of that buffer.

I've also discovered that in UTF-8 locale, XLookupString() stores an UTF8-encoded character into the passed buffer.

So, my solution is to set the locale to UTF8, and convert the returned UTF-8 character into unicode (UTF-32).

The included patch does only the second part of work: it assumes that UTF-8 locale is set, and converts the buffer returned from XLookupString to UTF-32 character.

The first part is not so trivial. At the initialization state, we have to call setlocale(LC_CTYPE, "utf8-locale-name"), and we don't know the second parameter. In my program, this work is done in configure script: it finds the first UTF-8 returned from "locale -a". I don't know how this problem should be solved in "OIS style".

Discussion

  • _root

    _root - 2008-01-17

    patch file

     
  • Phillip Castaneda

    Logged In: YES
    user_id=947604
    Originator: NO

    Thanks for posting this. It was always a plan to fully implement proper unicode support :)

    Reading up n setlocale seems to state that calling setlocale(LC_CTYPE, "") will set locale based on environment settings. Which, seems to work here. My (from standard Ubuntu install) has LANG set to en us UTF8. So, should work. I modified your patch a little. For Unicode text mode, use your utf8 to utf32 function. For ascii mode, simply use first byte as before... Though, will produce invalid (as in before the patch) characters for multibyte characters. But, provides less processing. And of course, Off will avoid any overhead when not needed.

    If you could please test latest CVS v1_2 branch and make sure I did not break it.

    Thanks again.

     
  • Phillip Castaneda

    • assigned_to: nobody --> pjcast
    • status: open --> pending
     
  • _root

    _root - 2008-01-28
    • status: pending --> open
     
  • _root

    _root - 2008-01-28

    Logged In: YES
    user_id=1112889
    Originator: YES

    I think it's better not to call setlocale() at all instead of calling setlocale(LC_CTYPE, ""), because the locale is set to the environment setting by default. Also, you shouldn't rely on the current locale setting. For example, I've set ru_RU.koi8r in my system. UTF-8 locale name may vary from system to system: en_US.utf8, en_US.UTF-8, en_US.UTF8 and so on. On some systems, UTF-8 locale may not be defined at all.
    I think that it should be always a user's decision of what locale to use. You only should notice in your documentation, that to use unicode keyboard translation, the UTF8 locale must be set.
    You also may check if the current locale uses UTF8 by calling nl_langinfo(CODESET): it should return "UTF-8" in such locale.

    It doesn't seem that v1_2 branch is patched:
    http://wgois.cvs.sourceforge.net/wgois/ois/src/linux/LinuxKeyboard.cpp?revision=1.19.2.1&view=markup&pathrev=v1_2

    Am I looking correct file?

     
  • Phillip Castaneda

    Logged In: YES
    user_id=947604
    Originator: NO

    Hi,

    Thanks for looking at this. First, I cvs committed from demo directory by accident. So, missed the changes to keyboard :) I just submitted, and new version should be there now.

    In regards to setlocale. If I was reading the docs right, then the current locale set for the process will be the C (or perhaps POSIX) locale. So, calling setlocale should be required. In regards to environment variable, I think it would be safe to call it regardless. If user has UTF locale set in environment, then they get UTF-8. If not, they don't get it (perhaps they don't want it?).

     
  • _root

    _root - 2008-01-28

    Logged In: YES
    user_id=1112889
    Originator: YES

    I've looked over the code, it's okay.

    About setlocale: oh, I'm sorry, I missed that default locale is "C".
    The problem is that in, for example, Ogre+CEGUI+OIS, the only way to get correct non-English keyboard input is to use Unicode characters (I haven't found the ability to set the font encoding in CEGUI, seems that it can work only with unicode fonts), but the end user might not know that. So, the programmer should _always_ set the UTF8 locale, if he/she wants localized input in such environment. I don't think it's OIS problem, and maybe it's offtopic here, but it's a big problem.

     

Log in to post a comment.