Menu

#552 toupper limitations

open
5
2009-05-07
2009-05-07
Don Porter
No

kbk \u0149 should uppercase to \u02bc\u004e
dgp Tcl_UniCharToUpper() cannot represent that.
dgp one Tcl_UniChar in; one Tcl_UniChar out.
kbk The fundamental assumption underlying Tcl_UniCharToUpper is wrong.

Discussion

  • Jan Nijtmans

    Jan Nijtmans - 2009-05-08

    On the long run, I don't think it's wise to
    do all UTF-8 processing specially in Tcl,
    while there are better solutions outside,
    e.g. utf8proc:
    http://freshmeat.net/projects/utf8proc/
    I would be in favour of including utf8proc
    in Tcl, and deprecate Tcl's own UTF-8
    handling functions.

    Anyone knows of (better) alternatives to
    utf8proc? This is worth a TIP!

     
  • Donal K. Fellows

    Definitive link is
    http://www.flexiguided.de/publications.utf8proc.en.html

    My principal concern is that it doesn't look to be actively maintained. Perhaps it is perfect...? :-)

     
  • Jan Nijtmans

    Jan Nijtmans - 2009-05-08

    yeah, like libtommath ;-)

     
  • Jan Nijtmans

    Jan Nijtmans - 2009-08-27

    utf8proc has a new home:
    http://www.public-software-group.org/utf8proc
    I think that answers the concern regarding
    the maintainance of this software. utf8proc
    is not perfect, and in its current form not
    even usable for Tcl (e.g. it doesn't compile
    with MSVC 6). But all this is fixable, and
    having an organisation as contact point
    is a big improvement compared to the
    earlier situation

     
  • Donal K. Fellows

    OK, my non-technical objections are dealt with. Now it's more to do with the actual details of how to do the integration, but that's not something that I expect to be a show-stopper. Just takes effort.

    This probably ought to be done on a feature-dev branch so that we can get things sorted out (including what API changes we want) before rolling into the mainline, and that also means it won't make 8.6.

     
  • Jan Nijtmans

    Jan Nijtmans - 2009-08-27

    How about a simpler path. utf8proc already has ruby bindings, but no Tcl binding. I am planning to write a Tcl stub-enabled extension to
    utf8proc, with full TEA support. This extension is simply a wrapper to
    all UTF-8 conversions we need. As soon as this works, we know it is
    integratable with Tcl. And the development can be done in the
    utf8proc repository. I only have to become an utf8proc submitter,
    but I already am talking to Jan Behrens about that.

    No it won't make 8.6, for sure. I wouln't even try that.