Menu

#544 Add locale support for case management

open
5
2009-01-30
2009-01-30
No

Tcl needs a system for doing case folding and case-insensitive comparisons. Works out as being adding a suitable -locale option to the following commands:
lsearch,
lsort,
regexp,
regsub,
string compare,
string equal,
string is (possibly),
string match,
string tolower,
string totitle,
string toupper,
switch -nocase

Possibly also need to consider allowing other things to fit into the locale system (esp. the number handling of [format] and [scan]) but those might be delegatable to another issue.

[[THIS FRQ IS BEING USED TO COALESCE OTHERS]]

Discussion

  • Alexandre Ferrieux

    I understand comparisons are sensitive to the locale, but why are [string to*] included in this list ?

     
  • Donal K. Fellows

    Turkish has strange rules for dotted and dotless 'i's. Alas. There may be other oddities out there, but I've not done a proper literature search.

     
  • Alexandre Ferrieux

    Are the oddities locale-specific for a given character (e.g. [string toupper i]->I in English and ??? in Turkish), or character-specific ([string toupper bizarre-i-from-Turkish-codepage]->some-other-bizarre-char) ?

    If the latter, [string to*] don't need locale dependency, they just need love and care to properly handle the whole set of Unicode characters.

     
  • Jan Nijtmans

    Jan Nijtmans - 2009-01-30

    There is a document describing this:
    <http://www.unicode.org/reports/tr21/tr21-5.html#Caseless_Matching>

    In short, apart from "string tolower", "string toupper" and
    "string totitle", there should be a new "string fold"
    which 'folds' the string to a case-insensite form, ready
    to be compared with another case 'folded' string.

    The new "string fold" function will do the same as
    "string tolower" for most locales, but turkish is an
    example where things are different.

    Apart from that, I would expect new functions:
    string is folded $s => [expr {[string fold $s] eq $s}]
    string is title $s => [expr {[string totitle $s] eq $s}]

    Looking at the unicode case fonding table:
    http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt
    It seems that turkish is the only special case.