From: SourceForge.net <no...@so...> - 2009-01-30 15:01:25
|
Feature Requests item #2549196, was opened at 2009-01-30 15:10 Message generated for change (Comment added) made by nijtmans You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=360894&aid=2549196&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 44. UTF-8 Strings Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Donal K. Fellows (dkf) Assigned to: Jan Nijtmans (nijtmans) Summary: Add locale support for case management Initial Comment: Tcl needs a system for doing case folding and case-insensitive comparisons. Works out as being adding a suitable -locale option to the following commands: lsearch, lsort, regexp, regsub, string compare, string equal, string is (possibly), string match, string tolower, string totitle, string toupper, switch -nocase Possibly also need to consider allowing other things to fit into the locale system (esp. the number handling of [format] and [scan]) but those might be delegatable to another issue. [[THIS FRQ IS BEING USED TO COALESCE OTHERS]] ---------------------------------------------------------------------- >Comment By: Jan Nijtmans (nijtmans) Date: 2009-01-30 16:01 Message: There is a document describing this: <http://www.unicode.org/reports/tr21/tr21-5.html#Caseless_Matching> In short, apart from "string tolower", "string toupper" and "string totitle", there should be a new "string fold" which 'folds' the string to a case-insensite form, ready to be compared with another case 'folded' string. The new "string fold" function will do the same as "string tolower" for most locales, but turkish is an example where things are different. Apart from that, I would expect new functions: string is folded $s => [expr {[string fold $s] eq $s}] string is title $s => [expr {[string totitle $s] eq $s}] Looking at the unicode case fonding table: http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt It seems that turkish is the only special case. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2009-01-30 15:39 Message: Are the oddities locale-specific for a given character (e.g. [string toupper i]->I in English and ??? in Turkish), or character-specific ([string toupper bizarre-i-from-Turkish-codepage]->some-other-bizarre-char) ? If the latter, [string to*] don't need locale dependency, they just need love and care to properly handle the whole set of Unicode characters. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2009-01-30 15:24 Message: Turkish has strange rules for dotted and dotless 'i's. Alas. There may be other oddities out there, but I've not done a proper literature search. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2009-01-30 15:19 Message: I understand comparisons are sensitive to the locale, but why are [string to*] included in this list ? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=360894&aid=2549196&group_id=10894 |