From: SourceForge.net <no...@so...> - 2006-07-31 05:34:33
|
Feature Requests item #1511357, was opened at 2006-06-23 10:43 Message generated for change (Settings changed) made by dgp You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=360894&aid=1511357&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: 44. Parsing and Eval >Group: None Status: Open Resolution: None Priority: 5 Submitted By: Don Porter (dgp) Assigned to: Don Porter (dgp) Summary: non-ASCII function names rejected Initial Comment: Since TIP 232 went Final, [expr] functions are a 1-1 map with Tcl commands in the tcl::mathfunc namespace. The attached script demonstrates that the [expr] parser rejects function names with non-ASCII characters in them, even though Tcl commands are perfectly happy to include non-ASCII characters. Seems inconsistent, and seems mildly attractive to be able to define math functions that actually have the traditional names instead of an ASCII transliteration. ---------------------------------------------------------------------- Comment By: Kevin B KENNY (kennykb) Date: 2006-07-30 13:56 Message: Logged In: YES user_id=99768 Hmmm. I like the idea in general, but I think that we really ought to write a TIP to address these issues across the board. There are subtle incompatibilities here, many of which come up in error reporting; how do you post an error dialog or write a message to stderr reporting an unknown mathfunc whose name isn't in the native character set? (I know that we can address these issues, but we need to think about them.) In addition to math function names, the issues of international characters in Tcl source code of which I'm aware are: - Variable names and $-substitution - Bug 408568. There appears to be no good reason that we shouldn't allow a variable named by a Greek letter, for instance. - Numeric digits - should we recognize equivalents for the Indo-Arabic digits (Half-width and full-width digits, Arabic/Thai/Devanagari presentation forms of the digits, ...)? [I'm not suggesting dealing with non-positional notations like Chinese or Japanese, although we could argue whether [string is digit] ought to recognize these, too.] This would affect not only [expr] and its friends plus [scan]/[format] etc, but also bits like field lengths in [binary scan]. I'm a little nervous about extending things this far; do we really want to inflict the maintainability issues that allowing numbers on non-European scripts would impose? - Whitespace. There are a number of places where we look at 'isspace' and fail to count the various Unicode whitespace characters (breaking and nonbreaking spaces of various widths). (I've been bitten by this one already, when a Tcl source file was edited in a Unicode editor, and somehow two words got separated by a string of half-width spaces. - Don't even get me started on Unicode collation, normalization, case mapping, comparison, .... We will likely want to begin addressing these someday, but that ought to be a separate project from allowing i18n of our own sources. In addition, we will need to pursue making international variable and function names typable in character sets that lack the characters. Right now, backslash subsitiution happens at the wrong times or not at all: % expr {s\u0069n(3.14159)} invalid bareword "s" in expression (prepend $ for variable; append argument list for function call) (parsing expression "s\u0069n(3.14159)") % expr "s\u0069n(3.14159)" 2.653589793352726e-6 % set i 1 1 % puts $\u0069 $i % puts [set \u0069] 1 % puts ${\u0069} can't read "\u0069": no such variable In summary: This may well be a great idea whose time is come, but I'm profoundly uncomfortable about doing it without a TIP. ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2006-06-23 10:53 Message: Logged In: YES user_id=80530 Also contributing to the desirability of this change is the new [source -encoding] option in Tcl 8.5 that for the first time really permits scripts to have non-ASCII characters as originally read in. As Unicode-aware editors get more widespread that increases desirability of this change too. Attached patch makes the demo script run successfully. Please review. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=360894&aid=1511357&group_id=10894 |