From: <no...@so...> - 2001-03-28 22:32:29
|
Bugs item #411825, was updated on 2001-03-27 22:36 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=110894&aid=411825&group_id=10894 >Category: Compiler and Objects Group: 8.3.1 >Status: Closed Priority: 5 Submitted By: Adrian Robert (arobert3434) >Assigned to: Don Porter (dgp) Summary: Passing list w/UTF-8 from C can fail Initial Comment: On certain installations of Tcl/Tk 8.3.1, the passing of UTF-8 character-triplets ending in octal 240 (decimal 160, hex A0) interferes with list delimitation when Tcl_AppendElement is used to return a result from a C function. In particular, if a UTF-8 string ending in octal 240 is appended to the result, and then another UTF-8 string is appended afterwards, the octal 240 seems to be interpreted as a "forward delete" character of some kind, with the result that the separation between the two list elements is erased and they are interpreted as one. The following C function, when called from Tcl, illustrates the problem. int sendCharList(ClientData clientData, Tcl_Interp *interp, int argc, char **argv) { char s1[5], s2[5], s3[5], s4[5]; strcpy(s1, "\345\220\240"); strcpy(s2, "\345\214\240"); strcpy(s3, "\351\235\240"); strcpy(s4, "\347\264\240"); Tcl_ResetResult(interp); Tcl_AppendElement(interp, s1); Tcl_AppendElement(interp, s2); Tcl_AppendElement(interp, s3); Tcl_AppendElement(interp, s4); return TCL_OK; } The Tcl calls: set s6 [sendCharList] puts "[llength $s6] , [string length $s6]" should output "4 , 7" (4 list elements, each a single UTF-8 composite character plus 3 delimiters). On some systems it does. On others, however, the output is "1 , 4", resulting from deletion of the list delimiters somewhere during passage from C to Tcl. A complete test program involving the above (plus some additional tests and using wish not tclsh) may be accessed at: ftp://zakros.ucsd.edu/arobert/Temp/testTclBug.tgz (it is also attached). A full application that exposes the bug (and led to its discovery) may be found at: http://freshmeat.net/projects/hanzim Unfortunately, I have not been able to isolate why some installations exhibit the bug and some don't. A default SUSE 7.0 Linux installation of 8.3.1 had the problem, while a default Slackware 7.1 installation of the same Tcl/Tk version did not. Maybe it is a compilation flag difference... ? I'm also not sure whether it persists in 8.3.2 or 8.4. ---------------------------------------------------------------------- >Comment By: Don Porter (dgp) Date: 2001-03-28 14:31 Message: Logged In: YES user_id=80530 TclNeedSpace() is not UTF-8 aware. That's why routines that call it, like Tcl_AppendElement() are deprecated. (See the documentation.) Rewrite your command procedure like so: Tcl_Obj *resultPtr; ... Tcl_ResetResult(interp); resultPtr = Tcl_GetObjResult(interp); Tcl_ListObjAppendElement(interp, resultPtr, Tcl_NewStringObj(s1, -1)); ... Tcl_ListObjAppendElement(interp, resultPtr, Tcl_NewStringObj(s4, -1)); return TCL_OK; ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=110894&aid=411825&group_id=10894 |