Share

Tcl

Tracker: Bugs

6 "eval" command corrupts "\xE0" (a grave) - ID: 227512
Last Update: Comment added ( hobbs )

In Tcl8.4a2 compiled on Linux/x86 (RedHat6.1):

% scan \xE0 %c i; puts $i
% eval scan \xE0 %c j; puts $j

returns:

224
195


Both results should be the same!
Results are identical for the other codes I tested.

Tcl8.3.1 works correctly for me, the bug
appears in Tcl8.3.2 and remains in 8.4a2.

- Claude.Barras@limsi.fr


Nobody/Anonymous ( nobody ) - 2001-01-04 11:06

6

Closed

Fixed

Jeffrey Hobbs

10. Objects

obsolete: 8.4a2

Public


Comments ( 2 )

Date: 2001-07-03 03:31
Sender: hobbsSourceForge.net SubscriberProject Admin

Logged In: YES
user_id=72656

The easiest way to show the bug is simply [concat \xe0],
which returns \xc3 when C isspace isn't used. Corrected
with this patch:

Index: generic/tclUtil.c
============================================================
=======
RCS file: /cvsroot/tcl/tcl/generic/tclUtil.c,v
retrieving revision 1.19
diff -b -c -r1.19 tclUtil.c
*** generic/tclUtil.c 2001/06/04 01:25:04 1.19
--- generic/tclUtil.c 2001/07/03 03:30:10
***************
*** 1071,1078 ****
for (i = 0; i < objc; i++) {
objPtr = objv[i];
element = Tcl_GetStringFromObj(objPtr,
&elemLength);
! while ((elemLength > 0)
! && (isspace(UCHAR(*element)))) { /*
INTL: ISO space. */
element++;
elemLength--;
}
--- 1071,1078 ----
for (i = 0; i < objc; i++) {
objPtr = objv[i];
element = Tcl_GetStringFromObj(objPtr,
&elemLength);
! while ((elemLength > 0) && (UCHAR(*element) <
127)
! && isspace(UCHAR(*element))) { /* INTL:
ISO C space. */
element++;
elemLength--;
}
***************
*** 1083,1090 ****
* this case it could be significant.
*/

! while ((elemLength > 0)
! && isspace(UCHAR(element[elemLength-
1])) /* INTL: ISO space. */
&& ((elemLength < 2) || (element
[elemLength-2] != '\\'))) {
elemLength--;
}
--- 1083,1090 ----
* this case it could be significant.
*/

! while ((elemLength > 0) && (UCHAR(element
[elemLength-1]) < 127)
! && isspace(UCHAR(element[elemLength-
1])) /* INTL: ISO C space. */
&& ((elemLength < 2) || (element
[elemLength-2] != '\\'))) {
elemLength--;
}

In 8.4a3cvs.


Date: 2001-01-06 07:54
Sender: hobbsSourceForge.net SubscriberProject Admin

OK, upon further examination, \xe0 does happen to convert
to a char followed by a whitespace in utf-8:

(tkcon) 67 % scan [encoding convertto utf-8 \xe0] %c%c
195 160

160 (octal 240) is a whitespace character:

(tkcon) 57 % puts '[encoding convertto utf-8 \xe0]'
'à'

When I follow it through the debugger, the simple scan
hits Tcl_ScanObjCmd with objv[1] == 'à', but the scan
that follows eval just gets 'Ã'. BTW, I found that we
can get the same problem with \240 (hex a0):

(tkcon) 73 % puts '[encoding convertto utf-8 \xa0]'
' '

Ah, found it! It's occuring in Tcl_ConcatObj! It's
not maintaining the sanctity of the UTF-8 characters,
stripping off whitespace as if it where an independent
character. See the following stack trace:

(gdb) p *objv[1]
$2 = {refCount = 1, bytes = 0x479e0 "scan", length = 4, typePtr
= 0x0,
internalRep = {longValue = 1633771873,
doubleValue = 1.2217638442043777e+161, otherValuePtr = 0x61616161,
twoPtrValue = {ptr1 = 0x61616161, ptr2 = 0x61616161}}}
(gdb) p *objv[2]
$3 = {refCount = 1, bytes = 0x4c608 "à ", length = 4, typePtr
= 0xef78d3d0,
internalRep = {longValue = 219384, doubleValue =
4.6553273179568089e-309,
otherValuePtr = 0x358f8, twoPtrValue = {ptr1 = 0x358f8,
ptr2 = 0x61616161}}}
(gdb) s
623 objPtr = Tcl_ConcatObj(objc-1, objv+1);
(gdb) s
624 result = Tcl_EvalObjEx(interp, objPtr, TCL_EVAL_DIRECT);
(gdb) p *objPtr
$4 = {refCount = 0, bytes = 0x34ce8 "scan à %c%c%c", length =
15,
typePtr = 0x0, internalRep = {longValue = 1633771873,
doubleValue = 1.2217638442043777e+161, otherValuePtr = 0x61616161,
twoPtrValue = {ptr1 = 0x61616161, ptr2 = 0x61616161}}}

We lost the trailing space (\240) in what will be passed to eval.
Looking further we see the guilty code in action...

(gdb) p *objPtr
$3 = {refCount = 1, bytes = 0x4b730 "à ", length = 4, typePtr
= 0xef78d3d0,
internalRep = {longValue = 219384, doubleValue =
4.6553273179568089e-309,
otherValuePtr = 0x358f8, twoPtrValue = {ptr1 = 0x358f8,
ptr2 = 0x61616161}}}
(gdb) n
1075 && (isspace(UCHAR(*element)))) { /*
INTL: ISO space. */
(gdb) p *element
$4 = -61 'Ã'
(gdb) n
1088 && ((elemLength < 2) ||
(element[elemLength-2] != '\\'))) {
(gdb) n
1089 elemLength--;
(gdb) l 1085
1080 /*
1081 * Trim trailing white space. But, be careful not to
trim
1082 * a space character if it is preceded by a backslash:
in
1083 * this case it could be significant.
1084 */
1085
1086 while ((elemLength > 0)
1087 &&
isspace(UCHAR(element[elemLength-1])) /* INTL: ISO space. */
1088 && ((elemLength < 2) ||
(element[elemLength-2] != '\\'))) {
1089 elemLength--;

The reason that this doesn't have a problem in the C locale is that
160 (\240) is evidently not a space char then... The reason that
<=8.3.1 works OK is that the magic that we used to have in
tclUnixInit.c
(getting current value of LC_ALL, init'ing LC_ALL to "" and then
resetting
to previous value) actually set it to C (which messed up the guys in
Russia and Japan).

It seems like Tcl_ConcatObj is really guilty here for walking backwards
in strings that might be utf-8, looking for whitespace.

BTW, a work-around is to do:
eval [list scan \xe0 %c]

which walks a different path, avoiding the Tcl_ConcatObj.
Or set your LC_CTYPE env var to "C".


Attached File

No Files Currently Attached

Changes ( 7 )

Field Old Value Date By
status_id Open 2001-07-03 03:31 hobbs
resolution_id None 2001-07-03 03:31 hobbs
summary "eval" command corrupts "\xE0" (a grave) 2001-07-03 03:31 hobbs
close_date - 2001-07-03 03:31 hobbs
category_id 81. Bundled Packages 2001-02-07 16:10 dkf
assigned_to nobody 2001-01-06 07:54 hobbs
priority 5 2001-01-06 07:54 hobbs