I have found a bug in Tcl 8.4.1 for platforms with 64-bit
integer support
enabled, i.e. where "#ifndef TCL_WIDE_INT_IS_LONG"
is true.
The problem is that the format internal function treats
integers
(%08x format) as 32-bit, and updates object's
internal "cache" value to be
tclIntType instead of tclWideIntType.
The docs for format say:
# The l modifier is ignored for real values and on 64-bit
platforms,
# which are always converted as if the l modifier were
present (i.e.
# the types double and long are used for the internal
representation
# of real and integer values, respectively). If the h
modifier is
# specified then integer values are truncated to short
before
# conversion. Both h and l modifiers are ignored on all
other
# conversions.
but I can get format to use a 32-bit internal
representation on a 64-bit
system, which contradicts the docs (the docs match my
idea of what should
be happening.)
The code to reproduce the issue is as follows:
proc tcl_ok {} {
set a 0xaaaaaaaa
set b_1 0xaaaa
set b_2 aaaa
set b "${b_1}${b_2}"
if {$a != $b} {
puts "Test ok_1: a NOT EQUAL b"
} else {
puts "Test ok_1: a EQUALS b"
}
set x [format %08lx $a]
if {$a != $b} {
puts "Test ok_2: a NOT EQUAL b"
} else {
puts "Test ok_2: a EQUALS b"
}
}
proc tcl_error {} {
set a 0xaaaaaaaa
set b_1 0xaaaa
set b_2 aaaa
set b "${b_1}${b_2}"
if {$a != $b} {
puts "Test error_1: a NOT EQUAL b"
} else {
puts "Test error_1: a EQUALS b"
}
set x [format %08x $a]
# format has changed $a to int internal representation
# b has no type, so is interpreted as wide_int
# a is sign-extended 32->64 to match b's size
# hence this comparison fails
if {$a != $b} {
puts "Test error_2: a NOT EQUAL b"
} else {
puts "Test error_2: a EQUALS b"
}
}
puts $tcl_patchLevel
tcl_ok
tcl_error
=========================================
===================
which prints:
8.4.1
Test ok_1: a EQUALS b
Test ok_2: a EQUALS b
Test error_1: a EQUALS b
Test error_2: a NOT EQUAL b
=========================================
===================
The issue is the code in tclCmdAH.c in function
Tcl_FormatObjCmd (at line 2187 in Tcl 8.4.1)
This says:
#ifndef TCL_WIDE_INT_IS_LONG
if (useWide) {
if (Tcl_GetWideIntFromObj(interp, /* INTL: Tcl
source. */
objv[objIndex], &wideValue) != TCL_OK) {
goto fmtError;
}
whichValue = WIDE_VALUE;
size = 40 + precision;
break;
}
#endif /* TCL_WIDE_INT_IS_LONG */
if (Tcl_GetLongFromObj(interp, /* INTL: Tcl
source. */
objv[objIndex], &intValue) != TCL_OK) {
goto fmtError;
}
=========================================
===================
But it should say something more like:
#ifndef TCL_WIDE_INT_IS_LONG
if (Tcl_GetWideIntFromObj(interp, /* INTL: Tcl
source. */
objv[objIndex], &wideValue) != TCL_OK) {
goto fmtError;
}
if (useWide) {
whichValue = WIDE_VALUE;
size = 40 + precision;
break;
}
else {
value_used_for_formatting = wide_value & 0xffffffff
// setup other formatting params here?
}
#else /* TCL_WIDE_INT_IS_LONG */
if (Tcl_GetLongFromObj(interp, /* INTL: Tcl
source. */
objv[objIndex], &intValue) != TCL_OK) {
goto fmtError;
}
#endif /* TCL_WIDE_INT_IS_LONG */
=========================================
===================
i.e. on 64-bit architectures, Tcl should *always* interpret
strings
as 64-bit numbers, but follow the wording of the docs on
the format
h/l modifiers more precisely by truncating 64->32/16 bits
depending on
whether h/l are specified, instead of interpreting the
number as
64/32/(16?) depending on whether h/l are specified.
On a more general note, a lot of the Tcl code says this:
#if HAVE_A_64_BIT_TYPE
do_something_64
bail out, or whatever
#endif
do_something_32
This leads to this type of bug. It'd be better if everything
said this:
#if HAVE_A_64_BIT_TYPE
do_something_64
#else
do_something_32
#endif
any_common_code
Plus, shouldn't the tclIntType be completely removed if
64-bit numbers
are enabled? Or, make tclIntType a #define/whatever to
tclInt32Type or
tclInt64Type, so that there's never ever a possibility of
using the
wrong internal representation for an integer.
Logged In: YES
user_id=79902
On the general issues at the end...
This is not quite as straight-forward as it appears, because
Tcl has to support building on both 64-bit and 32-bit
architectures, and we're mostly defined to use the native
machine-word size (as opposed to it being specifically one
size or another, which only really matters in a few spots.)
To support that mode of behaviour, on 64-bit architectures
the wide-int stuff is actually defined to be the same as the
int things; the #ifdefs cut out what would otherwise be
wasted code (and some compilers are almost unbelievably
picky about such stuff.) Couple that with the fact that
much 32-bit and 64-bit code is the same depending on the
platform, and you get back to where you are right now.
Logged In: YES
user_id=79902
Hmm, that's odd. What platform are you using? It works for
me with 8.4.2 on IRIX64...
% tcl_ok
Test ok_1: a EQUALS b
Test ok_2: a EQUALS b
% tcl_error
Test error_1: a EQUALS b
Test error_2: a EQUALS b
% parray tcl_platform
tcl_platform(byteOrder) = bigEndian
tcl_platform(machine) = IP35
tcl_platform(os) = IRIX64
tcl_platform(osVersion) = 6.5
tcl_platform(platform) = unix
tcl_platform(user) = zzcgudf
tcl_platform(wordSize) = 8
% info patch
8.4.2
Logged In: YES
user_id=728162
Here's the info on the machine I'm using:
% parray tcl_platform
tcl_platform(byteOrder) = littleEndian
tcl_platform(machine) = i686
tcl_platform(os) = Linux
tcl_platform(osVersion) = 2.4.7-10custom
tcl_platform(platform) = unix
tcl_platform(user) = swarren
tcl_platform(wordSize) = 4
% info patch
8.4.1
I also downloaded 8.4.2 and had the same issue.
(Perhaps 64-bit (native) platform isn't the issue - it's just
whether tclWideIntType is not the same as tclIntType, or
whether the wide int representation isn't the same as the
compiler's long type)
I replace the code with this and it solved my issue, although it
probably needs cleanup for the product since I wasn't very
concerned about corner cases, architecture differences etc.
etc.
---------------------------------------
switch (*format) {
case 'i':
newPtr[-1] = 'd';
case 'd':
case 'o':
case 'u':
case 'x':
case 'X':
#ifndef TCL_WIDE_INT_IS_LONG
if (Tcl_GetWideIntFromObj(interp, /* INTL: Tcl
source. */
objv[objIndex], &wideValue) != TCL_OK) {
goto fmtError;
}
if (useWide) {
whichValue = WIDE_VALUE;
}
else {
whichValue = INT_VALUE;
intValue = (wideValue & 0xffffffffU);
}
#else /* TCL_WIDE_INT_IS_LONG */
if (Tcl_GetLongFromObj(interp, /* INTL: Tcl
source. */
objv[objIndex], &intValue) != TCL_OK) {
goto fmtError;
}
whichValue = INT_VALUE;
#endif /* TCL_WIDE_INT_IS_LONG */
... as before
---------------------------------------
Logged In: YES
user_id=79902
Hmm. Note to self: need to make sure that the format
doesn't enforce one (or the other) kind of internal
representation of the numeric value, perhaps using similar
techniques to those used in TclExecuteByteCode?
Logged In: YES
user_id=79902
Your suggestion causes other non-obvious faults as it forces
values to become wides all the time. What we really want to
do is to peek for an existing type and work with that if it
is there. Alas, there's no pretty way to do this; you have
to peek the current type of the values directly to know what
the right course of action is...
Fixed in the 8.4 branch. Will close once fix is in the HEAD too.
Logged In: YES
user_id=728162
Donal,
Some more feedback on this issue now I've investigated
things a little further.
I assume that your most recent response indicates that
you've updated the format command to use
GET_WIDE_OR_INT
internally to convert any strings to integers, so that
it uses the "appropriate" internal representation.
I say this after looking at how TclExecuteByteCode does
the same thing for e.g. INST_ADD.
Then, the format code will look at what type that macro
created, and format based on that, without changing the
internal representation based on the format specification?
If so, will the useWide flag be used for %d/%x etc. any
more? I'm not sure what it would do except change default
precision, which wouldn't make sense, since the default
would just be to print the entire number?
More general comments:
I'm assuming GET_WIDE_OR_INT is intended to be used
throughout the Tcl code where conversions from string to
integer are required, and that the intent of that function
is to store the converted result into the "appropriate"
internal rep. On architectures with just long, it'll use
that. On architectures where long and wide are different,
it'll use the smallest type that can represent the
converted value correctly.
I'm still unclear why having both TclIntType and
TclWideIntType is a good idea. Your previous responses
state that updating format to always convert to wides
would cause problems. Can you expand on this? I would
have thought the best implementation would have been
something like:
#if long_is_native_64_bit_type
typedef long TclIntType;
#else if something_else_provides_64_bits
typedef something_else TclIntType
#else
typedef int TclIntType
#endif
That way, there's only ever one internal representation
on integers, that always provides the maximal "capacity"
that the underlying CPU/toolchain/OS supports.
There appears to be a problem with the current design
where multiple types are used. If I have two variables
setup as shown below:
set a 0x7fffffff
set b 0x80000000
then if I interpret these as integers somewhere, the
internal rep for a becomes TclIntType, whereas the
internal rep for b becomes TclWideIntType.
Now, there's no problem with this from the point of view
of a Tcl script writing provided this is completely
transparent to them. However, there are subtle
side-effects that are visible from Tcl scripts.
For example, see the following script:
--------------------
set a 0x7fffffff
set b [expr $a + 1]
set c 0x80000000
puts "b: [format %d $b] [format %ld $b] [format %x $b]
[format 0x%lx $b]"
puts "c: [format %d $c] [format %ld $c] [format %x $c]
[format 0x%lx $c]"
--------------------
On my system (same as platform info in a previous
response below), this produces the output below (using
a copy of Tcl without the hack I mentioned in a previous
response to this bug report)
--------------------
b: -2147483648 -2147483648 80000000 0xffffffff80000000
c: -2147483648 2147483648 80000000 0x80000000
--------------------
Given that I didn't tell Tcl which storage size to use
(I don't think there's any way to do that right?) and
given I know Tcl is capable of storing 64bit integers
on my platform, I would expect both b and c to contain
the exact same value, or at least appear that way in
any Tcl operation, such as formatting or equality tests.
I see three possible fixes for this integer overflow
problem.:
a) Update INST_ADD (and much other code) to cover all
the corner cases requiring promotion of the internal
type. This would probably be a lot of work -
complicated and error-prone.
b) Update all Tcl internal math to use wides, then
back-convert to int/long if it'll fit on architectures
where they're different. I'm not sure what the point
of having two types would be then.
c) Get rid of the two types and just have one.
On top of this, the Tcl code has a huge amount of
conditional code to support the two different internal
reps of integer values. The code could be *drastically*
simpler with a single maximal-size representation -
fewer bugs and easier maintenance!
From a general philosphical perspective on how/why Tcl
deals with platforms where long isn't 64bit but something
else is, can you comment on all this so I understand why
everything is the way it is, and if there's anything I
can do to write my Tcl code so it behaves more like I
expect it to in the code sample above.
Having just found TIP72, I see there is some supposed
need for a 32-bit type to remain for backwards
compatibility. However, I don't know that anything ever
stated that Tcl integers were categorically 32-bit before
this TIP was implemented. Indeed, the code looks like
integers were 32-bit or 64-bit depending on what the
underlying CPU/toolchain/OS supported as "long".
As such, I contend that any extensions etc. that rely on
int being 32-bit are broken and I would rather see *that*
code be fixed, instead of massively complicate the Tcl
code to continue supporting this incorrect extension
coding.
I don't honestly see how Tcl can be easily 100% fixed
with the current dual-type structure - and even if it is
fixed now, it's just too easy to accidentally introduce
more problems of this exact same type in the future.
Thanks for any comments!
Logged In: YES
user_id=80530
Is this report still valid with
the current HEAD (after
commit of the Patch for
Tcl Bug 713562) ?
Logged In: YES
user_id=728162
Donal,
Some more feedback on this issue now I've investigated
things a little further.
I assume that your most recent response indicates that
you've updated the format command to use
GET_WIDE_OR_INT
internally to convert any strings to integers, so that
it uses the "appropriate" internal representation.
I say this after looking at how TclExecuteByteCode does
the same thing for e.g. INST_ADD.
Then, the format code will look at what type that macro
created, and format based on that, without changing the
internal representation based on the format specification?
If so, will the useWide flag be used for %d/%x etc. any
more? I'm not sure what it would do except change default
precision, which wouldn't make sense, since the default
would just be to print the entire number?
More general comments:
I'm assuming GET_WIDE_OR_INT is intended to be used
throughout the Tcl code where conversions from string to
integer are required, and that the intent of that function
is to store the converted result into the "appropriate"
internal rep. On architectures with just long, it'll use
that. On architectures where long and wide are different,
it'll use the smallest type that can represent the
converted value correctly.
I'm still unclear why having both TclIntType and
TclWideIntType is a good idea. Your previous responses
state that updating format to always convert to wides
would cause problems. Can you expand on this? I would
have thought the best implementation would have been
something like:
#if long_is_native_64_bit_type
typedef long TclIntType;
#else if something_else_provides_64_bits
typedef something_else TclIntType
#else
typedef int TclIntType
#endif
That way, there's only ever one internal representation
on integers, that always provides the maximal "capacity"
that the underlying CPU/toolchain/OS supports.
There appears to be a problem with the current design
where multiple types are used. If I have two variables
setup as shown below:
set a 0x7fffffff
set b 0x80000000
then if I interpret these as integers somewhere, the
internal rep for a becomes TclIntType, whereas the
internal rep for b becomes TclWideIntType.
Now, there's no problem with this from the point of view
of a Tcl script writing provided this is completely
transparent to them. However, there are subtle
side-effects that are visible from Tcl scripts.
For example, see the following script:
--------------------
set a 0x7fffffff
set b [expr $a + 1]
set c 0x80000000
puts "b: [format %d $b] [format %ld $b] [format %x $b]
[format 0x%lx $b]"
puts "c: [format %d $c] [format %ld $c] [format %x $c]
[format 0x%lx $c]"
--------------------
On my system (same as platform info in a previous
response below), this produces the output below (using
a copy of Tcl without the hack I mentioned in a previous
response to this bug report)
--------------------
b: -2147483648 -2147483648 80000000 0xffffffff80000000
c: -2147483648 2147483648 80000000 0x80000000
--------------------
Given that I didn't tell Tcl which storage size to use
(I don't think there's any way to do that right?) and
given I know Tcl is capable of storing 64bit integers
on my platform, I would expect both b and c to contain
the exact same value, or at least appear that way in
any Tcl operation, such as formatting or equality tests.
I see three possible fixes for this integer overflow
problem.:
a) Update INST_ADD (and much other code) to cover all
the corner cases requiring promotion of the internal
type. This would probably be a lot of work -
complicated and error-prone.
b) Update all Tcl internal math to use wides, then
back-convert to int/long if it'll fit on architectures
where they're different. I'm not sure what the point
of having two types would be then.
c) Get rid of the two types and just have one.
On top of this, the Tcl code has a huge amount of
conditional code to support the two different internal
reps of integer values. The code could be *drastically*
simpler with a single maximal-size representation -
fewer bugs and easier maintenance!
From a general philosphical perspective on how/why Tcl
deals with platforms where long isn't 64bit but something
else is, can you comment on all this so I understand why
everything is the way it is, and if there's anything I
can do to write my Tcl code so it behaves more like I
expect it to in the code sample above.
Having just found TIP72, I see there is some supposed
need for a 32-bit type to remain for backwards
compatibility. However, I don't know that anything ever
stated that Tcl integers were categorically 32-bit before
this TIP was implemented. Indeed, the code looks like
integers were 32-bit or 64-bit depending on what the
underlying CPU/toolchain/OS supported as "long".
As such, I contend that any extensions etc. that rely on
int being 32-bit are broken and I would rather see *that*
code be fixed, instead of massively complicate the Tcl
code to continue supporting this incorrect extension
coding.
I don't honestly see how Tcl can be easily 100% fixed
with the current dual-type structure - and even if it is
fixed now, it's just too easy to accidentally introduce
more problems of this exact same type in the future.
Thanks for any comments!
Logged In: YES
user_id=728162
Damn. Stupid back button in the browser?!
Anyway, yes, this still happens on the latest CVS HEAD
assuming that's what I get if I type this:
cvs -z3 -
d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/tcl co tcl
The script given in my initial comment, followed by the one in
my latest comment, now print this (i.e. unchanged)
tcl_patchLevel: 8.5a0
tcl_platform(byteOrder) = littleEndian
tcl_platform(machine) = i686
tcl_platform(os) = Linux
tcl_platform(osVersion) = 2.4.7-10smp
tcl_platform(platform) = unix
tcl_platform(user) = swarren
tcl_platform(wordSize) = 4
Test ok_1: a EQUALS b
Test ok_2: a EQUALS b
Test error_1: a EQUALS b
Test error_2: a NOT EQUAL b
b: -2147483648 -2147483648 80000000 0xffffffff80000000
c: -2147483648 2147483648 80000000 0x80000000
Logged In: YES
user_id=79902
What an unpleasant type demotion bug this has been.
Logged In: YES
user_id=728162
Hmmm. This bug is closed, but I don't see any comments
indicating why. Is it fixed in a new release, or was it
accidentally closed?
Thanks.
Logged In: YES
user_id=79902
It's closed because I fixed it. :^)
The fix went in in time for 8.4.3, so any version of Tcl
from then on will have it.