[Tcl-bugs] [ tcl-Bugs-699060 ] format %08x 32/64bit confusion

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #699060, was opened at 2003-03-06 23:09
Message generated for change (Comment added) made by srwarren
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=699060&group_id=10894

Category: 11. Conversions from String
Group: = 8.4.1
Status: Open
Resolution: Fixed
Priority: 5
Submitted By: Stephen Warren (srwarren)
Assigned to: Donal K. Fellows (dkf)
Summary: format %08x 32/64bit confusion

Initial Comment:
I have found a bug in Tcl 8.4.1 for platforms with 64-bit 
integer support
enabled, i.e. where "#ifndef TCL_WIDE_INT_IS_LONG" 
is true.

The problem is that the format internal function treats 
integers
(%08x format) as 32-bit, and updates object's 
internal "cache" value to be
tclIntType instead of tclWideIntType.

The docs for format say:

# The l modifier is ignored for real values and on 64-bit 
platforms,
# which are always converted as if the l modifier were 
present (i.e.
# the types double and long are used for the internal 
representation
# of real and integer values, respectively). If the h 
modifier is
# specified then integer values are truncated to short 
before
# conversion. Both h and l modifiers are ignored on all 
other
# conversions. 

but I can get format to use a 32-bit internal 
representation on a 64-bit
system, which contradicts the docs (the docs match my 
idea of what should
be happening.)

The code to reproduce the issue is as follows:

=========================================
===================
proc tcl_ok {} {
    set a 0xaaaaaaaa

    set b_1 0xaaaa
    set b_2 aaaa
    set b "${b_1}${b_2}"

    if {$a != $b} {
        puts "Test ok_1: a NOT EQUAL b"
    } else {
        puts "Test ok_1: a EQUALS b"
    }

    set x [format %08lx $a]

    if {$a != $b} {
        puts "Test ok_2: a NOT EQUAL b"
    } else {
        puts "Test ok_2: a EQUALS b"
    }
}

proc tcl_error {} {
    set a 0xaaaaaaaa

    set b_1 0xaaaa
    set b_2 aaaa
    set b "${b_1}${b_2}"

    if {$a != $b} {
        puts "Test error_1: a NOT EQUAL b"
    } else {
        puts "Test error_1: a EQUALS b"
    }

    set x [format %08x $a]

    # format has changed $a to int internal representation
    # b has no type, so is interpreted as wide_int
    # a is sign-extended 32->64 to match b's size
    # hence this comparison fails
    if {$a != $b} {
        puts "Test error_2: a NOT EQUAL b"
    } else {
        puts "Test error_2: a EQUALS b"
    }
}

puts $tcl_patchLevel
tcl_ok
tcl_error
=========================================
===================

which prints:

=========================================
===================
8.4.1
Test ok_1: a EQUALS b
Test ok_2: a EQUALS b
Test error_1: a EQUALS b
Test error_2: a NOT EQUAL b
=========================================
===================

The issue is the code in tclCmdAH.c in function
Tcl_FormatObjCmd (at line 2187 in Tcl 8.4.1)

This says:

=========================================
===================
#ifndef TCL_WIDE_INT_IS_LONG
        if (useWide) {
            if (Tcl_GetWideIntFromObj(interp, /* INTL: Tcl 
source. */
                objv[objIndex], &wideValue) != TCL_OK) {
            goto fmtError;
            }
            whichValue = WIDE_VALUE;
            size = 40 + precision;
            break;
        }
#endif /* TCL_WIDE_INT_IS_LONG */
        if (Tcl_GetLongFromObj(interp,        /* INTL: Tcl 
source. */
            objv[objIndex], &intValue) != TCL_OK) {
            goto fmtError;
        }
=========================================
===================

But it should say something more like:

=========================================
===================
#ifndef TCL_WIDE_INT_IS_LONG
        if (Tcl_GetWideIntFromObj(interp, /* INTL: Tcl 
source. */
            objv[objIndex], &wideValue) != TCL_OK) {
            goto fmtError;
        }
        if (useWide) {
            whichValue = WIDE_VALUE;
            size = 40 + precision;
            break;
        }
        else {
            value_used_for_formatting = wide_value & 0xffffffff
            // setup other formatting params here?
        }
#else /* TCL_WIDE_INT_IS_LONG */
        if (Tcl_GetLongFromObj(interp,        /* INTL: Tcl 
source. */
            objv[objIndex], &intValue) != TCL_OK) {
            goto fmtError;
        }
#endif /* TCL_WIDE_INT_IS_LONG */
=========================================
===================

i.e. on 64-bit architectures, Tcl should *always* interpret 
strings
as 64-bit numbers, but follow the wording of the docs on 
the format
h/l modifiers more precisely by truncating 64->32/16 bits 
depending on
whether h/l are specified, instead of interpreting the 
number as
64/32/(16?) depending on whether h/l are specified.

On a more general note, a lot of the Tcl code says this:

#if HAVE_A_64_BIT_TYPE
    do_something_64
    bail out, or whatever
#endif
    do_something_32

This leads to this type of bug. It'd be better if everything 
said this:

#if HAVE_A_64_BIT_TYPE
    do_something_64
#else
    do_something_32
#endif
    any_common_code

Plus, shouldn't the tclIntType be completely removed if 
64-bit numbers
are enabled? Or, make tclIntType a #define/whatever to 
tclInt32Type or
tclInt64Type, so that there's never ever a possibility of 
using the
wrong internal representation for an integer.

----------------------------------------------------------------------

>Comment By: Stephen Warren (srwarren)
Date: 2003-04-18 18:49

Message:
Logged In: YES 
user_id=728162

Donal,

Some more feedback on this issue now I've investigated
things a little further.

I assume that your most recent response indicates that
you've updated the format command to use 
GET_WIDE_OR_INT
internally to convert any strings to integers, so that
it uses the "appropriate" internal representation.
I say this after looking at how TclExecuteByteCode does
the same thing for e.g. INST_ADD.

Then, the format code will look at what type that macro
created, and format based on that, without changing the
internal representation based on the format specification?
If so, will the useWide flag be used for %d/%x etc. any
more? I'm not sure what it would do except change default
precision, which wouldn't make sense, since the default
would just be to print the entire number?

More general comments:

I'm assuming GET_WIDE_OR_INT is intended to be used
throughout the Tcl code where conversions from string to
integer are required, and that the intent of that function
is to store the converted result into the "appropriate"
internal rep. On architectures with just long, it'll use
that. On architectures where long and wide are different,
it'll use the smallest type that can represent the
converted value correctly.

I'm still unclear why having both TclIntType and
TclWideIntType is a good idea. Your previous responses
state that updating format to always convert to wides
would cause problems. Can you expand on this? I would
have thought the best implementation would have been
something like:

#if long_is_native_64_bit_type
typedef long TclIntType;
#else if something_else_provides_64_bits
typedef something_else TclIntType
#else
typedef int TclIntType
#endif

That way, there's only ever one internal representation
on integers, that always provides the maximal "capacity"
that the underlying CPU/toolchain/OS supports.

There appears to be a problem with the current design
where multiple types are used. If I have two variables
setup as shown below:

set a 0x7fffffff
set b 0x80000000

then if I interpret these as integers somewhere, the
internal rep for a becomes TclIntType, whereas the
internal rep for b becomes TclWideIntType.

Now, there's no problem with this from the point of view
of a Tcl script writing provided this is completely
transparent to them. However, there are subtle
side-effects that are visible from Tcl scripts.

For example, see the following script:

--------------------
set a 0x7fffffff
set b [expr $a + 1]

set c 0x80000000

puts "b: [format %d $b] [format %ld $b] [format %x $b] 
[format 0x%lx $b]"
puts "c: [format %d $c] [format %ld $c] [format %x $c] 
[format 0x%lx $c]"
--------------------

On my system (same as platform info in a previous
response below), this produces the output below (using
a copy of Tcl without the hack I mentioned in a previous
response to this bug report)

--------------------
b: -2147483648 -2147483648 80000000 0xffffffff80000000
c: -2147483648 2147483648 80000000 0x80000000
--------------------

Given that I didn't tell Tcl which storage size to use
(I don't think there's any way to do that right?) and
given I know Tcl is capable of storing 64bit integers
on my platform, I would expect both b and c to contain
the exact same value, or at least appear that way in
any Tcl operation, such as formatting or equality tests.

I see three possible fixes for this integer overflow
problem.:

a) Update INST_ADD (and much other code) to cover all
   the corner cases requiring promotion of the internal
   type. This would probably be a lot of work -
   complicated and error-prone.

b) Update all Tcl internal math to use wides, then
   back-convert to int/long if it'll fit on architectures
   where they're different. I'm not sure what the point
   of having two types would be then.

c) Get rid of the two types and just have one.

On top of this, the Tcl code has a huge amount of
conditional code to support the two different internal
reps of integer values. The code could be *drastically*
simpler with a single maximal-size representation -
fewer bugs and easier maintenance!

From a general philosphical perspective on how/why Tcl
deals with platforms where long isn't 64bit but something
else is, can you comment on all this so I understand why
everything is the way it is, and if there's anything I
can do to write my Tcl code so it behaves more like I
expect it to in the code sample above.

Having just found TIP72, I see there is some supposed
need for a 32-bit type to remain for backwards
compatibility. However, I don't know that anything ever
stated that Tcl integers were categorically 32-bit before
this TIP was implemented. Indeed, the code looks like
integers were 32-bit or 64-bit depending on what the
underlying CPU/toolchain/OS supported as "long".

As such, I contend that any extensions etc. that rely on
int being 32-bit are broken and I would rather see *that*
code be fixed, instead of massively complicate the Tcl
code to continue supporting this incorrect extension
coding.

I don't honestly see how Tcl can be easily 100% fixed
with the current dual-type structure - and even if it is
fixed now, it's just too easy to accidentally introduce
more problems of this exact same type in the future.

Thanks for any comments!

----------------------------------------------------------------------

Comment By: Donal K. Fellows (dkf)
Date: 2003-03-14 23:32

Message:
Logged In: YES 
user_id=79902

Your suggestion causes other non-obvious faults as it forces
values to become wides all the time.  What we really want to
do is to peek for an existing type and work with that if it
is there.  Alas, there's no pretty way to do this; you have
to peek the current type of the values directly to know what
the right course of action is...

Fixed in the 8.4 branch. Will close once fix is in the HEAD too.

----------------------------------------------------------------------

Comment By: Donal K. Fellows (dkf)
Date: 2003-03-09 22:10

Message:
Logged In: YES 
user_id=79902

Hmm.  Note to self: need to make sure that the format
doesn't enforce one (or the other) kind of internal
representation of the numeric value, perhaps using similar
techniques to those used in TclExecuteByteCode?

----------------------------------------------------------------------

Comment By: Stephen Warren (srwarren)
Date: 2003-03-08 03:59

Message:
Logged In: YES 
user_id=728162

Here's the info on the machine I'm using:

% parray tcl_platform
tcl_platform(byteOrder) = littleEndian
tcl_platform(machine)   = i686
tcl_platform(os)        = Linux
tcl_platform(osVersion) = 2.4.7-10custom
tcl_platform(platform)  = unix
tcl_platform(user)      = swarren
tcl_platform(wordSize)  = 4

% info patch
8.4.1

I also downloaded 8.4.2 and had the same issue.

(Perhaps 64-bit (native) platform isn't the issue - it's just 
whether tclWideIntType is not the same as tclIntType, or 
whether the wide int representation isn't the same as the 
compiler's long type)

I replace the code with this and it solved my issue, although it 
probably needs cleanup for the product since I wasn't very 
concerned about corner cases, architecture differences etc. 
etc.

---------------------------------------
        switch (*format) {
            case 'i':
                newPtr[-1] = 'd';
            case 'd':
            case 'o':
            case 'u':
            case 'x':
            case 'X':
#ifndef TCL_WIDE_INT_IS_LONG
                if (Tcl_GetWideIntFromObj(interp, /* INTL: Tcl 
source. */
                        objv[objIndex], &wideValue) != TCL_OK) {
                    goto fmtError;
                }
                if (useWide) {
                    whichValue = WIDE_VALUE;
                }
                else {
                    whichValue = INT_VALUE;
                    intValue = (wideValue & 0xffffffffU);
                }
#else /* TCL_WIDE_INT_IS_LONG */
                if (Tcl_GetLongFromObj(interp,        /* INTL: Tcl 
source. */
                        objv[objIndex], &intValue) != TCL_OK) {
                    goto fmtError;
                }
                whichValue = INT_VALUE;
#endif /* TCL_WIDE_INT_IS_LONG */
... as before
---------------------------------------

----------------------------------------------------------------------

Comment By: Donal K. Fellows (dkf)
Date: 2003-03-07 11:25

Message:
Logged In: YES 
user_id=79902

Hmm, that's odd.  What platform are you using?  It works for
me with 8.4.2 on IRIX64...

% tcl_ok
Test ok_1: a EQUALS b
Test ok_2: a EQUALS b
% tcl_error
Test error_1: a EQUALS b
Test error_2: a EQUALS b
% parray tcl_platform
tcl_platform(byteOrder) = bigEndian
tcl_platform(machine)   = IP35
tcl_platform(os)        = IRIX64
tcl_platform(osVersion) = 6.5
tcl_platform(platform)  = unix
tcl_platform(user)      = zzcgudf
tcl_platform(wordSize)  = 8
% info patch
8.4.2

----------------------------------------------------------------------

Comment By: Donal K. Fellows (dkf)
Date: 2003-03-07 11:16

Message:
Logged In: YES 
user_id=79902

On the general issues at the end...

This is not quite as straight-forward as it appears, because
Tcl has to support building on both 64-bit and 32-bit
architectures, and we're mostly defined to use the native
machine-word size (as opposed to it being specifically one
size or another, which only really matters in a few spots.) 
To support that mode of behaviour, on 64-bit architectures
the wide-int stuff is actually defined to be the same as the
int things; the #ifdefs cut out what would otherwise be
wasted code (and some compilers are almost unbelievably
picky about such stuff.)  Couple that with the fact that
much 32-bit and 64-bit code is the same depending on the
platform, and you get back to where you are right now.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=699060&group_id=10894

[Tcl-bugs] [ tcl-Bugs-699060 ] format %08x 32/64bit confusion

The Tool Command Language implementation

[Tcl-bugs] [ tcl-Bugs-699060 ] format %08x 32/64bit confusion