Menu

#2225 format %08x 32/64bit confusion

obsolete: 8.4.1
closed-fixed
5
2014-08-17
2003-03-06
No

I have found a bug in Tcl 8.4.1 for platforms with 64-bit
integer support
enabled, i.e. where "#ifndef TCL_WIDE_INT_IS_LONG"
is true.

The problem is that the format internal function treats
integers
(%08x format) as 32-bit, and updates object's
internal "cache" value to be
tclIntType instead of tclWideIntType.

The docs for format say:

# The l modifier is ignored for real values and on 64-bit
platforms,
# which are always converted as if the l modifier were
present (i.e.
# the types double and long are used for the internal
representation
# of real and integer values, respectively). If the h
modifier is
# specified then integer values are truncated to short
before
# conversion. Both h and l modifiers are ignored on all
other
# conversions.

but I can get format to use a 32-bit internal
representation on a 64-bit
system, which contradicts the docs (the docs match my
idea of what should
be happening.)

The code to reproduce the issue is as follows:

=========================================

proc tcl_ok {} {
set a 0xaaaaaaaa

set b_1 0xaaaa
set b_2 aaaa
set b "${b_1}${b_2}"

if {$a != $b} {
puts "Test ok_1: a NOT EQUAL b"
} else {
puts "Test ok_1: a EQUALS b"
}

set x [format %08lx $a]

if {$a != $b} {
puts "Test ok_2: a NOT EQUAL b"
} else {
puts "Test ok_2: a EQUALS b"
}
}

proc tcl_error {} {
set a 0xaaaaaaaa

set b_1 0xaaaa
set b_2 aaaa
set b "${b_1}${b_2}"

if {$a != $b} {
puts "Test error_1: a NOT EQUAL b"
} else {
puts "Test error_1: a EQUALS b"
}

set x [format %08x $a]

# format has changed $a to int internal representation
# b has no type, so is interpreted as wide_int
# a is sign-extended 32->64 to match b's size
# hence this comparison fails
if {$a != $b} {
puts "Test error_2: a NOT EQUAL b"
} else {
puts "Test error_2: a EQUALS b"
}
}

puts $tcl_patchLevel
tcl_ok
tcl_error
=========================================
===================

which prints:

=========================================

8.4.1
Test ok_1: a EQUALS b
Test ok_2: a EQUALS b
Test error_1: a EQUALS b
Test error_2: a NOT EQUAL b
=========================================
===================

The issue is the code in tclCmdAH.c in function
Tcl_FormatObjCmd (at line 2187 in Tcl 8.4.1)

This says:

=========================================

#ifndef TCL_WIDE_INT_IS_LONG
if (useWide) {
if (Tcl_GetWideIntFromObj(interp, /* INTL: Tcl
source. */
objv[objIndex], &wideValue) != TCL_OK) {
goto fmtError;
}
whichValue = WIDE_VALUE;
size = 40 + precision;
break;
}
#endif /* TCL_WIDE_INT_IS_LONG */
if (Tcl_GetLongFromObj(interp, /* INTL: Tcl
source. */
objv[objIndex], &intValue) != TCL_OK) {
goto fmtError;
}
=========================================
===================

But it should say something more like:

=========================================

#ifndef TCL_WIDE_INT_IS_LONG
if (Tcl_GetWideIntFromObj(interp, /* INTL: Tcl
source. */
objv[objIndex], &wideValue) != TCL_OK) {
goto fmtError;
}
if (useWide) {
whichValue = WIDE_VALUE;
size = 40 + precision;
break;
}
else {
value_used_for_formatting = wide_value & 0xffffffff
// setup other formatting params here?
}
#else /* TCL_WIDE_INT_IS_LONG */
if (Tcl_GetLongFromObj(interp, /* INTL: Tcl
source. */
objv[objIndex], &intValue) != TCL_OK) {
goto fmtError;
}
#endif /* TCL_WIDE_INT_IS_LONG */
=========================================
===================

i.e. on 64-bit architectures, Tcl should *always* interpret
strings
as 64-bit numbers, but follow the wording of the docs on
the format
h/l modifiers more precisely by truncating 64->32/16 bits
depending on
whether h/l are specified, instead of interpreting the
number as
64/32/(16?) depending on whether h/l are specified.

On a more general note, a lot of the Tcl code says this:

#if HAVE_A_64_BIT_TYPE
do_something_64
bail out, or whatever
#endif
do_something_32

This leads to this type of bug. It'd be better if everything
said this:

#if HAVE_A_64_BIT_TYPE
do_something_64
#else
do_something_32
#endif
any_common_code

Plus, shouldn't the tclIntType be completely removed if
64-bit numbers
are enabled? Or, make tclIntType a #define/whatever to
tclInt32Type or
tclInt64Type, so that there's never ever a possibility of
using the
wrong internal representation for an integer.

Discussion

  • Jeffrey Hobbs

    Jeffrey Hobbs - 2003-03-06
    • assigned_to: hobbs --> dkf
     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    On the general issues at the end...

    This is not quite as straight-forward as it appears, because
    Tcl has to support building on both 64-bit and 32-bit
    architectures, and we're mostly defined to use the native
    machine-word size (as opposed to it being specifically one
    size or another, which only really matters in a few spots.)
    To support that mode of behaviour, on 64-bit architectures
    the wide-int stuff is actually defined to be the same as the
    int things; the #ifdefs cut out what would otherwise be
    wasted code (and some compilers are almost unbelievably
    picky about such stuff.) Couple that with the fact that
    much 32-bit and 64-bit code is the same depending on the
    platform, and you get back to where you are right now.

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    Hmm, that's odd. What platform are you using? It works for
    me with 8.4.2 on IRIX64...

    % tcl_ok
    Test ok_1: a EQUALS b
    Test ok_2: a EQUALS b
    % tcl_error
    Test error_1: a EQUALS b
    Test error_2: a EQUALS b
    % parray tcl_platform
    tcl_platform(byteOrder) = bigEndian
    tcl_platform(machine) = IP35
    tcl_platform(os) = IRIX64
    tcl_platform(osVersion) = 6.5
    tcl_platform(platform) = unix
    tcl_platform(user) = zzcgudf
    tcl_platform(wordSize) = 8
    % info patch
    8.4.2

     
  • Donal K. Fellows

    • status: open --> open-works-for-me
     
  • Stephen Warren

    Stephen Warren - 2003-03-08

    Logged In: YES
    user_id=728162

    Here's the info on the machine I'm using:

    % parray tcl_platform
    tcl_platform(byteOrder) = littleEndian
    tcl_platform(machine) = i686
    tcl_platform(os) = Linux
    tcl_platform(osVersion) = 2.4.7-10custom
    tcl_platform(platform) = unix
    tcl_platform(user) = swarren
    tcl_platform(wordSize) = 4

    % info patch
    8.4.1

    I also downloaded 8.4.2 and had the same issue.

    (Perhaps 64-bit (native) platform isn't the issue - it's just
    whether tclWideIntType is not the same as tclIntType, or
    whether the wide int representation isn't the same as the
    compiler's long type)

    I replace the code with this and it solved my issue, although it
    probably needs cleanup for the product since I wasn't very
    concerned about corner cases, architecture differences etc.
    etc.

    ---------------------------------------
    switch (*format) {
    case 'i':
    newPtr[-1] = 'd';
    case 'd':
    case 'o':
    case 'u':
    case 'x':
    case 'X':
    #ifndef TCL_WIDE_INT_IS_LONG
    if (Tcl_GetWideIntFromObj(interp, /* INTL: Tcl
    source. */
    objv[objIndex], &wideValue) != TCL_OK) {
    goto fmtError;
    }
    if (useWide) {
    whichValue = WIDE_VALUE;
    }
    else {
    whichValue = INT_VALUE;
    intValue = (wideValue & 0xffffffffU);
    }
    #else /* TCL_WIDE_INT_IS_LONG */
    if (Tcl_GetLongFromObj(interp, /* INTL: Tcl
    source. */
    objv[objIndex], &intValue) != TCL_OK) {
    goto fmtError;
    }
    whichValue = INT_VALUE;
    #endif /* TCL_WIDE_INT_IS_LONG */
    ... as before
    ---------------------------------------

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    Hmm. Note to self: need to make sure that the format
    doesn't enforce one (or the other) kind of internal
    representation of the numeric value, perhaps using similar
    techniques to those used in TclExecuteByteCode?

     
  • Donal K. Fellows

    • status: open-works-for-me --> open-fixed
     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    Your suggestion causes other non-obvious faults as it forces
    values to become wides all the time. What we really want to
    do is to peek for an existing type and work with that if it
    is there. Alas, there's no pretty way to do this; you have
    to peek the current type of the values directly to know what
    the right course of action is...

    Fixed in the 8.4 branch. Will close once fix is in the HEAD too.

     
  • Stephen Warren

    Stephen Warren - 2003-04-18

    Logged In: YES
    user_id=728162

    Donal,

    Some more feedback on this issue now I've investigated
    things a little further.

    I assume that your most recent response indicates that
    you've updated the format command to use
    GET_WIDE_OR_INT
    internally to convert any strings to integers, so that
    it uses the "appropriate" internal representation.
    I say this after looking at how TclExecuteByteCode does
    the same thing for e.g. INST_ADD.

    Then, the format code will look at what type that macro
    created, and format based on that, without changing the
    internal representation based on the format specification?
    If so, will the useWide flag be used for %d/%x etc. any
    more? I'm not sure what it would do except change default
    precision, which wouldn't make sense, since the default
    would just be to print the entire number?

    More general comments:

    I'm assuming GET_WIDE_OR_INT is intended to be used
    throughout the Tcl code where conversions from string to
    integer are required, and that the intent of that function
    is to store the converted result into the "appropriate"
    internal rep. On architectures with just long, it'll use
    that. On architectures where long and wide are different,
    it'll use the smallest type that can represent the
    converted value correctly.

    I'm still unclear why having both TclIntType and
    TclWideIntType is a good idea. Your previous responses
    state that updating format to always convert to wides
    would cause problems. Can you expand on this? I would
    have thought the best implementation would have been
    something like:

    #if long_is_native_64_bit_type
    typedef long TclIntType;
    #else if something_else_provides_64_bits
    typedef something_else TclIntType
    #else
    typedef int TclIntType
    #endif

    That way, there's only ever one internal representation
    on integers, that always provides the maximal "capacity"
    that the underlying CPU/toolchain/OS supports.

    There appears to be a problem with the current design
    where multiple types are used. If I have two variables
    setup as shown below:

    set a 0x7fffffff
    set b 0x80000000

    then if I interpret these as integers somewhere, the
    internal rep for a becomes TclIntType, whereas the
    internal rep for b becomes TclWideIntType.

    Now, there's no problem with this from the point of view
    of a Tcl script writing provided this is completely
    transparent to them. However, there are subtle
    side-effects that are visible from Tcl scripts.

    For example, see the following script:

    --------------------
    set a 0x7fffffff
    set b [expr $a + 1]

    set c 0x80000000

    puts "b: [format %d $b] [format %ld $b] [format %x $b]
    [format 0x%lx $b]"
    puts "c: [format %d $c] [format %ld $c] [format %x $c]
    [format 0x%lx $c]"
    --------------------

    On my system (same as platform info in a previous
    response below), this produces the output below (using
    a copy of Tcl without the hack I mentioned in a previous
    response to this bug report)

    --------------------
    b: -2147483648 -2147483648 80000000 0xffffffff80000000
    c: -2147483648 2147483648 80000000 0x80000000
    --------------------

    Given that I didn't tell Tcl which storage size to use
    (I don't think there's any way to do that right?) and
    given I know Tcl is capable of storing 64bit integers
    on my platform, I would expect both b and c to contain
    the exact same value, or at least appear that way in
    any Tcl operation, such as formatting or equality tests.

    I see three possible fixes for this integer overflow
    problem.:

    a) Update INST_ADD (and much other code) to cover all
    the corner cases requiring promotion of the internal
    type. This would probably be a lot of work -
    complicated and error-prone.

    b) Update all Tcl internal math to use wides, then
    back-convert to int/long if it'll fit on architectures
    where they're different. I'm not sure what the point
    of having two types would be then.

    c) Get rid of the two types and just have one.

    On top of this, the Tcl code has a huge amount of
    conditional code to support the two different internal
    reps of integer values. The code could be *drastically*
    simpler with a single maximal-size representation -
    fewer bugs and easier maintenance!

    From a general philosphical perspective on how/why Tcl
    deals with platforms where long isn't 64bit but something
    else is, can you comment on all this so I understand why
    everything is the way it is, and if there's anything I
    can do to write my Tcl code so it behaves more like I
    expect it to in the code sample above.

    Having just found TIP72, I see there is some supposed
    need for a 32-bit type to remain for backwards
    compatibility. However, I don't know that anything ever
    stated that Tcl integers were categorically 32-bit before
    this TIP was implemented. Indeed, the code looks like
    integers were 32-bit or 64-bit depending on what the
    underlying CPU/toolchain/OS supported as "long".

    As such, I contend that any extensions etc. that rely on
    int being 32-bit are broken and I would rather see *that*
    code be fixed, instead of massively complicate the Tcl
    code to continue supporting this incorrect extension
    coding.

    I don't honestly see how Tcl can be easily 100% fixed
    with the current dual-type structure - and even if it is
    fixed now, it's just too easy to accidentally introduce
    more problems of this exact same type in the future.

    Thanks for any comments!

     
  • Don Porter

    Don Porter - 2003-04-18

    Logged In: YES
    user_id=80530

    Is this report still valid with
    the current HEAD (after
    commit of the Patch for
    Tcl Bug 713562) ?

     
  • Stephen Warren

    Stephen Warren - 2003-04-18

    Logged In: YES
    user_id=728162

    Donal,

    Some more feedback on this issue now I've investigated
    things a little further.

    I assume that your most recent response indicates that
    you've updated the format command to use
    GET_WIDE_OR_INT
    internally to convert any strings to integers, so that
    it uses the "appropriate" internal representation.
    I say this after looking at how TclExecuteByteCode does
    the same thing for e.g. INST_ADD.

    Then, the format code will look at what type that macro
    created, and format based on that, without changing the
    internal representation based on the format specification?
    If so, will the useWide flag be used for %d/%x etc. any
    more? I'm not sure what it would do except change default
    precision, which wouldn't make sense, since the default
    would just be to print the entire number?

    More general comments:

    I'm assuming GET_WIDE_OR_INT is intended to be used
    throughout the Tcl code where conversions from string to
    integer are required, and that the intent of that function
    is to store the converted result into the "appropriate"
    internal rep. On architectures with just long, it'll use
    that. On architectures where long and wide are different,
    it'll use the smallest type that can represent the
    converted value correctly.

    I'm still unclear why having both TclIntType and
    TclWideIntType is a good idea. Your previous responses
    state that updating format to always convert to wides
    would cause problems. Can you expand on this? I would
    have thought the best implementation would have been
    something like:

    #if long_is_native_64_bit_type
    typedef long TclIntType;
    #else if something_else_provides_64_bits
    typedef something_else TclIntType
    #else
    typedef int TclIntType
    #endif

    That way, there's only ever one internal representation
    on integers, that always provides the maximal "capacity"
    that the underlying CPU/toolchain/OS supports.

    There appears to be a problem with the current design
    where multiple types are used. If I have two variables
    setup as shown below:

    set a 0x7fffffff
    set b 0x80000000

    then if I interpret these as integers somewhere, the
    internal rep for a becomes TclIntType, whereas the
    internal rep for b becomes TclWideIntType.

    Now, there's no problem with this from the point of view
    of a Tcl script writing provided this is completely
    transparent to them. However, there are subtle
    side-effects that are visible from Tcl scripts.

    For example, see the following script:

    --------------------
    set a 0x7fffffff
    set b [expr $a + 1]

    set c 0x80000000

    puts "b: [format %d $b] [format %ld $b] [format %x $b]
    [format 0x%lx $b]"
    puts "c: [format %d $c] [format %ld $c] [format %x $c]
    [format 0x%lx $c]"
    --------------------

    On my system (same as platform info in a previous
    response below), this produces the output below (using
    a copy of Tcl without the hack I mentioned in a previous
    response to this bug report)

    --------------------
    b: -2147483648 -2147483648 80000000 0xffffffff80000000
    c: -2147483648 2147483648 80000000 0x80000000
    --------------------

    Given that I didn't tell Tcl which storage size to use
    (I don't think there's any way to do that right?) and
    given I know Tcl is capable of storing 64bit integers
    on my platform, I would expect both b and c to contain
    the exact same value, or at least appear that way in
    any Tcl operation, such as formatting or equality tests.

    I see three possible fixes for this integer overflow
    problem.:

    a) Update INST_ADD (and much other code) to cover all
    the corner cases requiring promotion of the internal
    type. This would probably be a lot of work -
    complicated and error-prone.

    b) Update all Tcl internal math to use wides, then
    back-convert to int/long if it'll fit on architectures
    where they're different. I'm not sure what the point
    of having two types would be then.

    c) Get rid of the two types and just have one.

    On top of this, the Tcl code has a huge amount of
    conditional code to support the two different internal
    reps of integer values. The code could be *drastically*
    simpler with a single maximal-size representation -
    fewer bugs and easier maintenance!

    From a general philosphical perspective on how/why Tcl
    deals with platforms where long isn't 64bit but something
    else is, can you comment on all this so I understand why
    everything is the way it is, and if there's anything I
    can do to write my Tcl code so it behaves more like I
    expect it to in the code sample above.

    Having just found TIP72, I see there is some supposed
    need for a 32-bit type to remain for backwards
    compatibility. However, I don't know that anything ever
    stated that Tcl integers were categorically 32-bit before
    this TIP was implemented. Indeed, the code looks like
    integers were 32-bit or 64-bit depending on what the
    underlying CPU/toolchain/OS supported as "long".

    As such, I contend that any extensions etc. that rely on
    int being 32-bit are broken and I would rather see *that*
    code be fixed, instead of massively complicate the Tcl
    code to continue supporting this incorrect extension
    coding.

    I don't honestly see how Tcl can be easily 100% fixed
    with the current dual-type structure - and even if it is
    fixed now, it's just too easy to accidentally introduce
    more problems of this exact same type in the future.

    Thanks for any comments!

     
  • Stephen Warren

    Stephen Warren - 2003-04-18

    Logged In: YES
    user_id=728162

    Damn. Stupid back button in the browser?!

    Anyway, yes, this still happens on the latest CVS HEAD
    assuming that's what I get if I type this:

    cvs -z3 -
    d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/tcl co tcl

    The script given in my initial comment, followed by the one in
    my latest comment, now print this (i.e. unchanged)

    tcl_patchLevel: 8.5a0
    tcl_platform(byteOrder) = littleEndian
    tcl_platform(machine) = i686
    tcl_platform(os) = Linux
    tcl_platform(osVersion) = 2.4.7-10smp
    tcl_platform(platform) = unix
    tcl_platform(user) = swarren
    tcl_platform(wordSize) = 4
    Test ok_1: a EQUALS b
    Test ok_2: a EQUALS b
    Test error_1: a EQUALS b
    Test error_2: a NOT EQUAL b
    b: -2147483648 -2147483648 80000000 0xffffffff80000000
    c: -2147483648 2147483648 80000000 0x80000000

     
  • Donal K. Fellows

    • status: open-fixed --> closed-fixed
     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    What an unpleasant type demotion bug this has been.

     
  • Stephen Warren

    Stephen Warren - 2003-09-19

    Logged In: YES
    user_id=728162

    Hmmm. This bug is closed, but I don't see any comments
    indicating why. Is it fixed in a new release, or was it
    accidentally closed?

    Thanks.

     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    It's closed because I fixed it. :^)
    The fix went in in time for 8.4.3, so any version of Tcl
    from then on will have it.