From: Lars H. <Lar...@re...> - 2012-09-28 08:34:30
|
MartinLemburg@Siemens-PLM skrev 2012-09-27 11.10: > Hi, > > I needed to "format" a binary string with a series of "shorts" or WORDs. > > The values I provided to "binary format s" were out of range, but "binary format" accepted them. > > I am surprised, that providing integers, that are out of range are accepted to "format" specified integer of a lower range by an implicit cast: > > % set v [clock seconds] > 1348734108 > % format {%#x %#x} {*}[scan [set bin [binary format s $v]] {%c%c}] > 0x9c 0xc > % set s [expr {$v& 0xFFFF}] > 3228 > % expr {$s == [binary scan $bin s s2; set s2]} > 1 > > This behavior is not documented, is it wanted or just a bug? I'd bet it's being relied upon in places, at the very least to ignore the (un)signedness of numbers. > And why the following: > > % binary format s [clock milliseconds] > integer value too large to represent > > Is the acceptable integer range still limited to non-wide integers? From RTFS (function FormatNumber in tclBinary.c), the numeric value is indeed being parsed by TclGetLongFromObj for 32-bit integers and less. > Why not here casting implicitly? Because this part of the Tcl runtime library (get number from Tcl_Obj) has only been partially adapted to the reality of (nearly) unlimited precision arithmetic, and therefore tends to get edge cases wrong. Realising this has however been a slow process, since many big names here still seem to believe that integer=long (and therefore had to come up with "entier" for the normal meaning of integer). Conceretly, it certainly makes sense that Tcl_GetIntFromObj, Tcl_GetLongFromObj, and Tcl_GetWideIntFromObj return something that fits in the named C types (no matter how compiler-dependent these technically are allowed to be) or throws a Tcl error if they can't, because chances are that a C funtion calling one of these functions will shortly do something with the value that requires it to have this specific type, and catching an error at that stage would be much harder. The problem is that the logic for can't/can (and if so how) that is applied doesn't match the needs of all that many use-cases (if any at all). Changing it for the existing functions is probably not much of an option, so what one would need is a new family of Tcl_Get* functions that lets the caller specify how the edge cases should be handled (e.g. using a flag argument). "Ignore bits that are too high" (a.k.a. casting) is certainly appropriate in some cases. In others (particularly list and string indexing), one might rather want to clamp the result to the representable range, to avoid silly irregularities such as lindex x 1234567890 returning an empty string but lindex x 12345678901 throwing a syntax-looking error. There is also the issue of signed versus unsigned that ought to be addressed; currently Tcl only has facilities for getting signed numbers out of a Tcl_Obj. Lars Hellström |