[Tcl-bugs] [ tcl-Bugs-735364 ] Imprecise description of binary scan char 'a'

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Bugs item #735364, was opened at 2003-05-09 21:05
Message generated for change (Comment added) made by setok
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=735364&group_id=10894

Category: 12. ByteArray Object
Group: = 8.4.2
Status: Open
Resolution: None
Priority: 5
Submitted By: Kristoffer Lawson (setok)
Assigned to: Donal K. Fellows (dkf)
Summary: Imprecise description of binary scan char 'a'

Initial Comment:
The manpage states of the scan character 'a', "The data
is a character string of length _count_". However it
does not specify what this length is referring to. Does
it mean number of characters or number of bytes? As it
talks of a character string, this would lead one to
believe that it means number of characters, yet the
implementation apparently is for number of bytes.

If it actually means bytes, this should be clearly
mentioned and instead of 'character string' the term
'byte array', or something similar, might be more
appropriate.

----------------------------------------------------------------------

>Comment By: Kristoffer Lawson (setok)
Date: 2003-07-06 04:15

Message:
Logged In: YES 
user_id=137542

I'm getting confused with the discussion here. Isn't it just
easiest to document the 'a' specifier as taking a count of
bytes? Assuming the string just contains a byte array. Why
does one need to bother about encoding? Take whatever is
there directly as a byte array. That's at least exactly the
behaviour I would want ...

IMO strings are always byte arrays! Just that one character
might use several bytes.

----------------------------------------------------------------------

Comment By: Donal K. Fellows (dkf)
Date: 2003-07-04 15:06

Message:
Logged In: YES 
user_id=79902

Strictly, the behaviour of [binary scan] (or any other code 
that converts strings to ByteArrayObjs) is only fully defined 
when the input string only contains characters in the range 
\u0000-\u00FF.  Strings are not byte arrays, but byte arrays 
can be encoded in strings.

We do not define what encoding is used with the 'a' [binary 
scan] specifier; perhaps we should (I think we use ISO8859-1 
though [encoding system] would also be reasonable.)

----------------------------------------------------------------------

Comment By: Pat Thoyts (patthoyts)
Date: 2003-05-13 12:32

Message:
Logged In: YES 
user_id=202636

Lets illustrate this:
set s "\u266b\u266a" ;# two unicode characters.
string length $s  -> 2
string bytelen $s   -> 6 (ok counting nul terminator as well)
binary scan $s c* r   -> 1
set r         -> 107 106  - so just the low byte of each
character
binary scan %s a* r  ->  1
set r         -> kj  - ascii representation of the low byte
of each char.

Maybe I'm missing something to do with encodings?

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2003-05-12 19:57

Message:
Logged In: NO 

If bytes are characters then this too has to be defined and
a new name invented for what I would consider characters.
Normally, with wide characters and Unicode I would not
consider characters to be the same as bytes. One unicode
character can use up more than one byte.

----------------------------------------------------------------------

Comment By: Donal K. Fellows (dkf)
Date: 2003-05-12 11:42

Message:
Logged In: YES 
user_id=79902

Bytes are characters.  (I believe the conversion to byte
array truncates...)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110894&aid=735364&group_id=10894

[Tcl-bugs] [ tcl-Bugs-735364 ] Imprecise description of binary scan char 'a'

The Tool Command Language implementation

[Tcl-bugs] [ tcl-Bugs-735364 ] Imprecise description of binary scan char 'a'