Re: [TCLCORE] [Fwd: Re: UTF8 conversion problem in Tcl - tclUtf.c]

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> There is one thing in Tcl that uses it: The encoding of the null byte. 
> It is encoded as non shortest sequence. I don't know if it is sent over 
> the net in this form if a socket is configured as -encoding utf-8

I think we should be careful in this discussion to distinguish
the working of the [encoding] command and the -encoding option to
[fconfigure] from the "internal encoding" used by Tcl's C API.

We've called that "internal encoding" UTF-8.  That's never been
entirely true, AIUI, and as the true UTF-8 standard has evolved,
it's apparently less true now.  Perhaps we should be more careful
in our descriptions and documentation (apparently CESU is a
better name for things closer to what we're doing) so that we
don't mislead people, but I don't see any reason Tcl should need
to change its internals.  No standard body should have a care
about how Tcl's internals are organized.

What we perhaps do need to do is provide sufficient tools with
our [encoding] command and our -encoding option to allow Tcl
application programmers to create programs and libraries that
conform to the UTF-8 spec laid down in RFC 3629.  I think that
creation of a new encoding, "utf-8-rfc3629" might be sufficient to
address that issue.

When using that new encoding, which presumably would not accept invalid
UTF-8 input, we'd need to sort out among Donal's options of how to
react to invalid UTF-8.

Note that none of Tcl's current encodings have any script-level
reaction to invalid input.  The TCL_CONVERT_SYNTAX return code
from Tcl_ExternalToUtf() is silently ignored.

| Don Porter          Mathematical and Computational Sciences Division |
| don...@ni...             Information Technology Laboratory |
| http://math.nist.gov/~DPorter/                                  NIST |
|______________________________________________________________________|

Re: [TCLCORE] [Fwd: Re: UTF8 conversion problem in Tcl - tclUtf.c]

The Tool Command Language implementation

Re: [TCLCORE] [Fwd: Re: UTF8 conversion problem in Tcl - tclUtf.c]