[TCLCORE] Re: Unicode and Utf in Tcl

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Laurent Duperval said:
 > Is there a document that summarises the interactions between Unicode, 
UTF8
 > and ASCII in Tcl? Specifically, when do I know that I'm dealing with a 
UTF
 > string and when am I dealing with a Unicode string? I think we don't do
 > ASCII anymore, right?

I don't know of a good document that summarizes Tcl's character set 
handling features.  However, I should be able to answer your specific 
questions.  Tcl uses UTF8 internally as the standard string 
representation.  It also supports two other internal representation types: 
Unicode and binary.  The only interfaces that expect anything other than 
UTF8 are those specific to the Unicode or binary types.  The internal 
implementation of the regular expression engine also expects to operate on 
Unicode strings.  Other than that everything operates in terms of null 
terminated UTF8 strings.  Tcl does support ASCII because ASCII is a strict 
subset of UTF8.  When it doesn't support directly any longer is ISO8859-1 
(Latin 1).  The upper 128 characters now result in two byte UTF8 
sequences.  You need to perform an encoding conversion before passing 
Latin 1 strings into Tcl.

--Scott

--
The TclCore mailing list is sponsored by Ajuba Solutions
To unsubscribe:  email tcl...@aj... with the 
                 word UNSUBSCRIBE as the subject.

[TCLCORE] Re: Unicode and Utf in Tcl

The Tool Command Language implementation

[TCLCORE] Re: Unicode and Utf in Tcl