From: Scott S. <st...@aj...> - 2000-08-21 18:19:22
|
Laurent Duperval said: > Is there a document that summarises the interactions between Unicode, UTF8 > and ASCII in Tcl? Specifically, when do I know that I'm dealing with a UTF > string and when am I dealing with a Unicode string? I think we don't do > ASCII anymore, right? I don't know of a good document that summarizes Tcl's character set handling features. However, I should be able to answer your specific questions. Tcl uses UTF8 internally as the standard string representation. It also supports two other internal representation types: Unicode and binary. The only interfaces that expect anything other than UTF8 are those specific to the Unicode or binary types. The internal implementation of the regular expression engine also expects to operate on Unicode strings. Other than that everything operates in terms of null terminated UTF8 strings. Tcl does support ASCII because ASCII is a strict subset of UTF8. When it doesn't support directly any longer is ISO8859-1 (Latin 1). The upper 128 characters now result in two byte UTF8 sequences. You need to perform an encoding conversion before passing Latin 1 strings into Tcl. --Scott -- The TclCore mailing list is sponsored by Ajuba Solutions To unsubscribe: email tcl...@aj... with the word UNSUBSCRIBE as the subject. |