From: Alexandre F. <ale...@gm...> - 2009-02-09 12:28:51
|
TIP #345: KILL THE '''IDENTITY'''ENCODING =========================================== Version: $Revision: 1.1 $ Author: Alexandre Ferrieux <alexandre.ferrieux_at_gmail.com> State: Draft Type: Project Tcl-Version: 8.7 Vote: Pending Created: Thursday, 05 February 2009 URL: http://purl.org/tcl/tip/345.html WebEdit: http://purl.org/tcl/tip/edit/345 Post-History: ------------------------------------------------------------------------- ABSTRACT ========== This TIP proposes to remove the 'identity' encoding which is the Pandora's Box of invalid UTF-8 string representations. BACKGROUND ============ The contract of string representations in Tcl states that the /bytes/ field (the *strep*) of a Tcl_Obj must be a valid UTF-8 byte sequence. Violating it leads at best to inconsistent and shimmer-sensitive string comparisons. Fortunately, nearly all of the Tcl code takes careful steps to enforce it. With one exception: the 'identity' encoding. Indeed, this encoding allows any byte sequence to be copied verbatim into the strep of a value, as a side-effect of a strep computation on a ByteArray with [*encoding system*]=="identity", or through [*encoding convertfrom identity*]. Hence an invalid UTF-8 sequence can easily make it to the strep and start wreaking havoc. PROPOSED CHANGE ================= This TIP proposes to simply close that single window to the dark side. RATIONALE =========== The risk of compatibility breakage is inordinately mild in that case, since it has only ever been documented in tcltest. REFERENCE EXAMPLE =================== See Bug 2564363 [<URL:https://sourceforge.net/support/tracker.php?aid=2564363>] COPYRIGHT =========== This document has been placed in the public domain. ------------------------------------------------------------------------- TIP AutoGenerator - written by Donal K. Fellows |