Currently, Tcl never throws an error when:
1) Encountering an invalid byte sequence in the
source encoding when performing an "encoding
convertfrom" or while reading from a channel.
2) Encountering a Unicode character that cannot
be represented in the target encoding when
performing an "encoding convertto" or while
writing to a channel.
For example, these commands succeed
despite the data being incorrect:
$ tclsh
% encoding convertto iso8859-1 "\u4E24"
?
% encoding convertfrom utf-8 "\xC3\x28"
Ã(
This behaviour is often convenient if one wants
ones app to carry on "working" whatever happens,
but it's less desirable if one wants security or
correctness.
I feel Tcl would benefit from allowing a user to
request stricter encoding conversions.
Perhaps one way to do this would be to add a -strict
option to the encoding convert* commands:
encoding convertfrom ?-strict? ?encoding? data
encoding convertto ?-strict? ?encoding? string
And add a "-strictEncoding boolean" option to
fconfigure and chan configure.
I guess there might be other Tcl commands that
perform encoding conversions that might need to
be considered ("source" is one that comes to mind).
I'm not familiar enough with the C-level API to
propose changes there.
Alternatively, if this was targetted for Tcl 9 then
strictness could be made the default ...