Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#641 Ability to perform stricter encoding conversion checks

open
nobody
None
5
2012-12-18
2012-12-18
Kieran
No

Currently, Tcl never throws an error when:

1) Encountering an invalid byte sequence in the
source encoding when performing an "encoding
convertfrom" or while reading from a channel.

2) Encountering a Unicode character that cannot
be represented in the target encoding when
performing an "encoding convertto" or while
writing to a channel.

For example, these commands succeed
despite the data being incorrect:

$ tclsh
% encoding convertto iso8859-1 "\u4E24"
?
% encoding convertfrom utf-8 "\xC3\x28"
Ã(

This behaviour is often convenient if one wants
ones app to carry on "working" whatever happens,
but it's less desirable if one wants security or
correctness.

I feel Tcl would benefit from allowing a user to
request stricter encoding conversions.

Perhaps one way to do this would be to add a -strict
option to the encoding convert* commands:

encoding convertfrom ?-strict? ?encoding? data
encoding convertto ?-strict? ?encoding? string

And add a "-strictEncoding boolean" option to
fconfigure and chan configure.

I guess there might be other Tcl commands that
perform encoding conversions that might need to
be considered ("source" is one that comes to mind).

I'm not familiar enough with the C-level API to
propose changes there.

Alternatively, if this was targetted for Tcl 9 then
strictness could be made the default ...

Discussion