Menu

#641 Ability to perform stricter encoding conversion checks

open
nobody
None
5
2012-12-18
2012-12-18
Kieran
No

Currently, Tcl never throws an error when:

1) Encountering an invalid byte sequence in the
source encoding when performing an "encoding
convertfrom" or while reading from a channel.

2) Encountering a Unicode character that cannot
be represented in the target encoding when
performing an "encoding convertto" or while
writing to a channel.

For example, these commands succeed
despite the data being incorrect:

$ tclsh
% encoding convertto iso8859-1 "\u4E24"
?
% encoding convertfrom utf-8 "\xC3\x28"
Ã(

This behaviour is often convenient if one wants
ones app to carry on "working" whatever happens,
but it's less desirable if one wants security or
correctness.

I feel Tcl would benefit from allowing a user to
request stricter encoding conversions.

Perhaps one way to do this would be to add a -strict
option to the encoding convert* commands:

encoding convertfrom ?-strict? ?encoding? data
encoding convertto ?-strict? ?encoding? string

And add a "-strictEncoding boolean" option to
fconfigure and chan configure.

I guess there might be other Tcl commands that
perform encoding conversions that might need to
be considered ("source" is one that comes to mind).

I'm not familiar enough with the C-level API to
propose changes there.

Alternatively, if this was targetted for Tcl 9 then
strictness could be made the default ...

Discussion

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.