|
From: Poor Y. <org...@po...> - 2023-01-03 01:39:20
|
On 2022-12-31 13:29, Schelte Bron wrote:
> On 30/12/2022 23:13, Poor Yorick wrote:
>> Someone who wanted to pinpoint an encoding error encountered using
>> [gets] could then switch to [read] for that purpose, picking up where
>> [gets] logically left off.
>
> If I understand correctly, the code would need to look something like
> this:
>
> set fd [open strictencoding.txt]
> fconfigure $fd -encoding utf-8 -strictencoding 1
> try {
> set linenum 1
> while {[gets $fd line] >= 0} {
> puts $line
> incr linenum
> }
> } trap {POSIX EILSEQ} {err info} {
> catch {read $fd} err info
> set charnum [expr {[string length [dict get $info -result]] +
> 1}]
> puts stderr "$err at line $linenum, character $charnum"
> }
> close $fd
>
> Running this with Ashok's example data (a\nb\xc0\nc\n) in
> strictencoding.txt should report the error is at line 2, character 2.
> It doesn't. It says line 2, character 1. That's because [dict get $info
> -result] returns "". Not "b" as I expected.
>
>
> Schelte.
>
Rolf reported this issue regarding [gets] hanging indefinitely:
https://core.tcl-lang.org/tcl/info/154ed7ce56
On the "trunk-encodingdefaultstrict" branch I've fixed that issue:
https://core.tcl-lang.org/tcl/info/003c9e1f2e53312b
As of that commit, the example you posted works as you described:
set chan [open test22.data wb]
puts -nonewline $chan a\nb\xc0\nc\n
close $chan
set fd [open test22.data]
fconfigure $fd -encoding utf-8 -encodingstrict 1
try {
set linenum 1
while {[gets $fd line] >= 0} {
puts $line
incr linenum
}
} trap {POSIX EILSEQ} {err info} {
catch {read $fd} err info
set charnum [expr {[string length [dict get $info -result]] + 1}]
puts stderr "$err at line $linenum, character $charnum"
}
close $fd
The output is:
a
error reading "file*": illegal byte sequence at line 2, character 2
Because "-nocomplain" has also been eliminated on that branch, the
option
"-strictencoding" has been changed to "-encodingstrict".
On that branch I am currently working on fixing every encoding/decoding
behaviour that is less than ideal. Any other illustrations of undesired
or
desired behaviour on that branch are very welcome.
--
Yorick
|