Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#1241 fcopy does not respect encodings

obsolete: 8.2.2
closed-fixed
6
2001-05-19
2000-10-26
Anonymous
No

OriginalBugID: 3662 Bug
Version: 8.2.2
SubmitDate: '1999-11-23'
LastModified: '1999-12-07'
Severity: SER
Status: Assigned
Submitter: techsupp
ChangedBy: hobbs
OS: All
FixedDate: '2000-10-25'
ClosedDate: '1999-12-06'

Name:
Nikolai Saoukh

set out [open k.txt w]
fconfigure $out -encoding koi8-r
puts $out "\u0410\u0410"
close $out

set in [open k.txt r]
fconfigure $in -encoding koi8-r

set out [open u.txt w]
fconfigure $out -encoding utf-8

fcopy $in $out

close $in
close $out

It's not certain whether this is an RFE or bug, as the fcopy
man page states that it only pays heed to the -translation
option, not to -encoding. This could be a bug, since -encoding
came after fcopy was originally written. The work-around is to
replace the above fcopy with:
puts $out [read $in]
although there is no callback capability then.
-- 12/07/1999 hobbs

Discussion

1 2 > >> (Page 1 of 2)
  • I consider it as bug, an incomplete adaption of the existing commands to the new i18n features, i.e. encodings. Recategorized to OtherIO as this problem is not restricted to sockets.

     
    • priority: 5 --> 6
    • labels: 104250 --> 104247
     
    • labels: 104247 --> 24. Channel Commands
     
    • assigned_to: nobody --> andreas_kupries
     
  • Logged In: YES
    user_id=75003

    Another problem with the workaround: It uses much more
    memory than fcopy because the whole file is loaded into the
    interpreter before written back out.

     
  • Logged In: YES
    user_id=75003

    The fix for this report has to be done in "tclIO.c",
    "CopyData", which
    currently uses "DoRead" and "DoWrite" for reading from and
    writing to
    the channels involved in the copying. Exchanging these two
    calls
    against stripped down versions of "Tcl_ReadChars" and
    "Tcl_WriteChars"
    should do the trick. "Stripped down" means here that we have
    avoid the
    call to "CheckChannelErrors" in these two routines as this
    routine
    flags their usage for a channel used in an "fcopy" as an
    error. I
    would propose to move the meat of these two routines into
    two internal
    procedures "DoReadChars" and "DoWriteChars" which are then
    called from
    the original routines. The originals would retain the error
    checking. And "CopyData" can use the internal procedures to
    get its
    own work done.

    Note the following consequences of the change:

    - The system will use UTF-8 internally when copying data,
    meaning that it will consume more memory, or copy less data
    per buffer.

    - Performance will be affected negatively because of the
    additional conversions to and from UTF-8. (Side note: Do we
    have performance tests for "fcopy" in "tclbench" ?).

    The code of the channel system uses 'statePtr->encoding ==
    NULL' as
    signal that the encoding is binary, and the two "Tcl_*Chars"
    procedures have special provisions for that case.
    Unfortunately not
    very efficient as it involves ByteArray objects.

    I would propose that "CopyData" should check for binary
    translation on
    _both_ channels and fall back to the old code in such a
    case. This
    would avoid quite a lot of conversions and copying. We
    shouldn't do
    this for a mixture of binary and non-binary encodings as we
    still need
    an intermediate ByteArray to get the conversion right for
    these cases,
    causing additional complexity in the new code. Better to
    stay with the
    tried and true code for that for now.

     
  • Patch v1

     
    Attachments
  • Logged In: YES
    user_id=75003

    Uploading a patch fixing the bug, adding tests and extending
    the documentation.

     
  • Logged In: YES
    user_id=75003

    New patch, changing the fix slightly so that it doesn't do
    conversions which are not necessary. IOW, if both channels
    are set to the same encoding no conversion will occur and
    the transfer will run at the full old speed. The first patch
    did conversions in this case, lowering performance for a
    common case.

     
  • Patch v2

     
    Attachments
1 2 > >> (Page 1 of 2)