From: SourceForge.net <no...@so...> - 2006-06-20 20:36:48
|
Bugs item #219289, was opened at 2000-10-26 01:10 Message generated for change (Comment added) made by dgp You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=219289&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 10. Objects Group: obsolete: 8.1 Status: Open Resolution: None Priority: 3 Submitted By: Nobody/Anonymous (nobody) Assigned to: Jeffrey Hobbs (hobbs) Summary: Tcl doesn't read utf-16 files properly Initial Comment: OriginalBugID: 2129 Bug Version: 8.1 SubmitDate: '1999-05-26' LastModified: '1999-10-21' Severity: MED Status: UnAssn Submitter: pat ChangedBy: hobbs RelatedBugIDs: 2128 OS: Other Machine: NA FixedDate: '2000-10-25' ClosedDate: '2000-10-25' Name: Victor Wagner ReproducibleScript: tclsh8.1 >testfile <<! fconfigure stdout -encoding unicode puts "\uFEFFSome stuff ! dd if=testfile of=testin conv=swab tclsh8.1 <<! set f [open testin] fconfigure \$f -encoding unicode fconfigure stdout -encoding [encoding system] puts [read \$f] ! ObservedBehavior: When reading utf-16 file Tcl doesn't recongize byte order automatically. According to utf-16 specification, each unicode text file should contain "\uFEFF" as first character, which allows to determine byte order automatically. But Tcl doesn't pay attention to this characher. So, if proper UTF-16 file is constructed on MSB-first machine and then read by Tcl on LSB-first machine of vice versa, it doesn't read properly. On contrary, MS-Word 97 read such files without problems (at least ones with less than 256 chars) DesiredBehavior: When reading unicode files, tcl should pay attention to firsh characher and if it is FFFE, switch to opposite byte order. ---------------------------------------------------------------------- >Comment By: Don Porter (dgp) Date: 2006-06-20 16:36 Message: Logged In: YES user_id=80530 See 1165752 ---------------------------------------------------------------------- Comment By: Jeffrey Hobbs (hobbs) Date: 2001-10-09 18:21 Message: Logged In: YES user_id=72656 The user noted MS Word as an example, and this really is a feature for the application, not the language. The developer should test for \uFEFF or \uFFEF, but Tcl does need to provide a way to handle utf-16 in MSB or LSB order (either as two absolute encodings, or some option). ---------------------------------------------------------------------- Comment By: Andreas Kupries (andreas_kupries) Date: 2001-10-09 11:16 Message: Logged In: YES user_id=75003 Also note that there are channels like pipes and sockets which might use utf-16 but have no 'start of file' concept per se. IMHO this is a problem to be handled by the application and not the interpreter. ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2001-10-07 07:50 Message: Logged In: YES user_id=79902 This is not straight-forward to fix, since the BOM should only be read at the start of the file (this gets awkward when working with files that are in several encodings.) Perhaps it needs some special kind of encoding or a state-flag in the channel structure? (Of course, the encoding system does not know about channels at all. Maybe the decision needs to be made there?) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=219289&group_id=10894 |