|
From: <no...@so...> - 2001-10-07 11:50:27
|
Bugs item #219289, was opened at 2000-10-25 22:10 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=110894&aid=219289&group_id=10894 Category: 10. Objects Group: = 8.1 Status: Open Resolution: None Priority: 5 Submitted By: Nobody/Anonymous (nobody) >Assigned to: Jeffrey Hobbs (hobbs) Summary: Tcl doesn't read utf-16 files properly Initial Comment: OriginalBugID: 2129 Bug Version: 8.1 SubmitDate: '1999-05-26' LastModified: '1999-10-21' Severity: MED Status: UnAssn Submitter: pat ChangedBy: hobbs RelatedBugIDs: 2128 OS: Other Machine: NA FixedDate: '2000-10-25' ClosedDate: '2000-10-25' Name: Victor Wagner ReproducibleScript: tclsh8.1 >testfile <<! fconfigure stdout -encoding unicode puts "\uFEFFSome stuff ! dd if=testfile of=testin conv=swab tclsh8.1 <<! set f [open testin] fconfigure \ -encoding unicode fconfigure stdout -encoding [encoding system] puts [read \] ! ObservedBehavior: When reading utf-16 file Tcl doesn't recongize byte order automatically. According to utf-16 specification, each unicode text file should contain "\uFEFF" as first character, which allows to determine byte order automatically. But Tcl doesn't pay attention to this characher. So, if proper UTF-16 file is constructed on MSB-first machine and then read by Tcl on LSB-first machine of vice versa, it doesn't read properly. On contrary, MS-Word 97 read such files without problems (at least ones with less than 256 chars) DesiredBehavior: When reading unicode files, tcl should pay attention to firsh characher and if it is FFFE, switch to opposite byte order. ---------------------------------------------------------------------- >Comment By: Donal K. Fellows (dkf) Date: 2001-10-07 04:50 Message: Logged In: YES user_id=79902 This is not straight-forward to fix, since the BOM should only be read at the start of the file (this gets awkward when working with files that are in several encodings.) Perhaps it needs some special kind of encoding or a state-flag in the channel structure? (Of course, the encoding system does not know about channels at all. Maybe the decision needs to be made there?) ---------------------------------------------------------------------- You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=110894&aid=219289&group_id=10894 |