From: SourceForge.net <no...@so...> - 2011-12-29 15:42:41
|
Bugs item #3466099, was opened at 2011-12-27 09:31 Message generated for change (Settings changed) made by nijtmans You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3466099&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: 44. UTF-8 Strings Group: current: 8.5.11 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Donal K. Fellows (dkf) Assigned to: Jan Nijtmans (nijtmans) Summary: BOM in Unicode Initial Comment: I was reading about the problems that some people are having with Tcl scripts on Windows due to that platform's insistence on putting a byte-order mark at the start of a UTF-8 file. (Arguably wrong, but we're stuck with it.) For reference: https://groups.google.com/group/comp.lang.tcl/browse_frm/thread/cb6fbae11b95fac6/c4211cabc90a8b30?hl=en#c4211cabc90a8b30 I was wondering if the most effective way of dealing with this would be to make Tcl treat a stray BOM as whitespace for the purpose of script parsing? I don't know exactly how practical this is, but it would make the most pressing part of the problem Go Away. ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2011-12-28 15:32 Message: First attempt implemented in branch bug-3466099 Donal, do you see any negative effects of this? The disadvantage is that any stream which does not contain a BOM will need to seek to the start, and be read again in (possibly) another encoding... Still, I think this is the way I would go. Any feedback is highly appreciated! ---------------------------------------------------------------------- Comment By: Donal K. Fellows (dkf) Date: 2011-12-28 10:13 Message: Makes sense. I think we can get away just fine with making Tcl_FSEvalFileEx assume that the file's contents are supposed to be a script and so do a bit more magic than normal. (Theoretically, we also ought to think about doing progressive evaluation of "large" files, say over 1MB. That's for another time.) ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2011-12-28 01:26 Message: I think I would modify Tcl_FSEvalFileEx such that when it encounters a BOM as first character (in any of the forms allowed by Unicode), it would switch the encoding accordingly. Then it would work with UTF-16 as well, in both little- and big-endian formst. It will be about the same amount of work. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3466099&group_id=10894 |