From: SourceForge.net <no...@so...> - 2011-12-28 18:13:29
|
Bugs item #3466099, was opened at 2011-12-27 09:31 Message generated for change (Comment added) made by dkf You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3466099&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 45. Parsing and Eval Group: current: 8.5.11 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Donal K. Fellows (dkf) Assigned to: Jan Nijtmans (nijtmans) Summary: BOM in Unicode Initial Comment: I was reading about the problems that some people are having with Tcl scripts on Windows due to that platform's insistence on putting a byte-order mark at the start of a UTF-8 file. (Arguably wrong, but we're stuck with it.) For reference: https://groups.google.com/group/comp.lang.tcl/browse_frm/thread/cb6fbae11b95fac6/c4211cabc90a8b30?hl=en#c4211cabc90a8b30 I was wondering if the most effective way of dealing with this would be to make Tcl treat a stray BOM as whitespace for the purpose of script parsing? I don't know exactly how practical this is, but it would make the most pressing part of the problem Go Away. ---------------------------------------------------------------------- >Comment By: Donal K. Fellows (dkf) Date: 2011-12-28 10:13 Message: Makes sense. I think we can get away just fine with making Tcl_FSEvalFileEx assume that the file's contents are supposed to be a script and so do a bit more magic than normal. (Theoretically, we also ought to think about doing progressive evaluation of "large" files, say over 1MB. That's for another time.) ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2011-12-28 01:26 Message: I think I would modify Tcl_FSEvalFileEx such that when it encounters a BOM as first character (in any of the forms allowed by Unicode), it would switch the encoding accordingly. Then it would work with UTF-16 as well, in both little- and big-endian formst. It will be about the same amount of work. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3466099&group_id=10894 |