Re: [toolbox] Extended Attributes
Status: Planning
Brought to you by:
jlaurens
From: Maxwell, A. R <ada...@pn...> - 2005-07-06 21:31:23
|
On Jul 6, 2005, at 13:42, Maarten Sneep wrote: > On 5 Jul 2005, at 23:33, Maxwell, Adam R wrote: > > >> I think XeTeX accepts either UTF-16 or UTF-8. >> > > I doubt it, I think it only eats UTF-8. Besides, UTF-16 is very > recognisable when read as a sequence of bytes (a lot of zero's are > interspersed in the sequence). Doubt no longer: <http://www.tug.org/pipermail/xetex/2004-October/ 001153.html>. >> Also, trying to read Latin 1 files in as UTF-8 will cause an error >> in Cocoa, and you get nothing back >> > > Well, read it as a sequence of unsigned bytes, discard all values > > 127, and start searching for the marker that indicates the > encoding. If you encounter a lot of zero's, you probably have > UTF-16. Once you've found the real encoding, close the file, and re- > open with the now known encoding. I interpreted your earlier suggestion as "read all files as UTF-8," so I was just trying to point out that this will not work in all cases. Sniffing bytes to interpret a comment is probably possible, although messy, but reading the encoding from a comment will still only work if the information is up to date (i.e. machine-written when saving). For that, I think you'll need some comment block that says "Don't edit this," since users are notorious for unknowingly changing encodings. Adam |