Re: [toolbox] Extended Attributes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Jul 6, 2005, at 13:42, Maarten Sneep wrote:

> On 5 Jul 2005, at 23:33, Maxwell, Adam R wrote:
>
>
>> I think XeTeX accepts either UTF-16 or UTF-8.
>>
>
> I doubt it, I think it only eats UTF-8. Besides, UTF-16 is very  
> recognisable when read as a sequence of bytes (a lot of zero's are  
> interspersed in the sequence).

Doubt no longer: <http://www.tug.org/pipermail/xetex/2004-October/ 
001153.html>.

>> Also, trying to read Latin 1 files in as UTF-8 will cause an error  
>> in Cocoa, and you get nothing back
>>
>
> Well, read it as a sequence of unsigned bytes, discard all values >  
> 127, and start searching for the marker that indicates the  
> encoding. If you encounter a lot of zero's, you probably have  
> UTF-16. Once you've found the real encoding, close the file, and re- 
> open with the now known encoding.

I interpreted your earlier suggestion as "read all files as UTF-8,"  
so I was just trying to point out that this will not work in all cases.

Sniffing bytes to interpret a comment is probably possible, although  
messy, but reading the encoding from a comment will still only work  
if the information is up to date (i.e. machine-written when saving).   
For that, I think you'll need some comment block that says "Don't  
edit this," since users are notorious for unknowingly changing  
encodings.

Adam