Re: [gobo-eiffel-develop] UTF-16LE

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

>>>>> "Emmanuel" == Emmanuel Stapf [ES] <Emmanuel> writes:

    Emmanuel> Looks strange to me that the BOM is part of the
    Emmanuel> string. In my opinion it should not.

The Unicode Consortium decide these matters, not us.

    Emmanuel> well, you want to have a smart `make_from_utf16' that
    Emmanuel> uses the BOM to identify wether it is little endian or
    Emmanuel> big endian?

No, we already have that.

    Emmanuel> You are better off having `make_from_utf16
    Emmanuel> (a_content: STRING; a_bom: STRING)' (assuming your
    Emmanuel> make_from_utf16 takes a string).

No we are not better off. There are three situations:

1) The byte-string that we receive from wherever is in Unicode
   Encoding Scheme UTF-16. In this case it may start with a BOM, or it
   may not. If the latter, then big-endian is assumed, as per the
   Unicode specification. If the former, then the BOM is used to
   determine the endianness, and then discarded.
   This is creation procedure make_from_utf16.

2) The byte-string that we receive from wherever is in Unicode
   Encoding Scheme UTF-16BE. In this case no BOM is permitted. If the
   first two bytes indicate a zero-width non-breaking space, then that
   will be the first character in the string - it is not treated as a
   BOM and is not discarded.
   This is creation procedure make_from_utf16be.

3) The byte-string that we receive from wherever is in Unicode
   Encoding Scheme UTF-16LE. In this case no BOM is permitted. If the
   first two bytes indicate a zero-width non-breaking space, then that
   will be the first character in the string - it is not treated as a
   BOM and is not discarded.
   This is creation procedure make_from_utf16le.
-- 
Colin Adams
Preston Lancashire