David Goodger wrote:
> U+FEFF is only interpretable as a byte order mark if it's the
> first character of a file/stream. Within the text, U+FEFF is
> a "zero-width non-breaking space".
U+FEFF's "use as an indication of non-breaking is deprecated; see U+2060
WORD JOINER instead." So we can be reasonably safe to assume that it is
indeed a BOM.
And BOMs do occur in the middle of a text, when concatenating BOM-marked
<http://www.unicode.org/faq/utf_bom.html#38> suggests that U+FEFF in the
middle of a text be treated as a zero-width space "for backwards
compatibility," with the intention that the character be effectless.
However, since a U+FEFF character can change the semantics of a reST
file significantly, this seems like a bad idea for reST.
The FAQ-entry also says: "When designing a markup language or data
protocol, the use of U+FEFF can be restricted to that of Byte Order
Mark. In that case, any U+FEFF occurring in the middle of the file can
be ignored, or treated as an error." So in this case it seems to be
most appropriate to ignore (= delete) U+FEFF characters.
When replying to my email address, please ensure
that the mail header contains 'Felix Wiemann'.