|
From: William F. D. <wil...@th...> - 2006-10-03 18:28:15
|
It was pointed out to me by a user that flexml handles character
references to non-ascii characters incorrectly. This is a little worse
than we admit in the BUGS section of the man page ("The character set is
merely ASCII...") because your document might have the (nine byte ascii)
sequence
 
but the only thing that will show up in {pcdata} is the single byte 0x0A
(that is, a \n character) -- flexml has thrown away the high byte
altogether.
Here is my question: what *should* flexml do with character references
like this? Possible alternatives:
1) Issue a warning and insert nothing into {pcdata};
2) Issue a warning and insert a dummy character into {pcdata};
3) Put the two bytes (0x20, 0x0A) into {pcdata};
4) Put the UTF-8 encoding of 0x200A into {pcdata}.
5) something else?
What do you think?
--
William F Dowling
wil...@th...
www.scientific.thomson.com
|