From: William F. D. <wil...@th...> - 2006-10-03 18:28:15
|
It was pointed out to me by a user that flexml handles character references to non-ascii characters incorrectly. This is a little worse than we admit in the BUGS section of the man page ("The character set is merely ASCII...") because your document might have the (nine byte ascii) sequence   but the only thing that will show up in {pcdata} is the single byte 0x0A (that is, a \n character) -- flexml has thrown away the high byte altogether. Here is my question: what *should* flexml do with character references like this? Possible alternatives: 1) Issue a warning and insert nothing into {pcdata}; 2) Issue a warning and insert a dummy character into {pcdata}; 3) Put the two bytes (0x20, 0x0A) into {pcdata}; 4) Put the UTF-8 encoding of 0x200A into {pcdata}. 5) something else? What do you think? -- William F Dowling wil...@th... www.scientific.thomson.com |