I have taken an XML file (with numeric entities in) and transformed it to another XML file, through treebeard, using a stylesheet. The two input files are UTF-8, the output is UTF-8.
Now when I take that output file and feed it back into treebeard, it complains about invalid UTF-8 sequences. How is that possible? The file was created by treebeard, and so it should be completely well-formed. I am not editing the output file from the first stage, so it is quite literally passing the data straight back in again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have taken an XML file (with numeric entities in) and transformed it to another XML file, through treebeard, using a stylesheet. The two input files are UTF-8, the output is UTF-8.
Now when I take that output file and feed it back into treebeard, it complains about invalid UTF-8 sequences. How is that possible? The file was created by treebeard, and so it should be completely well-formed. I am not editing the output file from the first stage, so it is quite literally passing the data straight back in again.