Re: [Simple-support] Valid XML characters and String recoding
Brought to you by:
niallg
|
From: Dawid W. <daw...@gm...> - 2012-09-13 10:06:02
|
> Are you suggesting there is a bug?
Depends how you look at it. It's a confluence of different things --
the fact that Java Strings can be invalid unicode, the fact that XML
does not allow all unicode characters to appear in the stream, etc. I
think it would be nicer by default if the framework had an invariant
of _always_ producing valid XML and from this point of view it is a
bug. Try to serialize:
@Root("abc")
public class Abc {
@Attribute
public String s = new String("\u0000");
}
and you'll see this produces an XML that won't parse with any
(conforming) parser. It's an invalid XML.
> I think for faster processing something could be done in the org.simpleframework.xml.stream.Formatter object
Yes, from a quick look at it, it could also be used for this purpose.
> You could also write your own java.io.Writer object. If these are not suitable then I think your Transform<String> is probably best.
I don't think a custom Writer would be a better solution. It is a
derivative of XML specification so if simple-xml would want to be
strictly conformant it should, for example, reject attribute and
element names that contain unmappable XML characters, remap or reject
unmappable characters inside attribute values and text blocks, etc.
The attached patch solves the problem for me, I'm just pointing out
the issue if you wanted to tackle it in a more general way. A
randomized test case trying out if it work is also here:
http://goo.gl/GyDWm
Dawid
|