Thanks for reporting this. Until recently the serialization specs were a =
more liberal about this kind of thing: the serialization specs have been =
of the last things to stabilize and Saxon hasn't yet caught up in all =
If the input is <x><!-- =E9 --></x> then Saxon 6.5.5 behaves the same as =
8.7 - it allows Java to replace the =E9 with a substitute character. In =
case the comment preceded the first element, and this then falls foul of =
horrible code for dealing with comments and PIs that are output before =
system knows whether the output is going to be HTML or XML. (Saxon 6.5.5
outputs these internally as a text node using disable-output-escaping, =
the spec says d-o-e can be ignored for a text node containing =
I've now documented the bug and fixed the code for both versions, and =
added test cases.
> -----Original Message-----
> From: saxon-help-admin@...
> [mailto:saxon-help-admin@...] On Behalf Of=20
> David Carlisle
> Sent: 04 April 2006 14:05
> To: saxon-help@...
> Subject: [saxon] Re: [xsl] I18N / UTF-8 versus US-ASCII
> I just sent this to xsl-list as it's part of a thread there,=20
> but since I
> seem to have raised a bug report for both saxon6 and 8 I thought I
> should send here as well.
> I wrote
> > Of course the other cases where you can not use a=20
> restricted encoding
> > are cases where the element or attribute names use=20
> non-ascii characters.
> or in comments or processsing instructions or CDATA sections.
> An XSL system will just avoid using CDATA sections if it=20
> needs to write a
> character reference, but even an "identity" transform will=20
> die if there
> is a non ascii character in a comment in the source and the stylesheet
> has <xsl:output encoding=3D"US-ASCII"/>
> After writing the above I made a small test file to=20
> demonstrate this but.....
> I hope this gets through without having non-ascii character mangled.,
> the xml source is supposed to have a latin1-encode e acute.
> <?xml version=3D"1.0" encoding=3D"iso-8859-1"?>
> <!-- =E9 -->
> and the stylesheet just copies everything:
> xmlns:xsl=3D"http://www.w3.org/1999/XSL/Transform" version=3D"1.0">
> <xsl:output encoding=3D"US-ASCII"/>=20
> <xsl:template match=3D"/">
> <xsl:copy-of select=3D"."/>
> unfortunately I think both saxon 6 and 8 get this wrong, I'll forward
> this to saxon's bug reporting list.
> $ saxon comment.xml comment.xsl=20
> <?xml version=3D"1.0" encoding=3D"US-ASCII"?><!-- é --><x/>
> saxon6.5.4 seems to have made the comment into text so that=20
> it could use a
> character reference for the e-acute.
> $ saxon8 comment.xml comment.xsl
> Warning: Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor
> <?xml version=3D"1.0" encoding=3D"US-ASCII"?><!-- ? --><x/>
> saxon 8.7J keeps the comment but converts the non printable=20
> character to
> a ?, I think that it's supposed to moan with err:SERE0008
> This e-mail has been scanned for all viruses by Star. The
> service is powered by MessageLabs. For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> This SF.Net email is sponsored by xPML, a groundbreaking=20
> scripting language
> that extends applications into web and mobile media. Attend=20
> the live webcast
> and join the prime developer group breaking into this new=20
> coding territory!
> saxon-help mailing list