Thanks Mike.


From: Michael Kay [mailto:mike@saxonica.com]
Sent: Wednesday, June 08, 2011 6:41 PM
To: saxon-help@lists.sourceforge.net
Subject: Re: [saxon] XSLT throwing Illegal HTML character exception

On 08/06/2011 12:39, Jayarajan, Divya wrote:
Thanks Mike.
As per what you have suggested, the illegal character has to be removed from XML before transforming it.Am I correct ?
One surprising thing is xsl:output method in my XSLT was "html" initially and when I changed it to "xml" it worked.So does it mean it solved the issue by making such a change ? Or how it behaved when we changed it that way.
 
XML allows the character #137, HTML does not. As I explained, this rule applies to the HTML serializer. So it is not invoked when you specify xsl:output method="xml". Of course, the character is wrong either way, but HTML will reject it and XML won't.

Michael Kay
Saxonica
Thanks,
Divya


From: Michael Kay [mailto:mike@saxonica.com]
Sent: Wednesday, June 08, 2011 4:49 PM
To: saxon-help@lists.sourceforge.net
Subject: Re: [saxon] XSLT throwing Illegal HTML character exception

On 08/06/2011 11:51, Jayarajan, Divya wrote:

Your XML source explicitly includes   ‰  which as the error message says is not a legal character in HTML. A rather controversal decision in the W3C XSLT 2.0 specification was that the HTML serializer is required to report an error if an attempt is made to output an illegal HTML character; at some time between 8.4 and 8.9, this decision was implemented in Saxon.

The reason for the error, and the reason your code is failing, is that use of a character such as #137 is nearly always a mistaken attempt to use a Windows CP1252 code point in place of a Unicode code point. 137 in CP1252 is a per-mille sign, which is x2030 in Unicode. Numeric character references in XML should always be Unicode code points, not CP1252 code points. Use of a code such as #137 (which has no meaning in Unicode) is therefore almost certainly incorrect, and the intent of disallowing it in HTML serialization is to enable you to discover and correct the error. It's a bit draconian, I know, and I wasn't in favour of this change to the spec, but I don't allow my own views to get in the way when W3C makes a decision on such a point.

(There appears to be a lot of other garbage in this message that probably arises from incorrect character code conversions somewhere in the history of the data.)

Michael Kay
Saxonica


Hi ,
We have currently upgraded to a product that internally uses Saxon8.9.Earlier version used Saxon 8.4. One of the XSLT files is throwing an exception 'net.sf.saxon.trans.DynamicError: Illegal HTML character: decimal 137 ' with Saxon8.9 but it worked fine with Saxon8.4.
The xml that is being used is:
<ORIGINALMESSAGE>ST[28]RUSROSLH[29]TranType[28]01[29]CallType[28]2[29]CallerPhone[28]495-363 02 90[29]CallerName[28]avis ·´&#137;´½&#128;» &#132;²°&#131;[29]AssetID[28]S1ATM003[29]CallRef[28]ROAV031703[29]ProblemDesc[28]&#132;&#131;·&#137;»´³&#132;&#139;²º&#131;¿¼&#129;²&#132;°&#131; ¸&#132;°²º&#131; &#132;²&#139;º&#134;·&#137; ´»´&#137; ·&#137;´»³¾»²[29]Text[28]Additional Problem Description &#131;&#139;&#132;²º ·&#137;´&#136;»[29]Priority[28]2[29]StatusBytes[28]D12*000**G0*2*0002000000*2111[29]SE[28]RUSROSLH[29]</ORIGINALMESSAGE>
The XSLT code is :
<xsl:variable name="ORIGINALMESSAGE" select="//ORIGINALMESSAGE"/>
<TRANSACTION_MSG type="CLOB" dir="IN">
        <xsl:text disable-output-escaping="yes">&lt;![CDATA[</xsl:text>
                <xsl:copy-of select="$ORIGINALMESSAGE"/>
        <xsl:text disable-output-escaping="yes">]]&gt;</xsl:text>
</TRANSACTION_MSG>
 
Please let me know what is done wrong here.
Thanks,
Divya
 
 
 
 
 
 
 
------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/saxon-help

------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev
_______________________________________________ saxon-help mailing list archived at http://saxon.markmail.org/ saxon-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/saxon-help