Michael,

 

Thanks for clarifying some misconceptions of mine:

 

(2) only ‘<’ and ‘&’ must be escaped in the text nodes for an XML parser to be able to correctly parse the file;

 

(3) XML entity references in the mapping element string are normalized before being inserted in the hash table. This is different from the string I thought was in use during translation.

 

Wayne Johnson

 

 

-----Original Message-----
From: saxon-help-admin@lists.sourceforge.net [mailto:saxon-help-admin@lists.sourceforge.net] On Behalf Of Michael Kay
Sent:
Sunday, March 28, 2004 11:33 AM
To: saxon-help@lists.sourceforge.net
Subject: RE: [saxon] expansion of XML metacharacters

 

Firstly, they aren't called metacharacters - they are called entity references.

 

Secondly, you don't say why it's essential that they are preserved. Any XML parser treats " and &quot; as equivalent. Does this mean that you are reading the XML with something other than an XML parser? If so, why?

 

Thirdly, if you want to control this using character-map, you can do so. If you specify the target string as " (which is written &quot;) then it will be output as ". If you specify the target string as &quot; (which is written &amp;quot;) then it will be output as &quot;.

 

Michael Kay

 


From: saxon-help-admin@lists.sourceforge.net [mailto:saxon-help-admin@lists.sourceforge.net] On Behalf Of Johnson, Wayne
Sent:
28 March 2004 04:55
To: saxon-help@lists.sourceforge.net
Subject: [saxon] expansion of XML metacharacters

My application is performing a series of XML to XML transformations. It is essential that the &quot; and &apos; metacharacters be preserved after each transformation. However, I find that while the < > & characters are expanded, saxon (and xalan) will not expand the ‘ “  characters in the output serialization. If character maps (or entity property files) are used, the mappings have the effect of turning off the expansion for the XML metacharacters. For example, if I create a mapping from &#62; to &lt; the output element will contain the < character, not the &lt; I expect. (The mappings work as advertised for characters that are not XML metacharacters.)

 

What is the reason for this behavior and what is the simplest way to ensure that the five XML metacharacters will be expanded?

 

Wayne Johnson

 

saxon 7.9.1

 

input file

<test>

<e1> &apos; &lt; &gt; &amp; &quot; </e1>

<e2> &#65; </e2>

</test>

 

output file

<?xml version="1.0" encoding="UTF-8"?>

<test>

   <e1> ' < &gt; &amp; " </e1>

   <e2> AAA </e2>

</test>

 

stylesheet

<xsl:stylesheet version="2.0"

  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output

    encoding='UTF-8'

    method='xml'

    indent="yes"

    use-character-maps="mycharmap"/>

  <xsl:strip-space elements="*"/>

 

  <xsl:character-map name="mycharmap">

<!--    <xsl:output-character character="&#34;" string="&quot;"/>-->

<!--    <xsl:output-character character="&#38;" string="&amp;"/>-->

<!--    <xsl:output-character character="&#39;" string="&apos;"/>-->

    <xsl:output-character character="&#60;" string="&lt;"/>

<!--    <xsl:output-character character="&#62;" string="&gt;"/>-->

    <xsl:output-character character="&#65;" string="AAA"/>

  </xsl:character-map>

 

  <xsl:template match="node() | @*">

    <xsl:copy>

      <xsl:apply-templates select="node() | @*"/>

    </xsl:copy>

  </xsl:template>

 

</xsl:stylesheet>