We currently do not offer the user the ability to control
the escaping of markup, whitespace, and unencodable
characters appearing in character data and attribute
values. Currently we make the following escapes, which
produce conformant output, but not necessarily what the
user wants:
| in XML character data | in HTML character data
[1] |
|-----------------------------+-----------------------------------|
| < ==> &lt; | < ==> &lt; |
| > ==> &gt; | > ==> &gt; |
| & ==> &amp; | & ==>
&amp; |
| (CR) ==> &#13; | (CR) ==>
&#13; |
| unencodable ==> decimal NCR | unencodable ==>
HTML 4.01 entity, |
| | if available, or |
| | decimal NCR |
| | |
|========================================
=========================|
| in XML attribute values | in HTML attribute
values |
|-----------------------------+-----------------------------------|
| < ==> &lt; | |
| & ==> &amp; | & ==>
&amp; |
| (TAB) ==> &#9; | (TAB) ==>
&#9; |
| (LF) ==> &#10; | (LF) ==> &#10;
|
| (CR) ==> &#13; | (CR) ==>
&#13; |
| " ==> &quot; * | " ==> &quot; * |
| ' ==> &apos; * | ' ==> &#39; * |
| unencodable ==> decimal NCR | unencodable ==>
HTML 4.01 entity, |
| | if available, or |
| | decimal NCR |
| | |
=========================================
==========================
[1] except in SCRIPT or STYLE elements, or when
output escaping has been explicitly disabled, as
in XSLT's disable-output-escaping="yes"
[2] only when this kind of quote is the attribute value
delimiter
We should offer the ability for the user to specify the
following:
- which attribute value delimiter is used (' or ")
(currently it's double quote, to conform with DOM L3
Load & Save)
- whether NCRs are decimal or hex;
- whether unencodable HTML characters for which entity
references
are available are written using NCRs instead;
- whether HTML 4.01 or HTML 3.2 (Latin-1 range only)
entities are
used (this is a legacy browser compatibility issue);
- whether ">" is escaped at all