Menu

#78 non-BMP chars in text output method

v6.5.2
closed
5
2012-10-08
2002-04-28
Michael Kay
No

Saxon is unable to display non-BMP characters using
the text output method. (That is, Unicode surrogate
pairs, characters above 65535). With the XML or HTML
output methods they are displayed as numeric character
references, but with the text output method Saxon
needs to encode them as a surrogate pair so that Java
can output them as UTF-8 or UTF-16.

Reported by James Clark 28 April 2002.

Example:

<xsl:output method="text" encoding="utf-8"/>

<xsl:template match="/">
<xsl:text>&#x10300;</xsl:text>
</xsl:template>

Applies to 6.5.2, 7.0, and all previous releases.

Test case added, bug89.

Discussion

  • Michael Kay

    Michael Kay - 2002-04-28

    Logged In: YES
    user_id=251681

    Source fixed in 7.x branch (modules TextEmitter and
    UnicodeCharacterSet).

     
  • Michael Kay

    Michael Kay - 2002-04-30

    Logged In: YES
    user_id=251681

    Fixed in 7.1; source code fixed in both branches