From: Mike Brown <mike@sk...> - 2001-08-28 17:56:10
Ser Siew Keok wrote:
> Does anyone know about generating .html or .fo from Asian
> Language XML and XSL thro Saxon ? Any problem with Saxon when deal with
> double byte character ?
According to http://saxon.sourceforge.net/saxon6.4.3/conformance.html
The following is the list of encodings recognized by the built-in
AElfred parser (case-insensitive):
ISO-8859-1, 8859_1, ISO8859_1
ISO-10646-UCS-2, UTF-16, UTF-16BE, UTF-16LE
So you will need to feed Saxon its input using a different XML parser
such as Apache Xerces if your sources are not iso-8859-1, us-ascii,
utf-8, ucs-2 or utf-16.
That page also says...
The encodings available on output are the intersection of:
ascii, us-ascii, utf-8, utf8, iso-8859-1, iso-8859-2
ko18-r, cp1250, windows-1250, cp1251, windows-1251
with whatever your Java VM supports.
So if you need to produce, for example, Shift-JIS, it should be possible
as long as your Java VM supports it. (i.e., I think you just put the
Java name for the encoding in your xsl:output encoding="...")
I got this information from the documentation, not actual practice.
mike j. brown, fourthought.com | xml/xslt: http://skew.org/xml/
denver/boulder, colorado, usa | personal: http://hyperreal.org/~mike/
From: Ito Kazumitsu <kaz@ma...> - 2001-08-28 22:31:16
>>>>> ":" == Mike Brown <mike@...> writes:
:> According to http://saxon.sourceforge.net/saxon6.4.3/conformance.html
:> The encodings available on output are the intersection of:
:> ascii, us-ascii, utf-8, utf8, iso-8859-1, iso-8859-2
:> ko18-r, cp1250, windows-1250, cp1251, windows-1251
:> (again case-insensitive)
:> with whatever your Java VM supports.
Also see the document extensibility.html which says:
| Adding an output encoding
| If you want to use an output encoding that is not directly supported by Saxon
| (for a list of encodings that are supported, see conformance.html) you can do
| this by writing a Java class that implements the interface
| com.icl.saxon.output.PluggableCharacterSet. You need to supply two methods:
| inCharSet() which tests whether a particular Unicode character is present in
| the character set, and getEncodingName() which returns the name given to the
| encoding by your Java VM. The encoding must be supported by the Java VM. To use
| this encoding, specify the fully-qualified class name as the value of the
| encoding attribute in xsl:output.
| Alternatively, it is possible to specify the CharacterSet class to be used for
| a named output encoding by setting the system property, e.g.
| -D"encoding.EUC-JP"="EUC_JP"; the value of the property should be the name of a
| class that implements the PluggableCharacterSet interface. This indicates the
| class to be used when the xsl:output element specifies encoding="EUC-JP".
As for what if your encoding is not directly supported
by Saxon, see com/icl/saxon/charcode/CharacterSetFactory.java.
From: Michael Kay <mhkay@ic...> - 2001-08-29 15:47:25
> Does anyone know about generating .html or .fo
> from Asian
> Language XML and XSL thro Saxon ? Any problem with Saxon when
> deal with
> double byte character ?
XML and XSLT, and therefore Saxon, define all processing in terms of the
Unicode character set.