Please excuse a very newbie questions..
1> You mentioned about "whether you are running Saxon from the command line
from within an application."
I using the Instant Saxon, what I did is copy the Saxon.exe into my working
folder. And run the command line " C:\JavaProjects>Saxon sample.xml
I am not sure how to set the xerces parser to override the Aelfred parser.
Do I need to set classpath in the java batch file? And do i need to copy
xerces.jar into java/lib folder ?
-----my java batch file---
2> What is the "within an application (VS command line)" means ? Any Example
3> You wrote "So if you wanted, you could have Saxon use Xerces from
the command line."
How the command line like ?
4>let Java do the translation of the bytes in the file to Unicode
characters via a Reader.
Is this difficult to do or require special skill ? Or this method is not
recommended comparing to above method by using xerces parcer ?
Thank you very much
From: Mike Brown [mailto:mike@...]
Sent: Monday, September 03, 2001 3:44 PM
To: Ser Siew Keok
Subject: Re: [saxon] Does Saxon support Encoding character set for
differe nt languages
Ser Siew Keok wrote:
> Does Saxon come with XML parser that support Asian Language ?
To the extent that any language is supported by Unicode, any XML parser
will "support" it -- in the sense that it will support the UTF-8 or UTF-16
character encodings. These encodings map any Unicode character to specific
byte sequences, and vice-versa. All XML parsers support UTF-8 or UTF-16.
Big5 is another encoding different than UTF-8 or UTF-16. Not all parsers
> You mention about own XML parser, is it need to write ourself ?
No, I just mean you can use one that is separate from the parser that
comes with Saxon.
> Without encoding added into the XML, can Saxon process correct FO file
> chinese XML with XSL from the localized operating system ?
No, I think you misunderstand. The encoding declaration in the XML (the
part that says encoding="big5") exists to help the parser know what the
encoding actually is. It does not change the encoding.
Your XML file is a big mess of bits & bytes. These bits and bytes mean
something -- they correspond to Unicode characters. The map of bytes to
characters is the encoding.
For example, in the iso-8859-1 encoding, an inverted exclamation mark is
the single byte (0xA1). In utf-8, it is the pair of bytes (0xC2 0xA1).
Big5 probably doesn't even allow for an inverted exclamation mark.
So you give this mess of bytes to an XML parser. The parser examines the
beginning of the file, and following a set of rules laid out in the XML
specification, it decides how to map those bits and bytes to a string of
Unicode characters. The encoding="big5" in the XML helps with this
Now the parser knows what the characters are. It reads the string and
reports to the application about the elements, attributes, comments,
processing instructions, and character data contained in the document. The
application, in this case, is the XSLT processor, Saxon. If you give the
XML document to Saxon on the command line, Saxon gives it to its own
parser, Aelfred. Aelfred does not know about Big5.
You did not say whether you are running Saxon from the command line or
from within an application. If you read the documentation, it will tell
you how to set parser that Saxon uses. Apache Xerces is an XML parser that
does support big5. So if you wanted, you could have Saxon use Xerces from
the command line. If you are calling Saxon from within a Java program, you
do not have to use a different parser if you don't want to, because if you
know that the document you are parsing will always be Big5 encoded, you
can let Java do the translation of the bytes in the file to Unicode
characters via a Reader.
mike j. brown, fourthought.com | xml/xslt: http://skew.org/xml/
denver/boulder, colorado, usa | personal: http://hyperreal.org/~mike/