Thank you for the suggestions, Michael. I will give 9.1 a try.
Michael Kay wrote:
> The default XML parser in Saxon 6.5.5 is a version of AElfred - this dates
> back to the days when there was no XML parser included in the JDK. It
> wouldn't surprise me in the slightest to find that AElfred ignores anything
> in the HTTP header - though that's conjecture, it's years since I looked at
> it. I would suggest using a more up-to-date parser such as Xerces. You can
> configure the parser to be used either from the API (via JAXP mechanisms) or
> from the command line (-x option).
> Alternatively, switch to Saxon 9.1. Even though it's an XSLT 2.0 processor,
> it's probably at least as conformant to the XSLT 1.0 spec as 6.5.5 ever was.
> Saxon 9.1 by default uses the XML parser in the JDK - again, though, you'll
> probably get better conformance at the XML level by using the Apache version
> of Xerces.
> Michael Kay
>> -----Original Message-----
>> From: Chuck Bearden [mailto:cbearden@...]
>> Sent: 24 March 2009 21:10
>> To: saxon-help@...
>> Subject: [saxon] Question about charset declaration in HTTP header
>> From Appendix F.2 of the XML rec  and sections 3.1 and
>> 3.2 of RFC 3023 , it looks to me as if an XML processor
>> should prefer the charset parameter of the Content-Type
>> header in HTTP over the encoding declaration in the XML
>> prolog of an XML instance served up as application/xml, and
>> that it should ignore the encoding declaration altogether
>> when it is served up as text/xml.
>> In fact, I just learned that yesterday  when I reported
>> what I thought was a bug in libxml2/libxslt.
>> In the course of trying to understand the behavior, I noticed
>> that Saxon 6.5.5 appears to ignore Content-type/charset. I
>> created a stylesheet  that, when run with itself as source
>> doc, retrieves a UTF-8-encoded XML file with bytes that refer
>> to different Unicode codepoints in Latin-9 and UTF-8, in
>> permutations of charset (ISO-8859-15 or UTF-8) and MIME type
>> (application/xml, text/xml, or text/html). The XML prolog of
>> the file has no encoding declaration. With Saxon, all six
>> permutations have identical UTF-8-encoded strings. With
>> xsltproc, the Latin-9 versions of application/xml and
>> text/xml are different from the other permutations, which
>> seems to be in accordance with the above standards.
>> I'm not trying to complain or suggest that Saxon should
>> behave like xsltproc/libxslt in this regard--I'm just trying
>> to understand the standards and the decisions that developers
>> make with respect to them a little better.
>> Is Saxon's apparent divergence from the XML/RFC 3023 behavior
>> Are there differences of opinion among the XML cognoscenti
>> about what the right thing to do is in this connection? Is
>> this a question that would be better asked in some form on
>> xml-dev :-) ?
>>  <http://www.ietf.org/rfc/rfc3023.txt>
>>  <http://mail.gnome.org/archives/xml/2009-March/msg00040.html>
>>  <http://cfbdev.cnx.rice.edu/~cbearden/encodings/saxon.xsl>
>> Chuck Bearden (cbearden@... ; 713.348.3661) XML
>> Engineer, Connexions http://cnx.org/