Thanks for your reply.
> The error is caused by the XML Parser not by Saxon.
You will note that I phrased this very carefully,
"... XML parser used by Saxon has been lead to believe it is a UTF-8 file ...",
as I have often seen correspondents blame issues with the XML parser on Saxon, and just as often I've seen them corrected.
> For instance, you shouldn't receive that error when your XML starts with:
> <?xml version="1.0" encoding="windows-1252" ?>
That's one approach I've investigated. I did create a second Java class which I named "FetchDataAsXMLWindowsEncode".
OracleXMLQuery qry = new OracleXMLQuery(conn, sqt.getText());
The second line insures that the file is encoded as "windows-1252" and that the encoding notation is placed in the prolog.
When I compiled this version and changed the code in my ant build.xml file to call this function, the whole process went smoothly.
This is all part of an architecture to produce reports in MS Excel format from an Oracle database. My first version used an Oracle PL/SQL package to generate the XML. It doesn't appear that there is a programatic means of setting the encoding notation in the prologue with the PL/SQL package. That would mean I'd have to manipulate it on a text level (not very appealing, but certainly do-able).
In my second version, I switched to JDBC for other reasons, but this does have (as you can see) a means of setting the encoding notation. I haven't moved some of the less-frequently-run reports over to the JDBC version yet. I am poking about looking for all reasonable possibilites for dealing with the issue that arose today. Maybe it's the incentive I need to move everything over to the JDBC version post haste.
cknell@... - email
From: Abel Braaksma <abel.online@...>
Sent: Mon, 27 Aug 2007 21:44:18 +0200
To: Mailing list for SAXON XSLT queries <saxon-help@...>
Subject: Re: [saxon] File encoding problem, at a loss as how to proceed
The error is caused by the XML Parser not by Saxon. You can get that
error when there's no default encoding present in the prolog of the XML.
For instance, you shouldn't receive that error when your XML starts with:
<?xml version="1.0" encoding="windows-1252" ?>
which is well understood by all Java XML parsers.
Your analysis of the offending character has a few misses. Yes, in
windows-1252 the byte xD1 resembles the N + tilde. But in UTF-8 this
becomes a 2-byte sequence. Instead of the one-byte xC3, it should've
been encoded as xC3 x91 (or, if viewed as characters, this would look
like A + tilde and a backtic: Ã).
But that won't solve your problem. You should look at both bytes of your
offending character. It says byte 2 of 2-byte sequence. Can you isolate
the character in the source? I.e., pad it with spaces (which is encoded
as one-byte x20) so you can easily see what this second byte look like?
And whether xD1 is the starting or the ending according to the parser?
Ultimately, you will have to correct the error in your source, of course.
-- Abel Braaksma
> I was running a transformation of an XML document that is generated from an Oracle database using an Oracle-supplied stored procedure.
> This error popped up:
> "20070827.c.xml:66341:16: Fatal Error! Error reported by XML parser Cause: org.xml.sax.SAXParseException: Invalid byte 2 of 2-byte UTF-8 sequence."
> After investigating the file, I found the offending character to be D1 (Ñ). It appears that my XML input file is encoded as "windows-1252", but the XML parser used by Saxon has been lead to believe it is a UTF-8 file, which wants C3 to encode the Ñ.
> I don't see any command-line switch to direct the parser to assume a particular encoding, so I am at a loss. I have written a Java package to extract XML via JDBC as an alternative to the Oracle-supplied PL/SQL stored procedure that the process is now using. I could edit and re-compile the Java class I used to extract the XML input document to direct the "windows-1252", or UTF-8, or whatever encoding, but is there something simpler?
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
saxon-help mailing list