I can't think of any intrinsic reason why the net.sf.saxon.Transform
command line and the s9api DocumentBuilder should work differently, in
terms of loading or initializing the XML parser. I would try and
investigate whether they are indeed using the same parser. From the
command line, -t should tell you the parser in use; you could try
instantiating this directly in your s9api application. Though it could
still be the same parser, but a different version. As far as I'm aware,
only rather old parsers have problems with a BOM appearing at the start
of a UTF-8 file.
On 28/07/2010 21:06, Florent Georges wrote:
> I've run into a quite strange behaviour related to the XML
> parser. I know the error is reported by Saxon from the XML
> parser, but because two different ways of invoking Saxon give
> different results, maybe that's because both do not use the same
> parser, do not configure it the same way or because I did not find
> the correct switches to push in Saxon to configure properly the
> underlying XML parser.
> The relevant stacktrace is:
> Caused by: net.sf.saxon.trans.XPathException:
> org.xml.sax.SAXParseException: Content is not allowed in
> at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:417)
> at net.sf.saxon.event.Sender.send(Sender.java:182)
> at net.sf.saxon.Configuration.buildDocument(Configuration.java:3272)
> at net.sf.saxon.s9api.DocumentBuilder.build(DocumentBuilder.java:335)
> It occurs when the XML document, encoded as UTF-8, contains a
> BOM (so EF BB BF), and when I use S9API's DocumentBuilder. From
> the command line, everything is fine (e.g. by running an identity
> transform with the file as input).
> Given that InputStream is the binary stream initialized to the
> content (e.g. for a test by using a FileInputStream), the
> following is a repro:
> Processor proc = new Processor(false);
> DocumentBuilder builder = proc.newDocumentBuilder();
> Reader reader = new InputStreamReader(input, "utf-8");
> Source src = new StreamSource(reader, "system-id");
> I use Saxon HE 188.8.131.52, with Java 1.6.0_20 on Mac OS X.
> Did I make anything wrong? Or is it a bug, from Saxon or from
> the JAXP parser?