Saxon XSLT and XQuery Processor / Discussion / Help: Fragments of source XML in output

Fragments of source XML in output

Forum: Help

Creator: Tomaz Erjavec

Created: 2011-11-18

Updated: 2012-10-08

Tomaz Erjavec - 2011-11-18

I'm writing a fairly simple XSLT to convert a TEI dictionary into HTML, but a
strange thing happens: here and there, fragments of the source XML appear
escaped in the HTML output. It seems only to happen when the file get large.
I'm using saxonHE, and the same behaviour appears both with 9.2 and 9.3
I put the source, xslt and output on
http://nl.ijs.si/imp/bug/
The first error is at http://nl.ijs.si/imp/bug/Lexicon_bug.html#lex.261b810f5
6f0d28229c6e2b019848ba9
but more can be found searching for '<'.
Thanks for any help!
Tomaž

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Michael Kay - 2011-11-18

Are you using the default XML parser included in the JDK? If so, could you see
if the problem still occurs when you use the Xerces parser from Apache
instead? This looks similar to effects I've seen caused by bugs in the JDK
parser.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tomaz Erjavec - 2011-11-18

This could well be it. I'm a bit helpless when it comes to java, but will
figure out how to install Xerces and try that.
I will get back just in case the problem persists.
Thanks!
Tomaž

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tomaz Erjavec - 2012-01-09

I've tried using Xerces now, but the problem still persists. As before, a
sample of the bad output (search for '<') is at
http://nl.ijs.si/imp/bug and the way saxon is now
called is as below.

java -Djavax.xml.parsers.DocumentBuilderFactory=org.apache.xerces.jaxp.Documen
tBuilderFactoryImpl -Djavax.xml.parsers.SAXParserFactory=org.apache.xerces.jax
p.SAXParserFactoryImpl net.sf.saxon.Transform

In case it would be any help, I can put the source and xslt on the web as
well, but am afraid that it is indeed some horrible xml parser or java problem
that will be impossible to solve. Which is pretty awful, actually.
Anyway, thanks for any help!
Tomaž

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Michael Kay - 2012-01-09

If you can supply the information needed to reproduce this I will be happy to
investigate. Without that information, I can't really help.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Tomaz Erjavec - 2012-01-10

Thank you, you''ve already helped! Nothing like preparing files to show to
others, to find that the bug was actually on my side - I process the
dictionary in several steps, and one of them still used the default XML
parser.
So, the conversion to HTML was completely ok, it just just got already garbled
input. Sorry for crying wolf!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link: