Menu

Fragments of source XML in output

Help
2011-11-18
2012-10-08
  • Tomaz Erjavec

    Tomaz Erjavec - 2011-11-18

    I'm writing a fairly simple XSLT to convert a TEI dictionary into HTML, but a
    strange thing happens: here and there, fragments of the source XML appear
    escaped in the HTML output. It seems only to happen when the file get large.
    I'm using saxonHE, and the same behaviour appears both with 9.2 and 9.3
    I put the source, xslt and output on
    http://nl.ijs.si/imp/bug/
    The first error is at http://nl.ijs.si/imp/bug/Lexicon_bug.html#lex.261b810f5
    6f0d28229c6e2b019848ba9

    but more can be found searching for '<'.
    Thanks for any help!
    Tomaž

     
  • Michael Kay

    Michael Kay - 2011-11-18

    Are you using the default XML parser included in the JDK? If so, could you see
    if the problem still occurs when you use the Xerces parser from Apache
    instead? This looks similar to effects I've seen caused by bugs in the JDK
    parser.

     
  • Tomaz Erjavec

    Tomaz Erjavec - 2011-11-18

    This could well be it. I'm a bit helpless when it comes to java, but will
    figure out how to install Xerces and try that.
    I will get back just in case the problem persists.
    Thanks!
    Tomaž

     
  • Tomaz Erjavec

    Tomaz Erjavec - 2012-01-09

    I've tried using Xerces now, but the problem still persists. As before, a
    sample of the bad output (search for '<') is at
    http://nl.ijs.si/imp/bug and the way saxon is now
    called is as below.

    java -Djavax.xml.parsers.DocumentBuilderFactory=org.apache.xerces.jaxp.Documen
    tBuilderFactoryImpl -Djavax.xml.parsers.SAXParserFactory=org.apache.xerces.jax
    p.SAXParserFactoryImpl net.sf.saxon.Transform

    In case it would be any help, I can put the source and xslt on the web as
    well, but am afraid that it is indeed some horrible xml parser or java problem
    that will be impossible to solve. Which is pretty awful, actually.
    Anyway, thanks for any help!
    Tomaž

     
  • Michael Kay

    Michael Kay - 2012-01-09

    If you can supply the information needed to reproduce this I will be happy to
    investigate. Without that information, I can't really help.

     
  • Tomaz Erjavec

    Tomaz Erjavec - 2012-01-10

    Thank you, you''ve already helped! Nothing like preparing files to show to
    others, to find that the bug was actually on my side - I process the
    dictionary in several steps, and one of them still used the default XML
    parser.
    So, the conversion to HTML was completely ok, it just just got already garbled
    input. Sorry for crying wolf!