Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

java.io.UTFDataFormatException: Invalid byte

Help
Jack Bush
2010-08-17
2012-10-08
  • Jack Bush
    Jack Bush
    2010-08-17

    This is nothing to do with JDOM - you are on the wrong list. The problem
    is with a document that is being read using the document() function
    within your Saxon stylesheet. The error means that the XML parser is
    trying to read the document ("car.xml") as UTF-8, but it contains byte
    combinations that are not valid in UTF-8.

    You need to change the XML declaration of car.xml so that it correctly
    describes the actual encoding of the file.

    Michael Kay
    Saxonica

    On 16/08/2010 14:30, Jack Bush wrote:

    Hi All,

    I am getting the following error when reading secondary xml documents using
    document() in XSLT 2.0 stylesheet from within an EJB (Glassfish 2.1)
    container:

    Recoverable error on line 8
    FODC0002: java.io.UTFDataFormatException: Invalid byte 1 of 1-byte UTF-8
    sequence.

    The transformation snippet in Java is as follows:

    TransformerFactory transformerFactory = new
    net.sf.saxon.TransformerFactoryImpl();
    transformerFactory.setAttribute("http://saxon.sf.net/feature/sourceParserCl
    ass
    ","org.apache.xml.resolver.
    tools.ResolvingXMLReader");

    FileInputStream stylesheet = new FileInputStream("C:/stylesheet.xsl");
    Transformer transformer = transformerFactory.newTransformer(new
    StreamSource(stylesheet));
    JDOMSource source = new JDOMSource(mainJDomDocument);
    JDOMResult result = new JDOMResult();

    transformer.transform(source, result);

    The relevant lines in C:/stylesheet.xsl is:

    1<?xml version="1.0" encoding="ISO-8859-1"?>
    2<xsl:stylesheet version="2.0" <br="">3 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    4 xmlns:xs="http://www.w3.org/2001/XMLSchema"
    5 exclude-result-prefixes="ns">
    6
    7<xsl:template match="/">
    8<xsl:apply-templates select="document('file:///C:/car.xml'))/catalog">
    ...........
    15</xsl:apply-templates>
    16</xsl:template>
    17
    18 .........

    All xml documents have been encoded with UTF-8. Changing encoding on
    C:/stylesheet.xsl to UTF-8 or using InputStreamReader(styleSheetIS,"UTF-8")
    did
    not make any difference. This application have been working fine until it
    is moved into an EJB 3.0 (Glassfish 2.1) container. It appears that it is
    the
    EJB container which has overrided the way these secondary documents have
    been
    read using a different method that checks the encoding. I believe from other
    threats that the same issue also occur in a web container as well.

    I am running JDK 1.6.0_17, Netbeans 6.7, XSLT 2.0 (Saxon 9.1), JDOM 1.1 on
    Windows XP.

    Any suggestion would be appreciated.

    Thanks a lot,

    Jack

     
  • Jack Bush
    Jack Bush
    2010-08-17

    Hi Michael,

    Thank you for clarifying where this issue is coming from. Nevertheless, I
    don't understand why car.xml is not UTF-8 encoded unless Saxon with TagSoup
    parser has changed car.xml format (not sure) when converting from html while
    running in the EJB container.

    car.xml looks the same as one produced outside EJB container that has worked
    (lots of text). Can you suggest a method to track down where this issue is
    coming from? ie test the validity of car.xml for UTF-8 encoding. Once again,
    car.xml has been generated by Saxon & TagSoup and read by document() in UTF-8
    encoded format without a problem outside EJB container.

    I found post http://www.ibm.com/developerworks/forums/thread.jspa?messageID=1
    4156807
    which appears to resemble this issue. Have posted the
    same question to Glassfish forum to see if they could assist from a Java EE 5
    perspective.

    I am doing my best not to cross post as much as possible.

    As always, very much appreciated for your valuable advice.

    Thanks,

    Jack