#222 Serialization of japanese characters corrupts XML

Karl Critz

I am constructing a DOM from JAVA which includes
japanese characters. When I try to serialize this DOM,
the "<" character of a close-tag after certain japanese
text is not written properly. Also, the text itself
is not written properly.

I have attached several files which demonstrate the issue:

  • A simplified java test file
  • A screenshot of the japanese section of the file
  • An example of the result file
  • A screenshot of the result file

Interestingly enough, the result file is parseable by
Xerces, though JADE has trouble reading it.

Am I doing something wrong in my serialization, or is
this a legit bug in SAXON?


  • Karl Critz

    Karl Critz - 2004-06-04

    The java file which shows the problem

  • Karl Critz

    Karl Critz - 2004-06-04

    png-version of java file

  • Karl Critz

    Karl Critz - 2004-06-04

    Logged In: YES

    Using SAXON 6.5.3, if you're interested

  • Michael Kay

    Michael Kay - 2004-06-05

    Logged In: YES

    PLEASE do not enter suspected bugs in this area of the site
    until they have been confirmed. There is a bright yellow
    notice asking you not to do this on the "Submit New" page -
    I fail to see how people can fail to see this.

    I want people to be able to browse the bugs area knowing
    that it only contains real bugs.

    I'm afraid I can't see what's wrong with the output. It
    appears to be correctly encoded UTF-8, and is a well-formed
    XML file. I can't tell whether the output is correct,
    because I don;t know what the encoding used in your Java
    source file is - it doesn't appear to be UTF-8, as far as I
    can see.

    I am closing this bug because you raised it in the wrong
    place. Please use the saxon-help list or forum.

    Michael Kay


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks