#54 Problem with utf-8 characters in datastream

Fedora 2.2.1
General (6)
John Tatsis

Hello all

My name is John and at my company we are using Fedora 2.2.1
I have found a problem with utf-8 xml datastreams, only when the size is larger than 4K. Some times every 4K block, a utf-8 character is destroyed and shows as 2 other characters.
I searched the source code and I found a wrong implementation for converting InputStream to StringBuffer.
The class fedora.server.storage.translation.FOXMLDOSerializer has the method appendXMLStream. This method first reads 4096 bytes from the InputStream and then encodes them to utf-8. This is wrong, because some times the first byte of a utf-8 character is the last byte of the buffer and the second byte is the first byte of the next buffer.
The right method to read the InputStream, is first to use a InputStreamReader (that encodes the InputSream to utf-8) and then to fill the buffer.
I have modify that method like this:

private void appendXMLStream(InputStream in, StringBuffer buf, String encoding)
throws ObjectIntegrityException, UnsupportedEncodingException,
StreamIOException {
if (in==null) {
throw new ObjectIntegrityException("Object's inline xml "
+ "stream cannot be null.");
try {
InputStreamReader inReader = new InputStreamReader(in, Charset.forName(encoding));
int bufSize = 4096;
char charBuf[] = new char[bufSize];
int len;
while ( ( len = inReader.read(charBuf, 0, bufSize) ) != -1 ) {
buf.append(charBuf, 0, len);
} catch (UnsupportedEncodingException uee) {
throw uee;
} catch (IOException ioe) {
throw new StreamIOException("Error reading from inline xml datastream.");
} finally {
try {
} catch (IOException closeProb) {
throw new StreamIOException("Error closing read stream.");

I hope this fix to be useful to non-latin users of fedora.



  • birkland

    birkland - 2008-02-13
    • assigned_to: nobody --> birkland
  • Chris Wilper

    Chris Wilper - 2008-02-19
    • status: open --> open-fixed
  • Chris Wilper

    Chris Wilper - 2008-02-19

    Logged In: YES
    Originator: NO

    Thanks for the detailed bug report, John!
    Aaron fixed this in maintenance-2.2 r6618 and trunk r6619.
    Fixed by appending XML in a uniform way (using char encoding)
    in DOTranslationUtility. This fixes the expected SIZEs
    in the trunk/maintenance TestAPIM system test class and adds
    a regression test to the trunk junits.

  • Daniel Davis

    Daniel Davis - 2008-03-04
    • status: open-fixed --> closed-fixed

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks