Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#54 Problem with utf-8 characters in datastream

Fedora 2.2.1
closed-fixed
birkland
General (6)
5
2008-03-04
2008-01-24
John Tatsis
No

Hello all

My name is John and at my company we are using Fedora 2.2.1
I have found a problem with utf-8 xml datastreams, only when the size is larger than 4K. Some times every 4K block, a utf-8 character is destroyed and shows as 2 other characters.
I searched the source code and I found a wrong implementation for converting InputStream to StringBuffer.
The class fedora.server.storage.translation.FOXMLDOSerializer has the method appendXMLStream. This method first reads 4096 bytes from the InputStream and then encodes them to utf-8. This is wrong, because some times the first byte of a utf-8 character is the last byte of the buffer and the second byte is the first byte of the next buffer.
The right method to read the InputStream, is first to use a InputStreamReader (that encodes the InputSream to utf-8) and then to fill the buffer.
I have modify that method like this:

private void appendXMLStream(InputStream in, StringBuffer buf, String encoding)
throws ObjectIntegrityException, UnsupportedEncodingException,
StreamIOException {
if (in==null) {
throw new ObjectIntegrityException("Object's inline xml "
+ "stream cannot be null.");
}
try {
InputStreamReader inReader = new InputStreamReader(in, Charset.forName(encoding));
int bufSize = 4096;
char charBuf[] = new char[bufSize];
int len;
while ( ( len = inReader.read(charBuf, 0, bufSize) ) != -1 ) {
buf.append(charBuf, 0, len);
}
} catch (UnsupportedEncodingException uee) {
throw uee;
} catch (IOException ioe) {
throw new StreamIOException("Error reading from inline xml datastream.");
} finally {
try {
in.close();
} catch (IOException closeProb) {
throw new StreamIOException("Error closing read stream.");
}
}
}

I hope this fix to be useful to non-latin users of fedora.

Cheers
John

Discussion

  • birkland
    birkland
    2008-02-13

    • assigned_to: nobody --> birkland
     
  • Chris Wilper
    Chris Wilper
    2008-02-19

    • status: open --> open-fixed
     
  • Chris Wilper
    Chris Wilper
    2008-02-19

    Logged In: YES
    user_id=189298
    Originator: NO

    Thanks for the detailed bug report, John!
    Aaron fixed this in maintenance-2.2 r6618 and trunk r6619.
    Fixed by appending XML in a uniform way (using char encoding)
    in DOTranslationUtility. This fixes the expected SIZEs
    in the trunk/maintenance TestAPIM system test class and adds
    a regression test to the trunk junits.

     
  • fcswa
    fcswa
    2008-03-04

    • status: open-fixed --> closed-fixed