UTF-8 String Incorrectly Read

Status: Alpha

Brought to you by: hackrobat

#2 UTF-8 String Incorrectly Read

Status: open

Owner: nobody

Labels: Interface (example) (2)

Priority: 5

Updated: 2007-01-18

Created: 2007-01-18

Creator: Anonymous

Private: No

The following UTF-8 string gets truncated such that the last 'd' is omitted.

Åland

The exact Hessian protocol data can be found in the attached file.

Note that this is merely an example of one UTF-8 string that was not correctly interpreted.

Discussion

Nobody/Anonymous - 2007-01-18

Hessian Protocol (sniffed packet data)

sniffed_hessian_packet.txt

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

vatel - 2007-09-28

Logged In: YES
user_id=1781039
Originator: NO

We too - we have this problem with UTF-8.
I investigated why it happens: when reading String via ByteArray.readUTF() function, Flex expect that the string length will be in BYTES. But according to Hessian serialization protocol the string length is in CHARACTERS. That's why the problem occurs with non-ascii characters.

http://livedocs.adobe.com/flex/2/langref/flash/utils/ByteArray.html#readUTF\()
http://hessian.caucho.com/doc/hessian-serialization.html##string

Ideas how this can be fixed:

1) do not use Flex's readUTF and read manually - byte-by-byte using ByteArray.readByte() and some UTF-8 decoder (calculating characters count step-by-step).
For example, you can "port" Java's UTF-8 decoder to Flex:
https://openjdk.dev.java.net/source/browse/openjdk/jdk/trunk/j2se/src/share/classes/sun/nio/cs/UTF_8.java?rev=227&view=markup
Or there is another decoder in GNU libc (see iconv/gconv_simple.c in glibc)

2) Seems that each String or string part (chunk) is terminated by zero byte (0x00) in Hessian. This could serve as "end of string" mark.
This way you can calculate the number of bytes and then call ByteArray.readUTFBytes() function in Flex (it accepts "bytes number" parameter).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.