From: Martin K. <mar...@fe...> - 2008-02-06 08:47:40
|
Hi there, there's a bug report open on CPAN RT (http://rt.cpan.org//Ticket/Display.html?id=32952) about the (wrong) handling of unicode strings in SOAP::Lite. The current SOAP::Lite serializes utf8-strings as base64binary with autotyping enabled. On deserialization, the utf8 flag is not restored (which is correct, as base64binary data is a sequence of octets). Thus, a utf8 string sent appears as a sequence of octet at the receiver. There are two suggested resolutions: 1. Don't serialize utf8-strings as base64binary. This only works in perls >= 5.8, as there's no way to detect utf8 strings in perls before. 2. Introduce a "utf8binary" type, which behaves as the base64binary, except that the utf8 flag is restored on deserialization. I prefer 1), as there's no "utf8binary" type in the SOAP standard, and fixing it for perls before 5.8 is pretty useless (these can't handle utf8 data anyway) SOAP 1.2 demands the use of utf-8 or utf-16 in HTTP transports, so there should be no encoding problem - and the transport layer has to re-encode the envelope if needed (like for using quoted-printable for E-Mails). The problem is, 1) may break existing SOAP::Lite clients and servers relying on the encoding of utf8 data as base64binary. What do you think? Regards, Martin |