#168 serious utf8 bug


I am unable to comment on my bug report because you close it, so i open a new one.

The problem is that utf8 string are serialized as base64 binary data - so
the deserialize data never sets the uft8 flag.

One solution is to provide a separtate serializer for utf8 strings,
something like:

$self->{_typelookup}->{'utf8string'} =
[5, sub { Encode::is_utf8($_[0]) }, 'as_utf8string'],

sub as_utf8string {
my ($self, $value, $name, $type, $attr) = @_;

return [
{'xsi:type' => 'xsd:string', %$attr},
#SOAP::Utils::encode_data ($value) this does not work for unknown
HTML::Entities::encode_entities_numeric ($value)

Please ask if you do not understand that code. The problem is serious and reproducable, and I even have a fix for it. So why do you close my bug report?


  • Martin Kutter

    Martin Kutter - 2010-02-28
    • status: open --> closed-invalid
  • Martin Kutter

    Martin Kutter - 2010-02-28

    This is not a bug.

    Base64 encoded strings are sequences of octets, not bytes, so SOAP::Lite does not set the utf8 flag on decoded base64 data.

    The problem with your code - which I fully understand - is that it's fatally flawed:
    There is no such thing as a utf8string datatype in the SOAP specs. Introducing such a datatype would break the SOAP specs and thus break interoperability with every other soap library on this planet.

    If you want to transport utf8 strings (and receive them back as utf8) just transport them as strings.



Log in to post a comment.