#167 serious utf8 bug

0.71
closed-invalid
Byrne Reese
5
2009-09-29
2009-09-17
Dietmar Maurer
No

encoding utf8 strings (utf flag set) with as_base64 deletes the utf8 flag!

deserializing such string results is something different than the original string.

I attached an example to show that bug, which also contains a way to do it correctly.

Discussion

  • Dietmar Maurer
    Dietmar Maurer
    2009-09-17

    buggy encde/decode example

     
    Attachments
  • Dietmar Maurer
    Dietmar Maurer
    2009-09-17

    The problem is that utf8 string are serialized as base64 binary data - so the deserialize data never sets the uft8 flag.

    One solution is to provice a separtate serializer for utf8 strings, something like:

    $self->{_typelookup}->{'utf8string'} =
    [5, sub { Encode::is_utf8($_[0]) }, 'as_utf8string'],

    sub as_utf8string {
    my ($self, $value, $name, $type, $attr) = @_;

    return [
    $name,
    {'xsi:type' => 'xsd:string', %$attr},
    #SOAP::Utils::encode_data ($value) this does not work for unknown reasons
    HTML::Entities::encode_entities_numeric ($value)
    ];
    }

    What do you think?

     
  • Martin Kutter
    Martin Kutter
    2009-09-29

    As you correctly observed, the utf8 flag is dropped when encoding utf8 strings as base64. The reason is, that base64 transports a sequence of octets, not a utf8 string.

    As there is no such datatype like "utfstring" in XML schema, the suggested solution is not feasible.

    The recommended way of transporting utf8 strings is to use utf8 as content type and just transmit them as plain strings.

    There is no workaround - every other solution is fundamentally flawed.

    Martin

     
  • Martin Kutter
    Martin Kutter
    2009-09-29

    • status: open --> closed-invalid