#167 serious utf8 bug


encoding utf8 strings (utf flag set) with as_base64 deletes the utf8 flag!

deserializing such string results is something different than the original string.

I attached an example to show that bug, which also contains a way to do it correctly.


  • Dietmar Maurer

    Dietmar Maurer - 2009-09-17

    buggy encde/decode example

  • Dietmar Maurer

    Dietmar Maurer - 2009-09-17

    The problem is that utf8 string are serialized as base64 binary data - so the deserialize data never sets the uft8 flag.

    One solution is to provice a separtate serializer for utf8 strings, something like:

    $self->{_typelookup}->{'utf8string'} =
    [5, sub { Encode::is_utf8($_[0]) }, 'as_utf8string'],

    sub as_utf8string {
    my ($self, $value, $name, $type, $attr) = @_;

    return [
    {'xsi:type' => 'xsd:string', %$attr},
    #SOAP::Utils::encode_data ($value) this does not work for unknown reasons
    HTML::Entities::encode_entities_numeric ($value)

    What do you think?

  • Martin Kutter

    Martin Kutter - 2009-09-29

    As you correctly observed, the utf8 flag is dropped when encoding utf8 strings as base64. The reason is, that base64 transports a sequence of octets, not a utf8 string.

    As there is no such datatype like "utfstring" in XML schema, the suggested solution is not feasible.

    The recommended way of transporting utf8 strings is to use utf8 as content type and just transmit them as plain strings.

    There is no workaround - every other solution is fundamentally flawed.


  • Martin Kutter

    Martin Kutter - 2009-09-29
    • status: open --> closed-invalid

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks