#112 UTF8 content being returned with incorrect Content-Length

2.0x
open
nobody
5
2014-08-14
2010-08-05
Anonymous
No

SOAP/WSDL/Server/CGI.pm calculated the Content-Length from the unencoded data. It then utf8 encodes the data that is returned. I believe it should be returning the length of the utf8 encoded data. Here's a diff of the changes made on my system:

--- CGI.pm 2010-01-05 19:36:29.000000000 -0800
+++ CGI.pm.new 2010-08-05 09:14:12.000000000 -0700
@@ -82,10 +82,11 @@
else {
$response = HTTP::Response->new(200);
$response->header('Content-type' => 'text/xml; charset="utf-8"');
- $response->content( encode('utf8', $response_message ) );
+ my $response_message_enc = encode('utf8', $response_message);
+ $response->content($response_message_enc);
{
use bytes;
- $response->header('Content-length', length $response_message);
+ $response->header('Content-length', length $response_message_enc);
}
}

-Tony <tferrante@barracuda.com>

Discussion

  • Martin Kutter

    Martin Kutter - 2010-08-06

    Hi Tony,

    this sounds somewhat strange to me. The "use bytes" in the lower block enables byte semantics - which should cause the unicode string "ü" to have a length of 2 - so actually your patch should not have any effect.
    At least this is what "perldoc bytes" says.

    Do you have a test showing the erroneous behaviour before the patch?

    Thanks for your help,

    Martin

     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2010-08-09

    Looking closer, Encode::encode is encoding my content a second time.

    From the Encode documentation:
    CAVEAT: When you run "$octets = encode("utf8", $string)", then
    $octets may not be equal to $string. Though they both contain the
    same data, the UTF8 flag for $octets is always off. When you encode
    anything, UTF8 flag of the result is always off, even when it con‐
    tains completely valid utf8 string. See "The UTF8 flag" below.

    use bytes;
    use Encode qw(encode decode);

    my $message = "\x{e6}\x{97}\x{a5}\x{e6}\x{9c}\x{a3}abcd";
    print "message: $message ". length($message) ."\n";

    my $enc_message = encode('utf-8', $message);
    print "enc_message: $enc_message ". length($enc_message) ."\n";

    # decode utf8 to avoid double encoding issue
    my $dec_message;
    eval {
    $dec_message = decode('utf8', $message);
    };
    if ($@) {
    $dec_message = $message;
    }

    my $enc_message_new = encode('utf-8', $dec_message);
    print "enc_message_new: $enc_message_new ". length($enc_message_new) ."\n";

     

Log in to post a comment.