SOAP::WSDL / Bugs / #112 UTF8 content being returned with incorrect Content-Length

#112 UTF8 content being returned with incorrect Content-Length

Milestone: 2.0x

Status: open

Owner: nobody

Labels: SOAP message parser (6)

Priority: 5

Updated: 2014-08-14

Created: 2010-08-05

Creator: Anonymous

Private: No

SOAP/WSDL/Server/CGI.pm calculated the Content-Length from the unencoded data. It then utf8 encodes the data that is returned. I believe it should be returning the length of the utf8 encoded data. Here's a diff of the changes made on my system:

--- CGI.pm 2010-01-05 19:36:29.000000000 -0800
+++ CGI.pm.new 2010-08-05 09:14:12.000000000 -0700
@@ -82,10 +82,11 @@
else {
$response = HTTP::Response->new(200);
$response->header('Content-type' => 'text/xml; charset="utf-8"');
- $response->content( encode('utf8', $response_message ) );
+ my $response_message_enc = encode('utf8', $response_message);
+ $response->content($response_message_enc);
{
use bytes;
- $response->header('Content-length', length $response_message);
+ $response->header('Content-length', length $response_message_enc);
}
}

-Tony <tferrante@barracuda.com>

Discussion

Martin Kutter - 2010-08-06

Hi Tony,

this sounds somewhat strange to me. The "use bytes" in the lower block enables byte semantics - which should cause the unicode string "ü" to have a length of 2 - so actually your patch should not have any effect.
At least this is what "perldoc bytes" says.

Do you have a test showing the erroneous behaviour before the patch?

Thanks for your help,

Martin

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Comment has been marked as spam.
Undo

View and moderate all "bugs Discussion" comments posted by this user

Mark all as spam, and block user from posting to "Bugs"

Anonymous - 2010-08-09

Looking closer, Encode::encode is encoding my content a second time.

From the Encode documentation:
CAVEAT: When you run "$octets = encode("utf8", $string)", then
$octets may not be equal to $string. Though they both contain the
same data, the UTF8 flag for $octets is always off. When you encode
anything, UTF8 flag of the result is always off, even when it con‐
tains completely valid utf8 string. See "The UTF8 flag" below.

use bytes;
use Encode qw(encode decode);

my $message = "\x{e6}\x{97}\x{a5}\x{e6}\x{9c}\x{a3}abcd";
print "message: $message ". length($message) ."\n";

my $enc_message = encode('utf-8', $message);
print "enc_message: $enc_message ". length($enc_message) ."\n";

# decode utf8 to avoid double encoding issue
my $dec_message;
eval {
$dec_message = decode('utf8', $message);
};
if ($@) {
$dec_message = $message;
}

my $enc_message_new = encode('utf-8', $dec_message);
print "enc_message_new: $enc_message_new ". length($enc_message_new) ."\n";

Looking closer, Encode::encode is encoding my content a second time. From the Encode documentation: CAVEAT: When you run "$octets = encode$"utf8", $string$", then $octets may not be equal to $string. Though they both contain the same data, the UTF8 flag for $octets is always off. When you encode anything, UTF8 flag of the result is always off, even when it con‐ tains completely valid utf8 string. See "The UTF8 flag" below. use bytes; use Encode qw$encode decode$; my $message = "\x\{e6\}\x\{97\}\x\{a5\}\x\{e6\}\x\{9c\}\x\{a3\}abcd"; print "message: $message ". length$$message$ ."\n"; my $enc\_message = encode$'utf-8', $message$; print "enc\_message: $enc\_message ". length$$enc\_message$ ."\n"; \# decode utf8 to avoid double encoding issue my $dec\_message; eval \{ $dec\_message = decode$'utf8', $message$; \}; if $$@$ \{ $dec\_message = $message; \} my $enc\_message\_new = encode$'utf-8', $dec\_message$; print "enc\_message\_new: $enc\_message\_new ". length$$enc\_message\_new$ ."\n";

Add attachments
Cancel
You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

New Attachment:

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

UTF8 content being returned with incorrect Content-Length

Group

Searches

Help

#112 UTF8 content being returned with incorrect Content-Length

Discussion