SOAP/WSDL/Server/CGI.pm calculated the Content-Length from the unencoded data. It then utf8 encodes the data that is returned. I believe it should be returning the length of the utf8 encoded data. Here's a diff of the changes made on my system:
--- CGI.pm 2010-01-05 19:36:29.000000000 -0800
+++ CGI.pm.new 2010-08-05 09:14:12.000000000 -0700
@@ -82,10 +82,11 @@
else {
$response = HTTP::Response->new(200);
$response->header('Content-type' => 'text/xml; charset="utf-8"');
- $response->content( encode('utf8', $response_message ) );
+ my $response_message_enc = encode('utf8', $response_message);
+ $response->content($response_message_enc);
{
use bytes;
- $response->header('Content-length', length $response_message);
+ $response->header('Content-length', length $response_message_enc);
}
}
-Tony <tferrante@barracuda.com>
Hi Tony,
this sounds somewhat strange to me. The "use bytes" in the lower block enables byte semantics - which should cause the unicode string "ü" to have a length of 2 - so actually your patch should not have any effect.
At least this is what "perldoc bytes" says.
Do you have a test showing the erroneous behaviour before the patch?
Thanks for your help,
Martin
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Looking closer, Encode::encode is encoding my content a second time.
From the Encode documentation:
CAVEAT: When you run "$octets = encode("utf8", $string)", then
$octets may not be equal to $string. Though they both contain the
same data, the UTF8 flag for $octets is always off. When you encode
anything, UTF8 flag of the result is always off, even when it con‐
tains completely valid utf8 string. See "The UTF8 flag" below.
use bytes;
use Encode qw(encode decode);
my $message = "\x{e6}\x{97}\x{a5}\x{e6}\x{9c}\x{a3}abcd";
print "message: $message ". length($message) ."\n";
my $enc_message = encode('utf-8', $message);
print "enc_message: $enc_message ". length($enc_message) ."\n";
# decode utf8 to avoid double encoding issue
my $dec_message;
eval {
$dec_message = decode('utf8', $message);
};
if ($@) {
$dec_message = $message;
}
my $enc_message_new = encode('utf-8', $dec_message);
print "enc_message_new: $enc_message_new ". length($enc_message_new) ."\n";