Re: [Pagekit-users] Character set or parsing issue?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

Am 14.06.2006 um 22:10 schrieb Russell D. Weiss:

> Hello all,
>
> Long time no post :-).
>
> We've encountered the following problem in our PageKit =20
> application.  When
> pulling data from a database and using $model->fillinform to =20
> populate form
> fields, we're seeing problems when the data contains certain =20
> international
> characters, such as "=E1" -- as well as potentially some others.
>
> Basically, if the word "Test=E1" is pulled from the database, the =20
> HTML form
> field will look like:
>
> <input type=3D"text" value=3D"Test? name=3D"blah">  Obviously, this =
causes
> problems, as the value is not terminated properly with a quote, and =20=

> the true
> value is not shown in the form field.  I tested HTML::FillInForm =20
> separately
> and this problem does not appear.  The problem may be due to some =20
> parsing
> that pagekit does after it runs the the page through HTML::FillInForm.
>
> Boris and others, do you have any idea as to what might cause this?
>

I remember that problem. The reason is, that you lost the encoding of =20=

your string somewhere. mysql for example does it always wrong. =20
Pagekit try to ship around the problem by removeing the utf8 flag =20
before we pass the data to fillinform. and force utf8 on the result. =20
There might be a bug or some strings in your page are not in the =20
propper encoding. The other source of such errors are =20
Apache::Request. Since anything stored there lost the utf8 flag.

I think your input is somewhere not your default_input_charset. The =20
source of the problem is your database, or inputparams. Pagekit does =20
the right thing as far as I know for all sorts of input unless you =20
mix the charsets.

I have working solutions for any input but it takes a while to =20
explain all of them ;-) One is to convert all my inputs from the =20
database to default_input_charset with Encode::decode or I use =20
postgres with pg_enable_utf8 ;-) the other one is the right charset =20
from __from__ fields since browsers answer in a different charset =20
from time to time. The trick is to send a hidden field with known =20
chars like '=E1' and check that first. if it is the same as '=E1' in =20
latin1 you know the encoding for all other fields easy enough other =20
wise compare to another charset. The third point is to use a own =20
Apacje::Request object to handle the utf8 flag correctly. I can show =20
a example if you like. It is really hard to handle the charset issue =20
correct. If there is a mistake you get a '?' for the char in =20
question. The form trick is explained somehow with my answers to this =20=

tread on pm:

   http://www.perlmonks.com/index.pl?node_id=3D401315

I really know it is confusing, what version of pagekit do you use? I =20
remember there was a change to handle more wrong cases. Feel free to =20
ask more specific and I try to came up with a better description ;-)

The basic problem is this:

use Encode;
use DBI;

# setup a test database
my $dbh =3D DBI->connect( "dbi:SQLite:dbname=3D/tmp/dbfile",
                         "", "", { PrintError =3D> 0, RaiseError =3D> 1 =
} );
eval { $dbh->do(q{ CREATE TABLE t_storage ( id INTEGER, str VARCHAR=20
(255) ) }) };
eval { $dbh->do(q{ DELETE FROM  t_storage }) };

# and our test stringss
my $str      =3D 'test' . chr(0xe1);              #latin1 string
my $utf8_str =3D decode( 'iso-8859-1', $str );    # same string, but =
utf8

compare( "compare  \$str, \$utf8_str:\n", $str, $utf8_str );

# serialize the data into a database removes the utf8 flag only postgres
# can handle this correct on request
$dbh->do( q{ INSERT INTO t_storage VALUES ( 1, ? ) }, {}, $str );
my ($str_from_db) =3D
   $dbh->selectrow_array(q{ SELECT str FROM t_storage WHERE id =3D 1 });
$dbh->do( q{ INSERT INTO t_storage VALUES ( 2, ? ) }, {}, $utf8_str );
my ($utf8_str_from_db) =3D
   $dbh->selectrow_array(q{ SELECT str FROM t_storage WHERE id =3D 2 });

# compare again
compare( "compare \$str_from_db, \$utf8_str_from_db:\n",
          $str_from_db, $utf8_str_from_db );

# compare again
compare( "compare \$utf8_str, \$utf8_str_from_db:\n",
          $utf8_str, $utf8_str_from_db );

{
   use bytes;
   print "compare binary \$utf8_str, \$utf8_str_from_db:\n";
   print $utf8_str eq $utf8_str_from_db ? "same" : "different", $/, $/;
}

#

########
## Subs
########
sub compare {
   print shift;
   my ( $s1, $s2 ) =3D @_;

   # compare
   {
     use bytes;
     print length $s1, $/;    # length $str
     print length $s2, $/;    # length $utf8_str
   }

   # supprise for most people
   print $s1 eq $s2 ? "same" : "different", $/, $/;
}

> Thanks,
> Russell
>
>
>
> _______________________________________________
> Pagekit-users mailing list
> Pag...@li...
> https://lists.sourceforge.net/lists/listinfo/pagekit-users

--
Boris