From: Boris Z. <bz...@2b...> - 2006-06-15 21:33:00
|
Hi, Am 14.06.2006 um 22:10 schrieb Russell D. Weiss: > Hello all, > > Long time no post :-). > > We've encountered the following problem in our PageKit =20 > application. When > pulling data from a database and using $model->fillinform to =20 > populate form > fields, we're seeing problems when the data contains certain =20 > international > characters, such as "=E1" -- as well as potentially some others. > > Basically, if the word "Test=E1" is pulled from the database, the =20 > HTML form > field will look like: > > <input type=3D"text" value=3D"Test? name=3D"blah"> Obviously, this = causes > problems, as the value is not terminated properly with a quote, and =20= > the true > value is not shown in the form field. I tested HTML::FillInForm =20 > separately > and this problem does not appear. The problem may be due to some =20 > parsing > that pagekit does after it runs the the page through HTML::FillInForm. > > Boris and others, do you have any idea as to what might cause this? > I remember that problem. The reason is, that you lost the encoding of =20= your string somewhere. mysql for example does it always wrong. =20 Pagekit try to ship around the problem by removeing the utf8 flag =20 before we pass the data to fillinform. and force utf8 on the result. =20 There might be a bug or some strings in your page are not in the =20 propper encoding. The other source of such errors are =20 Apache::Request. Since anything stored there lost the utf8 flag. I think your input is somewhere not your default_input_charset. The =20 source of the problem is your database, or inputparams. Pagekit does =20 the right thing as far as I know for all sorts of input unless you =20 mix the charsets. I have working solutions for any input but it takes a while to =20 explain all of them ;-) One is to convert all my inputs from the =20 database to default_input_charset with Encode::decode or I use =20 postgres with pg_enable_utf8 ;-) the other one is the right charset =20 from __from__ fields since browsers answer in a different charset =20 from time to time. The trick is to send a hidden field with known =20 chars like '=E1' and check that first. if it is the same as '=E1' in =20 latin1 you know the encoding for all other fields easy enough other =20 wise compare to another charset. The third point is to use a own =20 Apacje::Request object to handle the utf8 flag correctly. I can show =20 a example if you like. It is really hard to handle the charset issue =20 correct. If there is a mistake you get a '?' for the char in =20 question. The form trick is explained somehow with my answers to this =20= tread on pm: http://www.perlmonks.com/index.pl?node_id=3D401315 I really know it is confusing, what version of pagekit do you use? I =20 remember there was a change to handle more wrong cases. Feel free to =20 ask more specific and I try to came up with a better description ;-) The basic problem is this: use Encode; use DBI; # setup a test database my $dbh =3D DBI->connect( "dbi:SQLite:dbname=3D/tmp/dbfile", "", "", { PrintError =3D> 0, RaiseError =3D> 1 = } ); eval { $dbh->do(q{ CREATE TABLE t_storage ( id INTEGER, str VARCHAR=20 (255) ) }) }; eval { $dbh->do(q{ DELETE FROM t_storage }) }; # and our test stringss my $str =3D 'test' . chr(0xe1); #latin1 string my $utf8_str =3D decode( 'iso-8859-1', $str ); # same string, but = utf8 compare( "compare \$str, \$utf8_str:\n", $str, $utf8_str ); # serialize the data into a database removes the utf8 flag only postgres # can handle this correct on request $dbh->do( q{ INSERT INTO t_storage VALUES ( 1, ? ) }, {}, $str ); my ($str_from_db) =3D $dbh->selectrow_array(q{ SELECT str FROM t_storage WHERE id =3D 1 }); $dbh->do( q{ INSERT INTO t_storage VALUES ( 2, ? ) }, {}, $utf8_str ); my ($utf8_str_from_db) =3D $dbh->selectrow_array(q{ SELECT str FROM t_storage WHERE id =3D 2 }); # compare again compare( "compare \$str_from_db, \$utf8_str_from_db:\n", $str_from_db, $utf8_str_from_db ); # compare again compare( "compare \$utf8_str, \$utf8_str_from_db:\n", $utf8_str, $utf8_str_from_db ); { use bytes; print "compare binary \$utf8_str, \$utf8_str_from_db:\n"; print $utf8_str eq $utf8_str_from_db ? "same" : "different", $/, $/; } # ######## ## Subs ######## sub compare { print shift; my ( $s1, $s2 ) =3D @_; # compare { use bytes; print length $s1, $/; # length $str print length $s2, $/; # length $utf8_str } # supprise for most people print $s1 eq $s2 ? "same" : "different", $/, $/; } > Thanks, > Russell > > > > _______________________________________________ > Pagekit-users mailing list > Pag...@li... > https://lists.sourceforge.net/lists/listinfo/pagekit-users -- Boris |