From: Boris Z. <bz...@2b...> - 2006-06-21 21:48:21
|
Hi Russell, Am 21.06.2006 um 20:48 schrieb Russell D. Weiss: > Thanks, Boris. > > It's a fairly large application, so modifying the entire app to use =20= > the > "trick" to detect the encoding that the browser is using would =20 > work, but > might be a bit of a pain to build-in to the entire app. > Not really, just build your own class, that acts like =20 Apache2::Request and point A::P in your global config to that class =20 (see: request_class). In that case you have only one place to edit =20 for your whole app. > Actually, using Encode::encode and encoding to UTF8 before =20 > outputting via > $model->fillinform() seemed to work and "fix" the output. I'm not =20 > sure that Just overload fillinform. This might fix the problem, but the source =20 of the error still remains. But if all your data is in the right format, this is unneeded. > > this is the "proper" way of doing things though. Rather than =20 > modifying the > entire app and each "fillinform()" call, I was considering =20 > modifying the No, just replace fillinfoerm in your modelcode and call $m-=20 >SUPER::fillinform later. > PageKit code that calls HTML::FillInForm and wrap the =20 > Encode::encode call > into there. I'm not sure if this is a great way to go, or if it is =20= > safe in > general. Actually, I don't pretend to be all that good when it =20 > comes to > character sets and encoding in general. > character encoding is a big pain. Your goal must be, that all input =20 is in one charset. my choice is always utf8. This means that my =20 database provide utf8 data my templtes are in utf8 and my data too. =20 Whenever my data came from a unsure datasource like a database I =20 check once if the utf8flag is set for utf8 data. if so im happy. =20 otherwise I check if the binary data is utf8. if so I just set the =20 utf8 flag and Im done. otherwise I convert the data with the =20 Encode::decode function. All this has to be done only once. Form data is a bit more difficult and need more checks. Thats where =20 the A::P::R class takes place. Look for a example at the first lines =20 of the Apache2/PageKit.pm file. > What are your thoughts on this? your big helper's here are Encode::decode, Encode::encode, Encode::is_utf8, Encode::_utf8_on, =20 Encode::_utf8_off Devel::Peek::Dump and the DBI qw(:utils) functions DBI::data_string_desc DBI::data_diff if you put some hours in your request_class most if not all your =20 problems go away. > > Thanks, > Russell > > ----- Original Message ----- > From: "Boris Zentner" <bz...@2b...> > To: "Russell D. Weiss" <rw...@in...> > Cc: <pag...@li...> > Sent: Thursday, June 15, 2006 5:32 PM > Subject: Re: [Pagekit-users] Character set or parsing issue? > > > Hi, > > Am 14.06.2006 um 22:10 schrieb Russell D. Weiss: > >> Hello all, >> >> Long time no post :-). >> >> We've encountered the following problem in our PageKit >> application. When >> pulling data from a database and using $model->fillinform to >> populate form >> fields, we're seeing problems when the data contains certain >> international >> characters, such as "=E1" -- as well as potentially some others. >> >> Basically, if the word "Test=E1" is pulled from the database, the >> HTML form >> field will look like: >> >> <input type=3D"text" value=3D"Test? name=3D"blah"> Obviously, this = causes >> problems, as the value is not terminated properly with a quote, and >> the true >> value is not shown in the form field. I tested HTML::FillInForm >> separately >> and this problem does not appear. The problem may be due to some >> parsing >> that pagekit does after it runs the the page through =20 >> HTML::FillInForm. >> >> Boris and others, do you have any idea as to what might cause this? >> > > I remember that problem. The reason is, that you lost the encoding of > your string somewhere. mysql for example does it always wrong. > Pagekit try to ship around the problem by removeing the utf8 flag > before we pass the data to fillinform. and force utf8 on the result. > There might be a bug or some strings in your page are not in the > propper encoding. The other source of such errors are > Apache::Request. Since anything stored there lost the utf8 flag. > > I think your input is somewhere not your default_input_charset. The > source of the problem is your database, or inputparams. Pagekit does > the right thing as far as I know for all sorts of input unless you > mix the charsets. > > I have working solutions for any input but it takes a while to > explain all of them ;-) One is to convert all my inputs from the > database to default_input_charset with Encode::decode or I use > postgres with pg_enable_utf8 ;-) the other one is the right charset > from __from__ fields since browsers answer in a different charset > from time to time. The trick is to send a hidden field with known > chars like '=E1' and check that first. if it is the same as '=E1' in > latin1 you know the encoding for all other fields easy enough other > wise compare to another charset. The third point is to use a own > Apacje::Request object to handle the utf8 flag correctly. I can show > a example if you like. It is really hard to handle the charset issue > correct. If there is a mistake you get a '?' for the char in > question. The form trick is explained somehow with my answers to this > tread on pm: > > http://www.perlmonks.com/index.pl?node_id=3D401315 > > I really know it is confusing, what version of pagekit do you use? I > remember there was a change to handle more wrong cases. Feel free to > ask more specific and I try to came up with a better description ;-) > > The basic problem is this: > > use Encode; > use DBI; > > # setup a test database > my $dbh =3D DBI->connect( "dbi:SQLite:dbname=3D/tmp/dbfile", > "", "", { PrintError =3D> 0, RaiseError =3D> =20= > 1 } ); > eval { $dbh->do(q{ CREATE TABLE t_storage ( id INTEGER, str VARCHAR > (255) ) }) }; > eval { $dbh->do(q{ DELETE FROM t_storage }) }; > > # and our test stringss > my $str =3D 'test' . chr(0xe1); #latin1 string > my $utf8_str =3D decode( 'iso-8859-1', $str ); # same string, but =20= > utf8 > > compare( "compare \$str, \$utf8_str:\n", $str, $utf8_str ); > > # serialize the data into a database removes the utf8 flag only =20 > postgres > # can handle this correct on request > $dbh->do( q{ INSERT INTO t_storage VALUES ( 1, ? ) }, {}, $str ); > my ($str_from_db) =3D > $dbh->selectrow_array(q{ SELECT str FROM t_storage WHERE id =3D 1 = }); > $dbh->do( q{ INSERT INTO t_storage VALUES ( 2, ? ) }, {}, $utf8_str ); > my ($utf8_str_from_db) =3D > $dbh->selectrow_array(q{ SELECT str FROM t_storage WHERE id =3D 2 = }); > > # compare again > compare( "compare \$str_from_db, \$utf8_str_from_db:\n", > $str_from_db, $utf8_str_from_db ); > > # compare again > compare( "compare \$utf8_str, \$utf8_str_from_db:\n", > $utf8_str, $utf8_str_from_db ); > > { > use bytes; > print "compare binary \$utf8_str, \$utf8_str_from_db:\n"; > print $utf8_str eq $utf8_str_from_db ? "same" : "different", $/, =20= > $/; > } > > # > > ######## > ## Subs > ######## > sub compare { > print shift; > my ( $s1, $s2 ) =3D @_; > > # compare > { > use bytes; > print length $s1, $/; # length $str > print length $s2, $/; # length $utf8_str > } > > # supprise for most people > print $s1 eq $s2 ? "same" : "different", $/, $/; > } > > > >> Thanks, >> Russell >> >> >> >> _______________________________________________ >> Pagekit-users mailing list >> Pag...@li... >> https://lists.sourceforge.net/lists/listinfo/pagekit-users > > -- > Boris > > > > > _______________________________________________ > Pagekit-users mailing list > Pag...@li... > https://lists.sourceforge.net/lists/listinfo/pagekit-users > > > All the advantages of Linux Managed Hosting--Without the Cost and =20 > Risk! > Fully trained technicians. The highest number of Red Hat =20 > certifications in > the hosting industry. Fanatical Support. Click to learn more > http://sel.as-us.falkag.net/sel?=20 > cmd=3Dlnk&kid=3D107521&bid=3D248729&dat=3D121642 > _______________________________________________ > Pagekit-users mailing list > Pag...@li... > https://lists.sourceforge.net/lists/listinfo/pagekit-users -- Boris |