From: Russell D. W. <rw...@in...> - 2006-06-21 18:49:12
|
Thanks, Boris. It's a fairly large application, so modifying the entire app to use the "trick" to detect the encoding that the browser is using would work, but might be a bit of a pain to build-in to the entire app. Actually, using Encode::encode and encoding to UTF8 before outputting via $model->fillinform() seemed to work and "fix" the output. I'm not sure that this is the "proper" way of doing things though. Rather than modifying the entire app and each "fillinform()" call, I was considering modifying the PageKit code that calls HTML::FillInForm and wrap the Encode::encode call into there. I'm not sure if this is a great way to go, or if it is safe in general. Actually, I don't pretend to be all that good when it comes to character sets and encoding in general. What are your thoughts on this? Thanks, Russell ----- Original Message ----- From: "Boris Zentner" <bz...@2b...> To: "Russell D. Weiss" <rw...@in...> Cc: <pag...@li...> Sent: Thursday, June 15, 2006 5:32 PM Subject: Re: [Pagekit-users] Character set or parsing issue? Hi, Am 14.06.2006 um 22:10 schrieb Russell D. Weiss: > Hello all, > > Long time no post :-). > > We've encountered the following problem in our PageKit > application. When > pulling data from a database and using $model->fillinform to > populate form > fields, we're seeing problems when the data contains certain > international > characters, such as "á" -- as well as potentially some others. > > Basically, if the word "Testá" is pulled from the database, the > HTML form > field will look like: > > <input type="text" value="Test? name="blah"> Obviously, this causes > problems, as the value is not terminated properly with a quote, and > the true > value is not shown in the form field. I tested HTML::FillInForm > separately > and this problem does not appear. The problem may be due to some > parsing > that pagekit does after it runs the the page through HTML::FillInForm. > > Boris and others, do you have any idea as to what might cause this? > I remember that problem. The reason is, that you lost the encoding of your string somewhere. mysql for example does it always wrong. Pagekit try to ship around the problem by removeing the utf8 flag before we pass the data to fillinform. and force utf8 on the result. There might be a bug or some strings in your page are not in the propper encoding. The other source of such errors are Apache::Request. Since anything stored there lost the utf8 flag. I think your input is somewhere not your default_input_charset. The source of the problem is your database, or inputparams. Pagekit does the right thing as far as I know for all sorts of input unless you mix the charsets. I have working solutions for any input but it takes a while to explain all of them ;-) One is to convert all my inputs from the database to default_input_charset with Encode::decode or I use postgres with pg_enable_utf8 ;-) the other one is the right charset from __from__ fields since browsers answer in a different charset from time to time. The trick is to send a hidden field with known chars like 'á' and check that first. if it is the same as 'á' in latin1 you know the encoding for all other fields easy enough other wise compare to another charset. The third point is to use a own Apacje::Request object to handle the utf8 flag correctly. I can show a example if you like. It is really hard to handle the charset issue correct. If there is a mistake you get a '?' for the char in question. The form trick is explained somehow with my answers to this tread on pm: http://www.perlmonks.com/index.pl?node_id=401315 I really know it is confusing, what version of pagekit do you use? I remember there was a change to handle more wrong cases. Feel free to ask more specific and I try to came up with a better description ;-) The basic problem is this: use Encode; use DBI; # setup a test database my $dbh = DBI->connect( "dbi:SQLite:dbname=/tmp/dbfile", "", "", { PrintError => 0, RaiseError => 1 } ); eval { $dbh->do(q{ CREATE TABLE t_storage ( id INTEGER, str VARCHAR (255) ) }) }; eval { $dbh->do(q{ DELETE FROM t_storage }) }; # and our test stringss my $str = 'test' . chr(0xe1); #latin1 string my $utf8_str = decode( 'iso-8859-1', $str ); # same string, but utf8 compare( "compare \$str, \$utf8_str:\n", $str, $utf8_str ); # serialize the data into a database removes the utf8 flag only postgres # can handle this correct on request $dbh->do( q{ INSERT INTO t_storage VALUES ( 1, ? ) }, {}, $str ); my ($str_from_db) = $dbh->selectrow_array(q{ SELECT str FROM t_storage WHERE id = 1 }); $dbh->do( q{ INSERT INTO t_storage VALUES ( 2, ? ) }, {}, $utf8_str ); my ($utf8_str_from_db) = $dbh->selectrow_array(q{ SELECT str FROM t_storage WHERE id = 2 }); # compare again compare( "compare \$str_from_db, \$utf8_str_from_db:\n", $str_from_db, $utf8_str_from_db ); # compare again compare( "compare \$utf8_str, \$utf8_str_from_db:\n", $utf8_str, $utf8_str_from_db ); { use bytes; print "compare binary \$utf8_str, \$utf8_str_from_db:\n"; print $utf8_str eq $utf8_str_from_db ? "same" : "different", $/, $/; } # ######## ## Subs ######## sub compare { print shift; my ( $s1, $s2 ) = @_; # compare { use bytes; print length $s1, $/; # length $str print length $s2, $/; # length $utf8_str } # supprise for most people print $s1 eq $s2 ? "same" : "different", $/, $/; } > Thanks, > Russell > > > > _______________________________________________ > Pagekit-users mailing list > Pag...@li... > https://lists.sourceforge.net/lists/listinfo/pagekit-users -- Boris _______________________________________________ Pagekit-users mailing list Pag...@li... https://lists.sourceforge.net/lists/listinfo/pagekit-users |