From: Shimon R. <sh...@ru...> - 2005-11-23 15:43:19
|
Erik, Unfortunately, I don't think there is a perfect solution to this. The browser is supposed to submit any forms using the encoding you served the page in, but there are so many levels of second-guessing about character encodings that this isn't guaranteed. For my site voo2do.com, I decided that if things weren't going to work perfectly, I might as well keep them simple. So I did the all-UTF-8 approach: I hacked PageKit to always send pages in UTF-8, regardless of wha= t the browser requested. Now people whose browsers don't support UTF-8 can't use non-ASCII characters on my site... but the site has plenty of Javascrip= t that won't work on old browsers anyway, so that probably doesn't hurt anyone. This isn't perfect, but serving in non-unicode is problematic too. With an unhacked pagekit, my site would be served using Latin-1 because my browser prefers it to UTF-8 for some reason. If I type some non-Latin-1 characters= , my browser will send HTML entity codes. Of course, there is no way to distinguish whether the user actually meant to type %u10123 or whether that's a trick the browser pulled. So I think it's best to just make everything unicode. A reasonable alternative might be to hack pagekit to serve in UTF-8 as long as it's one of the browser-supported encodings (even when it's not the preferred one), and only recode if UTF-8 is just unsupported. Then perhaps you have a slightly better chance of serving pre-UTF8 browsers. Good luck, and let us know how it goes. shimon. On 11/23/05, Erik G=FCnther <eri...@bo... > wrote: > > Hi > > I have played with pagekit for some time now. And now I would be able to > have a site that use UTF8 internally. But how to I do that. The easy > part is to have all files in UTF-8 and save to the DB in UTF-8 and so > on. But pagekit are smart and sends the page in the encoding the browser > prefers. That is not any problem. But who do I handle the input from a > form? > > I mean how do I know what char encoding the web-browser are sending in? > I can't trust the outgoing encoding because that is trivial to change in > ant browser. Afaik there are no serten way to tell what encoding by just > looking at the string. > > What are you doing to fix this? Om my previous site i "converted" all to > Latin-1. But that was just a ugly hack. utf8:Is_utf8() and > Encode::is_uft8() won't help they say false on every string passed by > apache. :/ > > > One way is to block pagekit and send everything in UTF-8 because most > often the browser will send the return in UTF-8... but that solution > aren't bullet prof. The user can still send in eg Latin-1 or the browser > do not handle UTF-8 (rare). > > Any ideas? > > -- > > /erikg > > Erik G=FCnther eri...@bo... > System Developer Bokus AB > +46 (0)40 - 35 21 19 icq: 160744619 > > Fortune: > 'Course, I haven't weighed in yet. :-) > -- Larry Wall in <199...@wa...> > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.4 (GNU/Linux) > > iD8DBQBDhG03q1HQ7Yl9BM8RAqUoAJ9cjBKEBmF1GSmMfMMJEPlHDf2mQQCfWXH6 > 3V6AtwghzOqYdFWEcf4fdb8=3D > =3D4QYG > -----END PGP SIGNATURE----- > > > |