From: Kutter M. <mar...@si...> - 2004-09-08 12:22:31
|
Hi Teemu, Hi *, looks like we just found a not-so-small issue... While SPOPS::Tool::UTFConvert can handle conversion for SPOPS backends, there's nothing like that for the OI2 frontends (say, Template::Toolkit and the like). My suggestion for the "whole OI2 i18n charset encoding" would be: 1. get the charset from the request 2. encode all parameters as UTF-8 when fetching them in the request object (all but uploads) 3. set the Content-type: charset="foo" for the response (if needed). 4. encode all output in the Response object to the appropriate charset just before sending it (if needed). Step 4 would probably be an issue for the Controller - OI2::Controller::Raw should never re-code anything, and alternative controllers like, let's say for outputting PDFs - probably shouldn't recode their stuff, too. This would allow OI2 to use UTF-8 only in it's internal processing, but serve frontends with potentially different character encodings. It would also remove the need for charset conversions in SPOPS backends (as long as the backends are UTF-8 capable - most perl modules should be) - they would have the appropriate form already, and, the number of supported charsets would largely superseed the current sad 'Latin1'. Regards, Martin -----Original Message----- From: Teemu Arina [mailto:te...@di...] Sent: Mittwoch, 8. September 2004 11:56 To: ope...@li... Cc: Kutter Martin; 'ope...@li...' Subject: Re: [Openinteract-dev] Small i18n issue > I've been able to track this down to a charset problem. > LDAP expects directoryString attributes to be in UTF-8 encoding. The > perl-ldap interface (Net::LDAP) does not provide UTF-8 conversions by > default, so these are to be done by the application using Net::LDAP. This > is no big deal - just a > > use Encode; > $value = decode($charset, $value); I had a similar problem with DBD::mysql and UTF-8. DBI has no general policy for UTF-8, so it has to be implemented by DBD:s themselves. DBD::mysql does nothing to that issue. If you store UTF-8 strings in the database and retrieve them back, these strings do not get marked as UTF-8. Later it might happen that your UTF-8 strings get encoded as UTF-8, because Perl didn't mark those as UTF-8 already =) Encode module helps fixing this problem for example in SPOPS::DBI::MySQL (define a post_fetch_action() that converts all data fields). I wonder when you are able to write utf-8 compatible software without mocking all the internals on several layers... It has been around so many years and still many module writers ignore it. It also wasn't until mysql 4.x when they included UTF-8 support in character type fields. > The problem is, that the only available solution to get the charset used in > the request is to grab it from the underlying Apache::Request or > CGI::Request handle - not really easy and not really portable: > > my $contentHeader = CTX->request->apache->headers_in()->{ Content-Type }; I noticed the same thing. I also found another way to set it: CTX->response->content_type( 'text/html; charset=utf-8' ) CTX->response->charset() would be nice to have. Greetings, - Teemu |