RE: [Openinteract-dev] Small i18n issue

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Teemu, Hi *,

looks like we just found a not-so-small issue...

While SPOPS::Tool::UTFConvert can handle conversion for SPOPS backends,
there's nothing like that for the OI2 frontends (say, Template::Toolkit and
the like). 

My suggestion for the "whole OI2 i18n charset encoding" would be:

1. get the charset from the request
2. encode all parameters as UTF-8 when fetching them in the request object
(all but uploads)
3. set the Content-type: charset="foo" for the response (if needed).
4. encode all output in the Response object to the appropriate charset just
before sending it (if needed).

Step 4 would probably be an issue for the Controller - OI2::Controller::Raw
should never re-code anything, and alternative controllers like, let's say
for outputting PDFs - probably shouldn't recode their stuff, too.

This would allow OI2 to use UTF-8 only in it's internal processing, but
serve frontends with potentially different character encodings.

It would also remove the need for charset conversions in SPOPS backends (as
long as the backends are UTF-8 capable - most perl modules should be) - they
would have the appropriate form already, and, the number of supported
charsets would largely superseed the current sad 'Latin1'.

Regards,

Martin

-----Original Message-----
From: Teemu Arina [mailto:te...@di...]
Sent: Mittwoch, 8. September 2004 11:56
To: ope...@li...
Cc: Kutter Martin; 'ope...@li...'
Subject: Re: [Openinteract-dev] Small i18n issue

> I've been able to track this down to a charset problem.
> LDAP expects directoryString attributes to be in UTF-8 encoding. The
> perl-ldap interface (Net::LDAP) does not provide UTF-8 conversions by
> default, so these are to be done by the application using Net::LDAP. This
> is no big deal - just a
>
>  use Encode;
>  $value = decode($charset, $value);

I had a similar problem with DBD::mysql and UTF-8. DBI has no general policy

for UTF-8, so it has to be implemented by DBD:s themselves. DBD::mysql does 
nothing to that issue. If you store UTF-8 strings in the database and 
retrieve them back, these strings do not get marked as UTF-8. Later it might

happen that your UTF-8 strings get encoded as UTF-8, because Perl didn't
mark 
those as UTF-8 already =) Encode module helps fixing this problem for
example 
in SPOPS::DBI::MySQL (define a post_fetch_action() that converts all data 
fields).

I wonder when you are able to write utf-8 compatible software without
mocking 
all the internals on several layers... It has been around so many years and 
still many module writers ignore it. It also wasn't until mysql 4.x when
they 
included UTF-8 support in character type fields.

> The problem is, that the only available solution to get the charset used
in
> the request is to grab it from the underlying Apache::Request or
> CGI::Request handle - not really easy and not really portable:
>
>  my $contentHeader = CTX->request->apache->headers_in()->{ Content-Type };

I noticed the same thing. I also found another way to set it:

CTX->response->content_type( 'text/html; charset=utf-8' )

CTX->response->charset() would be nice to have.

Greetings,

- Teemu