Re: [Modeling-users] working in unicode?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On dimanche, avr 20, 2003, at 14:58 Europe/Amsterdam, Sebastien Bigaret 
wrote:
> Mario Ruggier <ma...@ru...> wrote:
>> i would like to be able to write and read unicode (as transparently as
>> possible) to a text attribute. Postgres/PyPgSQL supports this (see the
>> 'Databases' section in http://dalchemy.com/opensource/unicodedoc/
>> -- an article I found very useful). However I have no idea about 
>> mysql.
>>
>> It seems that modeling operates in latin-1? Is this true?
>> Can I work around this in any way, or configure the framework to
>> assume utf-8 as encoding? The problem with latin-1 is that it
>> is not a unicode encoding, so even if most of the special characters
>> I would need to handle now are latin-1, sooner than later there will
>> be problems.
>
>   latin-1?  No, I never assumed a particular encoding for strings. It
>   can be that the framework does not behave correctly because of the
>   unicode type, but if this happens this is definitely a bug, it was 
> not
>   intended.

It's me who was assuming a particular encoding, when setting the values 
;)

> Two kinds of problems may arise:
>
> 1. the underlying adaptor itself (musqldb,psycopg, etc.) do not handle
>    uncide strings very well --I've no idea how adapters handle this,
>    never tried it,

PyPgSQL handles it transparently (but internally he encodes anyway to 
UTF-8),
and thus you can directly execute a query unicode string, and he will 
return
results encoded in client_encoding (see article ref above).

What I was asking, is can I tell the Framework to set extra parameters
such as the client_encoding, when connecting and crating cursors?

> 2. mdl assumes that the strings' type is string, not unicode, and some
>    operations fail.
>
>   I never tried unicode strings, hence I do not have more to say on
>   that.
>
>     Maybe you could go ahead and try it, and then report?

OK:

If i try to send a unicode string value where a string is expected,
a ValidationException is generated for that attribute.

However, as expected, if I take care to encode all string values to
utf-8 before sending them to the db via the framework, it all works 
fine.
Filtering of objects also works fine, e.g. if i specify a qual such as
' someAtt=="someUtf8String"'.

I guess case-insensitive matches will not work, as Yannick pointed out 
(thanks!).
But, case does not really mean much when applied generically to unicode
(the concept of case is not for all languages?).

The only issue is that is is a little bit of a pity to have to take 
care that all
string values must be sent, and received as, utf-8 encoded strings. But 
in fact
this is a really small price to pay -- especially if the input means is 
a utf-8
encoded web form, which thus automatically sends data as utf-8, and 
wants
to receive it in utf-8.

mario