Re: [Modeling-users] working in unicode?
Status: Abandoned
Brought to you by:
sbigaret
From: Mario R. <ma...@ru...> - 2003-04-21 11:14:11
|
On dimanche, avr 20, 2003, at 14:58 Europe/Amsterdam, Sebastien Bigaret wrote: > Mario Ruggier <ma...@ru...> wrote: >> i would like to be able to write and read unicode (as transparently as >> possible) to a text attribute. Postgres/PyPgSQL supports this (see the >> 'Databases' section in http://dalchemy.com/opensource/unicodedoc/ >> -- an article I found very useful). However I have no idea about >> mysql. >> >> It seems that modeling operates in latin-1? Is this true? >> Can I work around this in any way, or configure the framework to >> assume utf-8 as encoding? The problem with latin-1 is that it >> is not a unicode encoding, so even if most of the special characters >> I would need to handle now are latin-1, sooner than later there will >> be problems. > > latin-1? No, I never assumed a particular encoding for strings. It > can be that the framework does not behave correctly because of the > unicode type, but if this happens this is definitely a bug, it was > not > intended. It's me who was assuming a particular encoding, when setting the values ;) > Two kinds of problems may arise: > > 1. the underlying adaptor itself (musqldb,psycopg, etc.) do not handle > uncide strings very well --I've no idea how adapters handle this, > never tried it, PyPgSQL handles it transparently (but internally he encodes anyway to UTF-8), and thus you can directly execute a query unicode string, and he will return results encoded in client_encoding (see article ref above). What I was asking, is can I tell the Framework to set extra parameters such as the client_encoding, when connecting and crating cursors? > 2. mdl assumes that the strings' type is string, not unicode, and some > operations fail. > > I never tried unicode strings, hence I do not have more to say on > that. > > Maybe you could go ahead and try it, and then report? OK: If i try to send a unicode string value where a string is expected, a ValidationException is generated for that attribute. However, as expected, if I take care to encode all string values to utf-8 before sending them to the db via the framework, it all works fine. Filtering of objects also works fine, e.g. if i specify a qual such as ' someAtt=="someUtf8String"'. I guess case-insensitive matches will not work, as Yannick pointed out (thanks!). But, case does not really mean much when applied generically to unicode (the concept of case is not for all languages?). The only issue is that is is a little bit of a pity to have to take care that all string values must be sent, and received as, utf-8 encoded strings. But in fact this is a really small price to pay -- especially if the input means is a utf-8 encoded web form, which thus automatically sends data as utf-8, and wants to receive it in utf-8. mario |