From: Leif K-B. <eu...@ec...> - 2004-11-30 09:50:43
|
I'm working on a database-heavy Web application. I really like SQLObject's features, so I would like to use it; unfortunately, the lack of Unicode support is a real problem. I would imagine that this could be fixed fairly easily by subclassing StrCol, but I'm really not familiar enough with SQLObject's internals to do it myself. Has anyone else tried that? Are there plans to natively support Unicode in SQLObject 0.6 or later? |
From: Oleg B. <ph...@ma...> - 2004-11-30 10:23:32
|
On Tue, Nov 30, 2004 at 04:52:02AM -0500, Leif K-Brooks wrote: > I'm working on a database-heavy Web application. I really like > SQLObject's features, so I would like to use it; unfortunately, the lack > of Unicode support is a real problem. I would imagine that this could be > fixed fairly easily by subclassing StrCol, but I'm really not familiar > enough with SQLObject's internals to do it myself. Has anyone else tried We did that, but I cannot show the code - its a part of a commercial program. > that? Are there plans to natively support Unicode in SQLObject 0.6 or later? I don't know of any. And, btw, what is "unicode support"? Before starting to implement "unicode support" we have to understand, what it is. What are your use cases? In what encoding you are going to store strings in the db? Should there be an attribute "encoding" on a column? a table? a database? a db connection? Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Max I. <ma...@uc...> - 2004-11-30 10:31:08
|
Leif K-Brooks wrote: > I'm working on a database-heavy Web application. I really like > SQLObject's features, so I would like to use it; unfortunately, the lack > of Unicode support is a real problem. I would imagine that this could be > fixed fairly easily by subclassing StrCol, but I'm really not familiar > enough with SQLObject's internals to do it myself. Has anyone else tried > that? Are there plans to natively support Unicode in SQLObject 0.6 or > later? I have invented a rather crudge way to handle this: def getstring(name): def getter(self): value = getattr(self, "_SO_get_" + name)() if type(value) == type(''): value = value.decode('utf8') return value return getter def setstring(name): def setter(self, value): if type(value) == type(u''): value = value.encode('utf8') getattr(self, "_SO_set_" + name)(value) return setter class User(SQLObject): name = StringCol(length=96, default='') _get_name = getstring('name') _set_name = setstring('name') This solution provides convertion from Unicode (used throughout the probram) to the utf-8 encoding accepted by SQLObject. Obviously, this is far from ideal. Therefore I'd like to see how the others deal with this problem. Btw, Oleg's proposal about getting rid of sqlobject ad-hoc escaping can possibly solve the Unicode problem as well. |
From: Oleg B. <ph...@ma...> - 2004-11-30 10:39:37
|
On Tue, Nov 30, 2004 at 12:29:33PM +0200, Max Ischenko wrote: > class User(SQLObject): > name = StringCol(length=96, default='') > > _get_name = getstring('name') > _set_name = setstring('name') > > This solution provides convertion from Unicode (used throughout the > probram) to the utf-8 encoding accepted by SQLObject. > > Obviously, this is far from ideal. Therefore I'd like to see how the > others deal with this problem. Subclass StrCol, and convert data from unicode to db encoding and back in its validator. > Btw, Oleg's proposal about getting rid of sqlobject ad-hoc escaping can > possibly solve the Unicode problem as well. In what way?! In any case you cannot store unicode directly in the db - you must convert it to a db encoding. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Oleg B. <ph...@ma...> - 2004-11-30 10:57:53
|
I've got a permission to publish snippets of code from our program. On Tue, Nov 30, 2004 at 01:39:31PM +0300, Oleg Broytmann wrote: > Subclass StrCol, and convert data from unicode to db encoding and > back in its validator. class UnicodeStringValidator(validators.Validator): def fromPython(self, value, state): return value.encode(db_encoding) def toPython(self, value, state): return unicode(value, db_encoding) stringValidator = UnicodeStringValidator() class SOUnicodeStringCol(SOStringCol): def __init__(self, **kw): SOStringCol.__init__(self, **kw) self.validator = validators.All.join(stringValidator, self.validator) class UnicodeStringCol(Col): baseClass = SOUnicodeStringCol Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Sidnei da S. <si...@aw...> - 2004-11-30 11:26:56
|
On Tue, Nov 30, 2004 at 01:57:49PM +0300, Oleg Broytmann wrote: | I've got a permission to publish snippets of code from our program. Jeebus, If permission is required to publish that much lines of code, then something is severely broken in our world :( -- Sidnei da Silva <si...@aw...> http://awkly.org - dreamcatching :: making your dreams come true http://www.enfoldsystems.com http://plone.org/about/team#dreamcatcher <tenbytes> fag <dash> this channel is made of LOVE AND PEACE!! <tenbytes> oh |
From: Oleg B. <ph...@ma...> - 2004-11-30 11:45:24
|
On Tue, Nov 30, 2004 at 09:25:59AM -0200, Sidnei da Silva wrote: > On Tue, Nov 30, 2004 at 01:57:49PM +0300, Oleg Broytmann wrote: > | I've got a permission to publish snippets of code from our program. > > Jeebus, If permission is required to publish that much lines of code, > then something is severely broken in our world :( Very broken, indeed. Property, and especially intellectual property. Copyrights, licences, patents, EULA, UCITA, DMCA... Unfortunately, noone has come with a solution acceptable to everyone. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Max I. <ma...@uc...> - 2004-11-30 13:48:43
|
Oleg Broytmann wrote: >> Subclass StrCol, and convert data from unicode to db encoding and >>back in its validator. > > > class UnicodeStringValidator(validators.Validator): > > def fromPython(self, value, state): > return value.encode(db_encoding) > > def toPython(self, value, state): > return unicode(value, db_encoding) What's the value of db_encoding here? |
From: Oleg B. <ph...@ma...> - 2004-11-30 14:04:45
|
On Tue, Nov 30, 2004 at 03:38:44PM +0200, Max Ischenko wrote: > >class UnicodeStringValidator(validators.Validator): > > > > def fromPython(self, value, state): > > return value.encode(db_encoding) > > > > def toPython(self, value, state): > > return unicode(value, db_encoding) > > What's the value of db_encoding here? Database encoding. I personnaly use koi8-r, but it can be anythying your DB understands. If your DB understands UTF-8 (for ORDER BY, UPPER()/LOWER() and all that) - go with it. Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Max I. <ma...@uc...> - 2004-11-30 10:58:01
|
Oleg Broytmann wrote: > > Subclass StrCol, and convert data from unicode to db encoding and > back in its validator. May be. >>Btw, Oleg's proposal about getting rid of sqlobject ad-hoc escaping can >>possibly solve the Unicode problem as well. > > > In what way?! Well, I may be wrong here. I didn't follow that thread in details, but AFAIU, SQLObject interpolates string by itself, probably using str(). And this would give a UnicodeError for non-ascii encodings. IIRC, native db drivers handle unicode input without problems. At least those I've used. And moreover, they return data in a record set in unicode as well. > In any case you cannot store unicode directly in the db > - you must convert it to a db encoding. Most modern dbs and python drivers handle this problem transparently. At least, that's my experience. And discussed issues may be an indicator that the problem lies within the SQLObject itself. --max. |
From: Oleg B. <ph...@ma...> - 2004-11-30 11:06:57
|
On Tue, Nov 30, 2004 at 12:55:41PM +0200, Max Ischenko wrote: > IIRC, native db drivers handle > unicode input without problems. What is "unicode input"? > > In any case you cannot store unicode directly in the db > > - you must convert it to a db encoding. > > Most modern dbs and python drivers handle this problem transparently. At > least, that's my experience. Can you show an example? Using a snippet of code, with a DB API driver... Oleg. -- Oleg Broytmann http://phd.pp.ru/ ph...@ph... Programmers don't die, they just GOSUB without RETURN. |
From: Max I. <ma...@uc...> - 2004-11-30 13:48:20
|
Oleg Broytmann wrote: >>IIRC, native db drivers handle >>unicode input without problems. > > What is "unicode input"? Parameters that are passed in as a unicode string (u'bla-bla-bla') >>>In any case you cannot store unicode directly in the db >>>- you must convert it to a db encoding. >> >>Most modern dbs and python drivers handle this problem transparently. At >>least, that's my experience. > Can you show an example? Using a snippet of code, with a DB API > driver... Guess not. ;-) Just checked with psycopg -- it does requre a parameter to be encoded into something like utf-8. Either my memory make me a disservice or this psycopg is somehow broken. ;-) |
From: Ian B. <ia...@co...> - 2004-11-30 18:01:32
|
Max Ischenko wrote: >>> Most modern dbs and python drivers handle this problem transparently. >>> At least, that's my experience. >> >> Can you show an example? Using a snippet of code, with a DB API >> driver... > > > Guess not. ;-) > > Just checked with psycopg -- it does requre a parameter to be encoded > into something like utf-8. > > Either my memory make me a disservice or this psycopg is somehow broken. > ;-) I suspect it's something about Unicode in databases being a pain in the ass. At least, that's what I'm guessing; I've never tried to do it, I've only stored ASCII and stuff that I treat as though its binary data. I might note when I installed postgres on Debian, it asked me questions about encoding. This might imply that encoding setup in an installation-wide (not per-database or per-session) fashion. Then it also asked me about how I wanted to format my dates. I answered ISO, but what madness would happen if someone selected US format dates? I doubt psycopg knows anything about what format date the server is using. Maybe these are just defaults, and by explicitly setting up the configuration for the connection you can avoid the madness. So maybe Unicode is like dates, messy and ad hoc. It might be better (or worse) in database servers that accept basic data types; psycopg does all its quoting on the client side. Also, if you are willing to use a default encoding, you could change converters.StringLikeConverter: def UnicodeConverter(value, db): return StringLikeConverter(value.encode('utf-8'), db) registerConverter(unicode, UnicodeConverter) You'll still have to decode strings as they come out of the database, but this will handle any queries you build. Probably the current behavior of using StringLikeConverter for unicode strings is bad, or at least not helpful, because you'll get all sorts of errors if you have any non-ASCII characters. -- Ian Bicking / ia...@co... / http://blog.ianbicking.org |
From: Max I. <ma...@uc...> - 2004-12-01 09:35:34
|
Ian Bicking wrote: > So maybe Unicode is like dates, messy and ad hoc. It might be better > (or worse) in database servers that accept basic data types; psycopg > does all its quoting on the client side. I see. Amen. > Also, if you are willing to use a default encoding, you could change > converters.StringLikeConverter: > > def UnicodeConverter(value, db): > return StringLikeConverter(value.encode('utf-8'), db) > registerConverter(unicode, UnicodeConverter) Hmm. Looks like there are a lot of ways to solve this particular problem. The only complain is why they are missing in the docs. ;-) Anyway, thanks a lot. Surely have to look deeper into the sqlobject to find a sane replacement for my current approach. As for "coverters", will this affect only db->python path or python->db path or both? Suspect the answer is UTSL. ;-) I like very much the idea of being able to register some middle-man code between the python objects and db that could solve this conversion issues trasparently. I even have another use case at hand - make CurrencyCol understand my Money type. The "validators" approach feels a bit out of place for this, as well as idea to subclass a particular *Col class. > You'll still have to decode strings as they come out of the database, > but this will handle any queries you build. Probably the current > behavior of using StringLikeConverter for unicode strings is bad, or at > least not helpful, because you'll get all sorts of errors if you have > any non-ASCII characters. OK. |
From: Stuart B. <stu...@ca...> - 2004-12-03 08:54:11
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ian Bicking wrote: | Max Ischenko wrote: | |>>> Most modern dbs and python drivers handle this problem |>>> transparently. At least, that's my experience. |>> |>> |>> Can you show an example? Using a snippet of code, with a DB API |>> driver... |> |> |> |> Guess not. ;-) |> |> Just checked with psycopg -- it does requre a parameter to be encoded |> into something like utf-8. psycopg2 does it automatically I believe. |> Either my memory make me a disservice or this psycopg is somehow |> broken. ;-) | | I suspect it's something about Unicode in databases being a pain in the | ass. At least, that's what I'm guessing; I've never tried to do it, | I've only stored ASCII and stuff that I treat as though its binary data. We happily throw Unicode strings through SQLObject and it gives us nothing but Unicode strings back. Welcome to the new millenium :-) To do this, we patched DBAPI._executeRetry and the StringValidator class: ~ def _executeRetry(self, conn, cursor, query): ~ if isinstance(query, unicode): ~ query = query.encode('utf8') ~ else: ~ # raise UnicodeError if it is not valid utf8 already ~ query.decode('utf8') ~ return cursor.execute(query) class StringValidator(validators.Validator): ~ def fromPython(self, value, state): ~ if isinstance(value, unicode): ~ return value.encode('utf8') ~ return value ~ def toPython(self, value, state): ~ if isinstance(value, str): ~ return value.decode('utf8') ~ return value This of course should really be done in the PostgreSQL driver somewhere but the above hack is fine for our needs at the moment. And psycopg2 might make it all irrelevant anyway. | I might note when I installed postgres on Debian, it asked me questions | about encoding. This might imply that encoding setup in an | installation-wide (not per-database or per-session) fashion. Then it | also asked me about how I wanted to format my dates. I answered ISO, | but what madness would happen if someone selected US format dates? I | doubt psycopg knows anything about what format date the server is using. | Maybe these are just defaults, and by explicitly setting up the | configuration for the connection you can avoid the madness. PostgreSQL hard codes the locale at initdb time, I believe because the locale you use for collation order affects index creation and cannot be changed. It is a pita, because most people really want to use the C locale and the only way of reverting is to blow away your data directory and recreate it. But the locale has nothing to do with the encoding - you can happily create databases with whatever encoding you like no matter what locale you selected at initdb time. - -- Stuart Bishop <stu...@ca...> http://www.canonical.com/ Canonical Ltd. http://www.ubuntulinux.com/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.5 (GNU/Linux) iD8DBQFBsCmmAfqZj7rGN0oRAnzHAJwMSwERoXFFSH1Q67PXEoh87A9juQCeOQbh bUtoPdnU0pNrA3/PRnGQB+E= =/jiX -----END PGP SIGNATURE----- |