Re: [cx-oracle-users] WITH_UNICODE in Python 2.x. Why ?
Brought to you by:
atuining
From: Anthony T. <ant...@gm...> - 2010-03-15 14:32:51
|
On Sat, Mar 13, 2010 at 1:54 PM, Michael Bayer <mi...@zz...> wrote: > First, a bug report involving WITH_UNICODE mode. I only am familiar with this option as one of my users (i.e. of SQLAlchemy) insists he must use this mode in production. > > cx_Oracle is generally pretty good at loudly rejecting non-unicode objects in this mode, unless you send across a string as a bind: > > import cx_Oracle > > conn = cx_Oracle.connect( > dsn = u'(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=172.16.248.131)(PORT=1521)))(CONNECT_DATA=(SID=xe)))', > password=u'tiger', > user=u'scott' > ) > > cursor = conn.cursor() > > cursor.execute(u"select :name from dual", {'name':u'some name'}) > result = cursor.fetchall()[0][0] > assert result == u'some name', result # passes > > cursor.execute(u"select :name from dual", {'name':'some name'}) > result = cursor.fetchall()[0][0] > assert result == u'some name', result # does not raise an exception. Returns: 736F6D65206E616D65 > > > In SQLAlchemy, we're going to ensure that a plain string never gets passed, but it still seems inconsistent and dangerous that cx_Oracle simply corrupts the data in some circumstances. For this reason I'm planning on having SQLA issue a huge warning when the presence of this flag is detected, that they must never stray from the confines of SQLA's bind handling or fear the wrath of odd hex-encoded data being generated without their knowledge. Ok, this is not a "bug" but a "feature". :-) In Python 3.x strings are considered "binary" data and so Python 2.x in Unicode mode is doing the same. That could, however, be turned off and in Unicode mode insist that buffer objects be passed -- or the call to cx_Oracle.Binary() be used. I'd appreciate other people's feedback on whether this would be considered an improvement or not. I can see both sides of the argument, though. > Secondly, detecting that WITH_UNICODE was used is a trick ! Right now I just check that the version >= 5 and that cx_Oracle.UNICODE does not exist. It would be nice if an explicit bit of information were available to detect this mode. That is the only way to do so, currently. I could add a "WITH_UNICODE" boolean flag at the module level if that would be considered helpful. > and thirdly, *why* is this mode, in its current form, even available in Python 2.x ? The fact that it returns Python unicodes in result sets in all cases is great, and very useful. But the rigidity and arguable bugginess on the connect/statement/bind parameter side doesn't seem to have any clear rationale. It would be more performant and easier on the outside world if the unicode(x) call took place within cx_oracle's native guts and not within pure-Python libraries that wish to sanitize input. Well, the reason this mode was added is partly historical. While I was converting the internals of cx_Oracle to be able to support Python 3.x and Unicode I decided that the best way to do so was to add this mode so that I could test everything in Python 2.x before even having to consider Python 3.x and its own set of quirks. That worked out very well and I decided to leave it in place as a convenience to those who wish to use Unicode throughout without porting to Python 3.x. That is also the primary reason for allowing strings to be treated as binary data in Unicode mode. I'm not sure what you are referring to with the "rigidity and arguable bugginess on the connect/statement/bind parameter side" but be aware that Oracle has a "unicode" mode and in that mode only Unicode data will be accepted. If I were to accept strings I would have to convert them to Unicode in some fashion -- and that implies knowing the encoding which I can't claim to know. Thus the reason for in Unicode mode ONLY accepting Unicode data and all strings are therefore treated as binary data. Maybe that will help you understand the decisions made and the reasons behind some of the "rigidity". Note that there is another solution to getting all data returned as Unicode if that is what is desired. You can use "normal" or "non-Unicode" mode and simply specify a connection output type handler that tells cx_Oracle that all strings should be returned as Unicode. See the sample "ReturnUnicode.py" as an example. In that case, cx_Oracle will insist upon strings for connect strings, SQL statements, etc. -- but will accept Unicode for bind parameters and will return Unicode in result sets. Again, this is simply due to the way that the OCI deals with Unicode for what it terms "metadata". Hopefully I've helped answer your questions and concerns! Anthony |