Re: [Sqlalchemy-tickets] [sqlalchemy] #2911: rework of unicode conversion, re: "conditional" as wel
Brought to you by:
zzzeek
|
From: sqlalchemy <mi...@zz...> - 2014-01-17 21:44:20
|
#2911: rework of unicode conversion, re: "conditional" as well as cx_oracle
------------------------------+-------------------------------
Reporter: zzzeek | Owner: zzzeek
Type: defect | Status: new
Priority: high | Milestone: 0.9.xx
Component: cextensions | Severity: major - 1-3 hours
Resolution: | Keywords:
Progress State: in progress |
------------------------------+-------------------------------
Comment (by zzzeek):
upcoming is a new ``to_conditional_unicode_processor_factory()`` in both
Python and C. modify cx_oracle as such:
{{{
#!python
@@ -749,8 +752,9 @@ class OracleDialect_cx_oracle(OracleDialect):
outconverter=self._detect_decimal,
arraysize=cursor.arraysize)
# allow all strings to come back natively as Unicode
- elif defaultType in (cx_Oracle.STRING, cx_Oracle.FIXED_CHAR):
- return cursor.var(util.text_type, size, cursor.arraysize)
+ #elif defaultType in (cx_Oracle.STRING,
cx_Oracle.FIXED_CHAR):
+ # return cursor.var(util.text_type, size,
cursor.arraysize)
}}}
cx_oracle then returns bytes or unicode (py2K only) depending on the
column type (CHAR or NVARCHAR, etc.). In this case we seek to do
"conditional" unicode returns, since we don't know when the user might be
placing Unicode() around a CHAR or NVARCHAR expression. conditional
unicode returns are expensive since they require an isinstance().
But when we have cx_oracle's converter in place, now we have the unicode
conversion overhead for all strings, not just unicode. For whatever
reason, cx_oracle on Py2K counts all the decodes as Python function calls;
in Py3K it does not, even if you have that converter in place. So there's
some less than ideal shenanigans going on inside of cx_oracle making us
look bad.
If we standardize cx_oracle instead on "conditional", we pay a price for
unicode conversion when the C extensions are not in place; however, when
the C extensions are present, the new one that does "conditional" does the
check without any fn call overhead. results are as follows:
{{{
1. cx_oracle unicode, no C ext, no check, returning unicode - 200K
2. no cx_oracle unicode, no C ext, conditional check, returning unicode -
300K
3. no cx_oracle unicode, no C ext, unconditional check, returning unicode
- 250K
4. cx_oracle unicode, no C ext, returning str - 200K
5. no cx_oracle unicode, no C ext, returning str - 100K
6. cx_oracle unicode, C ext, no check, returning unicode - 100K
7. no cx_oracle unicode, C ext, conditional check, returning unicode - 254
8. no cx_oracle unicode, C ext, unconditional check, returning unicode -
254
9. cx_oracle unicode, C ext, returning str - 100K
10. no cx_oracle unicode, C ext, returning str - 236
}}}
--
Ticket URL: <http://www.sqlalchemy.org/trac/ticket/2911#comment:2>
sqlalchemy <http://www.sqlalchemy.org/>
The Database Toolkit for Python
|