Re: [Sqlalchemy-tickets] [sqlalchemy] #2911: rework of unicode conversion, re: "conditional" as wel

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

#2911: rework of unicode conversion, re: "conditional" as well as cx_oracle
------------------------------+-------------------------------
      Reporter:  zzzeek       |      Owner:  zzzeek
          Type:  defect       |     Status:  new
      Priority:  high         |  Milestone:  0.9.xx
     Component:  cextensions  |   Severity:  major - 1-3 hours
    Resolution:               |   Keywords:
Progress State:  in progress  |
------------------------------+-------------------------------

Comment (by zzzeek):

 upcoming is a new ``to_conditional_unicode_processor_factory()`` in both
 Python and C. modify cx_oracle as such:

 {{{
 #!python
 @@ -749,8 +752,9 @@ class OracleDialect_cx_oracle(OracleDialect):
                              outconverter=self._detect_decimal,
                              arraysize=cursor.arraysize)
              # allow all strings to come back natively as Unicode
 -            elif defaultType in (cx_Oracle.STRING, cx_Oracle.FIXED_CHAR):
 -                return cursor.var(util.text_type, size, cursor.arraysize)
 +            #elif defaultType in (cx_Oracle.STRING,
 cx_Oracle.FIXED_CHAR):
 +            #    return cursor.var(util.text_type, size,
 cursor.arraysize)
 }}}

 cx_oracle then returns bytes or unicode (py2K only) depending on the
 column type (CHAR or NVARCHAR, etc.).  In this case we seek to do
 "conditional" unicode returns, since we don't know when the user might be
 placing Unicode() around a CHAR or NVARCHAR expression.  conditional
 unicode returns are expensive since they require an isinstance().

 But when we have cx_oracle's converter in place, now we have the unicode
 conversion overhead for all strings, not just unicode. For whatever
 reason, cx_oracle on Py2K counts all the decodes as Python function calls;
 in Py3K it does not, even if you have that converter in place.  So there's
 some less than ideal shenanigans going on inside of cx_oracle making us
 look bad.

 If we standardize cx_oracle instead on "conditional", we pay a price for
 unicode conversion when the C extensions are not in place; however, when
 the C extensions are present, the new one that does "conditional" does the
 check without any fn call overhead.   results are as follows:

 {{{
 1. cx_oracle unicode, no C ext, no check, returning unicode - 200K

 2. no cx_oracle unicode, no C ext, conditional check, returning unicode -
 300K

 3. no cx_oracle unicode, no C ext, unconditional check, returning unicode
 - 250K

 4. cx_oracle unicode, no C ext, returning str - 200K

 5. no cx_oracle unicode, no C ext, returning str - 100K

 6. cx_oracle unicode, C ext, no check, returning unicode - 100K

 7. no cx_oracle unicode, C ext, conditional check, returning unicode - 254

 8. no cx_oracle unicode, C ext, unconditional check, returning unicode -
 254

 9. cx_oracle unicode, C ext, returning str - 100K

 10. no cx_oracle unicode, C ext, returning str - 236

 }}}

--
Ticket URL: <http://www.sqlalchemy.org/trac/ticket/2911#comment:2>
sqlalchemy <http://www.sqlalchemy.org/>
The Database Toolkit for Python

Re: [Sqlalchemy-tickets] [sqlalchemy] #2911: rework of unicode conversion, re: "conditional" as wel

Re: [Sqlalchemy-tickets] [sqlalchemy] #2911: rework of unicode conversion, re: "conditional" as well as cx_oracle