Thread: [cx-oracle-users] issue with Unicode parameters
Brought to you by:
atuining
|
From: Valentin K. <vk...@gm...> - 2016-06-03 14:39:51
|
Hi,
I identified a problem with cx_Oracle driver and would like to understand how
it can be resolved. Basically, I observed a huge latencies with query if
I pass unicode string as binded parameter value. Below a stand-alone
code which reproduce a problem with our Oracle.
We run CERN Scientific Linux distribution (SLC 6.8), the code runs on
x86_64 platform with gcc493 and cx_Oracle 5.2.1
import cx_Oracle
def test_cx_oracle(uname, pwd, host, sql, params):
time0 = time.time()
connection = cx_Oracle.connect(uname, pwd, host)
cursor = connection.cursor()
cursor.execute(sql, **params)
lfn = params['logical_file_name']
print("lfn type=%s, cursor execute %s sec" % (type(lfn), time.time()-time0))
for row in cursor:
print row
cursor.close()
def main():
# read password from a file
ppp = open('ppp').readline().replace('\n','').replace('oracle://', '')
line = ppp.split('@')
host = line[-1]
uname, pwd = line[0].split(':')
sql = 'SELECT F.LOGICAL_FILE_NAME FROM cms_dbs3_int_global_owner.FILES F WHERE F.IS_FILE_VALID <> -1 AND F.LOGICAL_FILE_NAME = :logical_file_name'
lfn = '/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root'
params = {'logical_file_name': lfn}
test_cx_oracle(uname, pwd, host, sql, params)
params = {'logical_file_name': u'%s' % lfn}
test_cx_oracle(uname, pwd, host, sql, params)
if __name__ == '__main__':
main()
The output of this program is the following:
lfn type=<type 'str'>, cursor execute 0.0578980445862 sec
('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',)
lfn type=<type 'unicode'>, cursor execute 144.633666992 sec
('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',)
As you can see using string parameter we quickly get results from Oracle
in 0.06 sec, while using unicode we see huge latency 144sec.
Any suggestions why unicode parameters such poorly behaves and where the time is spent on?
Thanks,
Valentin.
|
|
From: Anthony T. <ant...@gm...> - 2016-06-03 14:59:37
|
Hi Valentin,
I see this is Python 2.7. For that version, cx_Oracle assumes that unicode
data implies your desire to bind as NCHAR data. If the data in the database
is VARCHAR2, though, and indexed that way, the index will be bypassed and a
full table scan will take place instead. You can see what is going on here:
conn = cx_Oracle.Connection("user/pw@tns")
cursor = conn.cursor()
stringVar = cursor.var(str)
stringVar.setvalue(0, u"Testing")
print stringVar
This will show you the result <cx_Oracle.STRING with value 'Testing'>
ncharVar = cursor.var(unicode)
ncharVar.setvalue(0, u"Testing")
print ncharVar
This will show you the result <cx_Oracle.NCHAR with value u'Testing'>
Note the difference between the two. They behave differently when bound.
So if you want to pass unicode data in Python 2.7 you need to use
setinputsizes() to indicate that this is a STRING value and not an NCHAR
value.
Note that this changes for Python 3.x. Since all strings are unicode
strings, the default binding is STRING and you need to use setinputsizes()
to indicate your desire to use NCHAR instead. I decided it would be far too
disruptive to change the Python 2.7 behaviour to match the Python 3.x
behaviour -- but it was not an easy call to make!
Hope that explains things!
Anthony
On Fri, Jun 3, 2016 at 8:39 AM, Valentin Kuznetsov <vk...@gm...>
wrote:
> Hi,
> I identified a problem with cx_Oracle driver and would like to understand
> how
> it can be resolved. Basically, I observed a huge latencies with query if
> I pass unicode string as binded parameter value. Below a stand-alone
> code which reproduce a problem with our Oracle.
>
> We run CERN Scientific Linux distribution (SLC 6.8), the code runs on
> x86_64 platform with gcc493 and cx_Oracle 5.2.1
>
> import cx_Oracle
> def test_cx_oracle(uname, pwd, host, sql, params):
> time0 = time.time()
> connection = cx_Oracle.connect(uname, pwd, host)
> cursor = connection.cursor()
> cursor.execute(sql, **params)
> lfn = params['logical_file_name']
> print("lfn type=%s, cursor execute %s sec" % (type(lfn),
> time.time()-time0))
> for row in cursor:
> print row
> cursor.close()
>
> def main():
> # read password from a file
> ppp = open('ppp').readline().replace('\n','').replace('oracle://', '')
> line = ppp.split('@')
> host = line[-1]
> uname, pwd = line[0].split(':')
> sql = 'SELECT F.LOGICAL_FILE_NAME FROM
> cms_dbs3_int_global_owner.FILES F WHERE F.IS_FILE_VALID <> -1 AND
> F.LOGICAL_FILE_NAME = :logical_file_name'
> lfn =
> '/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root'
> params = {'logical_file_name': lfn}
> test_cx_oracle(uname, pwd, host, sql, params)
> params = {'logical_file_name': u'%s' % lfn}
> test_cx_oracle(uname, pwd, host, sql, params)
>
> if __name__ == '__main__':
> main()
>
> The output of this program is the following:
>
> lfn type=<type 'str'>, cursor execute 0.0578980445862 sec
>
> ('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',)
>
> lfn type=<type 'unicode'>, cursor execute 144.633666992 sec
>
> ('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',)
>
> As you can see using string parameter we quickly get results from Oracle
> in 0.06 sec, while using unicode we see huge latency 144sec.
>
> Any suggestions why unicode parameters such poorly behaves and where the
> time is spent on?
>
> Thanks,
> Valentin.
>
>
>
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> _______________________________________________
> cx-oracle-users mailing list
> cx-...@li...
> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users
>
|
|
From: Amaury F. d'A. <ama...@gm...> - 2016-06-03 15:00:37
|
Hi,
There is probably an index on this column, and for some reason it is not
used when the data is unicode (= NVARCHAR on server-side).
My guess is that since the "=" comparison receives different types,
Oracle needs to do a conversion before it can compare the operands.
By default this conversion cannot be unicode->str (because this would lose
information),
so it has to be str->unicode for each value in the table.
In this case the index cannot be used...
My suggestion is force the unicode->str comparison somehow:
- either in Python, with .encode(), this will crash if the value has
non-ascii characters
- or in SQL, with F.LOGICAL_FILE_NAME = TO_CHAR(:logical_file_name), this
can return wrong results,
because the conversion can lose characters and collide with other existing
data.
2016-06-03 16:39 GMT+02:00 Valentin Kuznetsov <vk...@gm...>:
> Hi,
> I identified a problem with cx_Oracle driver and would like to understand
> how
> it can be resolved. Basically, I observed a huge latencies with query if
> I pass unicode string as binded parameter value. Below a stand-alone
> code which reproduce a problem with our Oracle.
>
> We run CERN Scientific Linux distribution (SLC 6.8), the code runs on
> x86_64 platform with gcc493 and cx_Oracle 5.2.1
>
> import cx_Oracle
> def test_cx_oracle(uname, pwd, host, sql, params):
> time0 = time.time()
> connection = cx_Oracle.connect(uname, pwd, host)
> cursor = connection.cursor()
> cursor.execute(sql, **params)
> lfn = params['logical_file_name']
> print("lfn type=%s, cursor execute %s sec" % (type(lfn),
> time.time()-time0))
> for row in cursor:
> print row
> cursor.close()
>
> def main():
> # read password from a file
> ppp = open('ppp').readline().replace('\n','').replace('oracle://', '')
> line = ppp.split('@')
> host = line[-1]
> uname, pwd = line[0].split(':')
> sql = 'SELECT F.LOGICAL_FILE_NAME FROM
> cms_dbs3_int_global_owner.FILES F WHERE F.IS_FILE_VALID <> -1 AND
> F.LOGICAL_FILE_NAME = :logical_file_name'
> lfn =
> '/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root'
> params = {'logical_file_name': lfn}
> test_cx_oracle(uname, pwd, host, sql, params)
> params = {'logical_file_name': u'%s' % lfn}
> test_cx_oracle(uname, pwd, host, sql, params)
>
> if __name__ == '__main__':
> main()
>
> The output of this program is the following:
>
> lfn type=<type 'str'>, cursor execute 0.0578980445862 sec
>
> ('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',)
>
> lfn type=<type 'unicode'>, cursor execute 144.633666992 sec
>
> ('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',)
>
> As you can see using string parameter we quickly get results from Oracle
> in 0.06 sec, while using unicode we see huge latency 144sec.
>
> Any suggestions why unicode parameters such poorly behaves and where the
> time is spent on?
>
> Thanks,
> Valentin.
>
>
>
>
>
> ------------------------------------------------------------------------------
> What NetFlow Analyzer can do for you? Monitors network bandwidth and
> traffic
> patterns at an interface-level. Reveals which users, apps, and protocols
> are
> consuming the most bandwidth. Provides multi-vendor support for NetFlow,
> J-Flow, sFlow and other flows. Make informed decisions using capacity
> planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
> _______________________________________________
> cx-oracle-users mailing list
> cx-...@li...
> https://lists.sourceforge.net/lists/listinfo/cx-oracle-users
>
--
Amaury Forgeot d'Arc
|