Thread: [cx-oracle-users] issue with Unicode parameters
Brought to you by:
atuining
From: Valentin K. <vk...@gm...> - 2016-06-03 14:39:51
|
Hi, I identified a problem with cx_Oracle driver and would like to understand how it can be resolved. Basically, I observed a huge latencies with query if I pass unicode string as binded parameter value. Below a stand-alone code which reproduce a problem with our Oracle. We run CERN Scientific Linux distribution (SLC 6.8), the code runs on x86_64 platform with gcc493 and cx_Oracle 5.2.1 import cx_Oracle def test_cx_oracle(uname, pwd, host, sql, params): time0 = time.time() connection = cx_Oracle.connect(uname, pwd, host) cursor = connection.cursor() cursor.execute(sql, **params) lfn = params['logical_file_name'] print("lfn type=%s, cursor execute %s sec" % (type(lfn), time.time()-time0)) for row in cursor: print row cursor.close() def main(): # read password from a file ppp = open('ppp').readline().replace('\n','').replace('oracle://', '') line = ppp.split('@') host = line[-1] uname, pwd = line[0].split(':') sql = 'SELECT F.LOGICAL_FILE_NAME FROM cms_dbs3_int_global_owner.FILES F WHERE F.IS_FILE_VALID <> -1 AND F.LOGICAL_FILE_NAME = :logical_file_name' lfn = '/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root' params = {'logical_file_name': lfn} test_cx_oracle(uname, pwd, host, sql, params) params = {'logical_file_name': u'%s' % lfn} test_cx_oracle(uname, pwd, host, sql, params) if __name__ == '__main__': main() The output of this program is the following: lfn type=<type 'str'>, cursor execute 0.0578980445862 sec ('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',) lfn type=<type 'unicode'>, cursor execute 144.633666992 sec ('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',) As you can see using string parameter we quickly get results from Oracle in 0.06 sec, while using unicode we see huge latency 144sec. Any suggestions why unicode parameters such poorly behaves and where the time is spent on? Thanks, Valentin. |
From: Anthony T. <ant...@gm...> - 2016-06-03 14:59:37
|
Hi Valentin, I see this is Python 2.7. For that version, cx_Oracle assumes that unicode data implies your desire to bind as NCHAR data. If the data in the database is VARCHAR2, though, and indexed that way, the index will be bypassed and a full table scan will take place instead. You can see what is going on here: conn = cx_Oracle.Connection("user/pw@tns") cursor = conn.cursor() stringVar = cursor.var(str) stringVar.setvalue(0, u"Testing") print stringVar This will show you the result <cx_Oracle.STRING with value 'Testing'> ncharVar = cursor.var(unicode) ncharVar.setvalue(0, u"Testing") print ncharVar This will show you the result <cx_Oracle.NCHAR with value u'Testing'> Note the difference between the two. They behave differently when bound. So if you want to pass unicode data in Python 2.7 you need to use setinputsizes() to indicate that this is a STRING value and not an NCHAR value. Note that this changes for Python 3.x. Since all strings are unicode strings, the default binding is STRING and you need to use setinputsizes() to indicate your desire to use NCHAR instead. I decided it would be far too disruptive to change the Python 2.7 behaviour to match the Python 3.x behaviour -- but it was not an easy call to make! Hope that explains things! Anthony On Fri, Jun 3, 2016 at 8:39 AM, Valentin Kuznetsov <vk...@gm...> wrote: > Hi, > I identified a problem with cx_Oracle driver and would like to understand > how > it can be resolved. Basically, I observed a huge latencies with query if > I pass unicode string as binded parameter value. Below a stand-alone > code which reproduce a problem with our Oracle. > > We run CERN Scientific Linux distribution (SLC 6.8), the code runs on > x86_64 platform with gcc493 and cx_Oracle 5.2.1 > > import cx_Oracle > def test_cx_oracle(uname, pwd, host, sql, params): > time0 = time.time() > connection = cx_Oracle.connect(uname, pwd, host) > cursor = connection.cursor() > cursor.execute(sql, **params) > lfn = params['logical_file_name'] > print("lfn type=%s, cursor execute %s sec" % (type(lfn), > time.time()-time0)) > for row in cursor: > print row > cursor.close() > > def main(): > # read password from a file > ppp = open('ppp').readline().replace('\n','').replace('oracle://', '') > line = ppp.split('@') > host = line[-1] > uname, pwd = line[0].split(':') > sql = 'SELECT F.LOGICAL_FILE_NAME FROM > cms_dbs3_int_global_owner.FILES F WHERE F.IS_FILE_VALID <> -1 AND > F.LOGICAL_FILE_NAME = :logical_file_name' > lfn = > '/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root' > params = {'logical_file_name': lfn} > test_cx_oracle(uname, pwd, host, sql, params) > params = {'logical_file_name': u'%s' % lfn} > test_cx_oracle(uname, pwd, host, sql, params) > > if __name__ == '__main__': > main() > > The output of this program is the following: > > lfn type=<type 'str'>, cursor execute 0.0578980445862 sec > > ('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',) > > lfn type=<type 'unicode'>, cursor execute 144.633666992 sec > > ('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',) > > As you can see using string parameter we quickly get results from Oracle > in 0.06 sec, while using unicode we see huge latency 144sec. > > Any suggestions why unicode parameters such poorly behaves and where the > time is spent on? > > Thanks, > Valentin. > > > > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users > |
From: Amaury F. d'A. <ama...@gm...> - 2016-06-03 15:00:37
|
Hi, There is probably an index on this column, and for some reason it is not used when the data is unicode (= NVARCHAR on server-side). My guess is that since the "=" comparison receives different types, Oracle needs to do a conversion before it can compare the operands. By default this conversion cannot be unicode->str (because this would lose information), so it has to be str->unicode for each value in the table. In this case the index cannot be used... My suggestion is force the unicode->str comparison somehow: - either in Python, with .encode(), this will crash if the value has non-ascii characters - or in SQL, with F.LOGICAL_FILE_NAME = TO_CHAR(:logical_file_name), this can return wrong results, because the conversion can lose characters and collide with other existing data. 2016-06-03 16:39 GMT+02:00 Valentin Kuznetsov <vk...@gm...>: > Hi, > I identified a problem with cx_Oracle driver and would like to understand > how > it can be resolved. Basically, I observed a huge latencies with query if > I pass unicode string as binded parameter value. Below a stand-alone > code which reproduce a problem with our Oracle. > > We run CERN Scientific Linux distribution (SLC 6.8), the code runs on > x86_64 platform with gcc493 and cx_Oracle 5.2.1 > > import cx_Oracle > def test_cx_oracle(uname, pwd, host, sql, params): > time0 = time.time() > connection = cx_Oracle.connect(uname, pwd, host) > cursor = connection.cursor() > cursor.execute(sql, **params) > lfn = params['logical_file_name'] > print("lfn type=%s, cursor execute %s sec" % (type(lfn), > time.time()-time0)) > for row in cursor: > print row > cursor.close() > > def main(): > # read password from a file > ppp = open('ppp').readline().replace('\n','').replace('oracle://', '') > line = ppp.split('@') > host = line[-1] > uname, pwd = line[0].split(':') > sql = 'SELECT F.LOGICAL_FILE_NAME FROM > cms_dbs3_int_global_owner.FILES F WHERE F.IS_FILE_VALID <> -1 AND > F.LOGICAL_FILE_NAME = :logical_file_name' > lfn = > '/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root' > params = {'logical_file_name': lfn} > test_cx_oracle(uname, pwd, host, sql, params) > params = {'logical_file_name': u'%s' % lfn} > test_cx_oracle(uname, pwd, host, sql, params) > > if __name__ == '__main__': > main() > > The output of this program is the following: > > lfn type=<type 'str'>, cursor execute 0.0578980445862 sec > > ('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',) > > lfn type=<type 'unicode'>, cursor execute 144.633666992 sec > > ('/store/generator/Summer11/ZZTo4e_7TeV_mll8_mZZ95-160-powheg15-pythia6/GEN/START311_V2-v2/00000/BA1D37D5-93EE-E211-ADCF-003048F174AC.root',) > > As you can see using string parameter we quickly get results from Oracle > in 0.06 sec, while using unicode we see huge latency 144sec. > > Any suggestions why unicode parameters such poorly behaves and where the > time is spent on? > > Thanks, > Valentin. > > > > > > ------------------------------------------------------------------------------ > What NetFlow Analyzer can do for you? Monitors network bandwidth and > traffic > patterns at an interface-level. Reveals which users, apps, and protocols > are > consuming the most bandwidth. Provides multi-vendor support for NetFlow, > J-Flow, sFlow and other flows. Make informed decisions using capacity > planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e > _______________________________________________ > cx-oracle-users mailing list > cx-...@li... > https://lists.sourceforge.net/lists/listinfo/cx-oracle-users > -- Amaury Forgeot d'Arc |