|
From: Toby C. <Tob...@cr...> - 2015-12-10 03:43:03
|
Hi all,
I have been investigating a performance issue with a large data chunk
being sent via an http POST request. The outcome of which is that the
bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML,
which takes in the order of 10 mins to run through urlparse.unquote().
This occurs in seconds in the cPython implementation.
The performance difference seems to come down to slicing performance...For
reference the unquote code is included below
def unquote(s):
"""unquote('abc%20def') -> 'abc def'."""
res = s.split('%')
# fastpath
if len(res) == 1:
return s
s = res[0]
for item in res[1:]:
try:
s += _hextochr[item[:2]] + item[2:]
except KeyError:
s += '%' + item
except UnicodeDecodeError:
s += unichr(int(item[:2], 16)) + item[2:]
return s
To investigate the slicing performance I ran some simple test code through
jython and python, timing is very adhoc (no efforts made to ensure same
level of background activity etc, but the magnitude of the difference is
large (3 times or more), and it is even more obvious when running the full
unquote code.
Is this the expected performance of jython slicing, or is there something
that can be improved?
Regards,
Toby
The test code
======
import time
for x in range(4,7):
start = time.time()
print x
len([b[:2] + b[2:] for b in [u'%3casdf%3effff']*10**x])
print time.time() - start
=======
results
$ python /tmp/a.py
4
0.00148105621338
5
0.0163309574127
6
0.152363061905
$ jython /tmp/a.py
4
0.0190000534058
5
0.0709998607635
6
0.469000101089
|