[Jython-users] Performance of urlparse.unquote / string slicing

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi all,
I have been investigating a performance issue with a large data chunk 
being sent via an http POST request. The outcome of which is that the 
bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML, 
which takes in the order of 10 mins to run through urlparse.unquote(). 
This occurs in seconds in the cPython implementation.

The performance difference seems to come down to slicing performance...For 
reference the unquote code is included below
def unquote(s):
    """unquote('abc%20def') -> 'abc def'."""
    res = s.split('%')
    # fastpath
    if len(res) == 1:
        return s
    s = res[0]
    for item in res[1:]:
        try:
            s += _hextochr[item[:2]] + item[2:]
        except KeyError:
            s += '%' + item
        except UnicodeDecodeError:
            s += unichr(int(item[:2], 16)) + item[2:]
    return s

To investigate the slicing performance I ran some simple test code through 
jython and python, timing is very adhoc (no efforts made to ensure same 
level of background activity etc, but the magnitude of the difference is 
large (3 times or more), and it is even more obvious when running the full 
unquote code.

Is this the expected performance of jython slicing, or is there something 
that can be improved?

Regards,
Toby

The test code 
======
import time
for x in range(4,7):
    start = time.time()
    print x
    len([b[:2] + b[2:] for b in  [u'%3casdf%3effff']*10**x])
    print time.time() - start
=======
results
$ python /tmp/a.py 
4
0.00148105621338
5
0.0163309574127
6
0.152363061905

$ jython /tmp/a.py 
4
0.0190000534058
5
0.0709998607635
6
0.469000101089