Re: [Jython-users] Performance of urlparse.unquote / string slicing

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Since it's an easy fix, I made the change. It will be part of 2.7.1 beta 3.
See https://hg.python.org/jython/rev/23f2c16b9dc7

I haven't done a detailed performance analysis, since going from quadratic
to linear is such an obvious improvement. But eyeballing it, Jython 2.7.1
on urllib.unquote is now twice as fast on my laptop as CPython 2.7.10.

- Jim

On Thu, Dec 10, 2015 at 2:42 PM, Jim Baker <jim...@py...> wrote:

> Oliver,
>
> That's a good suggestion and likely matches Toby's use case, although it
> does return a unicode string, not a str. (But if ascii, just wrap with a
> str(...)). Re-running using URLDecoder.decode I get performance comparable
> to CPython.
>
> However, escapes %80 through %FF are decoded differently, so it's not a
> replacement we can just use in Jython itself:
>
>    - With urllib.unquote, we get \x80 through \xFF (high bytes)
>    - With URLDecoder.decode, we get Unicode Character 'REPLACEMENT
>    CHARACTER' (U+FFFD), which is rendered as �
>
> - Jim
>
> On Thu, Dec 10, 2015 at 10:17 AM, olivier merlin <ome...@gm...>
> wrote:
>
>> As jython is java why not using
>> String result = java.net.URLDecoder.decode(url, "UTF-8");
>> In your code ?
>>
>>
>> Le jeu. 10 déc. 2015 04:44, Toby Collett <Tob...@cr...> a
>> écrit :
>>
>>> Hi all,
>>> I have been investigating a performance issue with a large data chunk
>>> being sent via an http POST request. The outcome of which is that the
>>> bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML,
>>> which takes in the order of 10 mins to run through urlparse.unquote(). This
>>> occurs in seconds in the cPython implementation.
>>>
>>> The performance difference seems to come down to slicing
>>> performance...For reference the unquote code is included below
>>> def unquote(s):
>>>     """unquote('abc%20def') -> 'abc def'."""
>>>     res = s.split('%')
>>>     # fastpath
>>>     if len(res) == 1:
>>>         return s
>>>     s = res[0]
>>>     for item in res[1:]:
>>>         try:
>>>             s += _hextochr[item[:2]] + item[2:]
>>>         except KeyError:
>>>             s += '%' + item
>>>         except UnicodeDecodeError:
>>>             s += unichr(int(item[:2], 16)) + item[2:]
>>>     return s
>>>
>>> To investigate the slicing performance I ran some simple test code
>>> through jython and python, timing is very adhoc (no efforts made to ensure
>>> same level of background activity etc, but the magnitude of the difference
>>> is large (3 times or more), and it is even more obvious when running the
>>> full unquote code.
>>>
>>> Is this the expected performance of jython slicing, or is there
>>> something that can be improved?
>>>
>>> Regards,
>>> Toby
>>>
>>> The test code
>>> ======
>>> import time
>>> for x in range(4,7):
>>>     start = time.time()
>>>     print x
>>>     len([b[:2] + b[2:] for b in  [u'%3casdf%3effff']*10**x])
>>>     print time.time() - start
>>> =======
>>> results
>>> $ python /tmp/a.py
>>> 4
>>> 0.00148105621338
>>> 5
>>> 0.0163309574127
>>> 6
>>> 0.152363061905
>>>
>>> $ jython /tmp/a.py
>>> 4
>>> 0.0190000534058
>>> 5
>>> 0.0709998607635
>>> 6
>>> 0.469000101089
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Jython-users mailing list
>>> Jyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/jython-users
>>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Jython-users mailing list
>> Jyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/jython-users
>>
>>
>