Re: [Jython-users] Performance of urlparse.unquote / string slicing

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Oliver,

That's a good suggestion and likely matches Toby's use case, although it
does return a unicode string, not a str. (But if ascii, just wrap with a
str(...)). Re-running using URLDecoder.decode I get performance comparable
to CPython.

However, escapes %80 through %FF are decoded differently, so it's not a
replacement we can just use in Jython itself:

   - With urllib.unquote, we get \x80 through \xFF (high bytes)
   - With URLDecoder.decode, we get Unicode Character 'REPLACEMENT
   CHARACTER' (U+FFFD), which is rendered as �

- Jim

On Thu, Dec 10, 2015 at 10:17 AM, olivier merlin <ome...@gm...>
wrote:

> As jython is java why not using
> String result = java.net.URLDecoder.decode(url, "UTF-8");
> In your code ?
>
>
> Le jeu. 10 déc. 2015 04:44, Toby Collett <Tob...@cr...> a
> écrit :
>
>> Hi all,
>> I have been investigating a performance issue with a large data chunk
>> being sent via an http POST request. The outcome of which is that the
>> bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML,
>> which takes in the order of 10 mins to run through urlparse.unquote(). This
>> occurs in seconds in the cPython implementation.
>>
>> The performance difference seems to come down to slicing
>> performance...For reference the unquote code is included below
>> def unquote(s):
>>     """unquote('abc%20def') -> 'abc def'."""
>>     res = s.split('%')
>>     # fastpath
>>     if len(res) == 1:
>>         return s
>>     s = res[0]
>>     for item in res[1:]:
>>         try:
>>             s += _hextochr[item[:2]] + item[2:]
>>         except KeyError:
>>             s += '%' + item
>>         except UnicodeDecodeError:
>>             s += unichr(int(item[:2], 16)) + item[2:]
>>     return s
>>
>> To investigate the slicing performance I ran some simple test code
>> through jython and python, timing is very adhoc (no efforts made to ensure
>> same level of background activity etc, but the magnitude of the difference
>> is large (3 times or more), and it is even more obvious when running the
>> full unquote code.
>>
>> Is this the expected performance of jython slicing, or is there something
>> that can be improved?
>>
>> Regards,
>> Toby
>>
>> The test code
>> ======
>> import time
>> for x in range(4,7):
>>     start = time.time()
>>     print x
>>     len([b[:2] + b[2:] for b in  [u'%3casdf%3effff']*10**x])
>>     print time.time() - start
>> =======
>> results
>> $ python /tmp/a.py
>> 4
>> 0.00148105621338
>> 5
>> 0.0163309574127
>> 6
>> 0.152363061905
>>
>> $ jython /tmp/a.py
>> 4
>> 0.0190000534058
>> 5
>> 0.0709998607635
>> 6
>> 0.469000101089
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Jython-users mailing list
>> Jyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/jython-users
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Jython-users mailing list
> Jyt...@li...
> https://lists.sourceforge.net/lists/listinfo/jython-users
>
>