|
From: Jim B. <jim...@py...> - 2015-12-10 21:43:05
|
Oliver,
That's a good suggestion and likely matches Toby's use case, although it
does return a unicode string, not a str. (But if ascii, just wrap with a
str(...)). Re-running using URLDecoder.decode I get performance comparable
to CPython.
However, escapes %80 through %FF are decoded differently, so it's not a
replacement we can just use in Jython itself:
- With urllib.unquote, we get \x80 through \xFF (high bytes)
- With URLDecoder.decode, we get Unicode Character 'REPLACEMENT
CHARACTER' (U+FFFD), which is rendered as �
- Jim
On Thu, Dec 10, 2015 at 10:17 AM, olivier merlin <ome...@gm...>
wrote:
> As jython is java why not using
> String result = java.net.URLDecoder.decode(url, "UTF-8");
> In your code ?
>
>
> Le jeu. 10 déc. 2015 04:44, Toby Collett <Tob...@cr...> a
> écrit :
>
>> Hi all,
>> I have been investigating a performance issue with a large data chunk
>> being sent via an http POST request. The outcome of which is that the
>> bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML,
>> which takes in the order of 10 mins to run through urlparse.unquote(). This
>> occurs in seconds in the cPython implementation.
>>
>> The performance difference seems to come down to slicing
>> performance...For reference the unquote code is included below
>> def unquote(s):
>> """unquote('abc%20def') -> 'abc def'."""
>> res = s.split('%')
>> # fastpath
>> if len(res) == 1:
>> return s
>> s = res[0]
>> for item in res[1:]:
>> try:
>> s += _hextochr[item[:2]] + item[2:]
>> except KeyError:
>> s += '%' + item
>> except UnicodeDecodeError:
>> s += unichr(int(item[:2], 16)) + item[2:]
>> return s
>>
>> To investigate the slicing performance I ran some simple test code
>> through jython and python, timing is very adhoc (no efforts made to ensure
>> same level of background activity etc, but the magnitude of the difference
>> is large (3 times or more), and it is even more obvious when running the
>> full unquote code.
>>
>> Is this the expected performance of jython slicing, or is there something
>> that can be improved?
>>
>> Regards,
>> Toby
>>
>> The test code
>> ======
>> import time
>> for x in range(4,7):
>> start = time.time()
>> print x
>> len([b[:2] + b[2:] for b in [u'%3casdf%3effff']*10**x])
>> print time.time() - start
>> =======
>> results
>> $ python /tmp/a.py
>> 4
>> 0.00148105621338
>> 5
>> 0.0163309574127
>> 6
>> 0.152363061905
>>
>> $ jython /tmp/a.py
>> 4
>> 0.0190000534058
>> 5
>> 0.0709998607635
>> 6
>> 0.469000101089
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Jython-users mailing list
>> Jyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/jython-users
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Jython-users mailing list
> Jyt...@li...
> https://lists.sourceforge.net/lists/listinfo/jython-users
>
>
|