|
From: Toby C. <Tob...@cr...> - 2015-12-11 01:50:25
|
Thanks for the quick follow up. My apologies for not including the version, I am working with 2.7.0, and also had tried the 2.7.1 beta 2. Java build is openjdk version "1.8.0_45-internal" I am using django in the background, so cannot directly modify the call to decode, however the following inspired by Oliver has solved my immediate issue. import urlparse import java.net.URLDecoder urlparse.unquote = lambda s: java.net.URLDecoder.decode(s, "UTF-8") I look forward to 2.7.1 being available next year looks like some great work has been going into it, Regards, Toby From: Jim Baker <jim...@py...> To: olivier merlin <ome...@gm...>, Cc: "jyt...@li..." <jyt...@li...>, Toby Collett <Tob...@cr...> Date: 11/12/2015 11:32 a.m. Subject: Re: [Jython-users] Performance of urlparse.unquote / string slicing Since it's an easy fix, I made the change. It will be part of 2.7.1 beta 3. See https://hg.python.org/jython/rev/23f2c16b9dc7 I haven't done a detailed performance analysis, since going from quadratic to linear is such an obvious improvement. But eyeballing it, Jython 2.7.1 on urllib.unquote is now twice as fast on my laptop as CPython 2.7.10. - Jim On Thu, Dec 10, 2015 at 2:42 PM, Jim Baker <jim...@py...> wrote: Oliver, That's a good suggestion and likely matches Toby's use case, although it does return a unicode string, not a str. (But if ascii, just wrap with a str(...)). Re-running using URLDecoder.decode I get performance comparable to CPython. However, escapes %80 through %FF are decoded differently, so it's not a replacement we can just use in Jython itself: With urllib.unquote, we get \x80 through \xFF (high bytes) With URLDecoder.decode, we get Unicode Character 'REPLACEMENT CHARACTER' (U+FFFD), which is rendered as � - Jim On Thu, Dec 10, 2015 at 10:17 AM, olivier merlin <ome...@gm...> wrote: As jython is java why not using String result = java.net.URLDecoder.decode(url, "UTF-8"); In your code ? Le jeu. 10 déc. 2015 04:44, Toby Collett <Tob...@cr...> a écrit : Hi all, I have been investigating a performance issue with a large data chunk being sent via an http POST request. The outcome of which is that the bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML, which takes in the order of 10 mins to run through urlparse.unquote(). This occurs in seconds in the cPython implementation. The performance difference seems to come down to slicing performance...For reference the unquote code is included below def unquote(s): """unquote('abc%20def') -> 'abc def'.""" res = s.split('%') # fastpath if len(res) == 1: return s s = res[0] for item in res[1:]: try: s += _hextochr[item[:2]] + item[2:] except KeyError: s += '%' + item except UnicodeDecodeError: s += unichr(int(item[:2], 16)) + item[2:] return s To investigate the slicing performance I ran some simple test code through jython and python, timing is very adhoc (no efforts made to ensure same level of background activity etc, but the magnitude of the difference is large (3 times or more), and it is even more obvious when running the full unquote code. Is this the expected performance of jython slicing, or is there something that can be improved? Regards, Toby The test code ====== import time for x in range(4,7): start = time.time() print x len([b[:2] + b[2:] for b in [u'%3casdf%3effff']*10**x]) print time.time() - start ======= results $ python /tmp/a.py 4 0.00148105621338 5 0.0163309574127 6 0.152363061905 $ jython /tmp/a.py 4 0.0190000534058 5 0.0709998607635 6 0.469000101089 ------------------------------------------------------------------------------ _______________________________________________ Jython-users mailing list Jyt...@li... https://lists.sourceforge.net/lists/listinfo/jython-users ------------------------------------------------------------------------------ _______________________________________________ Jython-users mailing list Jyt...@li... https://lists.sourceforge.net/lists/listinfo/jython-users ------------------------------------------------------------------------------ _______________________________________________ Jython-users mailing list Jyt...@li... https://lists.sourceforge.net/lists/listinfo/jython-users |