|
From: Toby C. <Tob...@cr...> - 2015-12-10 03:43:03
|
Hi all,
I have been investigating a performance issue with a large data chunk
being sent via an http POST request. The outcome of which is that the
bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML,
which takes in the order of 10 mins to run through urlparse.unquote().
This occurs in seconds in the cPython implementation.
The performance difference seems to come down to slicing performance...For
reference the unquote code is included below
def unquote(s):
"""unquote('abc%20def') -> 'abc def'."""
res = s.split('%')
# fastpath
if len(res) == 1:
return s
s = res[0]
for item in res[1:]:
try:
s += _hextochr[item[:2]] + item[2:]
except KeyError:
s += '%' + item
except UnicodeDecodeError:
s += unichr(int(item[:2], 16)) + item[2:]
return s
To investigate the slicing performance I ran some simple test code through
jython and python, timing is very adhoc (no efforts made to ensure same
level of background activity etc, but the magnitude of the difference is
large (3 times or more), and it is even more obvious when running the full
unquote code.
Is this the expected performance of jython slicing, or is there something
that can be improved?
Regards,
Toby
The test code
======
import time
for x in range(4,7):
start = time.time()
print x
len([b[:2] + b[2:] for b in [u'%3casdf%3effff']*10**x])
print time.time() - start
=======
results
$ python /tmp/a.py
4
0.00148105621338
5
0.0163309574127
6
0.152363061905
$ jython /tmp/a.py
4
0.0190000534058
5
0.0709998607635
6
0.469000101089
|
|
From: olivier m. <ome...@gm...> - 2015-12-10 17:17:23
|
As jython is java why not using
String result = java.net.URLDecoder.decode(url, "UTF-8");
In your code ?
Le jeu. 10 déc. 2015 04:44, Toby Collett <Tob...@cr...> a écrit :
> Hi all,
> I have been investigating a performance issue with a large data chunk
> being sent via an http POST request. The outcome of which is that the
> bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML,
> which takes in the order of 10 mins to run through urlparse.unquote(). This
> occurs in seconds in the cPython implementation.
>
> The performance difference seems to come down to slicing performance...For
> reference the unquote code is included below
> def unquote(s):
> """unquote('abc%20def') -> 'abc def'."""
> res = s.split('%')
> # fastpath
> if len(res) == 1:
> return s
> s = res[0]
> for item in res[1:]:
> try:
> s += _hextochr[item[:2]] + item[2:]
> except KeyError:
> s += '%' + item
> except UnicodeDecodeError:
> s += unichr(int(item[:2], 16)) + item[2:]
> return s
>
> To investigate the slicing performance I ran some simple test code through
> jython and python, timing is very adhoc (no efforts made to ensure same
> level of background activity etc, but the magnitude of the difference is
> large (3 times or more), and it is even more obvious when running the full
> unquote code.
>
> Is this the expected performance of jython slicing, or is there something
> that can be improved?
>
> Regards,
> Toby
>
> The test code
> ======
> import time
> for x in range(4,7):
> start = time.time()
> print x
> len([b[:2] + b[2:] for b in [u'%3casdf%3effff']*10**x])
> print time.time() - start
> =======
> results
> $ python /tmp/a.py
> 4
> 0.00148105621338
> 5
> 0.0163309574127
> 6
> 0.152363061905
>
> $ jython /tmp/a.py
> 4
> 0.0190000534058
> 5
> 0.0709998607635
> 6
> 0.469000101089
>
>
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Jython-users mailing list
> Jyt...@li...
> https://lists.sourceforge.net/lists/listinfo/jython-users
>
|
|
From: Jim B. <jim...@py...> - 2015-12-10 22:30:43
|
Since it's an easy fix, I made the change. It will be part of 2.7.1 beta 3. See https://hg.python.org/jython/rev/23f2c16b9dc7 I haven't done a detailed performance analysis, since going from quadratic to linear is such an obvious improvement. But eyeballing it, Jython 2.7.1 on urllib.unquote is now twice as fast on my laptop as CPython 2.7.10. - Jim On Thu, Dec 10, 2015 at 2:42 PM, Jim Baker <jim...@py...> wrote: > Oliver, > > That's a good suggestion and likely matches Toby's use case, although it > does return a unicode string, not a str. (But if ascii, just wrap with a > str(...)). Re-running using URLDecoder.decode I get performance comparable > to CPython. > > However, escapes %80 through %FF are decoded differently, so it's not a > replacement we can just use in Jython itself: > > - With urllib.unquote, we get \x80 through \xFF (high bytes) > - With URLDecoder.decode, we get Unicode Character 'REPLACEMENT > CHARACTER' (U+FFFD), which is rendered as � > > - Jim > > On Thu, Dec 10, 2015 at 10:17 AM, olivier merlin <ome...@gm...> > wrote: > >> As jython is java why not using >> String result = java.net.URLDecoder.decode(url, "UTF-8"); >> In your code ? >> >> >> Le jeu. 10 déc. 2015 04:44, Toby Collett <Tob...@cr...> a >> écrit : >> >>> Hi all, >>> I have been investigating a performance issue with a large data chunk >>> being sent via an http POST request. The outcome of which is that the >>> bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML, >>> which takes in the order of 10 mins to run through urlparse.unquote(). This >>> occurs in seconds in the cPython implementation. >>> >>> The performance difference seems to come down to slicing >>> performance...For reference the unquote code is included below >>> def unquote(s): >>> """unquote('abc%20def') -> 'abc def'.""" >>> res = s.split('%') >>> # fastpath >>> if len(res) == 1: >>> return s >>> s = res[0] >>> for item in res[1:]: >>> try: >>> s += _hextochr[item[:2]] + item[2:] >>> except KeyError: >>> s += '%' + item >>> except UnicodeDecodeError: >>> s += unichr(int(item[:2], 16)) + item[2:] >>> return s >>> >>> To investigate the slicing performance I ran some simple test code >>> through jython and python, timing is very adhoc (no efforts made to ensure >>> same level of background activity etc, but the magnitude of the difference >>> is large (3 times or more), and it is even more obvious when running the >>> full unquote code. >>> >>> Is this the expected performance of jython slicing, or is there >>> something that can be improved? >>> >>> Regards, >>> Toby >>> >>> The test code >>> ====== >>> import time >>> for x in range(4,7): >>> start = time.time() >>> print x >>> len([b[:2] + b[2:] for b in [u'%3casdf%3effff']*10**x]) >>> print time.time() - start >>> ======= >>> results >>> $ python /tmp/a.py >>> 4 >>> 0.00148105621338 >>> 5 >>> 0.0163309574127 >>> 6 >>> 0.152363061905 >>> >>> $ jython /tmp/a.py >>> 4 >>> 0.0190000534058 >>> 5 >>> 0.0709998607635 >>> 6 >>> 0.469000101089 >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Jython-users mailing list >>> Jyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/jython-users >>> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Jython-users mailing list >> Jyt...@li... >> https://lists.sourceforge.net/lists/listinfo/jython-users >> >> > |
|
From: Jim B. <jim...@py...> - 2015-12-10 20:02:41
|
Toby, What I observed when running a slightly modified version of the unquote test code is that it is quadratic (or O(n^2)) with respect to the length of the string (n). (Please note, you didn't specify Jython version, but 2.7 is what we would urge anyone to use now, and certainly what we would work on for performance updates.) But no surprise there: urllib.unquote is using a pattern of appending to a string with in-place string concatenation (using +=) which does not work well on Jython, and in general on Java. (The fact that indexing is somewhat slower is probably not relevant here.) For the underlying Java API, we would need to keep the string in a StringBuilder; for Python, we would typically use StringIO. Please note that PEP 8 even specifically recommends against this usage for Python code: For example, do not rely on CPython's efficient implementation of in-place > string concatenation for statements in the form a += b or a = a + b . This > optimization is fragile even in CPython (it only works for some types) and > isn't present at all in implementations that don't use refcounting. In > performance sensitive parts of the library, the ''.join() form should be > used instead. This will ensure that concatenation occurs in linear time > across various implementations. (https://www.python.org/dev/peps/pep-0008/#programming-recommendations) I don't believe this issue has come up until now however because urllib.unquote is most likely used for short strings in many/most applications. But the poor performance could be readily fixed by using StringIO instead. - Jim On Wed, Dec 9, 2015 at 8:27 PM, Toby Collett <Tob...@cr...> wrote: > Hi all, > I have been investigating a performance issue with a large data chunk > being sent via an http POST request. The outcome of which is that the > bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML, > which takes in the order of 10 mins to run through urlparse.unquote(). This > occurs in seconds in the cPython implementation. > > The performance difference seems to come down to slicing performance...For > reference the unquote code is included below > def unquote(s): > """unquote('abc%20def') -> 'abc def'.""" > res = s.split('%') > # fastpath > if len(res) == 1: > return s > s = res[0] > for item in res[1:]: > try: > s += _hextochr[item[:2]] + item[2:] > except KeyError: > s += '%' + item > except UnicodeDecodeError: > s += unichr(int(item[:2], 16)) + item[2:] > return s > > To investigate the slicing performance I ran some simple test code through > jython and python, timing is very adhoc (no efforts made to ensure same > level of background activity etc, but the magnitude of the difference is > large (3 times or more), and it is even more obvious when running the full > unquote code. > > Is this the expected performance of jython slicing, or is there something > that can be improved? > > Regards, > Toby > > The test code > ====== > import time > for x in range(4,7): > start = time.time() > print x > len([b[:2] + b[2:] for b in [u'%3casdf%3effff']*10**x]) > print time.time() - start > ======= > results > $ python /tmp/a.py > 4 > 0.00148105621338 > 5 > 0.0163309574127 > 6 > 0.152363061905 > > $ jython /tmp/a.py > 4 > 0.0190000534058 > 5 > 0.0709998607635 > 6 > 0.469000101089 > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Jython-users mailing list > Jyt...@li... > https://lists.sourceforge.net/lists/listinfo/jython-users > > |
|
From: Jim B. <jim...@py...> - 2015-12-10 21:43:05
|
Oliver,
That's a good suggestion and likely matches Toby's use case, although it
does return a unicode string, not a str. (But if ascii, just wrap with a
str(...)). Re-running using URLDecoder.decode I get performance comparable
to CPython.
However, escapes %80 through %FF are decoded differently, so it's not a
replacement we can just use in Jython itself:
- With urllib.unquote, we get \x80 through \xFF (high bytes)
- With URLDecoder.decode, we get Unicode Character 'REPLACEMENT
CHARACTER' (U+FFFD), which is rendered as �
- Jim
On Thu, Dec 10, 2015 at 10:17 AM, olivier merlin <ome...@gm...>
wrote:
> As jython is java why not using
> String result = java.net.URLDecoder.decode(url, "UTF-8");
> In your code ?
>
>
> Le jeu. 10 déc. 2015 04:44, Toby Collett <Tob...@cr...> a
> écrit :
>
>> Hi all,
>> I have been investigating a performance issue with a large data chunk
>> being sent via an http POST request. The outcome of which is that the
>> bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML,
>> which takes in the order of 10 mins to run through urlparse.unquote(). This
>> occurs in seconds in the cPython implementation.
>>
>> The performance difference seems to come down to slicing
>> performance...For reference the unquote code is included below
>> def unquote(s):
>> """unquote('abc%20def') -> 'abc def'."""
>> res = s.split('%')
>> # fastpath
>> if len(res) == 1:
>> return s
>> s = res[0]
>> for item in res[1:]:
>> try:
>> s += _hextochr[item[:2]] + item[2:]
>> except KeyError:
>> s += '%' + item
>> except UnicodeDecodeError:
>> s += unichr(int(item[:2], 16)) + item[2:]
>> return s
>>
>> To investigate the slicing performance I ran some simple test code
>> through jython and python, timing is very adhoc (no efforts made to ensure
>> same level of background activity etc, but the magnitude of the difference
>> is large (3 times or more), and it is even more obvious when running the
>> full unquote code.
>>
>> Is this the expected performance of jython slicing, or is there something
>> that can be improved?
>>
>> Regards,
>> Toby
>>
>> The test code
>> ======
>> import time
>> for x in range(4,7):
>> start = time.time()
>> print x
>> len([b[:2] + b[2:] for b in [u'%3casdf%3effff']*10**x])
>> print time.time() - start
>> =======
>> results
>> $ python /tmp/a.py
>> 4
>> 0.00148105621338
>> 5
>> 0.0163309574127
>> 6
>> 0.152363061905
>>
>> $ jython /tmp/a.py
>> 4
>> 0.0190000534058
>> 5
>> 0.0709998607635
>> 6
>> 0.469000101089
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Jython-users mailing list
>> Jyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/jython-users
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Jython-users mailing list
> Jyt...@li...
> https://lists.sourceforge.net/lists/listinfo/jython-users
>
>
|
|
From: Toby C. <Tob...@cr...> - 2015-12-11 01:50:25
|
Thanks for the quick follow up. My apologies for not including the version, I am working with 2.7.0, and also had tried the 2.7.1 beta 2. Java build is openjdk version "1.8.0_45-internal" I am using django in the background, so cannot directly modify the call to decode, however the following inspired by Oliver has solved my immediate issue. import urlparse import java.net.URLDecoder urlparse.unquote = lambda s: java.net.URLDecoder.decode(s, "UTF-8") I look forward to 2.7.1 being available next year looks like some great work has been going into it, Regards, Toby From: Jim Baker <jim...@py...> To: olivier merlin <ome...@gm...>, Cc: "jyt...@li..." <jyt...@li...>, Toby Collett <Tob...@cr...> Date: 11/12/2015 11:32 a.m. Subject: Re: [Jython-users] Performance of urlparse.unquote / string slicing Since it's an easy fix, I made the change. It will be part of 2.7.1 beta 3. See https://hg.python.org/jython/rev/23f2c16b9dc7 I haven't done a detailed performance analysis, since going from quadratic to linear is such an obvious improvement. But eyeballing it, Jython 2.7.1 on urllib.unquote is now twice as fast on my laptop as CPython 2.7.10. - Jim On Thu, Dec 10, 2015 at 2:42 PM, Jim Baker <jim...@py...> wrote: Oliver, That's a good suggestion and likely matches Toby's use case, although it does return a unicode string, not a str. (But if ascii, just wrap with a str(...)). Re-running using URLDecoder.decode I get performance comparable to CPython. However, escapes %80 through %FF are decoded differently, so it's not a replacement we can just use in Jython itself: With urllib.unquote, we get \x80 through \xFF (high bytes) With URLDecoder.decode, we get Unicode Character 'REPLACEMENT CHARACTER' (U+FFFD), which is rendered as � - Jim On Thu, Dec 10, 2015 at 10:17 AM, olivier merlin <ome...@gm...> wrote: As jython is java why not using String result = java.net.URLDecoder.decode(url, "UTF-8"); In your code ? Le jeu. 10 déc. 2015 04:44, Toby Collett <Tob...@cr...> a écrit : Hi all, I have been investigating a performance issue with a large data chunk being sent via an http POST request. The outcome of which is that the bottleneck seems to be in urlparse.unquote. The data is about 6MB of XML, which takes in the order of 10 mins to run through urlparse.unquote(). This occurs in seconds in the cPython implementation. The performance difference seems to come down to slicing performance...For reference the unquote code is included below def unquote(s): """unquote('abc%20def') -> 'abc def'.""" res = s.split('%') # fastpath if len(res) == 1: return s s = res[0] for item in res[1:]: try: s += _hextochr[item[:2]] + item[2:] except KeyError: s += '%' + item except UnicodeDecodeError: s += unichr(int(item[:2], 16)) + item[2:] return s To investigate the slicing performance I ran some simple test code through jython and python, timing is very adhoc (no efforts made to ensure same level of background activity etc, but the magnitude of the difference is large (3 times or more), and it is even more obvious when running the full unquote code. Is this the expected performance of jython slicing, or is there something that can be improved? Regards, Toby The test code ====== import time for x in range(4,7): start = time.time() print x len([b[:2] + b[2:] for b in [u'%3casdf%3effff']*10**x]) print time.time() - start ======= results $ python /tmp/a.py 4 0.00148105621338 5 0.0163309574127 6 0.152363061905 $ jython /tmp/a.py 4 0.0190000534058 5 0.0709998607635 6 0.469000101089 ------------------------------------------------------------------------------ _______________________________________________ Jython-users mailing list Jyt...@li... https://lists.sourceforge.net/lists/listinfo/jython-users ------------------------------------------------------------------------------ _______________________________________________ Jython-users mailing list Jyt...@li... https://lists.sourceforge.net/lists/listinfo/jython-users ------------------------------------------------------------------------------ _______________________________________________ Jython-users mailing list Jyt...@li... https://lists.sourceforge.net/lists/listinfo/jython-users |
|
From: Philip C. <phi...@or...> - 2015-12-15 17:52:46
|
Are there instructions for running jython in IntelliJ? thanks phil |
|
From: <fwi...@gm...> - 2015-12-15 19:57:51
|
On Tue, Dec 15, 2015 at 9:52 AM, Philip Cannata <phi...@or...> wrote: > Are there instructions for running jython in IntelliJ? The Python plugin has great Jython support: https://plugins.jetbrains.com/plugin/?idea&pluginId=631 -Frank |
|
From: Darjus L. <da...@gm...> - 2015-12-16 00:10:06
|
+1 Awesome product. On Wed, Dec 16, 2015 at 6:59 AM fwi...@gm... <fwi...@gm...> wrote: > On Tue, Dec 15, 2015 at 9:52 AM, Philip Cannata <phi...@or...> > wrote: > > Are there instructions for running jython in IntelliJ? > The Python plugin has great Jython support: > > https://plugins.jetbrains.com/plugin/?idea&pluginId=631 > > -Frank > > > ------------------------------------------------------------------------------ > _______________________________________________ > Jython-users mailing list > Jyt...@li... > https://lists.sourceforge.net/lists/listinfo/jython-users > |
|
From: Philip C. <phi...@or...> - 2015-12-16 15:58:34
|
Thanks for the reply, however, I'd like to know how to load my own version of jython that I've built from the source into IntelliJ and/or PyCharm. Phil On 12/15/15 1:57 PM, fwi...@gm... wrote: > On Tue, Dec 15, 2015 at 9:52 AM, Philip Cannata <phi...@or...> wrote: >> Are there instructions for running jython in IntelliJ? > The Python plugin has great Jython support: > > https://plugins.jetbrains.com/plugin/?idea&pluginId=631 > > -Frank |
|
From: Paul E. <pau...@me...> - 2015-12-16 16:05:08
|
Hi Phil. As disclosure, I’m the PyCharm Developer Evangelist. It’s pretty easy. In PyCharm, under “Project Interpreters”, you add a new interpreter and just point to your own jython executable. —Paul > On Dec 16, 2015, at 10:58 AM, Philip Cannata <phi...@or...> wrote: > > Thanks for the reply, however, I'd like to know how to load my own > version of jython that I've built from the source into IntelliJ and/or > PyCharm. > Phil > > On 12/15/15 1:57 PM, fwi...@gm... wrote: >> On Tue, Dec 15, 2015 at 9:52 AM, Philip Cannata <phi...@or...> wrote: >>> Are there instructions for running jython in IntelliJ? >> The Python plugin has great Jython support: >> >> https://plugins.jetbrains.com/plugin/?idea&pluginId=631 >> >> -Frank > > > ------------------------------------------------------------------------------ > _______________________________________________ > Jython-users mailing list > Jyt...@li... > https://lists.sourceforge.net/lists/listinfo/jython-users |
|
From: Philip C. <phi...@or...> - 2015-12-16 16:30:16
|
Yes, that worked and it was easy, nice! I'm teaching an "Elements of Programming Languages" course at the University of Texas and I'm planning to use this or, more likely, the PyCharm plugin for IntelliJ. Thanks for the quick, helpful response. Phil On 12/16/15 10:04 AM, Paul Everitt wrote: > Hi Phil. As disclosure, I’m the PyCharm Developer Evangelist. > > It’s pretty easy. In PyCharm, under “Project Interpreters”, you add a new interpreter and just point to your own jython executable. > > —Paul > >> On Dec 16, 2015, at 10:58 AM, Philip Cannata <phi...@or...> wrote: >> >> Thanks for the reply, however, I'd like to know how to load my own >> version of jython that I've built from the source into IntelliJ and/or >> PyCharm. >> Phil >> >> On 12/15/15 1:57 PM, fwi...@gm... wrote: >>> On Tue, Dec 15, 2015 at 9:52 AM, Philip Cannata <phi...@or...> wrote: >>>> Are there instructions for running jython in IntelliJ? >>> The Python plugin has great Jython support: >>> >>> https://plugins.jetbrains.com/plugin/?idea&pluginId=631 >>> >>> -Frank >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Jython-users mailing list >> Jyt...@li... >> https://lists.sourceforge.net/lists/listinfo/jython-users |
|
From: Philip C. <phi...@or...> - 2015-12-16 17:17:19
|
I got PyCharm working in IntelliJ with my version of jython, very nice! Thanks phil On 12/16/15 10:30 AM, Philip Cannata wrote: > Yes, that worked and it was easy, nice! I'm teaching an "Elements of > Programming Languages" course at the University of Texas and I'm > planning to use this or, more likely, the PyCharm plugin for IntelliJ. > Thanks for the quick, helpful response. > Phil > > On 12/16/15 10:04 AM, Paul Everitt wrote: >> Hi Phil. As disclosure, I’m the PyCharm Developer Evangelist. >> >> It’s pretty easy. In PyCharm, under “Project Interpreters”, you add a >> new interpreter and just point to your own jython executable. >> >> —Paul >> >>> On Dec 16, 2015, at 10:58 AM, Philip Cannata >>> <phi...@or...> wrote: >>> >>> Thanks for the reply, however, I'd like to know how to load my own >>> version of jython that I've built from the source into IntelliJ and/or >>> PyCharm. >>> Phil >>> >>> On 12/15/15 1:57 PM, fwi...@gm... wrote: >>>> On Tue, Dec 15, 2015 at 9:52 AM, Philip Cannata >>>> <phi...@or...> wrote: >>>>> Are there instructions for running jython in IntelliJ? >>>> The Python plugin has great Jython support: >>>> >>>> https://plugins.jetbrains.com/plugin/?idea&pluginId=631 >>>> >>>> -Frank >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Jython-users mailing list >>> Jyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/jython-users > |