I've created a pull request at https://bitbucket.org/jython/jython/pull-request/27/make-sre_state-cache-the-code-points-for-a/diff that implements caching of the code points and where the properties of the cache are configurable via the registry.


On 9 March 2014 08:56, Indra Talip <indra.talip@gmail.com> wrote:
Running the code through the NetBeans profiler again shows that the execution time now appears to be dominated by Jython internals rather than the SRE_STATE/PyUnicode as before.

The top 13 results from the profiling session are shown below with PyTableCode.call, PyType$MethodCache.lookup_where and PyObject.object___findattr__ taking ~16% of the total execution time.

"Hot Spots - Method","Self Time [%]","Self Time","Total Time","Invocations"
"org.python.core.PyTableCode.call(org.python.core.ThreadState, org.python.core.PyFrame, org.python.core.PyObject)","9.314184","4736.226 ms","46921.135 ms","267700"
"org.python.core.PyType$MethodCache.lookup_where(org.python.core.PyType, String, org.python.core.PyObject[])","4.092952","2081.25 ms","3450.315 ms","1702296"
"org.python.core.PyObject.object___findattr__(String)","2.7491088","1397.911 ms","6655.189 ms","976144"
"org.python.core.PyType.lookup(String)","1.7592899","894.592 ms","5180.861 ms","1693183"
"org.python.core.PyType.lookup_where(String, org.python.core.PyObject[])","1.6819717","855.276 ms","4306.611 ms","1702296"
"org.python.core.PyObject.__getattr__(String)","1.5009181","763.211 ms","9284.117 ms","1228640"
"org.python.core.PyStringMap.__finditem__(String)","1.4666681","745.795 ms","745.795 ms","2114545"
"org.python.core.PySequence.<init>(org.python.core.PyType)","1.3575838","690.326 ms","1522.962 ms","840393"
"org.python.core.PyFrame.getlocal(int)","1.3472611","685.077 ms","685.077 ms","3150368"
"org.python.modules.sre.SRE_STATE.SRE_MATCH(int[], int, int)","1.2096357","615.095 ms","1305.284 ms","401248"
"org.python.core.PyObject.<init>(org.python.core.PyType)","1.1715233","595.715 ms","595.715 ms","2388936"
"org.python.core.PyMethodDescr.method_descriptor___get__(org.python.core.PyObject, org.python.core.PyObject)","1.0598332","538.921 ms","2489.703 ms","495366"

It doesn't look at this stage that further optimisations to the SRE_STATE/indexed ops would buy much improvement in the overall execution time.


On 8 March 2014 21:21, Indra Talip <indra.talip@gmail.com> wrote:
Thanks for the feedback. The attached patch moves the memoization to SRE_STATE using a guava com.google.common.cache.Cache with WeakReference keys.

It appears to have similar performance characteristics for the simple test case .
> JYTHONPATH=beautifulsoup4-4.3.2/ dist/bin/jython beautifulsoup.py 

If that looks alright I'll push it up to the bug tracker. One question I have is whether to make the concurrencyLevel configurable and if so what's the best way to do so? Is there a standard mechanism for looking up these sorts of configuration items in jython?

Regarding the performance of getting code points from the underlying UTF-16 string it looks like that will be built into Java 8 (http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8012665).


On 8 March 2014 17:26, Jim Baker <jbaker@zyasoft.com> wrote:

Thanks for your analysis and patch! It's quite interesting, I was discussing today this very issue with a colleague as a known performance issue in Jython. Incidentally Nicholas Riley also mentioned in a comment on that thread (http://sourceforge.net/p/jython/mailman/message/23906076/) that he saw on his work on Joni the same overhead of converting to codepoints (from the underlying UTF-16), so it looks like we just didn't go far enough in looking at this issue nearly 5 years ago.

Memoization is certainly the right approach with how SRE is currently implemented. Instead of doing it in PyUnicode, I would think it makes sense to apply the memoization approach in SRE instead, at the cost of somewhat greater complexity.

One more thing to consider: toCodePoints is really only necessary for unicode objects that have any codepoints not in the basic plane. So we could avoid this construction. Even more radically, we could also probably avoid it in general if we moved away from indexed ops in SRE and just used a next/prev approach.

Thanks again, this is great stuff!

- Jim

On Fri, Mar 7, 2014 at 9:56 PM, Indra Talip <indra.talip@gmail.com> wrote:
While looking into using BeautifulSoup with Jython I found that it is exceedingly slow. I came across the following thread from Nov 2009 http://sourceforge.net/p/jython/mailman/message/23893421/ which also documented the issue with speed.

Poking at it with the NetBeans profiler showed that most of the time was in PyUnicode#toCodePoints or more specifically the calls to SubsequenceIteratorImpl#next() due to the HTMLParser/Jython SRE engine passing the entire file as a string multiple times (for different regexes) to the SRE_STATE constructor which in turn calls PyUnicode#toCodePoints. The attached patch takes a brute force approach of memoizing the result of calling PyUnicode#toCodePoints(). This brings the run-time of parsing http://www.fixprotocol.org/specifications/fields/5000-5999 down from 500+ secs to ~6 seconds. I've wrapped the returned codepoints int[] in a SoftReference to allow for it to be garbage collected, although this adds about 1 second of overhead.

The code used is as per the previous thread.

> cat beautifulsoup.py 
import time
from bs4 import BeautifulSoup
data = open("fix-5000-5999.html").read()
start = time.time()
soup = BeautifulSoup(data)
print time.time() - start

> JYTHONPATH=beautifulsoup4-4.3.2/ dist/bin/jython beautifulsoup.py 

Is this approach worthwhile? Are there any other approaches that I could take to try speeding Jython up while using BeautifulSoup?


Indra Talip

Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
Jython-users mailing list

Indra Talip

Indra Talip

Indra Talip