Indra,

Thanks for working on this!

Memoization is the right approach for Jython's regex engine (SRE) as it is currently implemented using array indexed ops. Clearly certain usages are O(n^2) instead of being O(n), which the memoization mitigates. My only concern is that the cache could potentially contain references to large objects. But this can be readily resolved: Guava's Cache mechanism does support an arbitrary weighing mechanism and maximum weights for a cache. So we just need to assign the weight to be the length in codepoints, and we can constrain the cache to be say on the order of 1MB or whatever the user wants. Other issues like concurrency level are best left to the user, as you have done with this property.

Longer term, it would be best if we move to not having to need to cache these codepoints at all. (Jython's runtime does not cache user data, although we certainly do cache metadata, especially around types.) I did take a look at JRuby's Joni project, which is their equivalent to what we do. In particular, let's look specifically its regex bytecode VM (https://github.com/jruby/joni/blob/master/src/org/joni/ByteCodeMachine.java). It's actually quite close to SRE in design but implements the next/prev approach I suggested earlier (called step and stepBack, see https://github.com/jruby/jcodings/blob/master/src/org/jcodings/Encoding.java#L320). The other interesting thing Joni does is it breaks up its methods so they are more inlineable and therefore more likely to run fast on the JVM. This is especially seen when comparing the use of switch statements in Joni and SRE.

What does this mean for the pull request? Assuming weight support, it looks good and certainly fine for 2.7.0. But ideally we can find some time to work on SRE. For the interested developer, this would be straightforward work that would introduce what it means to be write a fast virtual machine on the JVM (and who doesn't like that idea, VMs on VMs?); issues of UTF-16 character encoding for unicode vs just using bytes (Python allows regexes on both); while still being quite self contained.

- Jim


On Tue, Mar 11, 2014 at 9:02 PM, Indra Talip <indra.talip@gmail.com> wrote:
Jim,
I've created a pull request at https://bitbucket.org/jython/jython/pull-request/27/make-sre_state-cache-the-code-points-for-a/diff that implements caching of the code points and where the properties of the cache are configurable via the registry.

Cheers
Indra


On 9 March 2014 08:56, Indra Talip <indra.talip@gmail.com> wrote:
Jim,
Running the code through the NetBeans profiler again shows that the execution time now appears to be dominated by Jython internals rather than the SRE_STATE/PyUnicode as before.

The top 13 results from the profiling session are shown below with PyTableCode.call, PyType$MethodCache.lookup_where and PyObject.object___findattr__ taking ~16% of the total execution time.

"Hot Spots - Method","Self Time [%]","Self Time","Total Time","Invocations"
"org.python.core.PyTableCode.call(org.python.core.ThreadState, org.python.core.PyFrame, org.python.core.PyObject)","9.314184","4736.226 ms","46921.135 ms","267700"
"org.python.core.PyType$MethodCache.lookup_where(org.python.core.PyType, String, org.python.core.PyObject[])","4.092952","2081.25 ms","3450.315 ms","1702296"
"org.python.core.PyObject.object___findattr__(String)","2.7491088","1397.911 ms","6655.189 ms","976144"
"org.python.core.PyType.lookup(String)","1.7592899","894.592 ms","5180.861 ms","1693183"
"org.python.core.PyType.lookup_where(String, org.python.core.PyObject[])","1.6819717","855.276 ms","4306.611 ms","1702296"
"org.python.core.PyObject.__getattr__(String)","1.5009181","763.211 ms","9284.117 ms","1228640"
"org.python.core.PyStringMap.__finditem__(String)","1.4666681","745.795 ms","745.795 ms","2114545"
"org.python.core.PySequence.<init>(org.python.core.PyType)","1.3575838","690.326 ms","1522.962 ms","840393"
"org.python.core.PyFrame.getlocal(int)","1.3472611","685.077 ms","685.077 ms","3150368"
"org.python.modules.sre.SRE_STATE.SRE_MATCH(int[], int, int)","1.2096357","615.095 ms","1305.284 ms","401248"
"org.python.core.PyObject.<init>(org.python.core.PyType)","1.1715233","595.715 ms","595.715 ms","2388936"
"org.python.core.PyMethodDescr.method_descriptor___get__(org.python.core.PyObject, org.python.core.PyObject)","1.0598332","538.921 ms","2489.703 ms","495366"

It doesn't look at this stage that further optimisations to the SRE_STATE/indexed ops would buy much improvement in the overall execution time.

Cheers
Indra



On 8 March 2014 21:21, Indra Talip <indra.talip@gmail.com> wrote:
Jim,
Thanks for the feedback. The attached patch moves the memoization to SRE_STATE using a guava com.google.common.cache.Cache with WeakReference keys.

It appears to have similar performance characteristics for the simple test case .
> JYTHONPATH=beautifulsoup4-4.3.2/ dist/bin/jython beautifulsoup.py 
6.15499997139

If that looks alright I'll push it up to the bug tracker. One question I have is whether to make the concurrencyLevel configurable and if so what's the best way to do so? Is there a standard mechanism for looking up these sorts of configuration items in jython?

Regarding the performance of getting code points from the underlying UTF-16 string it looks like that will be built into Java 8 (http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8012665).

Cheers
Indra


On 8 March 2014 17:26, Jim Baker <jbaker@zyasoft.com> wrote:
Indra,

Thanks for your analysis and patch! It's quite interesting, I was discussing today this very issue with a colleague as a known performance issue in Jython. Incidentally Nicholas Riley also mentioned in a comment on that thread (http://sourceforge.net/p/jython/mailman/message/23906076/) that he saw on his work on Joni the same overhead of converting to codepoints (from the underlying UTF-16), so it looks like we just didn't go far enough in looking at this issue nearly 5 years ago.

Memoization is certainly the right approach with how SRE is currently implemented. Instead of doing it in PyUnicode, I would think it makes sense to apply the memoization approach in SRE instead, at the cost of somewhat greater complexity.

One more thing to consider: toCodePoints is really only necessary for unicode objects that have any codepoints not in the basic plane. So we could avoid this construction. Even more radically, we could also probably avoid it in general if we moved away from indexed ops in SRE and just used a next/prev approach.

Thanks again, this is great stuff!

- Jim



On Fri, Mar 7, 2014 at 9:56 PM, Indra Talip <indra.talip@gmail.com> wrote:
G'day,
While looking into using BeautifulSoup with Jython I found that it is exceedingly slow. I came across the following thread from Nov 2009 http://sourceforge.net/p/jython/mailman/message/23893421/ which also documented the issue with speed.

Poking at it with the NetBeans profiler showed that most of the time was in PyUnicode#toCodePoints or more specifically the calls to SubsequenceIteratorImpl#next() due to the HTMLParser/Jython SRE engine passing the entire file as a string multiple times (for different regexes) to the SRE_STATE constructor which in turn calls PyUnicode#toCodePoints. The attached patch takes a brute force approach of memoizing the result of calling PyUnicode#toCodePoints(). This brings the run-time of parsing http://www.fixprotocol.org/specifications/fields/5000-5999 down from 500+ secs to ~6 seconds. I've wrapped the returned codepoints int[] in a SoftReference to allow for it to be garbage collected, although this adds about 1 second of overhead.

The code used is as per the previous thread.

> cat beautifulsoup.py 
import time
from bs4 import BeautifulSoup
data = open("fix-5000-5999.html").read()
start = time.time()
soup = BeautifulSoup(data)
print time.time() - start

> JYTHONPATH=beautifulsoup4-4.3.2/ dist/bin/jython beautifulsoup.py 

Is this approach worthwhile? Are there any other approaches that I could take to try speeding Jython up while using BeautifulSoup?

Cheers
Indra

--
Indra Talip

------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
Jython-users mailing list
Jython-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jython-users





--
Indra Talip



--
Indra Talip



--
Indra Talip