[Jython-dev] PyType publication with better concurrency

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Jim, all:

Apologies for the near repeat, but my first attempt to give this a life 
of its own on Jython-dev ended up nested under issue 2487 again. Hope 
this lands as a fresh topic.

On 05/08/2017 20:34, in Issue #2387 (http://bugs.jython.org/issue2487), 
Jim Baker wrote:
> ...
>
> Next, let's look at our caching, and its corresponding
> performance. Using weak keys/weak values in class_to_type means that
> when working with Java objects (calling methods, accessing
> attributes), Python code that does not maintain a strong reference to
> the Java class will cause the Jython runtime to constantly rebuild
> these entries in class_to_type. Here are the ways that such strong
> references would be established:
>
> 1. exposedTypes (so Jython's internals)
> https://hg.python.org/jython/file/1a14d360dc8d/src/org/python/core/PyType.java#l104 
>
>
> 2. imports of Java classes into a Python namespace
>
> 3. references to `type(java_obj)`
>
> 4. subclassing of a Java class by a Python class
>
> References to Java *objects*, including objects from factory methods;
> indirect construction, such as `java.util.HashMap()`; and callbacks
> (Netty, re/json/strptime) would not introduce such references to
> Python types for Java *classes*. But calling methods, using
> attributes, or otherwise using getting the Python type on the Java
> object, would require this entry in class_to_type to be added,
> potentially to be potentially quickly discarded by generational
> GC. Computing these entries is relatively expensive, especially if a
> given class C has inner classes (and this is a general type graph
> expansion).
>
> See also the analysis in https://bugs.openjdk.java.net/browse/JDK-6389107
> (scenario 3); our mapping in class_to_type is a common pattern seen in
> Java systems.
>
> What if instead of current one-level cache, we had a two-level cache?
> Level 1 uses weak key/weak value for the class/type entries (C, T);
> (C, T) entries that are expired from Level 1 are moved to Level 2;
> Level 2 uses an expiration time so a ClassLoader could be unloaded in
> say some reasonable amount of time.
>
> To implement:
>
> 1. Level 1 cache is a direct replacment of class_to_type with
>     CacheBuilder weak keys/weak values/removal listener. Also we will
>     not attempt to expose it as a Map going forward; adjust
>     class_to_type usage accordingly.
>
> 2. Level 2 cache is weak keys/strong values/expires after write. We
>     can tune this expiration as was done with the regex cache and the
>     python.sre.cachespec option,
> https://github.com/jythontools/jython/blob/master/src/org/python/modules/sre/SRE_STATE.java#L1253). 
>
>
> 3. For removalListener: (C, T) entries that are removed from Level 1
>     will use removalListener to get placed in Level 2 (safe because
>     Level 1 is concurrent; and obviously hard refs would prevent it
>     from being removed while under construction).
>
> 4. Try both caches when looking up the type T for class C. Because
>     synchronized, there's no visible inconsistency; if no entry for C
>     because it was removed, simply recompute T'. There is a small
>     window of time that it could be in the process of being moved to
>     Level 2, but this does not affect correctness; (C, T) will be
>     eventually expired from Level 2 and (C, T') is a valid repllacement
>     for it (because there can be no hard refs to T outside the level 2
>     cache itself).
>
> Lastly, 
> https://docs.oracle.com/javase/7/docs/api/java/lang/ClassValue.html
> looks potentially useful as an alternative for publishing PyType
> objects into a cache. But we need to address two key questions:
>
> 1. Whether ClassValue#computeValue would work properly in the
>     re-entrant case where the class graph from a given class C is
>     explored, as is done for inners, possibly referring back to C. This
>     is why we could not get the publication of the type in the map done
>     after the init (as one would usually want to do!) in
> https://github.com/jythontools/jython/blob/master/src/org/python/core/PyType.java#L1515 
>
>     (putIfAbsent); class C would be referenced, and it would attempt to
>     build again (and stack overflow). No solution I tried could break
>     this problem.
>
> 2. When to call ClassValue#removeValue ! We still have to keep track
>     of the cache with respect to PyType.

I have an idea that I can use ClassValue with this, but it sits as a 
(lock-free) cache in front of the same (properly locked) apparatus we 
have now.

Clients see a static PyType.forClass() that is not synchronised and 
simply calls the (ClassValue) pyType.get(c). A miss results in 
pyType.computeValue(c), which uses the existing synchronised apparatus. 
When we wrestle with types internally to the type registry, we call an 
internal Registry.forClass(), and other methods, that are essentially 
the ones we're familiar with. We don't use the ClassValue pyType for 
look-up internally because that would expose incomplete types to other 
threads. In the first instance, this means I can leave the existing code 
essentially unmodified, apart from turning it into a private delegate.

The limitation of ClassValue is that it doesn't let you supply 
information in the look-up call that can be passed to computeValue: it 
has to be possible using only the class c itself, and information 
provided beforehand. In our case, the extra argument is just the boolean 
hardRef that is supposed to decide whether the PyType created ends up in 
exposedTypes.

I think I can deal with it, awkwardly, but that API seems questionable: 
many places the first call is PyType.forClass(this, false) and a later 
(default) call has hardRef=true, but comes too late. The creation of 
'object' and 'type' are notable examples of core types missing from 
exposedTypes. In other places, an explicit false or implicit true is 
given when the thought may only be to look something up, and the 
possibility it will be created at that point may not have been 
considered. I wonder if we could decide, either by looking at the class 
(name) or through a list specified in advance, that certain classes 
should be strongly referenced?

I think I understand the intent of the two-level cache, and trying 
ClassValue doesn't invalidate it. It might do part of the work for us. 
In particular, a ClassValue does not prevent unloading of the class. 
(It's a weak-keys cache.) It keeps its value alive until the 
corresponding class is unloaded (unless the value is explicitly a weak 
reference).

Thing is, I don't understand in what circumstances we ought to create 
the strong reference: what need are we actually seeking to meet? Is the 
critical use case that the PyType could be freed when Python code stops 
using it, and yet the Java class has reason to remain?

Jeff