[Jython-dev] PyType.fromClass publication with more concurrency

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Moving this onto Jython-Dev for a wider readership. Rest of my message 
bottom-posted because you want to read Jim's first.

On 05/08/2017 20:34, in Issue #2387 (http://bugs.jython.org/issue2487), 
Jim Baker wrote:
> ...
>
> Next, let's look at our caching, and its corresponding
> performance. Using weak keys/weak values in class_to_type means that
> when working with Java objects (calling methods, accessing
> attributes), Python code that does not maintain a strong reference to
> the Java class will cause the Jython runtime to constantly rebuild
> these entries in class_to_type. Here are the ways that such strong
> references would be established:
>
> 1. exposedTypes (so Jython's internals)
> https://hg.python.org/jython/file/1a14d360dc8d/src/org/python/core/PyType.java#l104
>
> 2. imports of Java classes into a Python namespace
>
> 3. references to `type(java_obj)`
>
> 4. subclassing of a Java class by a Python class
>
> References to Java *objects*, including objects from factory methods;
> indirect construction, such as `java.util.HashMap()`; and callbacks
> (Netty, re/json/strptime) would not introduce such references to
> Python types for Java *classes*. But calling methods, using
> attributes, or otherwise using getting the Python type on the Java
> object, would require this entry in class_to_type to be added,
> potentially to be potentially quickly discarded by generational
> GC. Computing these entries is relatively expensive, especially if a
> given class C has inner classes (and this is a general type graph
> expansion).
>
> See also the analysis in https://bugs.openjdk.java.net/browse/JDK-6389107
> (scenario 3); our mapping in class_to_type is a common pattern seen in
> Java systems.
>
> What if instead of current one-level cache, we had a two-level cache?
> Level 1 uses weak key/weak value for the class/type entries (C, T);
> (C, T) entries that are expired from Level 1 are moved to Level 2;
> Level 2 uses an expiration time so a ClassLoader could be unloaded in
> say some reasonable amount of time.
>
> To implement:
>
> 1. Level 1 cache is a direct replacment of class_to_type with
>     CacheBuilder weak keys/weak values/removal listener. Also we will
>     not attempt to expose it as a Map going forward; adjust
>     class_to_type usage accordingly.
>
> 2. Level 2 cache is weak keys/strong values/expires after write. We
>     can tune this expiration as was done with the regex cache and the
>     python.sre.cachespec option,
>     https://github.com/jythontools/jython/blob/master/src/org/python/modules/sre/SRE_STATE.java#L1253).
>
> 3. For removalListener: (C, T) entries that are removed from Level 1
>     will use removalListener to get placed in Level 2 (safe because
>     Level 1 is concurrent; and obviously hard refs would prevent it
>     from being removed while under construction).
>
> 4. Try both caches when looking up the type T for class C. Because
>     synchronized, there's no visible inconsistency; if no entry for C
>     because it was removed, simply recompute T'. There is a small
>     window of time that it could be in the process of being moved to
>     Level 2, but this does not affect correctness; (C, T) will be
>     eventually expired from Level 2 and (C, T') is a valid repllacement
>     for it (because there can be no hard refs to T outside the level 2
>     cache itself).
>
> Lastly, https://docs.oracle.com/javase/7/docs/api/java/lang/ClassValue.html
> looks potentially useful as an alternative for publishing PyType
> objects into a cache. But we need to address two key questions:
>
> 1. Whether ClassValue#computeValue would work properly in the
>     re-entrant case where the class graph from a given class C is
>     explored, as is done for inners, possibly referring back to C. This
>     is why we could not get the publication of the type in the map done
>     after the init (as one would usually want to do!) in
>     https://github.com/jythontools/jython/blob/master/src/org/python/core/PyType.java#L1515
>     (putIfAbsent); class C would be referenced, and it would attempt to
>     build again (and stack overflow). No solution I tried could break
>     this problem.
>
> 2. When to call ClassValue#removeValue ! We still have to keep track
>     of the cache with respect to PyType.
>
Jim:

I've read this a couple of times now and understand the intent, if not 
the detail.

I'd like try ClassValue as a way to improve concurrency.

I see what you mean about the needsInners list. I've watched that work 
and I'm still a bit puzzled what it does. It stores PyJavaType objects 
whose initialisation we couldn't complete because they depend on other 
PyType objects existing first? I write "existing" here because in the 
creation of T for some class C, it must be acceptable to have  return a 
PyType the initialisation of which will be completed later, and only 
unacceptable to leave the monitor with T (or any PyType) incomplete. At 
any rate, that's what we do now, and I have half an idea how to 
reproduce it.

The other thing I'm not sure about is when we would like a PyType to be 
collected. A ClassValue will be freed when the Class C is freed. And the 
fact that the value (PyType T) hard-refers to C will not prevent this. 
If something else hard-refers to T, then C won't be unloaded. I dare say 
JVMs try to avoid premature unloading of C, from which T benefits. The 
difficult case may be where C continues in use, but we'll never want T 
again. T would be elegible for collection except it is in the ClassValue 
for C. Is this a significant case?

If it's common enough, then I think we need 
ClassValue<WeakReference<PyType>>, and if you want to defer that 
collection, a ReferenceQueue (not a thing I've used before) leading to a 
sort of recycling bin, your second cache. I'm pretty sure that could be 
thread safe, although it needs a secondary garbage collection thread to 
empty it. I'll try without this death-row cache, first.

Jeff