From: Jeff A. <ja...@fa...> - 2017-08-07 22:08:52
|
Moving this onto Jython-Dev for a wider readership. Rest of my message bottom-posted because you want to read Jim's first. On 05/08/2017 20:34, in Issue #2387 (http://bugs.jython.org/issue2487), Jim Baker wrote: > ... > > Next, let's look at our caching, and its corresponding > performance. Using weak keys/weak values in class_to_type means that > when working with Java objects (calling methods, accessing > attributes), Python code that does not maintain a strong reference to > the Java class will cause the Jython runtime to constantly rebuild > these entries in class_to_type. Here are the ways that such strong > references would be established: > > 1. exposedTypes (so Jython's internals) > https://hg.python.org/jython/file/1a14d360dc8d/src/org/python/core/PyType.java#l104 > > 2. imports of Java classes into a Python namespace > > 3. references to `type(java_obj)` > > 4. subclassing of a Java class by a Python class > > References to Java *objects*, including objects from factory methods; > indirect construction, such as `java.util.HashMap()`; and callbacks > (Netty, re/json/strptime) would not introduce such references to > Python types for Java *classes*. But calling methods, using > attributes, or otherwise using getting the Python type on the Java > object, would require this entry in class_to_type to be added, > potentially to be potentially quickly discarded by generational > GC. Computing these entries is relatively expensive, especially if a > given class C has inner classes (and this is a general type graph > expansion). > > See also the analysis in https://bugs.openjdk.java.net/browse/JDK-6389107 > (scenario 3); our mapping in class_to_type is a common pattern seen in > Java systems. > > What if instead of current one-level cache, we had a two-level cache? > Level 1 uses weak key/weak value for the class/type entries (C, T); > (C, T) entries that are expired from Level 1 are moved to Level 2; > Level 2 uses an expiration time so a ClassLoader could be unloaded in > say some reasonable amount of time. > > To implement: > > 1. Level 1 cache is a direct replacment of class_to_type with > CacheBuilder weak keys/weak values/removal listener. Also we will > not attempt to expose it as a Map going forward; adjust > class_to_type usage accordingly. > > 2. Level 2 cache is weak keys/strong values/expires after write. We > can tune this expiration as was done with the regex cache and the > python.sre.cachespec option, > https://github.com/jythontools/jython/blob/master/src/org/python/modules/sre/SRE_STATE.java#L1253). > > 3. For removalListener: (C, T) entries that are removed from Level 1 > will use removalListener to get placed in Level 2 (safe because > Level 1 is concurrent; and obviously hard refs would prevent it > from being removed while under construction). > > 4. Try both caches when looking up the type T for class C. Because > synchronized, there's no visible inconsistency; if no entry for C > because it was removed, simply recompute T'. There is a small > window of time that it could be in the process of being moved to > Level 2, but this does not affect correctness; (C, T) will be > eventually expired from Level 2 and (C, T') is a valid repllacement > for it (because there can be no hard refs to T outside the level 2 > cache itself). > > Lastly, https://docs.oracle.com/javase/7/docs/api/java/lang/ClassValue.html > looks potentially useful as an alternative for publishing PyType > objects into a cache. But we need to address two key questions: > > 1. Whether ClassValue#computeValue would work properly in the > re-entrant case where the class graph from a given class C is > explored, as is done for inners, possibly referring back to C. This > is why we could not get the publication of the type in the map done > after the init (as one would usually want to do!) in > https://github.com/jythontools/jython/blob/master/src/org/python/core/PyType.java#L1515 > (putIfAbsent); class C would be referenced, and it would attempt to > build again (and stack overflow). No solution I tried could break > this problem. > > 2. When to call ClassValue#removeValue ! We still have to keep track > of the cache with respect to PyType. > Jim: I've read this a couple of times now and understand the intent, if not the detail. I'd like try ClassValue as a way to improve concurrency. I see what you mean about the needsInners list. I've watched that work and I'm still a bit puzzled what it does. It stores PyJavaType objects whose initialisation we couldn't complete because they depend on other PyType objects existing first? I write "existing" here because in the creation of T for some class C, it must be acceptable to have return a PyType the initialisation of which will be completed later, and only unacceptable to leave the monitor with T (or any PyType) incomplete. At any rate, that's what we do now, and I have half an idea how to reproduce it. The other thing I'm not sure about is when we would like a PyType to be collected. A ClassValue will be freed when the Class C is freed. And the fact that the value (PyType T) hard-refers to C will not prevent this. If something else hard-refers to T, then C won't be unloaded. I dare say JVMs try to avoid premature unloading of C, from which T benefits. The difficult case may be where C continues in use, but we'll never want T again. T would be elegible for collection except it is in the ClassValue for C. Is this a significant case? If it's common enough, then I think we need ClassValue<WeakReference<PyType>>, and if you want to defer that collection, a ReferenceQueue (not a thing I've used before) leading to a sort of recycling bin, your second cache. I'm pretty sure that could be thread safe, although it needs a secondary garbage collection thread to empty it. I'll try without this death-row cache, first. Jeff |