Hi there,

In the past I have made some mailing list posts and filed bugs against jython w.r.t. its behavior in dynamic classloader environments like Tomcat or OSGi where classloaders can come and (more importantly) go at runtime. Not having heard any significant feedback from the jython dev community on this topic, I have been digging into the source code to try and find the root cause(s) and potential solutions.

As I have stated previously, one of the biggest problem areas seems to be PyType.class_to_type - this is a static map that is added to but never removed from. It contains class objects both as keys in the map as well as referenced by some of the values (e.g. a python type that implements a java interface will hold a java proxy object which references that java interface by class.) In case it isn't obvious, this is not an OK thing to do in environments like Tomcat or OSGi when reloading classes - in this case the old classloader is removed and a new classloader is created to load the new versions of the classes. The old classloader cannot be garbage collected until all references to its classes are garbage collected, and this cannot happen if they are being referenced by a static map like class_to_type. Beyond leaking memory (and consuming permgen space) references to the old classes can lead to buggy behavior, since certain object instances may be operating on a mix of old and new classes.

I have been thinking of a few workarounds/solutions here. Maybe someone on this mailing list can comment on these ideas or suggest other ones:
(1) Make the "key" in the PyType.class_to_type map be a string rather than a class object: since there is no longer a reference to the java class, this would allow the classloader to be garbage collected, but may introduce some issues if the same-named class is loaded by two different classloaders, since they will be indistinguishable in the map. Also, this doesn't fully solve the problem because I have found that some of the "values" in the map also contain references to the java class.

(2) Provide a method to explicitly clear out the class_to_type map: I would be happy to write code that calls this each time a classloader unload/reload occurs. It would be simple to write a function that clears out this map, but I am wondering what the ramifications of this would be (beyond the efficiency hit, which I am willing to take.) Can someone familiar with PyType help fill in some details here? While perusing through the code I noticed a lot of classes cache static instances of types returned from PyType.fromClass() (e.g. about 90 instances in the org.python.antlr package) - if we clear out and repopulate this map then there could be cases where a given python type has two PyType instances representing it in the heap. This could be a problem if anyone uses == to compare PyTypes (which does appear to happen in __builtin__.isinstance()) ... am I reading this right? Would this be a problem or not? Would there be any other issues?

(3) Change class_to_type from a static map to a map associated with a specific PythonInterpreter (or PySystemState) object. This is my preferred approach, since I can just make sure I destroy all my interpreters any time classloaders are reloaded. This also seems like the best long-term solution, since static variables cause a *lot* of problems, especially with dynamic classloaders.

(4) Anyone have any other ideas?


Thanks,
--matt

P.S.: If you are interested in this thread, please also take a look at bug #1522, which I recently commented on. This bug report shows that repeated execution of a jython script that includes a python class that implements a java interface or subclasses a java class will cause PyType.class_to_type to grow without bound, eventually exhausting all permgen space. For some reason a new proxy object is created every time the script is run and added to class_to_type with a unique name (e.g. MyJavaInterface$0 MyJavaInterface$1 etc). I would really like to see this fixed as well, but I don't know what purpose the unique suffix serves ... maybe someone more knowledgeable with this code can explain the rationale for this and suggest a fix.