Thread: [java-gnome-hackers] The next stage in memory management
Brought to you by:
afcowie
From: Andrew C. <an...@op...> - 2010-01-08 06:30:56
|
Way back in the dark ages, we started the current version of java-gnome. We [especially Vreixo] invested a huge amount of time in getting the memory management correct, specifically the use of ToggleRefs to manage the reference count that a java-gnome proxy has on a GObject, while being able to return the existing proxy for a GObject if there is one, using WeakReferences and the JNI equivalent. Magic no end. Through extensive experience, that work has held up nicely. If you were to look in the base class Pointer, you'd see a nice little fragment I wrote four or so years ago as follows: /** * Parent release function. Will be called by the Java garbage collector * when it invokes the finalizer, so this is the time to release * references and free memory on the C side. */ protected abstract void release(); /* * This is a placeholder to remind us of the cleanup actions that will be * necessary, irrespective of the finalizer technique used. */ protected void finalize() { release(); } Right from the get-go, I knew that we'd have to have our own code path to free resources, and I also knew full well that Java's finalize() mechanism is "unreliable" and that really if you need to clean up after yourself you should be using PhantomReferences (or WeakReferences or SoftReferences, depending). Now, what most people meant when they talk about "unreliable" is that they are not "guaranteed" to be called; a number of circumstances [/implementations] just terminate the Java VM process without calling the finalizers of every bloody object. So if (say) you're holding on to database client connections, they won't necessarily get cleared / closed / etc. Bad. In normal usage, though, finalizers *do* indeed get called, and for our purposes they were fine. And so the placeholder above has been doing us. I've started to run into a circumstance where this is not ideal, however. I'm doing something that is [org.gnome.pango] Layout & LayoutLine heavy; down in Pango itself these use a lot of resources; ordinarily that's fine because they are transient, but the thing I am working on is generating a *lot* of them, and they aren't getting free()'d until ... well, last time I ran the app I got myself > 100 MB of writable memory before a full GC happened. Yeech. The problem is that the full Java garbage collector [and I'm talking about OpenJDK HotSpot here] does not ordinarily run until the VM thinks it is out of heap. And that isn't going to happen anytime soon on most systems. Which means that finalize() isn't get called until *MUCH* later than we'd like, and we're holding on to tons of memory (allocated by glib) unnecessarily. Telling people just to call System.gc() doesn't really seem to address the weakness. Which brings us back to the _other_ purpose of SoftReference, WeakReference, and PhantomReferences: taking actions on Java objects as they go through the lifecycle, including being able to do cleanup long before finalize() gets called by the GC in it's second last phase. See file:///usr/share/doc/openjdk-6-doc/api/java/lang/ref/package-summary.html#reachability for more details. Presumably we want another WeakReference, plus a ReferenceQueue. The question is: what will poll the reference queue, and when? Sometime people use a separate thread for that. Another possibility would be to use an idle handler setup from the native side. A third option would be to poll the reference queue as a (generated) part of every JNI call. Thoughts? AfC Sydney P.S. What I *really* want to do is override g_object_ref() and g_object_unref(), but that'd only be possible with a LD_PRELOAD hack. But if they were plugable, and in conjunction with setting the Glib allocator function [which is plugable] to use (say) direct buffers, we could really get cool about leak detection on both sides. But that's another project. -- Andrew Frederick Cowie Operational Dynamics is an operations and engineering consultancy focusing on IT strategy, organizational architecture, systems review, and effective procedures for change management: enabling successful deployment of mission critical information technology in enterprises, worldwide. http://www.operationaldynamics.com/ Sydney New York Toronto London |
From: Vreixo F. L. <met...@ya...> - 2010-01-12 17:29:34
|
Hi all, I hope to take a further look at this asap, but I'd like to give you some impressions about this issue: > Now, what most people meant when they talk about "unreliable" is that > they are not "guaranteed" to be called; a number of circumstances > [/implementations] just terminate the Java VM process without calling > the finalizers of every bloody object In any case, a "kill -9 whatever" will terminate the JVM without any cleanup (either in finalize() or by other means) so I doubt "special circumstances" should be handled at all by java-gnome. >. So if (say) you're holding on to > database client connections, they won't necessarily get cleared / > closed / etc. Bad. That's not necessarily true. When a process finalizes (either normal way or killed) Linux will free the systems resources it was using. In particular, TCP connections are regularly closed (I mean, a TCP FIN packet is sent, however application-level close, if any, is not, as expected). > The problem is that the full Java garbage collector [and I'm talking > about OpenJDK HotSpot here] does not ordinarily run until the VM thinks > it is out of heap. And that isn't going to happen anytime soon on most > systems. Which means that finalize() isn't get called until *MUCH* later > than we'd like, and we're holding on to tons of memory (allocated by > glib) unnecessarily. I wonder if what we should do is to launch any java-gnome application with specific GC-related JVM options. > Telling people just to call System.gc() doesn't really seem to address > the weakness. Afaik, in modern JVM System.gc() does nothing. > Which brings us back to the _other_ purpose of SoftReference, > WeakReference, and PhantomReferences: taking actions on Java objects as > they go through the lifecycle, including being able to do cleanup long > before finalize() gets called by the GC in it's second last phase. In any case, does not the GC need to be called in order to the JVM detect an object has changed its reachability? In such case, using another kind of reference would be useless. Or am I missing something? > Presumably we want another WeakReference, plus a ReferenceQueue. The > question is: what will poll the reference queue, and when? What exactly are you thinking about? Do you plan that when an object changes its reachability to, say, weak, a cleanup method will be invoked? If so, why do we need another WeakReference? Can't we just use the weak reference we have on Plumbing? In such case, the Plumbing.unregisterProxy(this); we have on Proxy.finalize() can be invoked as part of such post-weak processing. > Sometime people use a separate thread for that. Another possibility > would be to use an idle handler setup from the native side. A third > option would be to poll the reference queue as a (generated) part of > every JNI call. The idle handler seems the best alternative to me, at least the one with less performance impact. On applications with high memory requirements, however, may the 3rd option is the best. Anyway, we can offer this as a compilation or runtime option (maybe via a JVM parameter or environment var?) Cheers Vreixo PS: I'm glad to write to this list again! Congratulation to all of you for your hard work. ____________________________________________________________________________________ Veja quais são os assuntos do momento no Yahoo! +Buscados http://br.maisbuscados.yahoo.com |
From: Andrew C. <an...@op...> - 2010-01-12 23:23:11
|
On Tue, 2010-01-12 at 17:29 +0000, Vreixo Formoso Lopes wrote: > Hi all, Hey Vreixo, nice to see you. > I hope to take a further look at this asap, but I'd like to give you > some impressions about this issue: Sure. Quick replies follow (and so anyone please feel free to re-reply to the original message, or this). > In any case, a "kill -9 whatever" will terminate the JVM without any cleanup (either in finalize() > or by other means) so I doubt "special circumstances" should be handled at all by java-gnome. Sure. That said, full application termination isn't a case we're worried about here. If the release() -> g_object_unref() -> free() doesn't run at shutdown that doesn't really hurt because a few cycles later the process will be destroyed by the kernel and its memory freed. But that's GTK' & X's problem. As "users" of GTK, we need to Do The Right Thing in all other circumstances. > > The problem is that the full Java garbage collector [and I'm talking > > about OpenJDK HotSpot here] does not ordinarily run until the VM thinks > > it is out of heap. And that isn't going to happen anytime soon on most > > systems. Which means that finalize() isn't get called until *MUCH* later > > than we'd like, and we're holding on to tons of memory (allocated by > > glib) unnecessarily. > > I wonder if what we should do is to launch any java-gnome application with specific > GC-related JVM options. I hadn't thought of that. Maybe... But -XX options aren't exactly something we can rely on. Still, if we know we're running HotSpot VM 6 [say], then we should tune it as best we know how. And meanwhile there are the GLib memory related options. But I don't think any of them (ie "always-malloc") are necessary for us [at least, not unless we mess with the GLib allocator functions] > > Which brings us back to the _other_ purpose of SoftReference, > > WeakReference, and PhantomReferences: taking actions on Java objects as > > they go through the lifecycle, including being able to do cleanup long > > before finalize() gets called by the GC in it's second last phase. > > In any case, does not the GC need to be called in order to the JVM detect an object > has changed its reachability? In such case, using another kind of reference would be > useless. Or am I missing something? Good question. My understanding is that *finalize* [-> disposal] happens very late (if at all) and only in full GC. So in all the work we did a few years back in ValidateMemoryManagement etc, we were calling System.gc() and so manually triggering a full GC, and seeing our objects release() -> removeToggleRef() all correctly. Using a breakpoint in Eclipse's debugger the finalize() methods are indeed getting called... but late in the game. > In any case, does not the GC need to be called in order to the JVM > detect an objecthas changed its reachability? In such case, using > another kind of reference would be useless. Or am I missing something? So my inference & observation is that a) the GC runs fairly often, but the "full" GC does not. b) ReferenceQueues are handled in a Thread separate from the GC, and that thread runs like any other thread. [see the code in java.lang.ref.Reference in OpenJDK jdk/src/share/classes/java/lang/ref/Reference.java which is where the ReferenceHandler thread gets started. I get that the GC treats it a bit specially, but it's still Java code] anyway, if (b) is true, then [say] WeakReferences being enqueued to the queue they are registered to happens automatically & soon after they are eligible. So we could react to that. Whereas finalize() happens MUCH MUCH later. [which sucks, but there you have it] > > > Presumably we want another WeakReference, plus a ReferenceQueue. The > > question is: what will poll the reference queue, and when? > > What exactly are you thinking about? Do you plan that when an object changes its reachability > to, say, weak, a cleanup method will be invoked? So interesting distinction here: in the GObject ToggleRef code, we switch everything to and from WeakReference. And later those go away. But ReferenceQueues are for _after_ something finishes at a given reachability. As far as I can tell, a Reference gets enqueued to the ReferenceQueue it is registered to after it _leaves_ (or is eligible to leave) that reachability level. Which would be all references to a given Java object being weak or less [which is when the ToggleRef is the last Ref, and we don't have any strong Java references in our Java code]. Which is when we are currently then waiting for finalize() to be invoked, calling release(), -> g_object_remove_toggle_ref() -> free(). So, if we instead call release() in response to getting a [Weak]Reference enqueued to a ReferenceQueue, it should be the same point. But maybe I'm missing something :) > If so, why do we need another WeakReference? > Can't we just use the weak reference we have on Plumbing? In such case, the > > > Plumbing.unregisterProxy(this); > > we have on Proxy.finalize() can be invoked as part of such post-weak processing. Except that is a WeakReference for Proxy only, not the Pointers (Boxeds, etc which is what GValues and PangoLayoutLines are right?). So I figured it would have to be a WeakReference created when Pointer are constructed. ie call Plumbing.registerPointer() or so. Of course, if we have to have a WeakReference for everything, then maybe we can use it for registerProxy() too, rather than having a second one there. But your suggestion (?) of polling the queue when unregister*Proxy*() is called could make sense. We just need to process the queue periodically. Doesn't have to be in it's own thread. ++ Anyway, ReferenceQueues are just one idea. I'm not saying it's the best way to go. It's possible we could be more aggressive when we handle window delete-event -> hide 'n all, or maybe special case handling of expose-event, or... AfC Sydney |