From: Ian R. <ian...@ma...> - 2006-03-01 17:14:15
|
Hi, the Jikes RVM uses "--enable-portable-native-sync" with classpath to try to force gtk to use the JNI for locking. It seems this is buggy. Removing this line from jBuildClasspathJar will enable the simple Frame test below to run: import java.io.*; import java.lang.*; import java.awt.*; public class test{ public static void main(String[] args) { Frame f = new Frame(); f.setVisible(true); } } The gtk peers assume a 1-to-1 mapping of java threads to pthreads. Jikes RVM has green threads and the portable native sync was supposed to cludge gtk locking onto the Jikes RVM's. This is likely to cause problems with gtk applications trying to inter-operate with the Jikes RVM (I'm informed by the classpath irc channel). So to solve the problem with the portable native syncing, is it possible to create a version of the Jikes RVM that has a 1-1 java thread to pthread mapping? Currently there is a limit of 11 processors/pthreads being specified on the command line. Could we raise all the issues and build this into an RFE? Thanks, Ian |
From: David P G. <gr...@us...> - 2006-03-01 22:58:12
|
> So to solve the problem > with the portable native syncing, is it possible to create a version of > the Jikes RVM that has a 1-1 java thread to pthread mapping? Currently > there is a limit of 11 processors/pthreads being specified on the > command line. Could we raise all the issues and build this into an RFE? Possible: yes Desirable: probably (my opinion only; others involved in the project may disagree) Easy: No. Being optimistic, this is a several month full-time effort and impacts every subsystem in the VM. Would involve ripping out and replacing the whole threading/synchronization implementation, some parts of the JNI implementation, and has rippling implications throughout the VM including the GC initiation process and the design of the adaptive system. Historically, (ie, when Jikes RVM was being built initially in 1997-2000), native threading had fairly horrific scalability and synchronization performance. Therefore Jikes RVM was built using m-n threading and demonstrated significantly better SMP scalability than contemporary product JVMs that used native threading. In the last 5 years native thread implementations, especially on linux, have gotten much better so it's less clear that m-n threading is a big win. If we were starting with a clean slate in 2006, I think we might have done something different than what made sense in 1998. So, although this is a massive project I personally think we should be working towards it. --dave |
From: Ian R. <ian...@ma...> - 2006-03-02 12:15:25
|
Thanks Dave, as I understand the problem it goes like this: 1) we create a number of pthreads (0 to 11) which have VM_Processors that have list(s) of VM_Threads. This is historic, but there are also likely virtues to green threads. I can't imagine losing a green threading ability would be desirable, for example, our working with using the Jikes RVM in an OS relies upon green threads at the moment. 2) GTK peers is built upon GLIB which may be built upon pthreads 3) we try to over-ride the GThreadFunctions interface for GLIB so that it uses JNI and thereby the JVM's threading mechanism. This is called enable portable native sync in classpath and is contained in the file "classpath/native/jni/gtk-peer/gthread-jni.c". There is a problem with (3) in that over-riding GLIB could cause problems when GTK code interacts with other GTK code. I'm not sure how this happens, but it could be a fundamental reason for not wanting enable native sync. "What happens when I run an AWT application on the Jikes RVM?" 1) with portable native sync: the application dies before drawing the first window. Various backtraces are attached to bug #1147592 and http://www.cs.man.ac.uk/~irogers/GtkPeerTracing is probably the most succinct. It appears the object locking has gotten itself into a mess and this leads to an assertion failure. 2) without portable native sync: the application runs become then becomes unresponsive. For example, SpecJVM will create its Frame but then lock up. Varying the number of VM_Processors alters how far execution proceeds. So the fixes appear to be: 1) try to get the portable native sync working again and ignore the fact that this may cause problems if trying to embed the Jikes RVM as part of another GTK application, or vice versa (appologies again for not really knowing what the issue is here) 2) rewrite the Jikes RVMs threading so that it doesn't use an m-to-n model, but instead all threads are pthreads. Dave has explained that this would be a fairly fundamental rewrite. 3) look for alternate fixes. Is there much wrong with the Jikes RVM thread model that I'm ignoring? M-to-n threading strikes me as a good thing as I like cheap context switching. So I'd rule out (2). There appear to be issues with (1). We may be able to patch it up again but I'm led to believe it will only provide a half-way measure. So what alternatives are there? The problem I think comes down to synchronisation; Java has one model of synchronisation, GLIB/pthreads another. As I understand it the two systems _don't_ need to interact for GTK peers to work. The problems come when a java thread becomes blocked on something its called by JNI. GTK peers do this a lot. The Jikes RVM will detect this situation. The process goes: 1) pthread with VM_Processor on it becomes blocked in JNI 2) native daemon thread notices blocked thread (by decrementing a counter on it) 3) native daemon thread marks the blocked VM_Thread as being blocked 4) native daemon makes the non-daemon pthread the native daemon, and the native daemon becomes non-daemon.. the new non-daemon pthread can now start running threads from the previously blocked VM_Processor 5) the previously blocked thread that's handled some JNI becomes unblocked and on returning recognizes it is on the native daemon processor and ships itself back onto the true processor 6) the native daemon processor (now running on the other pthread) goes back into idly detecting deadlock So, the Jikes RVM with one processor can handle the situation where two threads may synchronize on a variable in JNI code. The value taken to notice a blocked thread is likely to be large, maybe we can improve detection of the above situation by using a smaller value when a "gnu.java.awt.peer.gtk.*" method is called, or assuming calls to JNI from GTK code will become blocked. So a remaining problem is how to handle synchronization by more than 2 threads on something in JNI code? Perhaps we can dynamically create native daemon processors and associated pthreads for this job? That would mean that many threads could block on a native lock without blocking the whole JVM. Native processors could idled or possibly reclaimed as threads unblock. I'm also making an assumption that although there's a unique mapping of java thread to pthread when its blocked, with this technique the java threads will move around pthreads. This may cause problems if pthread numbers are being cached in the GTK peer code across seperate JNI calls, but I don't believe this is the case. What do people think? Thanks, Ian David P Grove wrote: >> So to solve the problem >> with the portable native syncing, is it possible to create a version of >> the Jikes RVM that has a 1-1 java thread to pthread mapping? Currently >> there is a limit of 11 processors/pthreads being specified on the >> command line. Could we raise all the issues and build this into an RFE? >> > > Possible: yes > Desirable: probably (my opinion only; others involved in the > project may disagree) > Easy: No. Being optimistic, this is a several month full-time > effort and impacts every subsystem in the VM. > > Would involve ripping out and replacing the whole > threading/synchronization implementation, some parts of the JNI > implementation, and has rippling implications throughout the VM including > the GC initiation process and the design of the adaptive system. > > Historically, (ie, when Jikes RVM was being built initially in > 1997-2000), native threading had fairly horrific scalability and > synchronization performance. Therefore Jikes RVM was built using m-n > threading and demonstrated significantly better SMP scalability than > contemporary product JVMs that used native threading. In the last 5 years > native thread implementations, especially on linux, have gotten much > better so it's less clear that m-n threading is a big win. If we were > starting with a clean slate in 2006, I think we might have done something > different than what made sense in 1998. So, although this is a massive > project I personally think we should be working towards it. > > --dave > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Jikesrvm-core mailing list > Jik...@li... > https://lists.sourceforge.net/lists/listinfo/jikesrvm-core > |
From: Ian R. <ian...@ma...> - 2006-03-02 17:43:42
|
I've been playing yet further. I can run the ScribbleFrame sample AWT application from Java in a Nutshell. I've rewritten the gtkMain loop so that it is in Java and calls the gtk_main_iteration function. This allows the gtkMain loop to enter C and then return back to Java. Having a thread that is just always in C means the GC barrier never succeeds (I think). There's a picture here: http://www.cs.man.ac.uk/~irogers/scribble.png Anyway, the remaining problem with this is that deadlock occurs. I believe this occurs because: 1) the thread with gtkMain on it enters C code 2) something handled here enters Java (possibly a JNI call or maybe an event dispatch) 3) the Java code enters GTk peer code again on the same pthread as the contended lock is on the same pthread then we get deadlock. A solution to this is to put a call at the entry/exit of all JNI code to the VM_Processor saying that the current thread wants to enter/leave some JNI code. The VM_Processor can then move the thread, if necessary, to another available VM_Processor so that we don't get a nested IN_NATIVE situation. Any thoughts? Thanks, Ian |
From: David P G. <gr...@us...> - 2006-03-02 18:40:10
|
> as the contended lock is on the same pthread then we get deadlock. A > solution to this is to put a call at the entry/exit of all JNI code to > the VM_Processor saying that the current thread wants to enter/leave > some JNI code. The VM_Processor can then move the thread, if necessary, > to another available VM_Processor so that we don't get a nested > IN_NATIVE situation. Any thoughts? Several person years were spent pursuing more or less this high level design at IBM in 2000/2001. The basic functionality was there on PPC, never on IA32 (implementing it in Jikes RVM requires deep and fundamental changes to platform-specific JNI compilers). It's really hard to get something working, especially with reasonable performance, stability and maintainability (the JNI compilers are hairy). You can't afford p-thread context switches on all JNI-boundary crossings, so the engineering gets quite complex. I think this whole approach of trying to make Jikes RVM's m-n threading play nicely with native code that assumes a JVM that uses native threading is quite hard, and not something that is worth attempting again. Maybe I'm overly negative since I was here for the last attempt at this and saw how painful and ultimately fruitless it was, but I think this episode from the project's history shouldn't be lightly ignored. If we're going to make a serious attempt at running AWT code again, then I only see two viable alternatives. 1. The point solution of getting the "native-sync" code working again. GTk is a nice special case of a native library because it explicitly factors out synchronization operations into a function vector, which you can replace with one of you own. The one we provide is probably out of date (old GTK version), but it shouldn't be that hard (relatively speaking) to update it and get this code working again. 2. The general solution of moving Jikes RVM to using native threads. It might be possible to do this in a backwards compatible way by introducing much stronger abstraction boundaries around the JVM's threading/scheduling module and generally hiding the details of how threading works from most of the JVM. I'm not sure if supporting both in Jikes RVM is a viable option or not, but it should be attempted first. If we can bound the m-n vs. native threading awareness to a small set of classes, then it wouldn't be too bad to support both. If it can't be confined, then it's much less clear what is the right thing for the project to do. --dave |
From: Ian R. <ian...@ma...> - 2006-03-03 08:39:53
|
David P Grove wrote: >> as the contended lock is on the same pthread then we get deadlock. A >> solution to this is to put a call at the entry/exit of all JNI code to >> the VM_Processor saying that the current thread wants to enter/leave >> some JNI code. The VM_Processor can then move the thread, if necessary, >> to another available VM_Processor so that we don't get a nested >> IN_NATIVE situation. Any thoughts? >> > > Several person years were spent pursuing more or less this high level > design at IBM in 2000/2001. The basic functionality was there on PPC, > never on IA32 (implementing it in Jikes RVM requires deep and fundamental > changes to platform-specific JNI compilers). It's really hard to get > something working, especially with reasonable performance, stability and > maintainability (the JNI compilers are hairy). You can't afford p-thread > context switches on all JNI-boundary crossings, so the engineering gets > quite complex. I think this whole approach of trying to make Jikes RVM's > m-n threading play nicely with native code that assumes a JVM that uses > native threading is quite hard, and not something that is worth attempting > again. Maybe I'm overly negative since I was here for the last attempt at > this and saw how painful and ultimately fruitless it was, but I think this > episode from the project's history shouldn't be lightly ignored. > > If we're going to make a serious attempt at running AWT code again, then I > only see two viable alternatives. > > 1. The point solution of getting the "native-sync" code working again. > GTk is a nice special case of a native library because it explicitly > factors out synchronization operations into a function vector, which you > can replace with one of you own. The one we provide is probably out of > date (old GTK version), but it shouldn't be that hard (relatively > speaking) to update it and get this code working again. > > 2. The general solution of moving Jikes RVM to using native threads. It > might be possible to do this in a backwards compatible way by introducing > much stronger abstraction boundaries around the JVM's threading/scheduling > module and generally hiding the details of how threading works from most > of the JVM. I'm not sure if supporting both in Jikes RVM is a viable > option or not, but it should be attempted first. If we can bound the m-n > vs. native threading awareness to a small set of classes, then it wouldn't > be too bad to support both. If it can't be confined, then it's much less > clear what is the right thing for the project to do. > The problems with 1 is that, I'm told, we end up with a GTk peers implementation that only works for the Jikes RVM. We're also not worrying about other peer implementations. I've looked at the GTk code and it all looks remarkably reasonable. I'm not sure its worth pursuing this option. I'm not sure how feasible it is to get 2 implemented. Your last e-mail implied it was a several month job. I'm not sure I'm understanding the problem. As I see it it comes from the VM_Processor implementation: 1) We can handle a blocked JNI thread but we can't handle 2. 2) If a non-blocked JNI thread calls back into Java then this can lead to a second Java thread calling into the JNI and bringing about deadlock in the JNI code. Example below: In GTk peers the case is that a thread is trying to raise a window and another thread is trying to poll the GTk main loop. On entry into the GTk peer code a lock is always aquired for gdk threading. This code works fine if the two threads are on separate VM_Processors, I have merrily scribbled on the scribble pad for minutes. The problem comes when we start trying to have JNI code on a pthread nested within JNI code on the same pthread. This situation is likely always to be bad. Solution? the problem comes when we have nesting of JNI threads on the same VM_Processor, so lets stop this from happening. This will only cost us in performance in the situation where we currently have successful concurrent JNI calls on the same pthread - I would imagine this is seldom or not really impacted by serialising entry to the JNI. What I propose is that the JNI epilogue and prologue code will probably get a little bit larger in this situation, so moving it into a VM_Processor.JNIprologue() and VM_Processor.JNIepilogue() will initially make it more maintainable. The prologue code will be responsible for waiting until the processor isn't in a nested JNI call before allowing entery to the JNI code. As we have the code in a method we can also add code easily to ship a thread trying to go into JNI onto a different VM_Processor and pthread. The remaining problem is that we could exhaust our pool of pthreads. We can avoid that momentarily by specifying more processors on the command line (-X:processors=11). We can increase this value beyond 11 with ease (its a constant) but then we end up with lots of idling pthreads, so we can probably do better. Why do I like this solution: 1) it solves the problem beyond just GTk peers, so QT peers for example stand a chance of working.. QT peers almost certainly don't use the glib threading interface. There are probably more examples of JNI code having there own threading and locking semantics. 2) it seems like a small amount of code is needed. We're changing the JNI prologues and epilogues from just setting the IN_NATIVE flag to call a VM_Processor method. The VM_Processor methods are a busy-wait loop that can possibly look to ship work onto other VM_Processors/pthreads. Possibly we can guard the generation of the calls in the JNI compiler so that we can fall back to the current situation should it seem useful for a benchmark. Any remaining problems: the one I can think of is that we need all threads to be IN_JAVA for GC. Some JNI threads disappear into JNI code never to return, the current GTk peers does this but it is a trivial job to alter it. When AWT has worked in the past we were able to get into the GC as the GTk main loop would call a glib thread function which would call back into Java code into the JVM and we'd hit a GC barrier. We can solve this problem by: 1) banning JNI calls that never return. Easy and feasible but possibly unpopular 2) altering GC so that we can special case this situation so it doesn't deadlock the GC, the thread can be moved and a clean up performed if it did ever return. given that (1) is easy and that we can modify classpath I prefer it as a solution. I'm aware its been a lot of effort to get what we have to where it is. Our (Jamaica) world differs from this in that we have many more processors. I don't think we need to start adding extra wrapper methods to all the threading code (my opinion is that this has already confused what's going on in the opt. compiler). I also want to see AWT working quickly and with minimal effort from me as I need to get on with the day job. Feedback is definitely welcome. I notice the code has Julian Dolby and Steven Augart's names on it, I hope they can pass on what they know. Many thanks, Ian Rogers |
From: David P G. <gr...@us...> - 2006-03-03 15:03:48
|
We don't need all threads to be IN_JAVA for a GC. Threads are allowed to be BLOCKED_IN_NATIVE as well. The code to actually change the processor state to BLOCKED_IN_NATIVE might be dead/deleted (part of the code that was cleaned out when we decided to discard the old attempt to get m-n threading working with native calls), but the logic on the GC side is still there. By design, a Java-native transition is a GC safepoint, so as long as we prevent the thread that is in native code from getting back to Java-land (IN_JAVA), then we can go ahead and scan it's stack while it's out in C land. --dave > the one I can think of is that we need all threads to be IN_JAVA for GC. > Some JNI threads disappear into JNI code never to return, the current > GTk peers does this but it is a trivial job to alter it. When AWT has > worked in the past we were able to get into the GC as the GTk main loop > would call a glib thread function which would call back into Java code > into the JVM and we'd hit a GC barrier. We can solve this problem by: |
From: Ian R. <ian...@ma...> - 2006-03-07 17:15:10
|
Thanks Dave, I have created 2 patches that get AWT running on the Jikes RVM. (1) patch classpath's gtkMain so that it doesn't become blocked in native, disable the portable native sync code and patches to the VM_Processor and VM_JNICompiler so that only one thread can transition from Java to native per VM_Processor (2) patch to classpath's gdk_threads_enter (so that it j.l.Thread.yields if it can't aquire the initial gdk threads lock), again the patch to gtkMain and again disable portable native sync of these (2) works by far the best. It seems to me there's an issue with being blocked in native for gtkMain, the most probable reason is that if this occupies the daemon processor we can't get another processor and become deadlocked. So there's likely to be a blocked in native bug (1). Portable native syncing is broken and as such I don't think we should use it. It also has the downside of only working if the JVM hasn't been created from an environment which has already set gthreads running (so running the Jikes RVM as a plugin would become an issue). I think this should be a new bug (2). Classpath's doesn't support M-to-N threading (portable native sync was a move in this direction - see bug 2). To stop two native threads contending for the same gdk_threads_enter lock then we can either modify entry to native code by the Jikes RVM, or remove the problem in classpath. Of these stopping the problem in classpath works best, but classpath needs modifying to conditionally include this patch if compiling for the M-to-N threading case. I think this is the (3)rd bug to solve. Obviously running AWT code is a selling point for the Jikes RVM, if it can do it. I wonder if its worth addressing the 3 bugs I've found (and worked around) or waiting for the lengthy rewrite of the threading code (to N-to-N) you proposed. If we fix AWT we should find some way to regression test it too. Regards, Ian Rogers David P Grove wrote: > We don't need all threads to be IN_JAVA for a GC. Threads are allowed to > be BLOCKED_IN_NATIVE as well. The code to actually change the processor > state to BLOCKED_IN_NATIVE might be dead/deleted (part of the code that > was cleaned out when we decided to discard the old attempt to get m-n > threading working with native calls), but the logic on the GC side is > still there. > > By design, a Java-native transition is a GC safepoint, so as long as we > prevent the thread that is in native code from getting back to Java-land > (IN_JAVA), then we can go ahead and scan it's stack while it's out in C > land. > > --dave > > >> the one I can think of is that we need all threads to be IN_JAVA for GC. >> > > >> Some JNI threads disappear into JNI code never to return, the current >> GTk peers does this but it is a trivial job to alter it. When AWT has >> worked in the past we were able to get into the GC as the GTk main loop >> would call a glib thread function which would call back into Java code >> into the JVM and we'd hit a GC barrier. We can solve this problem by: >> > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Jikesrvm-core mailing list > Jik...@li... > https://lists.sourceforge.net/lists/listinfo/jikesrvm-core > |
From: Ian R. <ian...@ma...> - 2006-03-09 17:52:05
|
Kaffe has jthreads which are a green thread implementation. My understanding is they differ from the Jikes RVM threads as they are on one underlying thread and yield on a timer tick, not on a counter. If this doesn't bring about blocking then does this mean that we don't do a thread yield on a virtual processor on a timer tick? If we did could this fix these problems? Thanks, Ian Ian Rogers wrote: > Thanks Dave, > > I have created 2 patches that get AWT running on the Jikes RVM. > > (1) patch classpath's gtkMain so that it doesn't become blocked in > native, disable the portable native sync code and patches to the > VM_Processor and VM_JNICompiler so that only one thread can transition > from Java to native per VM_Processor > > (2) patch to classpath's gdk_threads_enter (so that it > j.l.Thread.yields if it can't aquire the initial gdk threads lock), > again the patch to gtkMain and again disable portable native sync > > of these (2) works by far the best. > > It seems to me there's an issue with being blocked in native for > gtkMain, the most probable reason is that if this occupies the daemon > processor we can't get another processor and become deadlocked. So > there's likely to be a blocked in native bug (1). > > Portable native syncing is broken and as such I don't think we should > use it. It also has the downside of only working if the JVM hasn't > been created from an environment which has already set gthreads > running (so running the Jikes RVM as a plugin would become an issue). > I think this should be a new bug (2). > > Classpath's doesn't support M-to-N threading (portable native sync was > a move in this direction - see bug 2). To stop two native threads > contending for the same gdk_threads_enter lock then we can either > modify entry to native code by the Jikes RVM, or remove the problem in > classpath. Of these stopping the problem in classpath works best, but > classpath needs modifying to conditionally include this patch if > compiling for the M-to-N threading case. I think this is the (3)rd bug > to solve. > > Obviously running AWT code is a selling point for the Jikes RVM, if it > can do it. I wonder if its worth addressing the 3 bugs I've found (and > worked around) or waiting for the lengthy rewrite of the threading > code (to N-to-N) you proposed. > > If we fix AWT we should find some way to regression test it too. > > Regards, > > Ian Rogers > > David P Grove wrote: >> We don't need all threads to be IN_JAVA for a GC. Threads are >> allowed to be BLOCKED_IN_NATIVE as well. The code to actually change >> the processor state to BLOCKED_IN_NATIVE might be dead/deleted (part >> of the code that was cleaned out when we decided to discard the old >> attempt to get m-n threading working with native calls), but the >> logic on the GC side is still there. >> >> By design, a Java-native transition is a GC safepoint, so as long as >> we prevent the thread that is in native code from getting back to >> Java-land (IN_JAVA), then we can go ahead and scan it's stack while >> it's out in C land. >> >> --dave >> >> >>> the one I can think of is that we need all threads to be IN_JAVA for >>> GC. >> >> >>> Some JNI threads disappear into JNI code never to return, the >>> current GTk peers does this but it is a trivial job to alter it. >>> When AWT has worked in the past we were able to get into the GC as >>> the GTk main loop would call a glib thread function which would call >>> back into Java code into the JVM and we'd hit a GC barrier. We can >>> solve this problem by: >>> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by xPML, a groundbreaking scripting >> language >> that extends applications into web and mobile media. Attend the live >> webcast >> and join the prime developer group breaking into this new coding >> territory! >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >> _______________________________________________ >> Jikesrvm-core mailing list >> Jik...@li... >> https://lists.sourceforge.net/lists/listinfo/jikesrvm-core >> > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Jikesrvm-core mailing list > Jik...@li... > https://lists.sourceforge.net/lists/listinfo/jikesrvm-core |
From: Eliot M. <mo...@cs...> - 2006-03-09 18:03:16
|
Ian -- I'm not sure what you mean by "on a timer tick". Jikes RVM will yield only at a yield point, but generally will yield at the next yield point after a regular timer interrupt. The counters you mention as used, I believe, only for so-called _deterministic thread switching_. Best wishes -- Eliot |
From: Ian R. <ian...@ma...> - 2006-03-10 10:18:33
|
Hi Eliot/all, sorry, the timer tick events are externally generated interrupts handed to the RVM as signals. We current catch them with processTimerTick in sys.C. My understanding is that on timer ticks we will examine the stack to generate profile information as well as other book keeping. As you say we have deterministic thread switching and my point is that this differs from Kaffe (I believe). Currently the Jikes RVM is having issues with: 1) AWT - I have shown that patches to allow green thread yields and removing the main blocked native thread (gtkMain) will stop the AWT from hanging and it runs successfully. Unfortunately the GTK peers AWT implementation uses one native thread whereas the QT peers AWT implementation uses 2 threads which aren't as easily amenable to various Jikes RVM related threading changes. 2) A few mauve tests (I will quote as I won't pretend to have discovered or fully know the issues for these). From rvm/regression/tests/mauve/mauve-jikesrvm: <snip> # exclusions due to defects in Jikes RVM !java.net.DatagramSocket # hangs Jikes RVM !java.net.DatagramPacket # hangs Jikes RVM !java.net.MulticastSocket # hangs Jikes RVM !java.net.ServerSocket # hangs Jikes RVM !java.net.URLClassLoader.getResourceRemote # hangs Jikes RVM !java.awt.image.PixelGrabber # problem with gtkpeer function vector? </snip> From speaking to Mark Weilaard he asked me how the Jikes RVM handles none returning threads that didn't yield other than gtkMain? The specific example he thought of were ServerSockets. It seems to me that the mauve tests show we're not handling these properly. So, my interest switched to why Kaffe's gthreads pass mauve tests whereas the Jikes RVM doesn't? The interesting thing with Kaffe is that it has green threads too! It strikes me that the big difference is the timer tick vs. deterministic thread switching. So, could we adapt the deterministic model so that it too would yield on timer ticks? One idea maybe that if we're processing a timer tick and we're on a VM_Processor whose active thread is in native code then we could switch its active thread (yield). This would allow us possibly to avoid the situations we're getting at the moment of "hanging". The more I think about this problem I think the thread model is the cause of bugs. Another interesting point for Kaffe's jthreads is that (to my knowledge and a little bit of hunting in the source) they don't use the portable native sync wrapper to gthreads. Gthreads are a glib wrapper to the underlying threads of a system. The portable native sync code replaces the gthread functions with ones that call into the JVM. This is currently the default option for the Jikes RVM. Unfortunately this gives lots of warnings about locks and doesn't prevent the Jikes RVM hanging with even the most trivial AWT application - even before that application has had chance to draw a window to the screen. Steven Augurt first reported this bug in December 2004. It seems to me that maybe we can fix a number of bugs by adding a preemptive yield. Its only necessary for a Java thread that has entered native code (the deterministic approach will work for Java code). This may also mean we don't need classpath to change to support our thread model. Currently classpath only has portable native sync code for our benefit (ie Kaffe and other VMs don't need the gthread wrapper). My other ideas to fix AWT relied upon changing classpath to get the native thread to call into java.lang.Thread.yield(), (I did experiment with altering the JNI compiler but this doesn't work well) but some classpath developers are less than keen to see Jikes RVM fixes going into the source code - although the JNI spec does seem to lend support to an argument that JVMs shouldn't rely upon preemptive yielding. Preemptive yielding may also mean we don't need to find and fix these issues, possibly saving time :-). I really appreciate feedback here, even if it's just to say that I sound sane :-). I am also concerned that there may be unforeseen cans of worms that can be opened. I know others have a lot more experience of the Jikes RVM's threading issues than me. Thanks, Ian Rogers Eliot Moss wrote: > Ian -- I'm not sure what you mean by "on a timer tick". Jikes RVM will > yield only at a yield point, but generally will yield at the next yield > point after a regular timer interrupt. The counters you mention as used, I > believe, only for so-called _deterministic thread switching_. > > Best wishes -- Eliot > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Jikesrvm-core mailing list > Jik...@li... > https://lists.sourceforge.net/lists/listinfo/jikesrvm-core > |
From: Eliot M. <mo...@cs...> - 2006-03-10 11:52:54
|
Dear Ian -- It could be that you understood what I was trying to say, but I am not sure ... Yielding after (I won't say "at" because we cannot yield at just any old place in the JikesRVM design) a timer tick is, I believe, the default behavior. I know how it works on PPC; I am less certain about Intel, but believe it's the same (at a suitable elevl of abstraction): The signal handler code sets a flag This flag is regularly checked by running threads, i.e., at calls, returns, and back edges This is NOT the same as _deterministic_ thread switching. It has tests at the same places in the code, but its tests count down a counter and yield if it has reached zero, rather than being driven by time. Still, you seem to be saying that you want Jikes RVM to be able to yield _at_ a timer tick interrupt, rather than _shortly after_ one. This is not, in general, possible. Our whole design is based on yielding only at certain points, for which we are prepared to generate GC maps (and other reasons). However! It should be possible to arrange that it is ok to yield if we are in the middle of any of various native calls. If we set a flag as we leave Jikes RVM code for these calls (or maybe we can identify them by program counter somehow), then we can allow yielding _at_ a timer tick. But I suspect we already support something like that, otherwise things would be worse than they are. So, I am now at a loss as to what you are proposing. I don;t think we're going to go to a "yield at any instruction" model, so there needs to be some other solution. Best wishes -- Eliot |
From: Ian R. <ian...@ma...> - 2006-03-10 12:17:32
|
Hi Eliot, I think the important difference with what I'm saying is that currently we will only yield a Java thread if executing Java code. If we're executing native code that is either blocked, trying to get a mutex held by another Java thread or will never return, then this can cause the RVM to hang. What I think needs to be different is that if we're processing a timer tick, after a reasonable period of time and if the VM_Processor is marked as IN_NATIVE (ie. its current active thread is executing native code) then we force it to switch its active thread. My understanding is that native method calls are GC points, so I don't think this change would necessarily break GC. I think the thread switching code currently doesn't support yielding to a stalled Java thread executing native code, so this would need altering to support this. The IN_NATIVE flag may well have to become part of the VM_Thread rather than the VM_Processor. If we do yield then we will also have to alter the context passed to the timer tick handler so that it returns to the appropriate VM_Processor thread. My understanding for the deterministic thread switching is that there are counters rather than a flag that is polled. I think the counters can be optimised to hardware counters. I think you're right that this could be an area of divergence between Intel and PowerPC :-) I hope this is clear. Many thanks, Ian Rogers Eliot Moss wrote: > Dear Ian -- It could be that you understood what I was trying to say, but I > am not sure ... > > Yielding after (I won't say "at" because we cannot yield at just any old > place in the JikesRVM design) a timer tick is, I believe, the default > behavior. I know how it works on PPC; I am less certain about Intel, but > believe it's the same (at a suitable elevl of abstraction): > > The signal handler code sets a flag > This flag is regularly checked by running threads, i.e., > at calls, returns, and back edges > > This is NOT the same as _deterministic_ thread switching. It has tests at > the same places in the code, but its tests count down a counter and yield > if it has reached zero, rather than being driven by time. > > Still, you seem to be saying that you want Jikes RVM to be able to yield > _at_ a timer tick interrupt, rather than _shortly after_ one. This is not, > in general, possible. Our whole design is based on yielding only at certain > points, for which we are prepared to generate GC maps (and other reasons). > > However! It should be possible to arrange that it is ok to yield if we are > in the middle of any of various native calls. If we set a flag as we leave > Jikes RVM code for these calls (or maybe we can identify them by program > counter somehow), then we can allow yielding _at_ a timer tick. > > But I suspect we already support something like that, otherwise things > would be worse than they are. > > So, I am now at a loss as to what you are proposing. I don;t think we're > going to go to a "yield at any instruction" model, so there needs to be > some other solution. > > Best wishes -- Eliot > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Jikesrvm-core mailing list > Jik...@li... > https://lists.sourceforge.net/lists/listinfo/jikesrvm-core > |
From: David P G. <gr...@us...> - 2006-03-10 13:41:52
|
Hi Ian, Don't put much weight in the mauve configuration in Jikes RVM; it's a couple of years old. One way we handle blocking native calls (Linux/IA32 only) is that we interpose on select/poll and substitute non-blocking alternatives. So, when a thread in native code calls select/poll we get control back and turn it into a non-blocking call. This only works on Linux because it requires the linker/loader to be fairly permissive (in effect we're slipping in our own version of portions of libc to replace the ones the native code was actually compiled against). --dave |
From: Ian R. <ian...@ma...> - 2006-03-10 13:55:20
|
Thanks Dave, should there be a bug or RFE associated with the mauve configuration? There's no obvious one currently. I think the non-blocking optimization is great. As the optimization can only work for certain builds, presumably it wasn't invented as a work around to blocking problems? Any thoughts on preempting Java threads that have gone native? :-) Ian David P Grove wrote: > Hi Ian, > > Don't put much weight in the mauve configuration in Jikes RVM; > it's a couple of years old. > > One way we handle blocking native calls (Linux/IA32 only) is that > we interpose on select/poll and substitute non-blocking alternatives. So, > when a thread in native code calls select/poll we get control back and > turn it into a non-blocking call. This only works on Linux because it > requires the linker/loader to be fairly permissive (in effect we're > slipping in our own version of portions of libc to replace the ones the > native code was actually compiled against). > > --dave > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Jikesrvm-core mailing list > Jik...@li... > https://lists.sourceforge.net/lists/listinfo/jikesrvm-core > |
From: Ian R. <ian...@ma...> - 2006-03-12 16:48:35
Attachments:
preempt-native-threads.diff
|
In the words of Vic Reeves, I just wouldn't let it lie.. UNIX signals can either be delivered on a separate stack (specified via sigaltstack) or on the stack of the pthread that's handling them. By delivering them on the same stack I've tried to setup the timer interrupt so that it can execute and its stack looks like: softwareSignalHandler <signal handler frame> native code java code in effect it "looks like" the native code has called into the softwareSignalHandler. I thought this would be a good point to then perform a VM_Thread yield if the current pthread's VM_Processor was marked as being in native code. By calling the yield via the JNI the processor will become IN_JAVA again and we can potentially unblock the thread by scheduling another green thread. The attached patch performs this. I was hoping this might have the desired effect of unblocking native threads that don't return or are spinning on a mutex in native code that's held by another green thread. Unfortunately it doesn't. The errors include: class not found, a problem with MMTk alignment, faults outside RVM address space, stack walking problems. I disabled the IO waiting mechanism in case this related to file reading problems. Anyway, I've attached the stack back traces at the end of the e-mail. I'd appreciate anyone who may know the cause of these failures to let me know. I'm aware I will need to fix up the stack walker to handle the signal handler frame, but the errors seem to imply there's a bigger problem. Many thanks, Ian Rogers --- Exception in thread "Jikes_RVM_Boot_Thread": java.lang.NoClassDefFoundError: Could not find the class com.ibm.JikesRVM.JikesRVMSocketImpl$3: com.ibm.JikesRVM.JikesRVMSocketImpl$3 at com.ibm.JikesRVM.classloader.VM_TypeReference.resolve(VM_TypeReference.java:529) at com.ibm.JikesRVM.VM_Runtime.unresolvedNewScalar(VM_Runtime.java:275) at com.ibm.JikesRVM.JikesRVMSocketImpl.boot(JikesRVMSocketImpl.java:607) at com.ibm.JikesRVM.VM.finishBooting(VM.java:366) at com.ibm.JikesRVM.VM.boot(VM.java:110) Caused by: java.lang.ClassNotFoundException: com.ibm.JikesRVM.JikesRVMSocketImpl$3 at com.ibm.JikesRVM.classloader.VM_BootstrapClassLoader.findClass(VM_BootstrapClassLoader.java:199) at com.ibm.JikesRVM.classloader.VM_BootstrapClassLoader.loadClass(VM_BootstrapClassLoader.java:138) at java.lang.ClassLoader.loadClass(ClassLoader.java:294) at com.ibm.JikesRVM.classloader.VM_TypeReference.resolve(VM_TypeReference.java:527) at com.ibm.JikesRVM.VM_Runtime.unresolvedNewScalar(VM_Runtime.java:275) at com.ibm.JikesRVM.JikesRVMSocketImpl.boot(JikesRVMSocketImpl.java:607) at ctrying to yield om.ibm.JikesRVM.VM.finishBooting(VM.java:366) at com.ibm.JikesRVM.VM.boot(VM.java:110) Caused by: java.lang.NoClassDefFoundError: Could not find the class java.net.SocketImplFactory: java.net.SocketImplFactory at com.ibm.JikesRVM.classloader.VM_TypeReference.resolve(VM_TypeReference.java:529) at com.ibm.JikesRVM.classloader.VM_Class.<init>(VM_Class.java:854) at com.ibm.JikesRVM.classloader.VM_ClassLoader.defineClassInternal(VM_ClassLoader.java:244) at com.ibm.JikesRVM.classloader.VM_BootstrapClassLoader.findClass(VM_BootstrapClassLoader.java:180) at com.ibm.JikesRVM.classloader.VM_BootstrapClassLoader.loadClass(VM_BootstrapClassLoader.java:138) at java.lang.ClassLoader.loadClass(ClassLoader.java:294) at com.ibm.JikesRVM.classloader.VM_TypeReference.resolve(VM_TypeReference.java:527) at com.ibm.JikesRVM.VM_Runtime.unresolvedNewScalar(VM_Runtime.java:275) at com.ibm.JikesRVM.JikesRVMSocketImpl.boot(JikesRVMSocketImpl.java:607) at com.ibm.JikesRVM.VM.finishBooting(VM.java:366) at com.ibm.JikesRVM.VM.boot(VM.java:110) Caused by: java.lang.ClassNotFoundException: java.net.SocketImplFactory at java.lang.ClassNotFoundException.<init>(ClassNotFoundException.java:84) at com.ibm.JikesRVM.classloader.VM_BootstrapClassLoader.findClass(VM_BootstrapClassLoader.java:176) at com.ibm.JikesRVM.classloader.VM_BootstrapClassLoader.loadClass(VM_BootstrapClassLoader.java:138) at java.lang.ClassLoader.loadClass(ClassLoader.java:294) at com.ibm.JikesRVM.classloader.VM_TypeReference.resolve(VM_TypeReference.java:527) at com.ibm.JikesRVM.classloader.VM_Class.<init>(VM_Class.java:854) at com.ibm.JikesRVM.classloader.VM_ClassLoader.defineClassInternal(VM_ClassLoader.java:244) at com.ibm.JikesRVM.classloader.VM_BootstrapClassLoader.findClass(VM_BootstrapClassLoader.java:180) at com.ibm.JikesRVM.classloader.VM_BootstrapClassLoader.loadClass(VM_BootstrapClassLoader.java:138) at java.lang.ClassLoader.loadClass(ClassLoader.java:294) at com.ibm.JikesRVM.classloader.VM_TypeReference.resolve(VM_TypeReference.java:527) at com.ibm.JikesRVM.VM_Runtime.unresolvedNewScalar(VM_Runtime.java:275) at com.ibm.JikesRVM.JikesRVMSocketImpl.boot(JikesRVMSocketImpl.java:607) at com.ibm.JikesRVM.VM.finishBooting(VM.java:366) at com.ibm.JikesRVM.VM.boot(VM.java:110) vm internal error at: -- Stack -- Lcom/ibm/JikesRVM/VM; sysFail(Ljava/lang/String;)V at line 1075 Lcom/ibm/JikesRVM/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 573 Lcom/ibm/JikesRVM/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 554 Lcom/ibm/JikesRVM/VM; _assert(Z)V at line 534 Lorg/mmtk/vm/Assert; _assert(Z)V at line 63 Lorg/mmtk/utility/alloc/Allocator; getMaximumAlignedSize(III)I at line 138 Lorg/mmtk/utility/alloc/Allocator; getMaximumAlignedSize(II)I at line 123 Lorg/mmtk/plan/PlanLocal; checkAllocator(III)I at line 110 Lcom/ibm/JikesRVM/memoryManagers/mmInterface/MM_Interface; allocateScalar(I[Ljava/lang/Object;III)Ljava/lang/Object; at line 566 Lcom/ibm/JikesRVM/VM_Runtime; resolvedNewScalar(I[Ljava/lang/Object;ZIII)Ljava/lang/Object; at line 342 Lcom/ibm/JikesRVM/jni/VM_JNIGenericHelpers; createStringFromC(Lorg/vmmagic/unboxed/Address;)Ljava/lang/String; at line 78 Lcom/ibm/JikesRVM/jni/VM_JNIFunctions; GetStaticMethodID(Lcom/ibm/JikesRVM/jni/VM_JNIEnvironment;ILorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;)I at line 2594 <native frame> <native frame> <native frame> <native frame> <native frame> <native frame> <native frame> <native frame> <native frame> <native frame> <native frame> <native frame> Lgnu/java/awt/peer/gtk/GtkToolkit; gtkInit(I)V Lgnu/java/awt/peer/gtk/GtkToolkit; <clinit>()V at line 123 Lcom/ibm/JikesRVM/classloader/VM_Class; initialize()V at line 1408 Ljava/lang/Class; forNameInternal(Ljava/lang/String;ZLjava/lang/ClassLoader;)Ljava/lang/Class; at line 676 Ljava/lang/Class; forName(Ljava/lang/String;)Ljava/lang/Class; at line 72 Ljava/awt/Toolkit; getDefaultToolkit()Ljava/awt/Toolkit; at line 521 Ljava/awt/GraphicsEnvironment; getLocalGraphicsEnvironment()Ljava/awt/GraphicsEnvironment; at line 103 Ljava/awt/Window; <init>()V at line 161 Ljava/awt/Frame; <init>(Ljava/lang/String;)V at line 233 LScribbleFrame; <init>()V at line 30 LScribbleFrame; main([Ljava/lang/String;)V at line 23 Lcom/ibm/JikesRVM/MainThread; run()V at line 115 Lcom/ibm/JikesRVM/VM_Thread; run()V at line 200 Lcom/ibm/JikesRVM/VM_Thread; startoff()V at line 781 JikesRVM: TROUBLE. Got a signal (Segmentation fault; #11) from outside the VM's address space. JikesRVM: UNRECOVERABLE trapped signal 11 (Segmentation fault) handler stack 0x08063c7c si->si_addr 0x00000000 gs 0x00000033 fs 0x00000000 es 0xc010007b ds 0x0000007b edi -- JTOC? 0x4780cd00 esi -- PR/VP 0x476b574c ebp -- FP? 0x47879308 esp -- SP 0x5b0172e4 ebx 0x0000adb8 edx -- T1? 0x47081b98 ecx -- S0? 0x00000000 eax -- T0? 0xa12e802c trapno 0x0000000e err 0x00000004 eip 0x00000000 cs 0x00000073 eflags 0x00010206 esp_at_signal 0x5b0172e4 ss 0x0000007b fpstate 0x08063d94 oldmask 0x00020000 cr2 0x00000000 fp0 0x00000000000000000000 fp1 0x00000000000000000000 fp2 0x00000000000000000000 fp3 0x00000000000000000000 fp4 0x00000000000000000000 fp5 0x00000000ffffffffc01d fp6 0x000000000000c0004004 fp7 0x000000000000c0004004 JikesRVM: internal error -- Stack -- Lcom/ibm/JikesRVM/memoryManagers/mmInterface/MM_Interface; allocateArray(III[Ljava/lang/Object;III)Ljava/lang/Object; at line 603 Lcom/ibm/JikesRVM/VM_Runtime; resolvedNewArray(III[Ljava/lang/Object;III)Ljava/lang/Object; at line 426 Lcom/ibm/JikesRVM/VM_Runtime; resolvedNewArray(ILcom/ibm/JikesRVM/classloader/VM_Array;)Ljava/lang/Object; at line 383 Lcom/ibm/JikesRVM/VM_Runtime; clone(Ljava/lang/Object;)Ljava/lang/Object; at line 452 Ljava/lang/Object; clone()Ljava/lang/Object; at line 25 Ljava/lang/String; toLowerCase(Ljava/util/Locale;)Ljava/lang/String; at line 1431 Ljava/lang/String; toLowerCase()Ljava/lang/String; at line 1455 Lgnu/java/nio/charset/Provider; charsetForName(Ljava/lang/String;)Ljava/nio/charset/Charset; at line 200 Ljava/nio/charset/Charset; charsetForName(Ljava/lang/String;)Ljava/nio/charset/Charset; at line 208 Ljava/nio/charset/Charset; forName(Ljava/lang/String;)Ljava/nio/charset/Charset; at line 188 Ljava/lang/String; <init>([BIILjava/lang/String;)V at line 348 Ljava/util/zip/ZipFile; readEntries()V at line 300 Ljava/util/zip/ZipFile; getEntries()Ljava/util/HashMap; at line 398 Ljava/util/zip/ZipFile; getEntry(Ljava/lang/String;)Ljava/util/zip/ZipEntry; at line 419 Lcom/ibm/JikesRVM/classloader/VM_BootstrapClassLoader; getResourceInternal(Ljava/lang/String;Lcom/ibm/JikesRVM/classloader/VM_BootstrapClassLoader$Handler;Z)Ljava/lang/Object; at line 298 Lcom/ibm/JikesRVM/classloader/VM_BootstrapClassLoader; getResourceAsStream(Ljava/lang/String;)Ljava/io/InputStream; at line 233 Lcom/ibm/JikesRVM/classloader/VM_BootstrapClassLoader; findClass(Ljava/lang/String;)Ljava/lang/Class; at line 175 Lcom/ibm/JikesRVM/classloader/VM_BootstrapClassLoader; loadClass(Ljava/lang/String;Z)Ljava/lang/Class; at line 138 Ljava/lang/ClassLoader; loadClass(Ljava/lang/String;)Ljava/lang/Class; at line 294 Lcom/ibm/JikesRVM/classloader/VM_TypeReference; resolve()Lcom/ibm/JikesRVM/classloader/VM_Type; at line 527 Lcom/ibm/JikesRVM/classloader/VM_Class; <init>(Lcom/ibm/JikesRVM/classloader/VM_TypeReference;Ljava/io/DataInputStream;)V at line 854 Lcom/ibm/JikesRVM/classloader/VM_ClassLoader; defineClassInternal(Ljava/lang/String;Ljava/io/InputStream;Ljava/lang/ClassLoader;)Lcom/ibm/JikesRVM/classloader/VM_Type; at line 244 Lcom/ibm/JikesRVM/classloader/VM_BootstrapClassLoader; findClass(Ljava/lang/String;)Ljava/lang/Class; at line 180 Lcom/ibm/JikesRVM/classloader/VM_BootstrapClassLoader; loadClass(Ljava/lang/String;Z)Ljava/lang/Class; at line 138 Ljava/lang/ClassLoader; loadClass(Ljava/lang/String;)Ljava/lang/Class; at line 294 Lcom/ibm/JikesRVM/classloader/VM_TypeReference; resolve()Lcom/ibm/JikesRVM/classloader/VM_Type; at line 527 Lcom/ibm/JikesRVM/VM_Runtime; unresolvedNewScalar(I)Ljava/lang/Object; at line 275 Lcom/ibm/JikesRVM/JikesRVMSocketImpl; boot()V at line 607 Lcom/ibm/JikesRVM/VM; finishBooting()V at line 366 Lcom/ibm/JikesRVM/VM; boot()V at line 110 0x0080cd00 vm internal error at: -- Stack -- Lcom/ibm/JikesRVM/VM; sysFail(Ljava/lang/String;)V at line 1075 Lcom/ibm/JikesRVM/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 573 Lcom/ibm/JikesRVM/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 554 Lcom/ibm/JikesRVM/VM; _assert(Z)V at line 534 Lcom/ibm/JikesRVM/VM_CompiledMethods; getCompiledMethod(I)Lcom/ibm/JikesRVM/VM_CompiledMethod; at line 79 Lcom/ibm/JikesRVM/VM_StackTrace; walkFrames(ZI)I at line 102 Lcom/ibm/JikesRVM/VM_StackTrace; <init>(I)V at line 74 Ljava/lang/Throwable; fillInStackTrace()Ljava/lang/Throwable; at line 109 Ljava/lang/Throwable; <init>()V at line 53 Ljava/lang/Exception; <init>()V at line 67 Ljava/lang/RuntimeException; <init>()V at line 65 Ljava/lang/NullPointerException; <init>()V at line 70 Lcom/ibm/JikesRVM/VM_Runtime; deliverHardwareException(II)V at line 631 <hardware trap> Lcom/ibm/JikesRVM/VM_JavaHeader; getTIB(Ljava/lang/Object;)[Ljava/lang/Object; at line 138 0x0080cd00 Proc 1: Thread 5: VM.sysFail(): We're in an (unambiguously) recursive call to VM.sysFail(), 2 deep sysFail was called with the message: vm internal error at: vm internal error at: VM_Scheduler.dumpStack(): in a recursive call, 2 deep. -- Stack -- Lcom/ibm/JikesRVM/VM; sysFail(Ljava/lang/String;)V at line 1075 Lcom/ibm/JikesRVM/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 573 Lcom/ibm/JikesRVM/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 554 Lcom/ibm/JikesRVM/VM; _assert(Z)V at line 534 Lcom/ibm/JikesRVM/VM_CompiledMethods; getCompiledMethod(I)Lcom/ibm/JikesRVM/VM_CompiledMethod; at line 79 Lcom/ibm/JikesRVM/VM_Scheduler; dumpStack(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;)V at line 660 Lcom/ibm/JikesRVM/VM_Scheduler; dumpStack(Lorg/vmmagic/unboxed/Address;)V at line 621 Lcom/ibm/JikesRVM/VM_Scheduler; tracebackWithoutLock()V at line 592 Lcom/ibm/JikesRVM/VM_Scheduler; traceback(Ljava/lang/String;)V at line 570 Lcom/ibm/JikesRVM/VM; sysFail(Ljava/lang/String;)V at line 1075 Lcom/ibm/JikesRVM/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 573 Lcom/ibm/JikesRVM/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 554 Lcom/ibm/JikesRVM/VM; _assert(Z)V at line 534 Lcom/ibm/JikesRVM/VM_CompiledMethods; getCompiledMethod(I)Lcom/ibm/JikesRVM/VM_CompiledMethod; at line 79 Lcom/ibm/JikesRVM/VM_StackTrace; walkFrames(ZI)I at line 102 Lcom/ibm/JikesRVM/VM_StackTrace; <init>(I)V at line 74 Ljava/lang/Throwable; fillInStackTrace()Ljava/lang/Throwable; at line 109 Ljava/lang/Throwable; <init>()V at line 53 Ljava/lang/Exception; <init>()V at line 67 Ljava/lang/RuntimeException; <init>()V at line 65 Ljava/lang/NullPointerException; <init>()V at line 70 Lcom/ibm/JikesRVM/VM_Runtime; deliverHardwareException(II)V at line 631 <hardware trap> Lcom/ibm/JikesRVM/VM_JavaHeader; getTIB(Ljava/lang/Object;)[Ljava/lang/Object; at line 138 0x0080cd00 Proc 1: Thread 5: VM.sysFail(): We're in an (unambiguously) recursive call to VM.sysFail(), 3 deep sysFail was called with the message: vm internal error at: vm internal error at: VM_Scheduler.dumpStack(): in a recursive call, 3 deep. -- Stack -- Lcom/ibm/JikesRVM/VM; sysFail(Ljava/lang/String;)V at line 1075 Lcom/ibm/JikesRVM/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 573 Lcom/ibm/JikesRVM/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 554 Lcom/ibm/JikesRVM/VM; _assert(Z)V at line 534 Lcom/ibm/JikesRVM/VM_CompiledMethods; getCompiledMethod(I)Lcom/ibm/JikesRVM/VM_CompiledMethod; at line 79 Lcom/ibm/JikesRVM/VM_Scheduler; dumpStack(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;)V at line 660 Lcom/ibm/JikesRVM/VM_Scheduler; dumpStack(Lorg/vmmagic/unboxed/Address;)V at line 621 Lcom/ibm/JikesRVM/VM_Scheduler; tracebackWithoutLock()V at line 592 Lcom/ibm/JikesRVM/VM_Scheduler; traceback(Ljava/lang/String;)V at line 570 Lcom/ibm/JikesRVM/VM; sysFail(Ljava/lang/String;)V at line 1075 Lcom/ibm/JikesRVM/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 573 Lcom/ibm/JikesRVM/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 554 Lcom/ibm/JikesRVM/VM; _assert(Z)V at line 534 Lcom/ibm/JikesRVM/VM_CompiledMethods; getCompiledMethod(I)Lcom/ibm/JikesRVM/VM_CompiledMethod; at line 79 Lcom/ibm/JikesRVM/VM_Scheduler; dumpStack(Lorg/vmmagic/unboxed/Address;Lorg/vmmagic/unboxed/Address;)V at line 660 Lcom/ibm/JikesRVM/VM_Scheduler; dumpStack(Lorg/vmmagic/unboxed/Address;)V at line 621 Lcom/ibm/JikesRVM/VM_Scheduler; tracebackWithoutLock()V at line 592 Lcom/ibm/JikesRVM/VM_Scheduler; traceback(Ljava/lang/String;)V at line 570 Lcom/ibm/JikesRVM/VM; sysFail(Ljava/lang/String;)V at line 1075 Lcom/ibm/JikesRVM/VM; _assertionFailure(Ljava/lang/String;Ljava/lang/String;)V at line 573 Lcom/ibm/JikesRVM/VM; _assert(ZLjava/lang/String;Ljava/lang/String;)V at line 554 Lcom/ibm/JikesRVM/VM; _assert(Z)V at line 534 Lcom/ibm/JikesRVM/VM_CompiledMethods; getCompiledMethod(I)Lcom/ibm/JikesRVM/VM_CompiledMethod; at line 79 Lcom/ibm/JikesRVM/VM_StackTrace; walkFrames(ZI)I at line 102 Lcom/ibm/JikesRVM/VM_StackTrace; <init>(I)V at line 74 Ljava/lang/Throwable; fillInStackTrace()Ljava/lang/Throwable; at line 109 Ljava/lang/Throwable; <init>()V at line 53 Ljava/lang/Exception; <init>()V at line 67 Ljava/lang/RuntimeException; <init>()V at line 65 Ljava/lang/NullPointerException; <init>()V at line 70 Lcom/ibm/JikesRVM/VM_Runtime; deliverHardwareException(II)V at line 631 <hardware trap> Lcom/ibm/JikesRVM/VM_JavaHeader; getTIB(Ljava/lang/Object;)[Ljava/lang/Object; at line 138 0x0080cd00 Proc 1: Thread 5: VM.sysFail(): We're in an (unambiguously) recursive call to VM.sysFail(), 4 deep sysFail was called with the message: vm internal error at: VM.dieAbruptlyRecursiveSystemTrouble(): Dying abruptly; we're stuck in a recursive shutdown/exit. |
From: Eliot M. <mo...@cs...> - 2006-03-12 18:30:37
|
Dear Ian -- I wouldn't try the handle signals on the given thread's stack. I'd use the alt stack mechanism and then manipulate the context and the stack to make it appear that the thread called a routine we have set up to do the processing, i.e., to come back into our scheduler and switch to another thread. This gives better control, I think, and will leave less stuff on the stack to confuse a stack walker. It was what I was trying to suggest before. It is analogous to what we do in the case of a null pointer deference: the signal handler forces a call to a routine whose only function is to construct and then throw the suitable exception object. Given that we made _that_ work, this strategy seems to me a better idea than leaving signal handling stuff on a thread stack .... Best wishes -- Eliot |
From: Ian R. <ian...@ma...> - 2006-03-12 22:08:48
|
Hi Eliot, sounds like a great idea! I will have a play but it sounds like a bit of work to get it going. I imagine it could be an issue that the fake stack frame is going to have to return to the point that the signal occured. The signal handler return would have cleared this up for me. This is assuming throwing an exception doesn't have the same problem. Any thoughts? Thanks! Ian Eliot Moss wrote: >Dear Ian -- I wouldn't try the handle signals on the given thread's >stack. I'd use the alt stack mechanism and then manipulate the context and >the stack to make it appear that the thread called a routine we have set up >to do the processing, i.e., to come back into our scheduler and switch to >another thread. This gives better control, I think, and will leave less >stuff on the stack to confuse a stack walker. It was what I was trying to >suggest before. It is analogous to what we do in the case of a null pointer >deference: the signal handler forces a call to a routine whose only >function is to construct and then throw the suitable exception >object. Given that we made _that_ work, this strategy seems to me a better >idea than leaving signal handling stuff on a thread stack .... > >Best wishes -- Eliot > > >------------------------------------------------------- >This SF.Net email is sponsored by xPML, a groundbreaking scripting language >that extends applications into web and mobile media. Attend the live webcast >and join the prime developer group breaking into this new coding territory! >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >_______________________________________________ >Jikesrvm-core mailing list >Jik...@li... >https://lists.sourceforge.net/lists/listinfo/jikesrvm-core > > |
From: Eliot M. <mo...@cs...> - 2006-03-12 22:14:37
|
The throwing-an-exception case has exactly the same problem of creating a "fake" frame on the stack --- except it is in Java land, not C land .... Eliot |
From: Ian R. <ian...@ma...> - 2006-03-12 22:31:03
|
Thanks Eliot, after constructing the null pointer exception doesn't the code unwind the stack and branch directly to the catch block rather than returning to the instruction following the dereference? I know the optimizing compiler will try to remove instructions it knows can't be reached following a definite fault. Thanks again, Ian Eliot Moss wrote: >The throwing-an-exception case has exactly the same problem of creating a >"fake" frame on the stack --- except it is in Java land, not C land .... > >Eliot > > |
From: Eliot M. <mo...@cs...> - 2006-03-13 02:13:26
|
>>>>> "Ian" == Ian Rogers <ian...@ma...> writes: Ian> Thanks Eliot, Ian> after constructing the null pointer exception doesn't the code unwind Ian> the stack and branch directly to the catch block rather than returning Ian> to the instruction following the dereference? I know the optimizing Ian> compiler will try to remove instructions it knows can't be reached Ian> following a definite fault. No, it fakes a call to VM_Runtime.deliverHardwareException. That routine creates the appropriate exception object and then calls deliverException, which alks the stack and looks for a catch block. You can find the C trap handling code in src/tools/bootImageRunner/IA/libvm.C (look for, say, SIGSEGV) and the Java routines in src/vm/runtime/VM_Runtime.java. (It would be good to make corresponding changes for PPC as well, of course.) I think the right thing to do is quite similar, but of course you want to do a forced yield rather than throwing an exception. You should probably go through and pick apart processTimerTick in src/tools/bootImageRunner/sys.C. Perhaps that is the place to decide whether this is the appropriate action to take. Hope this helps .... EM |
From: Ian R. <ian...@ma...> - 2006-03-13 09:49:39
|
Hi Eliot, thanks for your reply. My point for the yield rather than an exception is that the yield must return to the yield point rather than branch to a catch block. Branching to a catch block means that there are few/no assumptions about what's in registers. Returning from the yield code needs to appear as though all the registers are unchanged. Intel makes more of a distinction on this than PowerPC by having two instructions ret and iret, ret is used in the case of a normal return whereas iret is used to return from an interrupt and not only restores the program counter but also the flags. There are C extensions to mark functions that happen to be signal handlers so the compiler can generate different prologue/epilogue code to store/restore all registers. So what's needed on the stack following the timer interrupt is a stack frame/handler that will not only return but also restore all of the registers so the native code is unaware anything has happened. My code attempted to take a short cut route to this by reusing the libc signal handler that calls our signal handling code. The return from the libc signal handler should restore all the registers and not be noticeable to the native code. I can create my own function to do this, as you say, and if I can keep the stack conventions correctly then this can avoid stack walking problems. I'm not sure if that's the cause of the errors I've reported though? What may be the problem, and using the alternate signal stack would solve this, would be that the Jikes RVM's use of the stack confuses the signal handler so that its data/stack frame potentially clobbers useful values for the Jikes RVM. This may explain the randomness of the errors I've been producing. Not being overly intimate with the stack layout if anyone knew this to definitely be the case then it means I should leave the approach of not using the alternate signal stack. Thanks for the feedback! The more thoughts on this the better. Regards, Ian Eliot Moss wrote: > No, it fakes a call to VM_Runtime.deliverHardwareException. That routine > creates the appropriate exception object and then calls deliverException, > which alks the stack and looks for a catch block. > > You can find the C trap handling code in > src/tools/bootImageRunner/IA/libvm.C (look for, say, SIGSEGV) and the > Java routines in src/vm/runtime/VM_Runtime.java. (It would be good to make > corresponding changes for PPC as well, of course.) > > I think the right thing to do is quite similar, but of course you want to > do a forced yield rather than throwing an exception. You should probably go > through and pick apart processTimerTick in > src/tools/bootImageRunner/sys.C. Perhaps that is the place to decide > whether this is the appropriate action to take. > > Hope this helps .... EM > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Jikesrvm-core mailing list > Jik...@li... > https://lists.sourceforge.net/lists/listinfo/jikesrvm-core > |
From: David P G. <gr...@us...> - 2006-03-13 13:18:59
|
You should look at the code for handling of stackoverflow. A trap is generated; handled on separate signal stack; execution is set to continue on the faulting thread's stack at VM_Runtime.deliverHardwareException. After growing the thread stack, execution resumes at the instruction after the faulting instruction. Same stack vs. different stack for signal handling is an orthogonal decision to everything else you've been talking about. Also note that on most platforms, by default we do the "timer interrupts" not with a signal but with a separate pthread doing timed-waits. More efficient and reduces the risk of interference with application signal handlers. --dave |
From: Ian R. <ian...@ma...> - 2006-03-13 14:32:28
Attachments:
preempt-native-threads.diff
|
I've attached a patch that gets things working but with issues :^) The patch uses processTimerTick to send a signal to VM_Processors that are IN_NATIVE. I've tried SIGVTALRM and SIGUSR1, the patch uses SIGUSR1 - I'll explain why I was playing with this later. I'm uses the alternate stack for the handler - the yield shouldn't inspect the stack so I don't want to get bogged down worrying about it. The native thread that gets interrupted checks its in native code and then calls out to a yield method. To ensure this didn't run into problems with the VM not being fully booted I've added an extra hook at the end of the JNI table (java.lang.Thread will cause a SEGV). There's also some house keeping to get classpath to build correctly. So, you're probably either wondering: 1) why is Ian even bothering with this ugly hack it'll never get into the repository ;-), or 2) wow, is this ever going to work.. well in answer to (2) it is working but with a problem. The problem comes when pthreads start using mutexs and/or conditional waits. For some reason when a pthread performs these, even though the thread gets hit with a signal, the signal handler doesn't process the signal and the thread doesn't yield. The ScribbleFrame program will run, but when a thread stops answering the signal it locks up. I'd say this is progress over the current repository (which just dies early), or the repository without portable native sync (which dies after drawing a single frame). I asked Dalibor Topic how jthreads (green threads in Kaffe) were handling the gtk peer code, but apparently he's only testing AWT with pthreads. So, it could be that there is a genuine problem with the use of pthread locks by classpath with respect to green threads, it could be that my code needs to do some more (ie perform some kind of wakeup on certain kinds of pthread), it could be that my stuff should get rewritten to have a better looking stack frame, or we can wait for someone to rewrite the Jikes RVM's thread model all together. Anyone wanting to try the code out, please do. Any suggestions as to how I can get the pthreads to answer their signals, please let me know. Any other ideas let me know too :-) Dave: the stack overflow code sounds a good place to look for fiddling the stack. The signal code I've got now is run from the separate thread but I need to handle the signal on the appropriate pthread for the yield. You're right that this does allow for better control of which thread gets what signal. Eliot: the call to the yield looks like a C to Java transition, and the Java to C transition is already ok for GC. I believe the C to Java transition is also ok for GC, as this situation can already be encountered. You're right that this code can effect a yield as soon as a processor becomes IN_NATIVE and possibly this means we're not sufficiently IN_NATIVE to call out to the JNI, or the stack frames aren't straight. I guess I want to know what's going on with the signals before worrying too much about the stack Thanks again for your help. If I get some more input I may try some things out. I'm afraid this is a background job for me. Dave: shouldn't bug #1224191 be closed now? Should bug #1147592 be reopened as a bug to do with the handling of native code? Do you know whether the mauve tests are right to be ignoring ServerSockets? Wiki? Many thanks, Ian Rogers |
From: Eliot M. <mo...@cs...> - 2006-03-10 11:54:36
|
Oh, and here's a terminology proposal: Jikes RVM supports _precise_ yield points (or we could say it yields at _restricted_ points) Optionally, it supports _deterministic_ yielding as well. -- Eliot |