From: Filip P. <pi...@pu...> - 2009-01-22 23:14:13
|
Hi Ian, Thanks for taking a look at this. I think that some of your concerns are due to nomenclature, and you're probably right that the documentation is thin in this particular part of the system. To address your concerns: - You're right, the native thread branch performs state transitions in a very different way. That's because, unlike the green threads code, the states in the native threads code are *not* for scheduling, in the sense that the scheduling is left entirely up to the OS. They are almost exclusively for communicating with the GC. As such, there are fewer states, and the state transitions occur in far fewer places. In particular, the transition to IN_NATIVE only occurs deep within the code. Thus, while some additional assertions might be put in, the need for assertions is much smaller, as the transitions are only being done from highly privileged internal code. - You're absolutely right about IN_NATIVE being incorrectly documented. I'll try to fix that - see the patch below. In particular, it's exactly true that the transition to IN_NATIVE is supposed to *only* happen from deep in the native thread code, and most of the VM code should never do it directly. As well, the IN_NATIVE name may indeed be the wrong one, probably something like IN_PRIVILEGED, or IN_GC_SAFE, would be better. I like IN_PRIVILEGED the best, as it does the best job of conveying that this is a very special state, not to be used lightly. - I'm not sure about your concerns about enterNative(). To be clear, the state prior to the call can be IN_JAVA or IN_JAVA_TO_BLOCK; the latter occurs if the GC had already requested the thread to block. After the call, the state could be IN_NATIVE or BLOCKED_IN_NATIVE, depending on whether the previous state was IN_JAVA or IN_JAVA_TO_BLOCK, respectively. This is more complicated than the green threads code, and necessarily so - the green threads code was badly broken in the case of GC requests that occur after the last safepoint before entry into JNI code, and furthermore, the green threads code did not have a clean mechanism of idling a VP if no threads are runnable, which is exactly what the IN_NATIVE mechanism is for. I've attached a patch with further documentation of the situation. I'd like to also change the IN_NATIVE state to something like IN_PRIVILEGED, but I'm not sure that this is a change worth making, and if it is, I'm not sure that IN_PRIVILEGED does a better job of conveying the semantics. Let me know to what extent this addresses your concerns. I understand that this is hairy code - but it's the best I could come up with given the multitude of concerns it has to deal with (PPC isync, mutator flushes, suspend requests, park/wait/ sleep, GC requests, etc.). -F Index: rvm/src/org/jikesrvm/scheduler/RVMThread.java =================================================================== --- rvm/src/org/jikesrvm/scheduler/RVMThread.java (revision 15291) +++ rvm/src/org/jikesrvm/scheduler/RVMThread.java (working copy) @@ -77,6 +77,70 @@ /** * A generic java thread's execution context. + * <p> + * Threads use a state machine to indicate to other threads, as well as VM + * services, how this thread should be treated in the case of an asynchronous + * request, for example in the case of GC. The state machine uses the + * following states: + * <ul> + * <li>NEW</li> + * <li>IN_JAVA</li> + * <li>IN_NATIVE</li> + * <li>IN_JNI</li> + * <li>IN_JAVA_TO_BLOCK</li> + * <li>BLOCKED_IN_NATIVE</li> + * <li>BLOCKED_IN_JNI</li> + * <li>TERMINATED</li> + * </ul> + * The following state transitions are legal: + * <ul> + * <li>NEW to IN_JAVA: occurs when the thread is actually started. At this + * point it is safe to expect that the thread will reach a safe point in + * some bounded amount of time, at which point it will have a complete + * execution context, and this will be able to have its stack traces by GC.</li> + * <li>IN_JAVA to IN_JAVA_TO_BLOCK: occurs when an asynchronous request is + * made, for example to stop for GC, do a mutator flush, or do an isync on PPC.</li> + * <li>IN_JAVA to IN_NATIVE: occurs when the code opts to run in privileged mode, + * without synchronizing with GC. This state transition is only performed by + * HeavyCondLock, in cases where the thread is about to go idle while waiting + * for notifications (such as in the case of park, wait, or sleep).</li> + * <li>IN_JAVA to IN_JNI: occurs in response to a JNI downcall, or return from a JNI + * upcall.</li> + * <li>IN_JAVA_TO_BLOCK to BLOCKED_IN_NATIVE: occurs when a thread that had been + * asked to perform an async activity decides to go idle instead. This state + * always corresponds to a notification being sent to other threads, letting + * them know that this thread is idle. When the thread is idle, any asynchronous + * requests (such as mutator flushes) can instead be performed on behalf of this + * thread by other threads, since this thread is guaranteed not to be running + * any user Java code, and will not be able to return to running Java code without + * first blocking, and waiting to be unblocked (see BLOCKED_IN_NATIVE to IN_JAVA + * transition.</li> + * <li>IN_JAVA_TO_BLOCK to BLOCKED_IN_JNI: occurs when a thread that had been + * asked to perform an async activity decides to make a JNI downcall, or return + * from a JNI upcall, instead. In all other regards, this is identical to the + * IN_JAVA_TO_BLOCK to BLOCKED_IN_NATIVE transition.</li> + * <li>IN_NATIVE to IN_JAVA: occurs when a thread returns from idling or running + * privileged code to running Java code.</li> + * <li>BLOCKED_IN_NATIVE to IN_JAVA: occurs when a thread that had been asked to + * perform an async activity while running privileged code or idling decides to + * go back to running Java code. The actual transition is preceded by the + * thread first performing any requested actions (such as mutator flushes) and + * waiting for a notification that it is safe to continue running (for example, + * the thread may wait until GC is finished).</li> + * <li>IN_JNI to IN_JAVA: occurs when a thread returns from a JNI downcall, or makes + * a JNI upcall.</li> + * <li>BLOCKED_IN_JNI to IN_JAVA: same as BLOCKED_IN_NATIVE to IN_JAVA, except that + * this occurs in response to a return from a JNI downcall, or as the thread + * makes a JNI upcall.</li> + * <li>IN_JAVA to TERMINATED: the thread has terminated, and will never reach any + * more safe points, and thus will not be able to respond to any more requests + * for async activities.</li> + * </ul> + * Observe that the transitions from BLOCKED_IN_NATIVE and BLOCKED_IN_JNI to IN_JAVA + * constitute a safe point. Code running in BLOCKED_IN_NATIVE or BLOCKED_IN_JNI is + * "GC safe" but is not quite at a safe point; safe points are special in that + * they allow the thread to perform async activities (such as mutator flushes or + * isyncs), while GC safe code will not necessarily perform either. * * @see org.jikesrvm.scheduler.greenthreads.GreenThread * @see org.jikesrvm.mm.mminterface.CollectorThread @@ -129,7 +193,8 @@ /* * definitions for thread status for interaction of Java-native transitions - * and requests for threads to stop. + * and requests for threads to stop. THESE ARE PRIVATE TO THE SCHEDULER, and + * are only used deep within the stack. */ /** * Thread has not yet started. This state holds right up until just before we @@ -143,34 +208,50 @@ public static final int IN_JAVA = 1; /** - * A state used by the scheduler to mark that a thread is in native code (a - * syscall). The point is that for now, the thread is not executing Java code - * and is effectively at a safe point, but it may transition back into Java - * code at any moment. + * A state used by the scheduler to mark that a thread is in privileged code + * that does not need to synchronize with the collector. This is a special + * state, similar to the IN_JNI state but requiring different interaction with + * the collector (as there is no JNI stack frame, the registers have to be + * saved in contextRegisters). As well, this state should only be entered + * from privileged code in the org.jikesrvm.scheduler package. Typically, + * this state is entered using a call to enterNative() just prior to idling + * the thread; though it would not be wrong to enter it prior to any other + * long-running activity that does not require interaction with the GC. */ public static final int IN_NATIVE = 2; /** - * A thread is executing native code and is effectively at a GC safe point. - * Care must be taken if GC occurs and the native code execution finishes. + * Same as IN_NATIVE, except that we're executing JNI code and thus have a + * JNI stack frame and JNI environment, and thus the GC can load registers + * from there rather than using contextRegisters. */ public static final int IN_JNI = 3; /** - * thread is in Java code but is expected to block. the point is that we're - * waiting for the thread to reach a safe point and expect this to happen in - * bounded time; but if the thread were to escape to native we want to know - * about it. thus, transitions into native code while in the IN_JAVA_TO_BLOCK - * state result in a notification (broadcast on the thread's monitor) and a - * state change to BLOCKED_IN_NATIVE. Observe that it is always safe to - * conservatively change IN_JAVA to IN_JAVA_TO_BLOCK. + * thread is in Java code but is expected to block. the transition from IN_JAVA + * to IN_jAVA_TO_BLOCK happens as a result of an asynchronous call by the GC + * or any other internal VM code that requires this thread to perform an + * asynchronous activity (another example is the request to do an isync on PPC). + * the point is that we're waiting for the thread to reach a safe point and + * expect this to happen in bounded time; but if the thread were to escape to + * native we want to know about it. thus, transitions into native code while + * in the IN_JAVA_TO_BLOCK state result in a notification (broadcast on the + * thread's monitor) and a state change to BLOCKED_IN_NATIVE. Observe that it + * is always safe to conservatively change IN_JAVA to IN_JAVA_TO_BLOCK. */ public static final int IN_JAVA_TO_BLOCK = 4; /** * thread is in native code, and is to block before returning to Java code. - * the point is that the thread is guaranteed not to execute any Java code - * until: + * the transition from IN_NATIVE to BLOCKED_IN_NATIVE happens as a result + * of an asynchronous call by the GC or any other internal VM code that + * requires this thread to perform an asynchronous activity (another example + * is the request to do an isync on PPC). as well, code entering privileged + * code that would otherwise go from IN_JAVA to IN_NATIVE will go to + * BLOCKED_IN_NATIVE instead, if the state was IN_JAVA_TO_BLOCK. + * <p> + * the point of this state is that the thread is guaranteed not to execute + * any Java code until: * <ol> * <li>The state changes to IN_NATIVE, and * <li>The thread gets a broadcast on its monitor. @@ -1393,9 +1474,8 @@ } if (traceBlock) - VM - .sysWriteln("Thread #", threadSlot, - " has acknowledged soft handshakes"); + VM.sysWriteln("Thread #", threadSlot, + " has acknowledged soft handshakes"); for (;;) { // deal with block requests @@ -1706,6 +1786,20 @@ return block(ba, false); } + /** + * Indicate that we'd like the current thread to be executing privileged code that + * does not require synchronization with the GC. This call may be made on a thread + * that is IN_JAVA or IN_JAVA_TO_BLOCK, and will result in the thread being either + * IN_NATIVE or BLOCKED_IN_NATIVE. In the case of an + * IN_JAVA_TO_BLOCK->BLOCKED_IN_NATIVE transition, this call will acquire the + * thread's lock and send out a notification to any threads waiting for this thread + * to reach a safepoint. This notification serves to notify them that the thread + * is in GC-safe code, but will not reach an actual safepoint for an indetermined + * amount of time. This is significant, because safepoints may perform additional + * actions (such as handling handshake requests, which may include things like + * mutator flushes and running isync) that IN_NATIVE code will not perform until + * returning to IN_JAVA by way of a leaveNative() call. + */ public static void enterNative() { RVMThread t = getCurrentThread(); if (ALWAYS_LOCK_ON_STATE_TRANSITION) { @@ -1721,6 +1815,9 @@ if (VM.VerifyAssertions) VM._assert(oldState == IN_JAVA_TO_BLOCK); t.enterNativeBlocked(); return; + } else { + VM.sysFail("entering native with wrong exec status"); + return; // make javac happy } } while (!(Synchronization.tryCompareAndSwap(t, offset, oldState, newState))); } @@ -1747,11 +1844,20 @@ if (VM.VerifyAssertions) VM._assert(oldState == BLOCKED_IN_NATIVE || oldState == BLOCKED_IN_JNI); return false; + } else { + VM.sysFail("leaving native with wrong exec status"); + return true; // make javac happy } } while (!(Synchronization.tryCompareAndSwap(t, offset, oldState, newState))); return true; } + /** + * Leave privileged code. This is valid for threads that are either IN_NATIVE, + * IN_JNI, BLOCKED_IN_NATIVE, or BLOCKED_IN_JNI, and always results in the thread + * being IN_JAVA. If the thread was previously BLOCKED_IN_NATIVE or BLOCKED_IN_JNI, + * the thread will block until notified that it can run again. + */ @Unpreemptible("May block if the thread was asked to do so; otherwise does no actions that would lead to blocking") public static void leaveNative() { if (!attemptLeaveNativeNoBlock()) { On Jan 21, 2009, at 7:18 PM, Ian Rogers wrote: > 2009/1/21 Ian Rogers <ian...@ma...> > There's a fix in r15290 but it seems the machine is locked up as > we've not had regression results in two days. Could someone kill any > runs on pre r15290 and restart them (it'll probably be Daniel, so > thanks to Daniel in advance if it is). > > The native thread branch is not performing state transitions as we > do in the green thread code. In the green thread code a helper > method changeThreadState will transition from one state to another, > the purpose of this is many: (1) it documents the transitions of > state (2) it catches unexpected behaviour (3) consequently it helps > find bugs (the code is less brittle). Adding basic assertions to > some of the code showed that not doing things in this way is a > serious regression, and the benchmarks that would fail the most > obvious assertions were lusearch and xalan (that are failing in > general on the native thread branch). > > There are clearly more jobs to do before this code is ready to merge > with the trunk, but I understand this is everyone's priority. I've > started adding sub-tasks to RVM-91 [1], if people could add to them > and start knocking them off it'd be great. > > Thanks, > Ian > > [1] http://jira.codehaus.org/browse/RVM-91 > > > Just to expand on my concerns, one thing that bothers me in the > native branch is the IN_NATIVE state: > 1) the javadoc implies it is a state transition that a call to a > syscall will cause to occur, this isn't the case code voluntarily > switches to this state, VMMath.sin won't transition the thread state > therefore > 2) the javadoc states that it is a GC safe point, this needs a huge > health warning about the use of references that should only be to > objects that aren't going to move > 3) the name seems wrong as it appears to have more to do with the > thread being at a safe point than in native code > Code like enterNative [1] doesn't help (although this is slightly > cleaned up), there's no javadoc or assertions to tell me what state > the thread is going to be in following the call, could the state > legitimately be blocked? Who will clean up the mess if it is? > > Ian > > [1] http://jikesrvm.svn.sourceforge.net/viewvc/jikesrvm/rvmroot/branches/RVM-PureNativeThread/workingMergeUp/rvm/src/org/jikesrvm/scheduler/RVMThread.java?revision=15290&view=markup#l_1709 > > > > > 2009/1/20 Ian Rogers <ian...@ma...> > So the fix is that when leaving JNI the exit code shouldn't loop if > the transition of IN_JNI to IN_JAVA fails. I'm not sure this is > sensible in particular with PowerPC ll/sc, but I'll put a fix in and > add more assertions. > > Regards, > Ian > > 2009/1/20 Ian Rogers <ian...@ma...> > > 2009/1/20 Steve Blackburn <Ste...@an...> > > On 20/01/2009, at 9:55 PM, Ian Rogers wrote: > > > with the exception of FullAdaptiveStickyMS these have gone to 0% > > because we're running sanity regressions on the native thread branch > > and these configs were moved into the sanity-tier2 test-run. We > > could add these tests back to sanity on the native thread branch or > > also run the sanity-tier2 test-run on the branch. > > I wondered that. The change happened only recently (since Jan 9), a > long time after the change of the composition of sanity, so I'm a bit > confused. Perhaps the files on the nativethread test machine were > for some reason not updated. Anyway, no big deal. > > --Steve > > It's no big mystery. Fil updated his branch to a release prior to my > 64bit Intel work. A change to the sanity regression was made to > trunk. On Jan 9th I merged changes from the trunk into Fil's native > thread branch, which included the change to the sanity test-run. I > was more focused on solving the merge conflicts in the JNI compiler > (Fil changed how the IN_JAVA to IN_JNI transition occured, I made > the JNI compiler 64bit and utilized a helper routine to enter and > leave JNI code). > > Regards, > Ian > > > |