From: Anton K. <an...@sw...> - 2011-05-16 04:01:05
|
David Lichteblau <da...@li...> writes: >> call-out VOP, making SBCL compiler to save all live registers. This >> way, all GENCGC conservatism is reduced to scanning a stack, with >> the exact range bounds available without any inter-thread > > This work seems to concern the following highly related but ultimately > distinct questions: > > 1. Should be avoid SuspendThread() and GetThreadContext()? > 2. Where can we find the ESP of any thread currently in a safe > region? > 3. Possible optimization of the ESP storing operation using the Dice > et al. approach of mprotect instead of a barrier. > 4. What synchonization approach is needed to halt threads leaving > their safe region during GC? [...] > (2) In my patchset, I simply store the current ESP into a field of the > thread's structure on entry to the safe region. A memory barrier is > required for safety, but otherwise I have to say that I'm somewhat > partial to this approach because of its simplicity. (3) would be an > interesting optimization making FFI calls faster -- but perhaps it > could/should be a separate patch? The per-thread location I use for ESP differs from your approach only in the offset value to thread-base (so it does have its own page). The field is not part of thread primitive object definition, but otherwise I could describe it, too, as using a field of a thread structure -- that just happens to be in mprotect()ed page when GC is in progress. (BTW, if you don't have access to other thread's nonvolatile registers, and if the code in call_into_c pops return Lisp PC to EBX -- as it does originally -- you have to ensure that PC is pinned, too. I have another per-thread field for this purpose: an ordinary thread slot not subject to mprotect). In my code, I removed explicit safepoint checks around FFI calls: if GC is in progress, the thread trying to store ESP traps and waits for its completion (a propos, similar trick is used in my pseudo-pseudo-atomics on Windows: no checks and no branches in the end of a section, just a memory access -- fetching global safepoint page in my version, but if I'd do this on Linux, it would have to be a per-thread location too). The problem with adding this optimization as a separate patch is that a lot more of code will have to be committed initially, and removed in the separate optimization patch. Well, it seems to contradict the common understanding of "optimization" (some guru writing a thing that runs fast but is hard to understand): my approach to safepoint code refactoring was going from several problems solved by several pieces of code (here and there) to an "optimized" solution eliminating them all at once. I've also tried to ensure that the new code could be explained and understood faster than the old bits and pieces replaced by it; however, the _relative_ code complexity (defined as "time to understand what code does / code size") increased nevertheless. Consider mprotect on saved ESP location. It (1) ensures a non-mutator flag is visible to GCing thread in the right time, (2) provides GCing thread with a notification if a thread waited for by GC enters a FFI call, (3) prevents FFI threads from becoming mutators when GC is in progress, (4) provides GENCGC with a start address of stack region that should be used for pinning. If I remove this "optimization" from my code, I have to add something to handle tasks (1)-(4) instead. And that something isn't going to be too simple. >> communication. When a per-thread page (containing ESP value) is made >> read-only, it guarantees _both_ that GENCGC doesn't see obsolete >> value _and_ that any thread suddenly leaving foreign call during >> concurrent GC will trap and wait instead of returning to Lisp and >> becoming an Evil Mutator. > > Question. Exit from a safe region already goes through an explicit > safepoint call to inspect the suspend info -- at least in the version of > the code I'm looking at. But from what you are saying above, it sounds > to me like you are instead (additionally?) using the mprotect approach > as a way to halt that thread. Can you explain why that is needed/better? As of my current code (http://github.com/akovalenko/sbcl-win32-threads ), it's a long time since I removed all explicit safepoint calls. In addition to what I've already described above, it allows me to rely on OS saving CPU context when GC call is required. As an example, consider a foreign call with a floating-point result (returned in %st(0) on x86): with explicit safepoint calls, I had to save and restore %st(0) in the FFI call epilogue (and woe to me if I try to do it for non-fp-result function: stack underflow is going to happen). And now, %st(0) is returned directly on the fast path, and saved/restored by OS on the slow path (when a thread attempts to clear saved ESP location during GC). If you're working with our old code, please take a closer look at explicit suspend_info.suspend check in gc_safepoint(). Does it return early if suspend_info.suspend is zero, without attempt of taking a lock before testing this condition? Then it's a memory visibility bug. When explicit safepoint calls are used, not only should the GCing thread see an up-to-date "in-Lisp" state and up-to-date context of other threads (SuspendThread/GetThreadContext used to take care on it); gc_safepoint() should, too, see an up-to-date state of suspend_info.suspend, and that's where gc_safepoint() was wrong. > - I think of lispobj as SBCL's current name for uintptr_t. (The > whole alpha situation notwithstanding.) SBCL just doesn't use that > type consistently yet. It would be somewhat odd to have three types > for the same purpose (uintptr_t == uword_t == lispobj). u64 and os_vm_size_t normally have the right size too (through the latter suggests a semantical difference, and the former -- being 32-bit on x86 -- hints at a long, glorious porting history). -- Regards, Anton Kovalenko +7(916)345-34-02 | Elektrostal' MO, Russia |