From: Derek G. <fri...@gm...> - 2010-04-27 15:01:42
|
Moving to devel list. Good analysis here Roy... thanks for going through all of this... On Apr 26, 2010, at 3:03 PM, Roy Stogner wrote: > First: we only use one global spinlock right now. Derek is using said > spinlock correctly, to make sure that MeshBase::point_locator isn't > creating and leaking multiple PointLocatorTree objects. But the rest > of the code feels free to assume that it can also use the same > spinlock, because we assume we're never in nested Threads:: loops... Right... this whole idea of ONE global spinlock isn't right. With TBB each place where a spinlock is needed should have it's own spinlock defined just in the local compilation unit (often they do this at the top of the .C). We've been able to get away with this global spinlock for a while... but it doesn't scale and can be dangerous. As a library it's a bad idea because we no idea if our threaded loops might or might not call user code that might have threaded loops in it (or vice versa)... and then we immediately have a problem. > That assumption is clearly incorrect. And in this case it's > rightfully incorrect - we want all but 1 thread to give up on > constraint calculations until we've built our tree, but we'd like > every thread to join in where possible on constructing the tree. So > we need some way to handle multiple nested Threads:: loops, any of > which may need to do locking. Exactly. > Second: we had decided that we should never call MPI-using Parallel:: > functions from within TBB-using Threads:: functions. > > But here we are. bounding_box() has to use MPI to be accurate on > parallel meshes, and create_dof_constraints needs to be threaded to be > efficient. When bounding_box() needs MPI, every other thread should > be > either finished (the ones internal to bounding_box()) or locked (the > ones waiting to continue compute_*_constraints), so the Parallel:: > call should be safe. But asserting it's safe is no longer as simple > as asserting that we're not in the middle of a Threads:: call. We could just use a compilation unit local spinlock in the Parallel namespace where the MPI functions are and lock / unlock it as we go through the parallel calls. The only issue is speed... I don't know how much overhead that will add when we're calling those functions and we don't need to create / destroy the spinlock. > There are ways to fix all this (an IntIncrement instead of > BoolAcquire, a per-loop lock created by each Threads::parallel_* call, > and a Parallel-specific lock created for MPI calls) but I'm not sure > these are the best fixes; I need to give it some more thought. Anyone > with other ideas feel free to jump in. I'll give it all some more thought as well... but it sounds like we're thinking along the same lines. Derek |