From: Roy S. <roy...@ic...> - 2010-04-26 21:03:11
|
On Sat, 24 Apr 2010, Boyce Griffith wrote: > It appears that one problem is that the innermost parallel_reduce is > being called within a parallel_for. Regardless of the number of > threads, calls to parallel_XXX call BoolAcquire(in_threads), and so the > innermost call to parallel_reduce results in an assertion failure. That's definitely the root of the problem - BoolAcquire is not designed to be used nested. > I'm not sure that this is the best solution, but everything seems to > work if I comment out the assertion failures in the constructor and > destructor for BoolAcquire. That would have been my solution, too, and it should be just fine in the default, n_threads=1 case, but I suspect there may be a bug in the n_threads>1 case which your workaround is just hiding. Let's see: >>> #8 0x00dbd674 in BoolAcquire (this=0xbfff85cc, b=@0x1b4e7da) at >>> threads.h:64 >>> #9 0x01470936 in parallel_reduce<ConstNodeRange, <unnamed>::FindBBox> >>> (range=@0xbfff8c40, body=@0xbfff8bf4) at threads.h:292 >>> #10 0x01466764 in MeshTools::bounding_box (mesh=@0xbfffd77c) at >>> src/mesh/mesh_tools.C:290 Here's where the inner threaded loop is - bounding_box is multithreaded, naturally. >>> #11 0x0168afec in PointLocatorTree::init (this=0x35656c0, >>> build_type=Trees::NODES) at src/utils/point_locator_tree.C:117 >>> #12 0x0168a96b in PointLocatorTree (this=0x35656c0, mesh=@0xbfffd77c, >>> master=0x0) at src/utils/point_locator_tree.C:42 >>> #13 0x01686f03 in PointLocatorBase::build (t=MeshEnums::TREE, >>> mesh=@0xbfffd77c, master=0x0) at src/utils/point_locator_base.C:64 >>> #14 0x01398e11 in MeshBase::point_locator (this=0xbfffd77c) at >>> src/mesh/mesh_base.C:293 >>> #15 0x00dbd210 in PeriodicBoundaries::neighbor (this=0x354f764, >>> boundary_id=3, mesh=@0xbfffd77c, e=0x3532340, side=3) at >>> src/base/dof_map_constraints.C:1342 >>> #16 0x00eb2ff3 in FEBase::compute_periodic_constraints >>> (constraints=@0x354f740, dof_map=@0x354f610, boundaries=@0x354f764, >>> mesh=@0xbfffd77c, variable_number=0, elem=0x3532340) at >>> src/fe/fe_base.C:1983 >>> #17 0x00fa3a86 in FEInterface::compute_periodic_constraints >>> (constraints=@0x354f740, dof_map=@0x354f610, boundaries=@0x354f764, >>> mesh=@0xbfffd77c, variable_number=0, elem=0x3532340) at >>> src/fe/fe_interface.C:1995 >>> #18 0x00dab668 in operator() (this=0xbfffc6b4, range=@0xbfffc600) at >>> src/base/dof_map_constraints.C:89 >>> #19 0x00dbd28f in parallel_for<ConstElemRange, >>> <unnamed>::ComputeConstraints> (range=@0xbfffc600, body=@0xbfffc6b4) at >>> threads.h:267 >>> #20 0x00dabdf9 in DofMap::create_dof_constraints (this=0x354f610, >>> mesh=@0xbfffd77c) at src/base/dof_map_constraints.C:168 And here's the outer threaded loop - create_dof_constraints is also naturally multithreaded. Hmm.. and there's one possible bug (multiple PointLocator creation) averted here, thanks to Derek, but there's still two very dangerous situations: First: we only use one global spinlock right now. Derek is using said spinlock correctly, to make sure that MeshBase::point_locator isn't creating and leaking multiple PointLocatorTree objects. But the rest of the code feels free to assume that it can also use the same spinlock, because we assume we're never in nested Threads:: loops... That assumption is clearly incorrect. And in this case it's rightfully incorrect - we want all but 1 thread to give up on constraint calculations until we've built our tree, but we'd like every thread to join in where possible on constructing the tree. So we need some way to handle multiple nested Threads:: loops, any of which may need to do locking. Second: we had decided that we should never call MPI-using Parallel:: functions from within TBB-using Threads:: functions. But here we are. bounding_box() has to use MPI to be accurate on parallel meshes, and create_dof_constraints needs to be threaded to be efficient. When bounding_box() needs MPI, every other thread should be either finished (the ones internal to bounding_box()) or locked (the ones waiting to continue compute_*_constraints), so the Parallel:: call should be safe. But asserting it's safe is no longer as simple as asserting that we're not in the middle of a Threads:: call. There are ways to fix all this (an IntIncrement instead of BoolAcquire, a per-loop lock created by each Threads::parallel_* call, and a Parallel-specific lock created for MPI calls) but I'm not sure these are the best fixes; I need to give it some more thought. Anyone with other ideas feel free to jump in. We can take our time on this one, since it looks like there's nothing broken other than an overzealous assertion *right now*, it's just that we've got a design which makes it too easy to accidentally add a deadlock without realizing it. --- Roy |