From: Derek G. <fri...@gm...> - 2013-08-17 05:35:36
|
A couple of things: 1. This only happens with pthreads (ie it doesn't happen with TBB). 2. I can confirm that it is the same symptom on the Intel Phi card. 3. I can confirm that it is alleviated by commenting out those scoped_lock lines (for both machines). Pthread locks work by being 0 in their unlocked state and anything else when they are locked. I can only guess that the constructor for those mutexes hasn't yet been called to set the initial value of the lock... therefore it's waiting there. I won't have access to either of those machines to do more testing until next week. For now, I vote for removing those locks as they are unnecessary. BTW - we're not creating a global LibmeshInit. It gets created in main like normal. Derek Sent from my iPad On Aug 16, 2013, at 8:13 PM, Roy Stogner <roy...@ic...> wrote: > > On Fri, 16 Aug 2013, Derek Gaston wrote: > >> We're seeing hard locks on some machines in Singleton::Setup::Setup()! >> The problem is that it's trying to create a scoped_lock using a mutex that is defined in that file. >> Apparently that mutex is not guaranteed to have been initialized at the point where we're calling that function (or >> something) and it is just hanging while trying to acquire that lock! > > Hmm... remote_elem_mtx should only get constructed at static > initialization time before main() gets called, and > RemoteElem::create() should only get called from > LibMeshInit::LibMeshInit() afterwards. > > You're not creating a global LibMeshInit object, are you? > >> I commented out that scoped_lock line and then the binary runs just fine. > > Hmmm... would you replace that global mutex with two locals? Maybe > there's some problem with a mutex constructor being called before we > init TBB? > >> Why do we need to lock in those functions? �Surely the >> Singleton::Setup stuff is NOT going to get called in a loop. > > You're right; the Setup constructor should be called at static init > time and the setup() call should be at LibMeshInit constructor time. > >> How do we want to proceed? > > It looks like we've got redundant locks that we can safely get rid > of... but I'd like to actually *understand* the problem too, and that > hasn't happened for me yet. > --- > Roy |