A couple of things:
1. This only happens with pthreads (ie it doesn't happen with TBB).
2. I can confirm that it is the same symptom on the Intel Phi card.
3. I can confirm that it is alleviated by commenting out those
scoped_lock lines (for both machines).
Pthread locks work by being 0 in their unlocked state and anything
else when they are locked. I can only guess that the constructor for
those mutexes hasn't yet been called to set the initial value of the
lock... therefore it's waiting there.
I won't have access to either of those machines to do more testing
until next week.
For now, I vote for removing those locks as they are unnecessary.
BTW - we're not creating a global LibmeshInit. It gets created in
main like normal.
Sent from my iPad
On Aug 16, 2013, at 8:13 PM, Roy Stogner <roystgnr@...> wrote:
> On Fri, 16 Aug 2013, Derek Gaston wrote:
>> We're seeing hard locks on some machines in Singleton::Setup::Setup()!
>> The problem is that it's trying to create a scoped_lock using a mutex that is defined in that file.
>> Apparently that mutex is not guaranteed to have been initialized at the point where we're calling that function (or
>> something) and it is just hanging while trying to acquire that lock!
> Hmm... remote_elem_mtx should only get constructed at static
> initialization time before main() gets called, and
> RemoteElem::create() should only get called from
> LibMeshInit::LibMeshInit() afterwards.
> You're not creating a global LibMeshInit object, are you?
>> I commented out that scoped_lock line and then the binary runs just fine.
> Hmmm... would you replace that global mutex with two locals? Maybe
> there's some problem with a mutex constructor being called before we
> init TBB?
>> Why do we need to lock in those functions? �Surely the
>> Singleton::Setup stuff is NOT going to get called in a loop.
> You're right; the Setup constructor should be called at static init
> time and the setup() call should be at LibMeshInit constructor time.
>> How do we want to proceed?
> It looks like we've got redundant locks that we can safely get rid
> of... but I'd like to actually *understand* the problem too, and that
> hasn't happened for me yet.