Suggest to create bug tickets.
Keep in mind that was not the only deadlock place.
Over the years I encountered many such hidden issues in the ipmidirect
On Mon, 26 Sep 2011 21:47:12 +0400, David McKinley <dmckinley@...>
> Hi Anton,
> It appears your diagnosis is right on target. The problem seems to be
> in the IpmiSetAutoInsertTimeout() function in
> plugins/ipmidirect/ipmi.cpp. At line #1227 it calls the IfLeave()
> method of the cIpmi object, which calls ReadUnlock(), without first
> setting the read lock.
> The relevant change from 2.16 seems to be in openhpid/plugin.c. In the
> oh_create_handler() function, it now calls
> handler->abi->set_autoinsert_timeout(), which it previously did not do.
> This call maps to the IpmiSetAutoInsertTimeout() function - so it
> appears that this bug was present, but not hit prior to 2.17.
> As far as I can tell, there is no reason that the read lock needs to be
> set in this function, as it does not reference any data that is
> protected by that lock, so I just commented out the call to
> ipmi->IfLeave(), and things now seem to work.
> If you agree that this looks like a bug, then you should also look at
> the NewSimulatorGetAutoExtractTimeout() function in
> plugins/dynamic_simulator/new_sim.cpp, as it seems to have the same
> problem . . . .
>> -----Original Message-----
>> From: Anton Pak [mailto:anton.pak@...]
>> Sent: Monday, September 26, 2011 12:33 AM
>> To: openhpi-devel@...; David McKinley
>> Subject: Re: [Openhpi-devel] Hang/Deadlock in 2.17 release
>> There is a daemon level code in thread #5 working with handler list.
>> There are two locks there: one for handler list and the other for
>> To get access to these locks one should call one of oh_xxx() functions
>> from the daemon.
>> A plug-in usually doesn't work with these locks at all.
>> On my recollection ipmidirect is quite able to successfully lock itself
>> without any help from other threads.
>> It often tries writelock() on the lock that has been already acquired
>> Or to to release lock that none has been acquired before (I suspect it
>> your case).
>> POSIX says these operations lead to UB.
>> Anton Pak
>> On Mon, 26 Sep 2011 06:22:32 +0400, David McKinley
>> > Anton,
>> > First, I pulled down the latest code on the trunk from svn, and
>> > that the problem still exists - it did.
>> > As you suggested, I ran it with gdb, and did back traces for all
>> > threads. This was with the current trunk code. There does seem to
>> be a
>> > deadlock between two threads, though I have not yet been able to
>> > it back to its root. I ran the test two times, and these two threads
>> > were both waiting to get locks in exactly the same places both times,
>> > it seems pretty clear that they are deadlocked.
>> > In the attached, the "interesting" threads are #5 and #3. Thread #3
>> > the one that I can track progress on using the ipmidirect log file,
>> > sure enough, it is blocked waiting on a "write lock" for the domain
>> > object immediately after reading the SEL entries.
>> > I'm guessing that the reason it cannot get that lock is because it is
>> > owned by Thread #3, but I have not been able to verify this. Thread
>> > is waiting on a mutex for the handler. The simplest case would be if
>> > thread #3 grabbed the domain lock first, then tried to get the
>> > lock, while Thread #5 held the handler lock and then went for the
>> > lock. Whether it is this simple, I don't know - but I would be
>> > surprised if the deadlock isn't somehow between these two threads.
>> > Anyway, I'm learning a lot, as I look at it. I'm hoping, though,
>> > you or someone else more familiar with the theory of operation here
>> > see the problem quicker than I am likely to be able to figure it out.
>> > Regards,
>> > David
>> >> -----Original Message-----
>> >> From: Anton Pak [mailto:anton.pak@...]
>> >> Sent: Sunday, September 25, 2011 4:11 AM
>> >> To: openhpi-devel@...; David McKinley
>> >> Subject: Re: [Openhpi-devel] Hang/Deadlock in 2.17 release
>> >> Suggest to run it under gdb and print stack trace for each thread
>> >> it
>> >> hangs.
>> >> Anton Pak
>> >> On Sun, 25 Sep 2011 07:43:04 +0400, David McKinley
>> >> <dmckinley@...>
>> >> wrote:
>> >> > Hello,
>> >> >
>> >> > On my platform, which is a Sun Netra, using the ipmidirect plugin,
>> >> > things seem to work fine on the 2.12, 2.14, and 2.16 release
>> >> but
>> >> > with the 2.17 release code, it hangs during the discovery process.
>> >> > Looking at the log file created by the ipmidirect plugin, it
>> >> > through discovery to the point where it reads the SEL, but then
>> >> > logs anything else, and in particular never logs the message, "BMC
>> >> > Discovery Done". Meanwhile, in clients, calls to saHpiDiscover()
>> >> hang.
>> >> >
>> >> > Backing out all the code changes in the ipmidirect plugin between
>> >> 2.16
>> >> > and 2.17 made no difference (there were very few, and apparently
>> >> > trivial). So, the problem seems to have been introduced
>> >> I
>> >> > looked through the tracker, and did not see any problem like this
>> >> > reported.
>> >> >
>> >> > Given that I'm still very much a newbie in this codebase, I doubt
>> >> that
>> >> > I'll be able to track this down very quickly - and if the plugin
>> >> > working in other platforms, others should judge how much
>> >> to
>> >> > attach to this issue. But, I did want to mention it, as it seems
>> >> like
>> >> > some sort of regression, at least on this platform.
>> >> >
>> >> > David
> All the data continuously generated in your IT infrastructure contains a
> definitive record of customers, application performance, security
> threats, fraudulent activity and more. Splunk takes this data and makes
> sense of it. Business sense. IT sense. Common sense.
> Openhpi-devel mailing list