I am experiencing a problem where openhpid becomes unresponsive to all client connections. It returns SA_ERR_HPI_NO_RESPONSE for all API connections. This is in both 3.2.1 and 3.4.0. This is using the IPMI direct plugin.
Here is what happens: We have a rogue process that connects to openhpid via the C API. Then it crashes, and starts up again 1 second later. Crashes, starts up, etc. This is causing openhpid to not release socket descriptors. An "lsof -p" shows 1024 socket descriptors stuck in CLOSE_WAIT. (1024 is the max file descriptor limit for this user on this machine.) When we get into this situation I sent openhpid an ABRT signal, and there are 1024 threads most all of which are blocked on:
Attached to this bug is /var/log/messages with "openhpid -v". The problem starts to happen at 18:43:08.