#1828 openhpid refuses new connections after some time

IPMI Direct plugin
Alex Jones

I am experiencing a problem where openhpid becomes unresponsive to all client connections. It returns SA_ERR_HPI_NO_RESPONSE for all API connections. This is in both 3.2.1 and 3.4.0. This is using the IPMI direct plugin.

Here is what happens:

We have a rogue process that connects to openhpid via the C API.  Then it crashes, and starts up again 1 second later. Crashes, starts up, etc.

This is causing openhpid to not release socket descriptors.

An "lsof -p" shows 1024 socket descriptors stuck in CLOSE_WAIT.  (1024 is the max file descriptor limit for this user on this machine.)

When we get into this situation I sent openhpid an ABRT signal, and there are 1024 threads most all of which are blocked on:

(gdb) bt

0 0x00007f13236b41eb in pthread_cond_timedwait@@GLIBC_2.3.2 ()

from /lib64/libpthread.so.0

1 0x00007f1323dc74c5 in ?? () from /usr/lib64/libgthread-2.0.so.0

2 0x00007f13238e0ebf in ?? () from /usr/lib64/libglib-2.0.so.0

3 0x00007f13238e1711 in g_async_queue_timed_pop ()

from /usr/lib64/libglib-2.0.so.0

4 0x0000000000424c0f in oh_dequeue_session_event ()

5 0x00000000004197a4 in saHpiEventGet ()

6 0x000000000040b820 in service_thread(void, void) ()

7 0x00007f13239342d8 in ?? () from /usr/lib64/libglib-2.0.so.0

8 0x00007f1323931db6 in ?? () from /usr/lib64/libglib-2.0.so.0

9 0x00007f13236aff05 in start_thread () from /lib64/libpthread.so.0

10 0x00007f1322d8210d in clone () from /lib64/libc.so.6

Attached to this bug is /var/log/messages with "openhpid -v". The problem starts to happen at 18:43:08.

