From: Alex J. <aj...@ge...> - 2014-03-04 15:56:51
|
Hi Mohan, I am running the ipmidirect plugin. When we start to experience the problem, there are a lot of these error messages from /var/log/messages: Mar 4 10:50:30 BL-SWR16-44-3 openhpid[3336]: openhpid: CRIT: server.cpp:237: 0x6927e0 Error or Timeout while reading socket. Mar 4 10:50:30 BL-SWR16-44-3 openhpid[3336]: openhpid: CRIT: server.cpp:237: 0x691480 Error or Timeout while reading socket. I will rerun the test with "-v" turned on to see if anything else shows up. Alex -------------------------------- Hi Alex, syslog may contain many messages. Could you post the messages too. -v option increases the verbosity of the messages. Which plugin you are running? Filing a bug athttps://sourceforge.net/p/openhpi/bugs/ with all the information will help. Regards Mohan On 03/03/2014 03:55 PM, Alex Jones wrote: > Hi All, > > I am experiencing a problem where openhpid becomes unresponsive to > all client connections. It returns SA_ERR_HPI_NO_RESPONSE for all API > connections. This is in both 3.2.1 and 3.4.0. > > Here is what happens: > > We have a rogue process that connects to openhpid via the C API. > Then it crashes, and starts up again 1 second later. Crashes, starts > up, etc. > > This is causing openhpid to not release socket descriptors. > > An "lsof -p" shows 1024 socket descriptors stuck in CLOSED_WAIT. > > When we get into this situation I sent openhpid an ABRT signal, > and there are 1024 threads most all of which are blocked on: > > (gdb) bt > #0 0x00007f13236b41eb in pthread_cond_timedwait@@GLIBC_2.3.2 () > from /lib64/libpthread.so.0 > #1 0x00007f1323dc74c5 in ?? () from /usr/lib64/libgthread-2.0.so.0 > #2 0x00007f13238e0ebf in ?? () from /usr/lib64/libglib-2.0.so.0 > #3 0x00007f13238e1711 in g_async_queue_timed_pop () > from /usr/lib64/libglib-2.0.so.0 > #4 0x0000000000424c0f in oh_dequeue_session_event () > #5 0x00000000004197a4 in saHpiEventGet () > #6 0x000000000040b820 in service_thread(void*, void*) () > #7 0x00007f13239342d8 in ?? () from /usr/lib64/libglib-2.0.so.0 > #8 0x00007f1323931db6 in ?? () from /usr/lib64/libglib-2.0.so.0 > #9 0x00007f13236aff05 in start_thread () from /lib64/libpthread.so.0 > #10 0x00007f1322d8210d in clone () from /lib64/libc.so.6 > > Can someone help me debug this? > > Alex > > |