#1765 OpenHPI Daemon get stuck if running for couple of days

closed-works-for-me
9
2012-12-21
2012-09-12
preeti sharma
No

Following steps I did:

• I downloaded OpenHPI-3.0.0. Compiled it using below commands:

./configure --prefix=/usr --sysconfdir=/etc --with-varpath=/var/lib/openhpi --enable-ipmi=yes --enable-ilo2_ribcl=yes --enable-oa_soap=yes --enable-debuggable=yes --enable-debug-msgs=yes --enable-cpp_wrappers=yes

make

• I as well downloaded openssl-0.9.8x. Compiled it using below commands (no code change done):

./config shared zlib-dynamic enable-camellia -fPIC -g
make depend
make

Created directory lib in openssl-0.9.8x with below content:
[root@xp4dncc3 openssl-0.9.8x]# ls -l lib
total 7012
lrwxrwxrwx 1 root root 18 Aug 27 12:12 libcrypto.so -> libcrypto.so.0.9.8
-rwxr-xr-x 1 root root 5669613 Aug 26 05:27 libcrypto.so.0.9.8
lrwxrwxrwx 1 root root 15 Aug 27 12:12 libssl.so -> libssl.so.0.9.8
-rwxr-xr-x 1 root root 1488131 Aug 26 05:27 libssl.so.0.9.8

• I had un-tarred OpenHPI-3.0.0 and openssl-0.9.8 in directory /root/preeti/ORIG/
• Only change I did- I had Modified OpenHPI-3.0.0 configuration file to point to openssl-0.9.8x (attached with this mail - to track the changes search openssl-0.9.8x in attachment)
• I am running Daemon using this command:
./openhpid -n -v -c openhpi.conf

(attaching openhpi.conf with the mail)

[root@xp4dncc3 lib-0.9.8x]# pwd
/root/preeti/ORIG/openhpi-3.0.0/lib-0.9.8x

[root@xp4dncc3 lib-0.9.8x]# ls —l
total 17540
lrwxrwxrwx 1 root root 15 Aug 26 05:51 liboa_soap.so -> liboa_soap.so.3
-rwxr-xr-x 1 root root 12588317 Aug 26 05:43 liboa_soap.so.3
-rwxr-xr-x 1 root root 382544 Aug 26 05:44 libopenhpimarshal.so.3
-rwxr-xr-x 1 root root 682411 Aug 26 05:45 libopenhpi.so.3
-rwxr-xr-x 1 root root 520965 Aug 26 05:44 libopenhpi_ssl.so.3
-rwxr-xr-x 1 root root 158444 Aug 26 05:44 libopenhpitransport.so.3
-rwxr-xr-x 1 root root 1451480 Aug 26 05:45 libopenhpiutils.so.3
-rw-r—r—1 root root 4006 Sep 10 14:36 openhpi.conf
-rwxr-xr-x 1 root root 2112715 Aug 26 05:43 openhpid

[root@xp4dncc3 lib-0.9.8x]# echo $LD_LIBRARY_PATH
/root/reeti/ORIG/openhpi-3.0.0/lib-0.9.8x:/root/reeti/ORIG/openssl-0.9.8x
• After running for almost one and half day Daemon got stuck. I discovered it by running client program hpievents . OpenHPI Daemon initially had 6 threads. I tried running hpievents twice each time utility program got stuck and added up a new thread to Daemon now Daemon has total 8 threads running.

[root@xp4dncc3 clients]# ./hpievents
/root/preeti/ORIG/openhpi-3.0.0/clients/.libs/lt-hpievents - This program came with OpenHPI 3.0.0
SAF HPI Version B.03.02

(lt-hpievents:24690): baselib-CRITICAL **: conf.c:487: Client configuration file '/etc/openhpi/openhpiclient.conf' could not be opened
************** timeout:[0] ****************

[root@xp4dncc3 lib-0.9.8x]# ps -eafL|grep openhpi|grep -v grep|wc -l
8

• Gdb attach shows Daemon threads are stuck in lock wait (gdb log is attached with this mail)

Discussion

1 2 > >> (Page 1 of 2)
  • preeti sharma
    preeti sharma
    2012-09-12

    openhpi.conf file

     
    Attachments
  • preeti sharma
    preeti sharma
    2012-09-12

    openhpi configure file

     
    Attachments
  • preeti sharma
    preeti sharma
    2012-09-12

    gdb log with information of threads stuck in wait lock

     
    Attachments
  • dr_mohan
    dr_mohan
    2012-09-12

    • labels: --> HP c-Class Plugin
     
  • dr_mohan
    dr_mohan
    2012-09-12

    • assigned_to: nobody --> hemanthreddy
     
  • preeti sharma
    preeti sharma
    2012-09-12

    • priority: 5 --> 9
     
  • dr_mohan
    dr_mohan
    2012-09-12

    • assigned_to: hemanthreddy --> nobody
    • priority: 9 --> 5
     
  • dr_mohan
    dr_mohan
    2012-09-12

    Preeti,

    Could you please provide the OA version of the target. If you have hpitree output please upload that file. We will take the OA version from there.

    Thanks
    Mohan

     
  • preeti sharma
    preeti sharma
    2012-09-12

     
    Attachments
  • preeti sharma
    preeti sharma
    2012-09-12

    Hi Mohan,
    I have uploaded hpitree log.

    Regards,
    Preeti

     
  • dr_mohan
    dr_mohan
    2012-09-12

    • assigned_to: nobody --> hemanthreddy
     
  • Hi Preeti,

    Could please upload the /var/log/messages and/or openhpid log messages when ran with -n -v option.

    Thanks & Regards
    Hemantha Reddy

     
    • priority: 5 --> 9
     
  • preeti sharma
    preeti sharma
    2012-09-13

    Hi Hemantha,
    I will upload /var/log/messages though openhpid logs were comming in console... if those logs get saved in someplace? As well I have restarted the Daemon.

    Regards,
    Preeti

     
  • Hi Preeti,
    if you just run the openhpid with -n -v -c and redirect all the messages to a file or you can configure the console terminal to log to a file.

    Thanks& Regards,
    Hemantha Reddy

     
  • preeti sharma
    preeti sharma
    2012-09-13

    problem happened in Sep 9 arround 4-5 and persisted till Sep 10th and 11th

     
    Attachments
  • preeti sharma
    preeti sharma
    2012-09-13

    Hi Hemantha,
    I have uploaded the /var/log/messages. Problem occurred on Sep 9th arround 4-5 pm and persisted Sep 10th and 11th until I restarted the Daemon (today Sep 13th server power went down and we restored the power).

    I will re-run the Daemon and collect the logs though problem occurrence frequency varies- sometime within 20mints Daemon restarted it comes in bad state, mostly it take 3-4 days (but we have seen situation where Daemon was ruuning without issue for 20days as well).

    Regards,
    Preeti

     
  • preeti sharma
    preeti sharma
    2012-09-13

    var-log-messages2, log.txt (daemon -n -v log), gdb.log threads stuck shown in gdb

     
    Attachments
  • preeti sharma
    preeti sharma
    2012-09-13

    Hi Hemantha,
    Daemon got stuck for today's session. I have uploaded failure-log.tar.gz . This has below files:

    failure-log/
    failure-log/var-log-messages2 (again sending /var/log/messages)
    failure-log/log.txt (Daemon -n -v log)
    failure-log/gdb.log (stuck threads detail in gdb captured)

    Please let us know if you need further details.
    Regards,
    Preeti

     
  • Praveen
    Praveen
    2012-09-26

    Hi,

    We started with the analysis on this issue. We have tried to reproduce the issue in our environment with similar setup (with OpenHPI 3.0.0 and Onboard Administrator Version 3.55). We have used the same version of openssl (version 0.9.8x).
    We have started the daemon with the above setup and kept it running for 6 days. We didn’t see any issue with daemon even on 7th day. OpenHPI daemon was running fine on 7th day too without any issue along with all the client programs.
    We have gone through the attached logs and we could not get much useful information related to OpenHPI in order to find out the root cause of this issue.
    Could you please help us out in reproducing this issue? Or could you please provide us the exact steps to reproduce this issue?

    Thanks!
    Praveen

     
  • preeti sharma
    preeti sharma
    2012-09-26

    Hi Praveen,
    Just providing more information here. After 7th day did you run hpievent client to verify. In good condition hpievent client shows detail about rpts etc. but when it is stuck it does not show all those detail. We are here able to reproduce problem so consistently that the main application which depends on OpenHPI daemon has broken for us.

    Our platform detail is given here:

    [root@xp4dncc3 lib-0.9.8x]# uname -a
    Linux xp4dncc3 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

    I am even ready to give all compiled OpenHPI/OpenSSL libraries or complete zip.

    Regards,
    Preeti

     
  • Anton Pak
    Anton Pak
    2012-09-26

    Just idea.
    I see that oa_handler->mutex is of GMutex type.
    The documentation says:

    ====================================
    GMutex is neither guaranteed to be recursive nor to be non-recursive, i.e. a thread could deadlock while calling g_mutex_lock(), if it already has locked mutex. Use GStaticRecMutex, if you need recursive mutexes.
    ====================================

    Could it be that thread called g_mutex_lock twice on that mutex?
    I find that non-recursive mutexes are rather difficult to use when you have many functions that call each other and the call stack is deep.

     
  • preeti sharma
    preeti sharma
    2012-09-26

    Hi Praveen,
    I am trying to upload tar.gz of ORIG (which is mentioned in SPR detail)directory but no sucess.

    SPR steps are complete in SPR text. I haven't missed any thing in that. Hence only way I see is to share compiled OpenHPI and OpenSSL directory tree. Please let me know how can I share/send this as I could not upload these.

    Regards,
    Preeti

     
  • preeti sharma
    preeti sharma
    2012-09-27

    Hi Praveen,
    I am giving SPR steps once again (hope this helps). We are facing this issue in atleast 15 test beds and it is reproduciable.

    1. Created directory ORIG . Inside this ORIG directory downloaded OpenHPI and OpenSSL releases as mentioned in SPR.
    2. Untarred OpenHPI and OpenSSL and compiled OpenSSL as per command given in SPR.
    3. Modified configure (modified configure script is in attachements with name configure-0.9.8x.zip) of OpenHpi. Using this configure script as per Steps given in SPR compiled OpenHPI.

    4. Created directory lib-0.9.8x in ORIG/openhpi-3.0.0 . Copied .so and openhpid in this directory from .libs . Directory has below contents (openhpi.conf is in attachements) :

    [root@xp4dncc3 lib-0.9.8x]# ls —l
    total 17540
    lrwxrwxrwx 1 root root 15 Aug 26 05:51 liboa_soap.so -> liboa_soap.so.3
    -rwxr-xr-x 1 root root 12588317 Aug 26 05:43 liboa_soap.so.3
    -rwxr-xr-x 1 root root 382544 Aug 26 05:44 libopenhpimarshal.so.3
    -rwxr-xr-x 1 root root 682411 Aug 26 05:45 libopenhpi.so.3
    -rwxr-xr-x 1 root root 520965 Aug 26 05:44 libopenhpi_ssl.so.3
    -rwxr-xr-x 1 root root 158444 Aug 26 05:44 libopenhpitransport.so.3
    -rwxr-xr-x 1 root root 1451480 Aug 26 05:45 libopenhpiutils.so.3
    -rw-r—r—1 root root 4006 Sep 10 14:36 openhpi.conf
    -rwxr-xr-x 1 root root 2112715 Aug 26 05:43 openhpid

    5. Set LD_LIBRARY_PATH

    [root@xp4dncc3 lib-0.9.8x]# echo $LD_LIBRARY_PATH
    /root/preeti/ORIG/openhpi-3.0.0/lib-0.9.8x:/root/preeti/ORIG/openssl-0.9.8x

    6. Ran OpenHPI Daemon using below command (from path ORIG/openhpi-3.0.0/lib-0.9.8x):

    ./openhpid -n -v -c openhpi.conf

    7. Verified by running openhpi provided client application hpievent at regular intervals. When problem occurs hpievent does not show rpts otherwise it shows- at this time attach to OpenHPI process ID using gdb and verify threads (gdb log is in attachments).

    8. As well when problem happens each run try of hpievent adds a new thread count to openhpi daemon's thread.
    (This might or might not be related to issue though this has been my observation:
    As well I noticed when Daemon is in discovery mode (or still comming up) - each run try of hpievent adds a new thread count to openhpi daemon's thread though these added threads count clears-up when daemon comes-up fully).

    9. Linux OS detail of the m/c I was running OpenHPI Daemon:
    [root@xp4dncc3 lib-0.9.8x]# uname -a
    Linux xp4dncc3 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64
    x86_64 x86_64 GNU/Linux

    I am ready to share ORIG directory. Please let me know how can I do that.

    Regards,
    Preeti

     
  • preeti sharma
    preeti sharma
    2012-09-27

    Hi Praveen,
    problem occurrence frequency varies- sometime within 20mints Daemon restarted it comes in bad state, mostly it take 3-4 days (but we have seen situation where Daemon was ruuning without issue for 20days as well).

    Regards,
    Preeti

     
1 2 > >> (Page 1 of 2)