Menu

#2441 smf: coredump and syslog flood after immnd crash

5.17.11
fixed
nobody
None
defect
smf
d
major
False
2017-10-30
2017-04-27
No

Seen in opensaf version: 183d7c379a8f
short ID: 8190

SMF shall handle the return code ERR_BAD_HANDLE in a better way probably by reinitializing and creating a new handle. ERR_BAD_HANDLE can happen when IMMND crashes and is still reinitializing.

These lines are flooding the system and trace log:

5:09:43.862034 osafsmfd [27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107] WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16
5:09:43.862042 osafsmfd [27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101] >> smfd_imm_trylock

SMF backtrace

### BT FULL ###
#0 0x00007f04a27e50c7 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f04a27e6478 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00007f04a4d1afee in __osafassert_fail (__file=<optimized out>, __line=<optimized out>, __func=<optimized out>, __assertion=<optimized out>) at ../../../../../../opensaf/osaf/libs/core/leap/sysf_def.c:281
No locals.
#3 0x0000000000411f8f in updateImmAttr (dn=<optimized out>, attributeName=0x47db5b "saSmfCmpgElapsedTime", attrValueType=SA_IMM_ATTR_SATIMET, value=0x1d89cb8) at ../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_campaign_oi.cc:773
rc = SA_AIS_ERR_BAD_OPERATION
__FUNCTION__ = "updateImmAttr"
#4 0x000000000040f129 in SmfCampaign::updateElapsedTime (this=this@entry=0x1d89c90) at ../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:930
updateTime = <optimized out>
diffTime = <optimized out>
timeStamp = {
tv_sec = 1490843664,
tv_usec = 109102
}
#5 0x000000000040f169 in SmfCampaign::stopElapsedTime (this=0x1d89c90) at ../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaign.cc:958
No locals.
#6 0x0000000000440bee in SmfCampState::changeState (this=this@entry=0x7f048c014b70, i_camp=i_camp@entry=0x7f048c001220, i_state=0x7f048c00d3d0) at ../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:224
__FUNCTION__ = "changeState"
newState = {
static npos = <optimized out>,
_M_dataplus = {
<std::allocator<char>> = {
<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
_M_p = 0x7f048c12ae48 "SmfCampStateExecFailed"
}
}
oldState = {
static npos = <optimized out>,
_M_dataplus = {
<std::allocator<char>> = {
<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
_M_p = 0x7f048c02f108 "SmfCampStateExecuting"
}
}
#7 0x0000000000443faa in SmfCampStateExecuting::procResult (this=0x7f048c014b70, i_camp=0x7f048c001220, i_procedure=<optimized out>, i_result=<optimized out>) at ../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampState.cc:947
error = {
static npos = <optimized out>,
_M_dataplus = {
<std::allocator<char>> = {
<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
members of std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Alloc_hider:
_M_p = 0x7f048c010f48 "Procedure safSmfProc=SingleStep_upgrade_SCs failed"
}
}
__FUNCTION__ = "procResult"
result = <optimized out>
#8 0x000000000042500a in SmfUpgradeCampaign::procResult (this=0x7f048c001220, i_procedure=0x7f048c10afe0, i_result=SMF_PROC_FAILED) at ../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfUpgradeCampaign.cc:955
__FUNCTION__ = "procResult"
campResult = <optimized out>
#9 0x000000000040cdb1 in SmfCampaignThread::processEvt (this=0x1d8d310) at ../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:653
evt = 0x7f0490001b40
#10 0x000000000040cf48 in SmfCampaignThread::handleEvents (this=this@entry=0x1d8d310) at ../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:699
ret = <optimized out>
__FUNCTION__ = "handleEvents"
fds = {{
fd = 25,
events = 1,
revents = 1
}}
#11 0x0000000000408253 in SmfCampaignThread::main (this=this@entry=0x1d8d310) at ../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:760
__FUNCTION__ = "main"
#12 0x0000000000408352 in SmfCampaignThread::main (info=0x1d8d310) at ../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/SmfCampaignThread.cc:109
__FUNCTION__ = "main"
self = 0x1d8d310
#13 0x00007f04a3a110a4 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#14 0x00007f04a289502d in clone () from /lib64/libc.so.6
No symbol table info available.

The following lines flooded all syslog messages of SC-1
Mar 30 5:09:43.862001 osafsmfd [27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0101] >> smfd_imm_trylock
Mar 30 5:09:43.862034 osafsmfd [27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_main.c:0107] WA Lock failed eith EBUSY pthread_mutex_trylock for imm 16

Coredump happened:
Mar 30 5:14:24.109139 osafsmfd [27207:../../../../../../../opensaf/osaf/libs/agents/saf/imma/imma_oi_api.c:2519] ER ERR_BAD_OPERATION: The SaImmOiHandleT is not associated with any implementer name
Mar 30 5:14:24.109204 osafsmfd [27207:../../../../../../../opensaf/osaf/services/saf/smfsv/smfd/smfd_campaign_oi.cc:0772] ER updateImmAttr(): immutil_update_one_rattr FAILED, rc = 20, going to assert

Related

Wiki: ChangeLog-5.17.11

Discussion

  • Anders Widell

    Anders Widell - 2017-07-28
    • Milestone: 5.17.07 --> 5.17.10
     
  • Rafael Odzakow

    Rafael Odzakow - 2017-08-14
    • Priority: major --> minor
     
  • Rafael Odzakow

    Rafael Odzakow - 2017-08-14
     
  • Rafael Odzakow

    Rafael Odzakow - 2017-08-14

    Setting it to minor until it shows up again.

     
  • elunlen

    elunlen - 2017-09-11
    • status: unassigned --> accepted
    • assigned_to: Rafael Odzakow --> elunlen
     
  • elunlen

    elunlen - 2017-10-06
    • Priority: minor --> major
     
  • elunlen

    elunlen - 2017-10-06

    The following seems to happen:
    SMF has lost the OI handle (BAD HANDLE) and has started recovery of the OI in a background thread. Before the recovery is finished an attempt to update a cached runtime attribute is done. This fails with BAD OPERATION since SMF is not (yet) administrator of the runtime object SMF is trying to update. The window when this can happen is probably after the OI handle is recovered (saImmOiInitialize_2()) but before saImmOiImplementerSet() is done

     

    Last edit: elunlen 2017-10-06
  • elunlen

    elunlen - 2017-10-18
    • status: accepted --> review
     
  • elunlen

    elunlen - 2017-10-19
    • status: review --> fixed
    • assigned_to: elunlen --> nobody
     
  • elunlen

    elunlen - 2017-10-19

    commit 1c58a2106a55ad212a8e296424b1f20508eeb9cd
    Author: Lennart Lund lennart.lund@ericsson.com
    Date: Thu Oct 19 15:17:27 2017 +0200

    smf: coredump and syslog flood after immnd crash [#2441]
    
    When reinitializing the OI handle, done in a separate thread, then keep the
    new handle in a local variable until the whole OI including OI set is done
    When finished the new handle can be published in the global cb structure.
    Also protect global variable change with imm lock mutex
    
     

Log in to post a comment.