Menu

#2566 amfd: Segfaults in su/node_ccb_completed_delete_hdlr

5.17.07
fixed
Gary Lee
None
defect
amf
d
major
True
2017-09-15
2017-08-31
Hoa Le
No

The issue observed is that if an object in IMM is deleted right before Standby AMFD reading configuration to generate its own data, the callback Stanby AMFD receives afterward may cause Segmentation fault when AMFD tries to delete the already deleted object.

Steps to reproduce:
1. Modify avd_standby_role_initialization() function as below:

diff --git a/src/amf/amfd/role.cc b/src/amf/amfd/role.cc
index ec13c3b..5486ff7 100644
--- a/src/amf/amfd/role.cc
+++ b/src/amf/amfd/role.cc
@@ -299,6 +299,10 @@ uint32_t avd_standby_role_initialization(AVD_CL_CB *cb) {
     goto done;
   }

+  LOG_NO("### 1. Waiting ....");
+  sleep(10);
+  LOG_NO("### 2. Waiting ....");
+
   if (avd_imm_config_get() != NCSCC_RC_SUCCESS) {
     LOG_ER("avd_imm_config_get FAILED, AMF will not start.");
     goto done;
  1. Start SC-1 then SC-2. SC-1 will become the active SC and SC-2 is on standby

  2. Observe syslogs on SC-2, when the first "Waiting" log appears, run the following command on SC-1:
    immcfg -d safSu=PL-3,safSg=NoRed,safApp=OpenSAF

  3. osafamfd on SC-2 will crash and then get rebooted.
    2017-08-31 14:30:50.828 SC-2 osafamfd[246]: NO ### 1. Waiting ....
    2017-08-31 14:30:53.780 SC-2 osafimmnd[206]: NO Ccb 2 COMMITTED (immcfgSC-1491)
    2017-08-31 14:31:00.828 SC-2 osafamfd[246]: NO ### 2. Waiting ....
    2017-08-31 14:31:00.938 SC-2 osafamfd[246]: src/amf/amfd/su.cc:1664: succbcompleteddeletehdlr: Assertion 'su != nullptr' failed.
    2017-08-31 14:31:00.942 SC-2 osafamfnd[256]: ER AMFD has unexpectedly crashed. Rebooting node

  4. Observe syslogs on SC-2, when the first "Waiting" log appears this time, run the following commands on SC-1:
    amf-adm lock safAmfNode=PL-3,safAmfCluster=myAmfCluster
    amf-adm lock-in safAmfNode=PL-3,safAmfCluster=myAmfCluster
    immcfg -a saAmfNGNodeList-=safAmfNode=PL-3,safAmfCluster=myAmfCluster safAmfNodeGroup=PLs,safAmfCluster=myAmfCluster
    immcfg -a saAmfNGNodeList-=safAmfNode=PL-3,safAmfCluster=myAmfCluster safAmfNodeGroup=AllNodes,safAmfCluster=myAmfCluster
    immcfg -d safAmfNode=PL-3,safAmfCluster=myAmfCluster

  5. osafamfd on SC-2 will crash again.
    2017-08-31 14:31:04.448 SC-2 osafamfd[246]: NO ### 1. Waiting ....
    2017-08-31 14:31:07.904 SC-2 osafimmnd[206]: NO Ccb 3 COMMITTED (immcfgSC-1516)
    2017-08-31 14:31:08.016 SC-2 osafimmnd[206]: NO Ccb 4 COMMITTED (immcfgSC-1519)
    2017-08-31 14:31:08.141 SC-2 osafimmnd[206]: NO Ccb 5 COMMITTED (immcfgSC-1522)
    2017-08-31 14:31:14.448 SC-2 osafamfd[246]: NO ### 2. Waiting ....
    2017-08-31 14:31:14.583 SC-2 osafamfnd[256]: ER AMFD has unexpectedly crashed. Rebooting node
    2017-08-31 14:31:14.584 SC-2 osafamfnd[256]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: AMFD has unexpectedly crashed. Rebooting node, OwnNodeId = 131599, SupervisionTime = 60

Attached are backtraces, logs and traces of these two Segmentation faults

3 Attachments

Related

Wiki: ChangeLog-5.17.11

Discussion

  • Hoa Le

    Hoa Le - 2017-08-31
    • Blocker: False --> True
     
  • Gary Lee

    Gary Lee - 2017-08-31
    • status: unassigned --> accepted
    • assigned_to: Gary Lee
     
  • Gary Lee

    Gary Lee - 2017-08-31

    This problem exists in most completed/apply callback handlers.

     
  • Gary Lee

    Gary Lee - 2017-09-05
    • status: accepted --> review
     
  • Gary Lee

    Gary Lee - 2017-09-15

    RELEASE

    commit 15bcca73288b37118d746852642e8fdb633602bd
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Fri Sep 15 13:49:03 2017 +1000

    amfd: harden completed and apply delete callbacks [#2566]
    
    It is possible for an object to be deleted in IMM, before
    a standby SC finishes initialization. Now, if the related
    callbacks are processed by the standby late, then unnecessary
    assertions or null pointer accesses may occur.
    

    DEVELOP

    commit 3a7855b3d8164a0780158cedcd9a202c7ddea1a1
    Author: Gary Lee gary.lee@dektech.com.au
    Date: Fri Sep 15 13:49:03 2017 +1000

    amfd: harden completed and apply delete callbacks [#2566]
    
    It is possible for an object to be deleted in IMM, before
    a standby SC finishes initialization. Now, if the related
    callbacks are processed by the standby late, then unnecessary
    assertions or null pointer accesses may occur.
    
     
  • Gary Lee

    Gary Lee - 2017-09-15
    • status: review --> fixed
     

Log in to post a comment.