Ticket [#191] introduced generic support in OpenSAF for DNs longer than 255 bytes. Each individual OpenSAF service will also have to be adapted to support long DNs. This ticket is for AMF to support this feature.
Moreover, this ticket also contains some enhancements:
1) Refactor internal SaNameT to std::string in amfd, amfnd.
2) Refactor PATRICIA trees to std::map in amfnd.
Tickets: #1571
Tickets: #1614
Tickets: #1642
Tickets: #191
Tickets: #1989
Wiki: NEWS-5.1.0
AMFD patches
Last edit: Gary Lee 2016-07-25
AMFND part.
Attached are the crashes observed. Crashes in AMFND may not be reated to #1642.
amfd_crash2.tgz and amfd_crash2.tgz.
amfd_crash1.tgz
Praveen,
Can you reproduce amfd_crash1.tgz? Is it possible to run with valgrind to help me isolate which part of avsv_dnd_msg_free() is failing?
amfd_crash2.tgz should be fixed with "amfd: do not call memset on classes with string members" patch.
Hi Gary,
Attached is the configuration for reproducing amfd_crash1 on top of long dn patches.
Step to reproduce:on standby controller kill component of SU2.
Below patch fixes this crash:
diff --git a/osaf/services/saf/amf/amfd/sgproc.cc b/osaf/services/saf/amf/amfd/sgproc.cc
--- a/osaf/services/saf/amf/amfd/sgproc.cc
+++ b/osaf/services/saf/amf/amfd/sgproc.cc
@@ -1129,8 +1129,6 @@ void avd_su_si_assign_evh(AVD_CL_CB *cb,
- avsv_dnd_msg_free(n2d_msg);
- evt->info.avnd_msg = nullptr;
goto done;
} else if ((susi->state == SA_AMF_HA_QUIESCING)
&& (susi->fsm != AVD_SU_SI_STATE_UNASGN)) {
Publisehd patch "amfd: do not call memset on classes with string members" fixes second crash.
Thanks,
Praveen
Last edit: Praveen 2016-08-04
Hi Praveen,
Can you please also give us the steps to reproduce the amfnd coredumps?
I would like to check if it relates to #1642 or not since you did not apply the amfnd longDN patches in your test.
Thanks,
Long Nguyen.
Hi Long,
For amfnd_crash1, I have raised #1935. It is an older issue and #1935 contains details of steps etc.
I will get back to you for other amfnd crash.
Thanks,
Praveen
Hi Long, Gary.
In the published patches along with long dn, there chages related to patricia conversion to map in amfnd and internal names to std::string in AMFD. Please update the description of this ticket for these changes.
Thanks,
Praveen
Diff:
Related
Tickets:
#191Hi Praveen,
I have updated the ticket description.
Best regards,
Long Nguyen.
Hi Long,
I think amfnd_crash2 is also because of #1935.
From traces, here also after node-switchover recovery SUs were deleted and again added:
1)comp fault with node-swtichvoer recovery let to node geting disabled:
Jul 30 19:08:57.524107 osafamfnd [5465:di.cc:0929] << avnd_di_object_upd_send: 1
Jul 30 19:08:57.524124 osafamfnd [5465:err.cc:0403] NO 'safComp=Comp3,safSu=SU1,safSg=SG1,safApp=App1' faulted due to 'errorReport' : Recovery is 'nodeSwitchover'
Jul 30 19:08:57.524128 osafamfnd [5465:err.cc:0513] >> a
2)After clean up in SU1, AMFND infroms AMFD for nodeswitchover recovery:
Jul 30 19:08:57.616687 osafamfnd [5465:susm.cc:1465] NO Terminated all components in 'safSu=SU1,safSg=SG1,safApp=App1'
Jul 30 19:08:57.616702 osafamfnd [5465:susm.cc:1467] NO Informing director of Nodeswitchover
Jul 30 19:08:57.616708 osafamfnd [5465:di.cc:0724] >> avnd_di_oper_send: SU '0x68ea50', recv '4'
Jul 30 19:08:57.616717 osafamfnd [5465:di.cc:0737] TR SU 'safSu=SU1,safSg=SG1,safApp=App1, su oper '2'
3) SU2 after removal of assignments was terminated by Lock-in operation:
Jul 30 19:09:27.675391 osafamfnd [5465:susm.cc:1322] TR SU term state is set to true
Jul 30 19:09:27.675396 osafamfnd [5465:susm.cc:1327] TR Marking SU as terminated by admin operation
4)After that SUs are deleted:Sg=SG1,safApp=App1'
Jul 30 19:09:31.888880 osafamfnd [5465:compdb.cc:0570] IN Deleted 'safComp=Comp1,safSu=SU1,safSg=SG1,safApp=App1'
Jul 30 19:09:31.889481 osafamfnd [5465:compdb.cc:0570] IN Deleted 'safComp=Comp2,safSu=SU1,safSg=SG1,safApp=App1'
Jul 30 19:09:31.890247 osafamfnd [5465:compdb.cc:0570] IN Deleted 'safComp=Comp3,safSu=SU1,safSg=SG1,safApp=App1'
Jul 30 19:09:31.890671 osafamfnd [5465:compdb.cc:0570] IN Deleted 'safComp=Comp4,safSu=SU1,safSg=SG1,safApp=App1'
Jul 30 19:09:31.891391 osafamfnd [5465:sudb.cc:0230] IN Deleted 'safSu=SU1,safSg=SG1,safApp=App1'
Jul 30 19:09:31.922104 osafamfnd [5465:compdb.cc:0570] IN Deleted 'safComp=Comp1,safSu=SU2,safSg=SG1,safApp=App1'
Jul 30 19:09:31.922951 osafamfnd [5465:compdb.cc:0570] IN Deleted 'safComp=Comp2,safSu=SU2,safSg=SG1,safApp=App1'
Jul 30 19:09:31.924203 osafamfnd [5465:compdb.cc:0570] IN Deleted 'safComp=Comp3,safSu=SU2,safSg=SG1,safApp=App1'
Jul 30 19:09:31.924691 osafamfnd [5465:compdb.cc:0570] IN Deleted 'safComp=Comp4,safSu=SU2,safSg=SG1,safApp=App1'
Jul 30 19:09:31.925298 osafamfnd [5465:sudb.cc:0230] IN Deleted 'safSu=SU2,safSg=SG1,safApp=App1'
5)After deletion SUs were again added:
6)Jul 30 19:16:25.542247 osafamfnd [5465:su.cc:0126] >> avnd_evt_avd_reg_su_evh
Jul 30 19:16:25.542252 osafamfnd [5465:sudb.cc:0082] >> avnd_sudb_rec_add: SU'safSu=SU1,safSg=SG1,safApp=App1'
Jul 30 19:16:25.542258 osafamfnd [5465:compdb.cc:1837] >> avnd_comp_config_get_su: SU'safSu=SU1,safSg=SG1,safApp=App1'
Jul 30 19:16:25.567762 osafamfnd [5465:su.cc:0126] >> avnd_evt_avd_reg_su_evh
Jul 30 19:16:25.567767 osafamfnd [5465:sudb.cc:0082] >> avnd_sudb_rec_add: SU'safSu=SU2,safSg=SG1,safApp=App1'
Jul 30 19:16:25.567773 osafamfnd [5465:compdb.cc:1837] >> avnd_comp_config_get_su: SU'safSu=SU2,safSg=SG1,safApp=App1'
6)After this Sus went throrugh instantiation phase and as per lsat log in AMFND crashed at:
Jul 30 19:16:34.447354 osafamfnd [5465:di.cc:0929] << avnd_di_object_upd_send: 1
Jul 30 19:16:34.447360 osafamfnd [5465:susm.cc:1944] << avnd_su_pres_uninst_suinst_hdler: 1
Jul 30 19:16:34.447366 osafamfnd [5465:susm.cc:1457] T1 Exited SU presence state FSM: New State = 2
Which means perform_pending_nodeswitchover() was called inside avnd_su_pres_fsm_run(). Inside
perform_pending_nodeswitchover(), AMFND crashes illegal memory as avnd_cb->failed_su points to an illegal memory
AVND_SU *su = avnd_cb->failed_su;
for (comp = m_AVND_COMP_FROM_SU_DLL_NODE_GET(m_NCS_DBLIST_FIND_FIRST(&su->comp_list));
comp;
Thanks,
Praveen
Hi Gary,
Replacement of m_CMP_HORDER_SANAMET() with std::compare() in AMFD alters the way SI populates list_of_csi. Because of this old AMFD and new AMFD creates different COMPCSI.
Both AMFD creates assignment without violating any AMF criteria. But it can lead to problem from upgrade perspective.
I have attached traces and configuration from both new and old AMFD.
Steps to reproduce:
1)immcfg -f compare.xml.
2)Then immcfg -f csi_add.xml to add a new CSI in the same SI. At this point of time, SI in both AMFDs will have list_of_csi with different order.
3)When application was brought up both AMFDs prepared compcsis differenlty because of different ordering:
old_amfd:
Aug 8 12:26:52.156079 osafamfd [10702:csi.cc:1124] >> avd_compcsi_create: Comp'safComp=AmfDemo,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' and Csi'safCsi=Norm1,safSi=AmfDemo,safApp=AmfDemo1'
Aug 8 12:26:52.156114 osafamfd [10702:csi.cc:1124] >> avd_compcsi_create: Comp'safComp=AmfDemo1,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' and Csi'safCsi=Norm3,safSi=AmfDemo,safApp=AmfDemo1'
Aug 8 12:26:52.156137 osafamfd [10702:csi.cc:1124] >> avd_compcsi_create: Comp'safComp=AmfDemo2,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' and Csi'safCsi=Norm11,safSi=AmfDemo,safApp=AmfDemo1'
new amfd:
fDemo,safApp=AmfDemo1' and Csi'safCsi=Norm1,safSi=AmfDemo,safApp=AmfDemo1'
Aug 8 12:26:51.974686 osafamfd [18560:csi.cc:1154] >> avd_compcsi_create: Comp'safComp=AmfDemo1,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' and Csi'safCsi=Norm11,safSi=AmfDemo,safApp=AmfDemo1'
Aug 8 12:26:51.974736 osafamfd [18560:csi.cc:1154] >> avd_compcsi_create: Comp'safComp=AmfDemo2,safSu=SU1,safSg=AmfDemo,safApp=AmfDemo1' and Csi'safCsi=Norm3,safSi=AmfDemo,safApp=AmfDemo1'
In the Attachment find more information with extra traces.
This need to be checked at other places also wherever there is any such replacement in AMFD, AMFND and AMFA.
Thanks,
Praveen
Thanks - I will check and see if I can change the code to produce the same order.
Hi Praveen
I've sent a patch that seems to fix the issue raised with string::compare. Please check it.
Thanks
Hi Gary,
Patch looks fine. I will rerun some failed test cases.
Thanks,
Praveen
Hi Gary,
-Creation of CSI deps, dynamically, fails with error:
immcfg -a saAmfCSIDependencies+=safCsi=LOG,safSi=SC-2N,safApp=OpenSAF safCsi=CLM,safSi=SC-2N,safApp=OpenSAF
error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21)
OI reports: IMM: Validation abort: Completed validation fails (ERR_BAD_OPERATION)
OI reports: 'safCsi=CLM,safSi=SC-2N,safApp=OpenSAF' is not in the same SI as 'safCsi=LOG,safSi=SC-2N,safApp=OpenSAF'
Minor correction is required. It should be allowed as both the CSIs are of same SI.
-Other case is related to deldeion of csi attriburte value dynamically.
1)immcfg -f amf_demo_csiattr.xml (In attachement).
2) immcfg -a saAmfCSIAttriValue+=CCCC safCsiAttr=AmfDemo1,safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1
3) immcfg -a saAmfCSIAttriValue-=CCCC safCsiAttr=AmfDemo1,safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1
error - saImmOmCcbApply FAILED: SA_AIS_ERR_FAILED_OPERATION (21)
OI reports: IMM: Validation abort: Completed validation fails (Error code: 21)
OI reports: csi attr name 'AmfDemo1' and value 'CCCC' doesn't exist
Deletion of added value should be allowed. It allows deletion of valued that were present when whole conf was added. Only dynamically added value is not being allowed for deletion which is a supported case.
For the csi deps isuse below patch works:
diff --git a/osaf/services/saf/amf/amfd/csi.cc b/osaf/services/saf/amf/amfd/csi.cc
--- a/osaf/services/saf/amf/amfd/csi.cc
+++ b/osaf/services/saf/amf/amfd/csi.cc
@@ -593,6 +593,7 @@ static SaAisErrorT csi_ccb_completed_mod
int i = 0;
AVD_CSI *csi = csi_db->find(Amf::to_string(&opdata->objectName));
const std::string object_name(Amf::to_string(&opdata->objectName));
+ std::string si_name;
@@ -629,7 +630,8 @@ static SaAisErrorT csi_ccb_completed_mod
}
- if (object_name.find(required_dn) == std::string::npos) {
+ avsv_sanamet_init(required_dn, si_name, "safSi");
+ if (object_name.find(si_name) == std::string::npos) {
report_ccb_validation_error(opdata,
"'%s' is not in the same SI as '%s'",
object_name.c_str(), required_dn.c_str());
AMF demo which also confgiures csiattr.
Hi Gary,
One minor issue I am seeing when running long dn supported amf demo. Sysylog says:
"The allocated memory is not large enough, the object will be truncated." This is while sending a state change notification. Since su and si names are long dn, length of additional text should be dynamically calcuated or increment to a long fixed value.currenlty ntf.h defines it as:
#define ADDITION_TEXT_LENGTH 320
Some minor comments and queries:
1) in avd_si_get(), trace leave should be above return statement.
2) In amfnd patches: Call to functions like amf_cbk_copy() from amfnd/util.cc are replaced by there equivalents like avsv_amf_cbk_copy() from osaf/libs/common/amf/n2avamsg.c.So these functions. I think. needs to be cleaned. is there any reason of keeping them
3) In amfnd patches, memory allocation with 'new' operator is replaced with calloc(). Any reason for doing this?
4) SInce new file amf_db_template.h is introduced, do we still need to keep old db_template.h at amfd.
Thanks,
Praveen
Hi Praveen,
Please see my answers for 2, 3, 4.
2) OK. We should clean up the util.cc.
3) Because the memory allocation is now free'ed in the common libs (using C), we use calloc to avoid mismatching between 'new' (C++ style) anf 'free' (C style).
4) No, we should use the new file.
Thanks,
Long Nguyen.