|
From: minh c. <min...@de...> - 2017-01-31 00:23:10
|
Hi Nagu,
I sent to you my findings a week ago, copy it here again.
Thanks,
Minh
"
Looks like you are testing Nway model, I haven't tested any headless
cases for Nway and NpM model
Jan 23 12:02:55.047416 osafamfd [8625:sg_nway_fsm.cc:0215] >> su_insvc:
'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1', 0
However, after failover SC1, in SC2 the cluster init timer has been
activated again to failover absent assignment
Jan 23 12:04:38.113581 osafamfd [9935:cluster.cc:0055] >>
avd_cluster_tmr_init_evh
failover absent assignment of SU1 started here
Jan 23 12:04:38.114051 osafamfd [9935:sg.cc:2270] >>
failover_absent_assignment: SG:'safSg=AmfDemo_2N,safApp=AmfDemo1'
Jan 23 12:04:38.114055 osafamfd [9935:su.cc:2451] >> any_susi_fsm_in:
SU:'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1', check_fsm:1
Jan 23 12:04:38.114060 osafamfd [9935:su.cc:2456] TR
SUSI:'safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1',
fsm:'1'
Jan 23 12:04:38.114064 osafamfd [9935:su.cc:2459] TR Found
Jan 23 12:04:38.114068 osafamfd [9935:su.cc:2462] << any_susi_fsm_in
Jan 23 12:04:38.114073 osafamfd [9935:sg_nway_fsm.cc:0474] >> node_fail: 0
SU2's assignment has moved to active
Jan 23 12:04:38.114123 osafamfd [9935:siass.cc:0753] >>
avd_susi_mod_send: SI 'safSi=AmfDemo,safApp=AmfDemo1', SU
'safSu=SU2,safSg=AmfDemo_2N,safApp=AmfDemo1' ha_state:1
assignment of SU1 was deleted
Jan 23 12:04:38.117784 osafamfd [9935:siass.cc:0586] >> avd_susi_delete:
safSu=SU1,safSg=AmfDemo_2N,safApp=AmfDemo1 safSi=AmfDemo,safApp=AmfDemo1
...
Jan 23 12:04:38.120109 osafamfd [9935:imm.cc:0275] >> exec: Delete
safCSIComp=safComp=AmfDemo\,safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safCsi=AmfDemo,safSi=AmfDemo,safApp=AmfDemo1
...
Jan 23 12:04:38.120440 osafamfd [9935:imm.cc:0275] >> exec: Delete
safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDemo,safApp=AmfDemo1
So maybe "amf-state siass" had been issued before the failover absent
assignment finished in SC2?
"
On 30/01/17 16:04, Nagendra Kumar wrote:
> Any update ??
>
> Thanks
> -Nagu
>
>> -----Original Message-----
>> From: Nagendra Kumar
>> Sent: 23 January 2017 12:18
>> To: minh chau; han...@er...; Praveen Malviya;
>> gar...@de...
>> Cc: ope...@li...
>> Subject: Re: [devel] [PATCH 2 of 2] AMFND: Fix SC failover during headless
>> sync before standby AMFD comes up [#2162]
>>
>> The logs (Logs-tc.rar) attached in the ticket.
>>
>> Thanks
>> -Nagu
>>
>>> -----Original Message-----
>>> From: minh chau [mailto:min...@de...]
>>> Sent: 16 January 2017 05:47
>>> To: Nagendra Kumar; han...@er...; Praveen Malviya;
>>> gar...@de...
>>> Cc: ope...@li...
>>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
>>> sync before standby AMFD comes up [#2162]
>>>
>>> Hi Nagu,
>>>
>>> I misunderstood your point, and now I get it.
>>> In my test I see it works as expected - SU2 becomes Act and no
>>> assignment for SU1 I guess in your test some how the cluster
>>> initiation timer has not been started on SC2 (new active), there could be a
>> missing case in the patch.
>>> Could you please share me the trace?
>>>
>>> Thanks,
>>> Minh
>>>
>>> On 13/01/17 21:48, Nagendra Kumar wrote:
>>>> Hi Minh,
>>>> Please check my response inlined with [Nagu].
>>>>
>>>> Thanks
>>>> -Nagu
>>>>> -----Original Message-----
>>>>> From: minh chau [mailto:min...@de...]
>>>>> Sent: 13 January 2017 03:53
>>>>> To: Nagendra Kumar; han...@er...; Praveen Malviya;
>>>>> gar...@de...
>>>>> Cc: ope...@li...
>>>>> Subject: Re: [PATCH 2 of 2] AMFND: Fix SC failover during headless
>>>>> sync before standby AMFD comes up [#2162]
>>>>>
>>>>> Hi Nagu,
>>>>>
>>>>> Thanks for reviewing, please see comments inline.
>>>>>
>>>>> Thanks,
>>>>> Minh
>>>>>
>>>>> On 12/01/17 21:48, Nagendra Kumar wrote:
>>>>>> Hi Minh,
>>>>>> Though I am not able to simulate the problem, I tested as below:
>>>>>> 1. Start SC1, SC2, PL-3 and PL-4. Configure SU1 on PL-3 as Act and
>>>>>> SU2 on
>>>>> PL-4 as Standby.
>>>>>> 2. Stop SC1 and SC2 and then stop PL-3.
>>>>>> 3. Start SC-1 and SC-2. When SC-2 prints Cold sync complete, stop
>>>>>> SC1. SC2
>>>>> becomes Act.
>>>>> [M]: As SU1 is on PL3, SU2 is on PL4, and If PL-3 is stopped, then
>>>>> only
>>>>> SU2 has active assignment
>>>> [Nagu]: PL-3 is stopped in step #2.
>>>>>> In this case, SC-2 contains both SU1(Act) and SU2(Standby)
>> assignments.
>>>>>> Ideally, SU2 assignments should have been Act and there shouldn't
>>>>>> be
>>>>>> SU1
>>>>> assignment.
>>>>> [M]: This seems to be another test where SU1 and SU2 are hosted on
>>>>> SC2, then both SU1 and SU2 should get assignment
>>>> [Nagu]: I mean to say command 'amf-state siass' run on SC-1 displays
>>>> both
>>> SU1 and SU2 assignments.
>>>> SU1 and SU2 are hosted on PL-3 and PL-4 respectively.
>>>> This is similar test case, which is mentioned in the ticket?
>> safSISU=safSu=SU1\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>>>>> mo,safApp=AmfDemo1
>>>>>> saAmfSISUHAState=ACTIVE(1)
>>>>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>>>>>
>> safSISU=safSu=SU2\,safSg=AmfDemo_2N\,safApp=AmfDemo1,safSi=AmfDe
>>>>> mo,safApp=AmfDemo1
>>>>>> saAmfSISUHAState=STANDBY(2)
>>>>>> saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
>>>>>>
>>>>>> Please check.
>>>>>>
>>>>>> Thanks
>>>>>> -Nagu
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Minh Hon Chau [mailto:min...@de...]
>>>>>>> Sent: 08 November 2016 08:53
>>>>>>> To: han...@er...; Nagendra Kumar; Praveen
>> Malviya;
>>>>>>> gar...@de...; min...@de...
>>>>>>> Cc: ope...@li...
>>>>>>> Subject: [PATCH 2 of 2] AMFND: Fix SC failover during headless
>>>>>>> sync before standby AMFD comes up [#2162]
>>>>>>>
>>>>>>> osaf/services/saf/amf/amfnd/di.cc | 7 +++++--
>>>>>>> osaf/services/saf/amf/amfnd/susm.cc | 6 ++++++
>>>>>>> 2 files changed, 11 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>>
>>>>>>> This case of SC failover causes new active AMFD getting stuck in
>>>>>>> node_up messages
>>>>>>>
>>>>>>> Say first active controller is SC1, which goes down during headless
>> sync.
>>>>>>> Therefore, the amfnd on SC2 receives mds_down of AVD, then both
>>>>>>> is_avd_down and amfd_sync_required are set to true. When SC2
>>>>>>> takes over active role, amfnd on SC2 receives mds_up, but only
>>>>>>> is_avd_down is set to false and the variable amfd_sync_required
>>> remains true.
>>>>>>> When amfnd-SC2 finishes initiating middleware SU, it needs to
>>>>>>> send su_oper message to AMFD, but it is failed to send out due to
>>>>> amfd_sync_required.
>>>>>>> In this scenario of SC failover, amfd_sync_required needs to set
>>>>>>> to false when amfnd on SC2 receives su_pres message on middleware
>>> SUs.
>>>>>>> That means amfnd on active controller does not need to wait for
>>>>>>> set_leds message, to be informed that cluster initiation is done,
>>>>>>> so that amfnd can sen su_oper messages to AMFD. This logic also
>>>>>>> aligns with normal headless scenario, where amfnd on active
>>>>>>> controller has amfd_sync_required initially marked as false
>>>>>>> because no middleware SUs are initiated. When amfd_sync_required
>>>>>>> is true that means amfnd all middleware SUs are initiated and
>>>>>>> assigned before headless, thus amfnd needs to wait for cluster
>>>>>>> initiation after
>>> headless.
>>>>>>> diff --git a/osaf/services/saf/amf/amfnd/di.cc
>>>>>>> b/osaf/services/saf/amf/amfnd/di.cc
>>>>>>> --- a/osaf/services/saf/amf/amfnd/di.cc
>>>>>>> +++ b/osaf/services/saf/amf/amfnd/di.cc
>>>>>>> @@ -748,7 +748,8 @@ uint32_t avnd_di_oper_send(AVND_CB *cb,
>>>>>>> if (avnd_diq_rec_add(cb, &msg) == nullptr) {
>>>>>>> rc = NCSCC_RC_FAILURE;
>>>>>>> }
>>>>>>> - LOG_NO("avnd_di_oper_send() deferred as AMF
>> director is
>>>>>>> offline");
>>>>>>> + LOG_NO("avnd_di_oper_send() deferred as AMF
>> director is
>>>>>>> offline(%d),"
>>>>>>> + " or sync is required(%d)", cb->is_avd_down,
>>>>>>> +cb->amfd_sync_required);
>>>>>>> } else {
>>>>>>> // We are in normal cluster, send msg to director
>>>>>>> msg.info.avd->msg_info.n2d_opr_state.msg_id =
>> ++(cb-
>>>>>>>> snd_msg_id); @@ -881,7 +882,9 @@ uint32_t
>>>>>>> avnd_di_susi_resp_send(AVND_CB
>>>>>>> rc = NCSCC_RC_FAILURE;
>>>>>>> }
>>>>>>> m_AVND_SU_ALL_SI_RESET(su);
>>>>>>> - LOG_NO("avnd_di_susi_resp_send() deferred as AMF
>>>>> director is
>>>>>>> offline");
>>>>>>> + LOG_NO("avnd_di_susi_resp_send() deferred as AMF
>>>>>>> + director is
>>>>>>> offline(%d),"
>>>>>>> + " or sync is required(%d)",
>>>>>>> + cb->is_avd_down,
>>>>>>> + cb->amfd_sync_required);
>>>>>>> +
>>>>>>> } else {
>>>>>>> // We are in normal cluster, send msg to director
>>>>>>> msg.info.avd->msg_info.n2d_su_si_assign.msg_id =
>>>>>>> ++(cb-
>>>>>>>> snd_msg_id); diff --git a/osaf/services/saf/amf/amfnd/susm.cc
>>>>>>> b/osaf/services/saf/amf/amfnd/susm.cc
>>>>>>> --- a/osaf/services/saf/amf/amfnd/susm.cc
>>>>>>> +++ b/osaf/services/saf/amf/amfnd/susm.cc
>>>>>>> @@ -1345,6 +1345,12 @@ uint32_t
>>> avnd_evt_avd_su_pres_evh(AVND_C
>>>>>>> goto done;
>>>>>>> }
>>>>>>> } else { /* => instantiate the su */
>>>>>>> + // Do not need to wait for headless sync if there is no
>>>>>>> application SUs
>>>>>>> + // initiated. This is known because here we are
>> receiving
>>>>>>> su_pres message
>>>>>>> + // for NCS SUs
>>>>>>> + if (su->is_ncs == true)
>>>>>>> + cb->amfd_sync_required = false;
>>>>>>> +
>>>>>>> AVND_EVT *evt_ir = 0;
>>>>>>> TRACE("Sending to Imm thread.");
>>>>>>> evt_ir = avnd_evt_create(cb, AVND_EVT_IR, 0,
>> nullptr, &info-
>>>>>>>> su_name, 0, 0);
>> ------------------------------------------------------------------------------
>> Check out the vibrant tech community on one of the world's most engaging
>> tech sites, SlashDot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Opensaf-devel mailing list
>> Ope...@li...
>> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
|