A circular dependency can be seen when performing a si-swap of safSi=SC-2N,safApp=OpenSAF:
So we have (1) depending on (2) depending on (3) depending on (1)
This results in a temporary deadlock that dramatically slows down NTFD's ability to process its main dispatch loop. The deadlock only lasts for approx. 1 second, when mbcsv_mds_send_msg() times out. But since there could be lots of MBC messages to send, sometimes osafntfimcnd is killed with SIGABRT generating a coredump. The si-swap operation will also timeout.
steps to reproduce
- Run loop of ntftest 32
root@SC-1:~# for i in {1..10}; do ntftest 32; done
- On another terminal, keep swapping 2N Opensaf SI, got coredump after couples of swaps
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
...
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
SC-2 (active) There are a lot of send failures. Each taking approx. 1 second to timeout. During these 1 second timeouts, NTFD cannot process the main dispatch loop. Dec 7 11:01:37.531772 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:37.531781 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:38.537307 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:38.537758 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:38.537766 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:39.543180 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:39.543695 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:39.543698 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:40.545252 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:40.545719 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:40.545726 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:41.551328 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:41.551971 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:41.551979 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:42.557594 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:42.558171 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:42.558179 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:43.564051 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:43.564874 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:43.564883 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:44.572407 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:44.573262 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:44.573271 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:45.575091 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:47.083548 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e
SC-1 (standby) NTFD is trying to terminate osafntfimcnd. While it is doing that, it cannot reply to NTFD on SC-2. Meanwhile, osafntfimcnd is sending NTF notifications to NTFD on SC-1. Dec 7 11:01:35.453151 osafntfd [464:ntfs_imcnutil.c:0316] TR handle_state_ntfimcn: Terminating osafntfimcnd process Dec 7 11:01:45.474313 osafntfd [464:ntfs_imcnutil.c:0124] TR Termination timeout Dec 7 11:01:45.474375 osafntfd [464:ntfs_imcnutil.c:0130] << wait_imcnproc_termination: rc = -1, retry_cnt = 101 Dec 7 11:01:45.474387 osafntfd [464:ntfs_imcnutil.c:0168] TR Normal termination failed. Escalate to abort Dec 7 11:01:45.574703 osafntfd [464:ntfs_imcnutil.c:0172] TR Imcn successfully aborted Dec 7 11:01:45.574712 osafntfd [464:ntfs_imcnutil.c:0187] << timedwait_imcn_exit
Traces from Minh
Diff:
Hi Gary,
Is it the case when some configuration change was done and within not time switchover was triggered? In this case active IMCN was still processing IMM callbacks and sending notification for them. Since switchover got triggered when IMCN was processing CCB calbacks, it will get stuck in saNotificationSend() API as active NTFS was not available. As soon as active NTFS gets available, IMCN will clear its IMM callback list and will be able to act on term signal. But new active NTFS will have some notifications related to switchover for checkpointing to standby, so active NTFS will also get stuck.
Attach is the patch (2219_v0.patch) in which NTFS does not terminate IMCN in csi_set callback. It posts a local event with low priority to NTFS mailbox for terminating IMCN. Since standby will not stuck in csi set callback for IMCN termination, it will allow completion of si-swap operation and other high priority messages.
I think as a long term solution, termination of IMCN must be avoided.
Thanks,
Praveen
Diff:
Hi Praveen
The original problem was seen during an upgrade of OpenSAF, after one controller had been upgraded and SMF was about to upgrade the other. Minh's instruction will also replicate the problem.
I think moving the instantiation/termination out of CSI set callback might be problematic. We might temporarily have no or multiple active / standby ntfimmcn instances.
Also, if an OpenSAF component calls ntf_send() from CSI set callback, during si-swap, the si-swap may still timeout.
Last edit: Gary Lee 2016-12-09
Hi Praveen
Vu is writing a test patch that starts a new thread to kill ntfimmcn. Do you think that would work?
Minh and I talked about it, I think your patch can work if timedwait_imcn_exit() is changed a bit. So instead of waiting for X seconds, it could post another event to try again to keep the main loop flowing.
Hi Gary, MInh,
What I see here is when NTFS sends term singal to IMCN component, the singal handler sigterm_handler() sets the FD. But poll does not gets unblocked for this because for IMM Dispatch is being done with SA_DISPATCH_ALL, so until all callbacks are processesed IMCN will not come out of dispatch. If it comes out then only TERM_EVENT will be processed.
So even if we terminate IMCN in a sepatate thread in NTFS the situation may remain same.
Regarding changing timedwait_imcn_exit() to post another event after first try may work. But in this case also configuration changes are continoulsy being done (testcase suggested by MInh) then Dispatch will remain busy in that single call and will never come out.
I still did not get one thing why IMMDispatch() is not coming out. It is working like a blocking call.
There could be other solutions like:
1)to do IMMDispatch in IMCN in a different thread, so main thread will be ready for any other event. But it will require good amount of code change.
2)Do not terminated IMCN now. Instead it can subscribe for PG tracking of MW SI and based on role it will work. It currently sends notification when it is active. (I never got why this terminated always. IMM and notificatiohn handles are valid across switchvoer and failovers. )
3)Third way could be to post events for sending notifications in mailbox, just like we do in AMF job queues
Thanks,
Praveen
Hi Praveen,
Attached file bt printed from the coredump.
ntfimcnd could not get out of IMMDispatch because it was waiting for response of NtfSend() sync call.
Thanks
Minh
Just want to share our findings so far, not sure it's correct though.
Coredump tells that osafntfimcnd[2697] received changes of ccb 237, and was sending attributeChange notification which has numAdditionalInfo=8
CcbId = 237
...
lengthAdditionalText = 0, numAdditionalInfo = 8, notificationId = 0x6603f0,
CCB 237 committed at 11:01:35
2016-12-07 11:01:35 SC-1 osafimmnd[431]: NO Ccb 237 COMMITTED (multiCcbOwner)
2016-12-07 11:01:35 SC-2 osafimmnd[433]: NO Ccb 237 COMMITTED (multiCcbOwner)
ntfd-SC1 received QUIESCED csi callback at
Dec 7 11:01:34.502019 osafntfd [464:ntfs_amf.c:0175] >> amf_csi_set_callback
Dec 7 11:01:34.502022 osafntfd [464:ntfs_main.c:0271] >> initialize_for_assignment: ha_state = 3
ntfd-SC2 received ACTIVE csi callback at
Dec 7 11:01:35.414545 osafntfd [452:ntfs_amf.c:0175] >> amf_csi_set_callback
Dec 7 11:01:35.414553 osafntfd [452:ntfs_main.c:0271] >> initialize_for_assignment: ha_state = 1
ntf agent - amfd received MDS NEW ACTIVE
Dec 7 11:01:35.415337 osafamfd [486:ntfa_mds.c:0416] T2 MSG from NTFS NCSMDS_NEW_ACTIVE/UP
Dec 7 11:01:35.415356 osafamfd [486:ntfa_util.c:1550] TR Standby NTF server becomes new Active
So osafntfimcnd[2697] should have sent attributeChange around/after 11:01:35.415337
According to both ntfd traces, all notifications were preserved - no loss during switch over. So question is which notification was sent from osafntfimcnd[2697]
The last notification before switchover received at ntfd-SC1 has notifId = 537, and ntfd-SC2 (new active) continued receiving notifId starting from 538
The one sent from osafntfimcnd[2697] should have notifId=541 (because it was the only attributeChange notif having numAdditionalInfo=8 after 11:01:35:415337 and before osafntfimcnd[2697]'s death)
notif=541 was received at mds level
Dec 7 11:01:40.546538 osafntfd [452:ntfsv_enc_dec.c:0856] >> ntfsv_dec_not_msg
And processed in main thread at (delay time ~ 4 seconds)
Dec 7 11:01:44.572702 osafntfd [452:NtfAdmin.cc:0252] T2 New notification received, id: 541
osafntfimcnd[2697] died at
Dec 7 11:01:45.572387 osafntfd [452:ntfs_mds.c:0920] T8 MDS DOWN dest: 2020f000002b6, node ID: 2020f, svc_id: 29
So ntfd-SC2 could not send response to osafntfimcnd[2697]
Dec 7 11:01:45.575322 osafntfd [452:ntfs_com.c:0166] >> notfication_result_lib
Dec 7 11:01:45.575336 osafntfd [452:ntfs_com.c:0172] T4 not_id: 541, rv: 1
...
Dec 7 11:01:47.083431 osafntfd [452:ntfs_com.c:0176] TR ntfs_mds_msg_send FAILED
Looks like there's also a delay of notif 541 from sender-ntfimcnd[2697] to its receiver-[ntfd-SC2] at mds level, here it seems to be ~ 5 secs
MDS docs has mentioned this
Another delay in ntfd-SC2's main thread that could not process notif=541 in time is the circular dependency as Gary described.
We can see many mbscv checkpoints but could not be processed in ntfd-SC1, because it was stuck in termination of ntfimcnd, that results in 1 second delay top up ntfd-SC2 for every mbscv timeout
Last edit: Minh Hon Chau 2016-12-09
Hi Gary, Minh,
Attached 2219_v1.patch is the patch which performs IMM dispatch in a separate thread. I kept 2219_v0.patch also and used class_def.xml for class defintion. It is based on ntfsv_test_classes.xml with only one class and one attribute. object_create.xml creates one object of this class.
With 10 switchovers and continuously running "for i in {0..20000} ; do immcfg -a testUint32Cfg=21 stringRdnCfg=object1; done" no crash observed.
Thanks,
Praveen
Hi Praveen,
I repeat the same test, got ntfd coredump, but it looks like another bug? Thanks, Minh
Hi Minh,
Based on NTFS trace if this crash looks a different issue, please raise a new ticket for this.
Thanks,
Praveen
Sorry, I mean I repeat the same test with the patch V1, and got this coredump, will check the trace then.
V1 worked with the same test - no ntfimcnd coredump. Created #2223 for another ntfd coredump - different issue.
Hi Praveen
V1 looks OK - as an enhancement perhaps the OI handle could be created in the thread itself to avoid passing the CB into the thread, and issues with concurrent access to the OI handle.
Last edit: Gary Lee 2016-12-12
Hi Praveen,
I think, with your V1's patch, there is possiblity of ntf message lost as the IMCN main poll is non-blocking, then SIGTERM could be proceed while ntf msg not yet come to active NTFD.
Other thing, regarding your concern on handling IMCN terminating on a seperated thread,
you stated: "So even if we terminate IMCN in a sepatate thread in NTFS the situation may remain same."
We intend to not handle SIGABORT/SIGKILL in that thread, that thread will block until IMCN process proceed SIGTERM, and come up with new ha_state.
/Vu
I attached the patch here for your ref.
Hi Vu,
I raised the concern for termination in a separate thread because IMCN is blocked in one of the CCB callbacks while sending notification NotificationSend() API. This is happening because of circular depedency as standby NTFS is blocked. We need to remember that when IMCN is blocked, the test case is still continuing and making configuration changes and because of this IMCN is accumulating those callbacks (IMM is not blocked and sending those callbacks to IMCN). These callbacks plus already in the list will be processed once IMCN gets unlocked. Now moving termination in a separate thread will surely allow standby NTFS to proceed and will unblock IMCN, but IMCN component after getting unblocked will still be executing those remaining callbacks. Earliar NTFS used to abort IMCN when terminates gets timed out. So we need gaurantee that thread will surely terminated IMCN.
One more thing, Will the temination by NTFS in a separate thread not lead to accumulation of terminations itself? Suppose NTFS started the thread for termination and termination thread has still not exited and another switchver happens. In this way one more thread will get spawned for termination.
If both these problems are resolved and gauranteed, I do not see any problem with V3 because my gues is the problem of notifcation being lost has been always existed and it will remain so irrespective of v1 or v3 (refined version) approach. The reasons are:
1) NTFS responds to AMF in csi set callback before termination of IMCN component. Because of this during switchover, a time comes when active NTFS has just responded for active assignments and is terminating standby IMCN process. While this is happening quiesced NTFS has got the standby CSI callback and it also has started termination of active IMCN process. Since there is no syncronization between these two terminations though on different nodes, configuration changes may get missed and hence the notification.
2) IMCN is not role aware. So NTFS currently terminates IMCN when it gets standby role andMCN exits without any confirmation that it has cleared its notification list and CCB callback list. Neither patch v1 not v3 handles this.
Note: These is a ticket fot these two (#157 NTF: Improve HA handling in IMCN).
Thanks,
Praveen
Hi Praveen,
Below is my understanding regarding your concerns:
IMCN component after getting unblocked, i think, it will go to
handle_sigterm_event()
for SIGTERM signal handling and exit gracefully.Other thing is, when IMCN is blocking at saImmOiDispatch(), if any IMM configuration changes, all CCB callbacks to the blocking applier may get TIME_OUT, I guess. I see, IMCN sets
IMMA_SYNCR_TIMEOUT
IMM sync timeout to 01 second.This concern is valid. We should have one flag to prevent this happens.
/Vu
Hi Vu,
That was expected behaviour that IMCN after getting unblocked will process sig_term event. For this to happen, it has to come out of IMMDispatch(DISPATCH_ALL). I think IMMND just sends CCB callbacks for atleast create/delete/modify and complete on one go. For CCB Apply IMMND will have to wait for completed response which creates a dependency on IMCN. But I this not synschronous lilke callbacks in other services. So in our test application which is only one client making conf changes, no problem will occur. However in case of multiple clients are making CCB changes then IMMND will be sending these to IMCNand these will be piling up.
Anyways, I do not see much risk but still I would like Minh to clarify the problem enviornment whether configuration changes are done much faster and by multiple clients. I have seen in some IMM tickets where reporter was creating thousands of objects
Yes, by maintaining the flag spawning of mulitple threads can be avoided. But we need to remember here that maintaining this flag does not block a user to trigger swtichvers. So per switchover, restart of IMCN may get missed.
Thanks,
Praveen
Hi
The original specificatinon for IMCN said that it should be able to handle HA state changes. A first increment of IMCN without any HA handling except the possibility to inform the IMCN process if it shall run as active or standby when started. To handle a HA state change the IMCN process must be terminated and restarted given the correct state.
In the "IMCN NTF notification protocol" a special notification was defined (See Error Report Notification in the OpenSAF Extensions PR document) to be sent if there is a risk that any IMM changes may have been missed. This notification is sent every time IMCN is started as active. If a client receives such a notification it has to read all relevant IMM data in order to synchronize itself. After that it can rely on notifications from IMCN until the next Error report notification is received.
The intention was to make IMCN able to change role without beeing restarted and without loosing track of IMM changes (ticket #157). However the first increment was considered good enough so #157 was never implemented.
This means that IMCN can be terminated at any time also if executing an IMM callback function. It does not matter if any IMM change is lost and never reported. Since IMCN is a process and it is the process that is terminated there is no problem with memory leeks or resource leeks. IMM will detect that the client is gone and will finalize all resources.
In NTF the IMCN surveillance thread will also detect that the IMCN process is gone and start a new one and give it the correct state (which may no longer be active)
This means that any solution for immediate termination is Ok e.g. to handle the termination in a separate thread to avoid the described deadlock.
NOTE:
SInce there is no HA handling implemented it is actually not needed to start any IMCN process on any other node than the active. That this is done is also remainings from the original intention.
MUST BE FIXED!
I saw that the init_ntfimcn() has been move from the main initialize() to initialize_for_assignment(). This is NOT OK since initialize_for_assignment() is called more than once meaning that also init_ntfimcn() is called more than once. The init_ntfimcn() function starts the IMCN surveillance thread and that cannot be done more than once!
Other init functions that are called from initialize_for_assignment() has an "is_initialized flag" in order to prevent initialization more than once. This could be a solution to implement for init_ntfimcn() as well.
Thanks
Lennart
Hi Lennart,
Seems the global flag
fully_initialized
in control block is used to ensure thatinitialize_for_assignment
is called once.Another thing I would like to bring up here, that is TIMEOUT for all NTF API is set to 10 seconds (
#define NTFS_WAIT_TIME 1000
). Then, if calling to one of them e.gsaNtfNotificationSend()
, in worse case, the calling thread will be blocked up to 10 seconds and could lead to unexpected result as below syslog when perfoming si-swap:I am not sure if that setting time is ok or not.
/Vu
Hi Vu,
For the NtfSend API(), 10 secs can count on saflogging at server side, writing to disk, ... so I guess we should not change for NtfSend(). For the other APIs() we could see how many critical jobs the server get involved, then we can decide to reduce this timeout, perhaps for thos API() doing finalize jobs
thanks,
Minh
Hi,
I think both patches can solve the problem. V1 (Praveen's) removes the dependency in ntfimcnd, V3 (Vu's) removes the dependency in ntfd. Below is a summary I have collected from previous discussions:
- V3 (Vu's)
. There could be chances that we have no active ntfimcnd: the period of no active imcn happens when both termination threads are terminating imcnd at the same time (there should be one doing earlier), and the active imcnd is stuck in ntfInitialize()
. The period of 2 active ntfimcnd could be longer: In state transition from quiesced->standby, termination thread is ongoing, the old active imcnd could be a bit longer as active one
I think the difficulty in V3 is ntfd can not know things happening in both local and remote ntfimcnd
. When ntfimcnd gets sigterm, the job from dispatching IMM callback to sending notification could be interrupted. If we make sigterm_handler sync with IMM's thread, that could cause signal abort from ntfd due to latency
. But what if loss of notification is always assumpted, and we have special notification to inform clients at ntfimcnd's start up
. IMM thread needs to do synchronization with sigterm_handler on common stuffs: ie immOI handle, ...
Thanks
/Minh