A circular dependency can be seen when performing a si-swap of safSi=SC-2N,safApp=OpenSAF:
So we have (1) depending on (2) depending on (3) depending on (1)
This results in a temporary deadlock that dramatically slows down NTFD's ability to process its main dispatch loop. The deadlock only lasts for approx. 1 second, when mbcsv_mds_send_msg() times out. But since there could be lots of MBC messages to send, sometimes osafntfimcnd is killed with SIGABRT generating a coredump. The si-swap operation will also timeout.
steps to reproduce
- Run loop of ntftest 32
root@SC-1:~# for i in {1..10}; do ntftest 32; done
- On another terminal, keep swapping 2N Opensaf SI, got coredump after couples of swaps
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
...
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
SC-2 (active) There are a lot of send failures. Each taking approx. 1 second to timeout. During these 1 second timeouts, NTFD cannot process the main dispatch loop. Dec 7 11:01:37.531772 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:37.531781 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:38.537307 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:38.537758 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:38.537766 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:39.543180 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:39.543695 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:39.543698 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:40.545252 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:40.545719 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:40.545726 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:41.551328 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:41.551971 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:41.551979 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:42.557594 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:42.558171 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:42.558179 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:43.564051 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:43.564874 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:43.564883 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:44.572407 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:44.573262 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e Dec 7 11:01:44.573271 osafntfd [452:mbcsv_mds.c:0209] TR send type MDS_SENDTYPE_REDRSP: Dec 7 11:01:45.575091 osafntfd [452:mbcsv_mds.c:0247] << mbcsv_mds_send_msg: failure Dec 7 11:01:47.083548 osafntfd [452:mbcsv_mds.c:0185] >> mbcsv_mds_send_msg: sending to vdest:e
SC-1 (standby) NTFD is trying to terminate osafntfimcnd. While it is doing that, it cannot reply to NTFD on SC-2. Meanwhile, osafntfimcnd is sending NTF notifications to NTFD on SC-1. Dec 7 11:01:35.453151 osafntfd [464:ntfs_imcnutil.c:0316] TR handle_state_ntfimcn: Terminating osafntfimcnd process Dec 7 11:01:45.474313 osafntfd [464:ntfs_imcnutil.c:0124] TR Termination timeout Dec 7 11:01:45.474375 osafntfd [464:ntfs_imcnutil.c:0130] << wait_imcnproc_termination: rc = -1, retry_cnt = 101 Dec 7 11:01:45.474387 osafntfd [464:ntfs_imcnutil.c:0168] TR Normal termination failed. Escalate to abort Dec 7 11:01:45.574703 osafntfd [464:ntfs_imcnutil.c:0172] TR Imcn successfully aborted Dec 7 11:01:45.574712 osafntfd [464:ntfs_imcnutil.c:0187] << timedwait_imcn_exit
Hi,
I think, as Minh has pointed out 10 second is plausbile time and may depend on the type of API. If NTFs reads notification from a file(not supported now), then a not only a reader API may have to wait for long time but also an API which is related to less critical activity. In this way, response time for less critical APIs may also increase . In that sense 10 seconds may become some average time uniformly for all the APIs. SAF API interface is designed for use by both threaded and non-threaded application processes, this can help application to optimize their service requests.
Regarding the patch for this ticket, I see both the approaches needs refinement. I think termination of IMCN can be avoided by making it role aware by way of RDE callback mechanism (Not the #157 way). Upon role change from standby to active, IMCN can send the error notification to the user giving impression that it has been restarted. If Imm handle goes BAD then IMCN will exit and survillence thread will start it in correct role. So in this case NTFS wll not be terminaing it. I am still evaluating this.
Thanks,
Praveen
Hi,
Attached is the patch 2219_v4.patch which improves following things:
1)IMCN subscribes for RDE callback for role change events. Also initial role also it takes calling RDE API.
2)IMCN will not be terminated on role change.
3)When IMCN goes from standby to active it will send the notification to inform user as if it is restarted.
NTFS will still be starting IMCN process and will be monitoring its thorugh suvillence thread. If IMCN process crashed or exits it will started by NTFS.
Thanks,
Praveen
Hi Praveen, I'm testing/looking through the patch. Thanks, Minh
Hi Praveen,
With the V4 patch looks good, but wondering below case is valid.
When the active IMCN on SC-2 is blocking at saImmOiDispatch(), then do si-swap. On SC-2, standby IMCN will take active role while there's still other active instance running on SC-2.
If any change on IMM attributes, will notify the change to these two active instances. As result, the NTF msg could be duplicated.
I think, if it is valid, we can consider to change SA_DISPATCH_ALL to SA_DISPATCH_ONE, so that the ha_state can be updated as quick as possible.
/Vu
Hi Praveen,
Other concern is that, regarding roaming SC. I am not sure how RDE callback works in case of roaming for IMCN.
With V4, NTFD will activate IMCN when getting RDE callback. Then IMCN will register itself to RDE. So, my point is IMCN could get notification for his late registration?
/Vu
Hi Praveen
I have also looked at this and have seen much the same as Vu already has pointed out. There is also another problem and I hope I manage to explain in an understandable way. Let's say that both active (Aa) and standby (Sb) are acting as appliers (only A will send notifications). Then both appliers will receive the IMM info and execute the CCB callbacks. However, they will not get the information at the same time. It is also not known which applier that gets the information first. This means that it is possible that when a state change happens the new active will get the apply callback for the same change information that the previous active has already got and sent a notification for. Let’s say we improve by only letting the active run as an applier but we still end up with a synchronization problem.
It is probably ok to use RDE callback to know when HA state is changed instead of having NTF terminate imcn as before. It is no big concern if a monitored IMM change happen without imcn sending a notification e.g. during restart as long as a "may have missed notifications" notification is sent.
A solution could be to not change role when a state change happen instead RDE callback handling could run in a separate thread in the imcn process and immediately when a state change is detected exit the imcn process. This will be detected by the imcn monitoring in NTF that will restart the imcn process. Sending a "may have missed notifications" notification at start up can be done as is. Imcn could detect its role by checking with RDE as in your patch and in this case information does not have to be provided by NTF.
I will also come back with a suggestion for how to install your patch in a bit "cleaner" way. I will do some modifications of the patch to explain what I mean.
Thanks
Lennart
Hi,
IMCN will not be started by NTFS on spare controller. NTFS starts IMCN only when it gets role. Suppose one standby controller goes down and there is a spare controller present in the system. Now AMF will gives standby role to all director coponents including RDE and NTFS. RDE upon receiving this standby role will send callback to all its clients. NTFS upon receving standby role will start IMCN process.
In IMCN patch v4 is taking intial role by calling rda_get_role() to take initial role.
I think the concern is if RDE is bit slow and NTFS starts IMCN and it gets started even when RDE has not processed its CSI set callback . For active role of IMCN this is not a problem because on active controller RDE's role is Active from the beginning, so there is no problem for active IMCN. For standby role on spare controller, this can still be handled by avoiding termination of IMCN: Default role of RDE is quiesced. So with rda_get_role(), IMCN will get either quiesced role standby role. If it is quiesced role then in the while loop a if condition can again call rda_get_role() to fetch the role, before proceeding for any Dispatch event. Most probalby IMCN will get the role in the beginning when it has not entered the while loop. As of now AMFD process is getting the role this way only on spare controller as it not a MW component.
Thanks,
Praveen
Hi,
It seems under "enhancement" view, the discussion will be taking a little while.
Let's get back this issue under "defect" view. I think the issue is coincidence of 10s timeout of saNtfNotificationSend and 10s timeout of SIGTERM on ntfimcnd. When both timers expire, sometimes we see coredump but most of the cases we don't.
As in the trace file, new active ntfd did still respond the notification_send request but at that time the ntfimcnd had died. If NtfSend() API expired a bit earlier (most of the case), main poll of ntfimcnd would have been released for sigterm.
The sigterm timeout should be set longer than the NtfSend timeout, as logically during termination supervision period, there could be many NtfSend() to be called.
It should be:
The sigterm timeout = NtfSend() tiemout x N
For now, we have N = 1, then both timers expired at once in some cases.
My suggestion is N should be 2 (at least)
NtfSend() is set 10s by default for ntfimcnd, that could be more than enough, since ntfimcnd does not send alarm/security alarm notification, which involve saf logging, timeout for NtfSend() should be less than 10s for ntfimcnd.
Currently NTFS_WAIT_TIME = 10 secs by default for all APIs, so changing timeout of NtfSend() could impact the other applications using different APIs. My another suggestion is using env var for NTFS_WAIT_TIME_ENV timeout (similar as IMMA_SYNCR_TIMEOUT), by default it still 10s.
Fix for this defect ticket could be:
- Introduce env var NTFS_WAIT_TIME_ENV, default is still 10s
- ntfimcnd set NTFS_WAIT_TIME_ENV=6s, sigterm timeout = 12s
- ntf api get timeout from NTFS_WAIT_TIME_ENV env
- ntfimcnd uses DISPATCH_ONE for immDispatch
I think this fix would add less complication to existing behaviour so that it could help to solve the issue at application side whose main concern is ntfimcnd coredump (the upgrade succeeded eventually though), while everyone can continue discussions as enhancement.
thanks,
Minh
Hi
I agree with Minh. Actually it does not matter how imcn is terminated the only thing is that a coredump is generated if escalated to step 3 and a coredump should of course be avoided.
However if this is fixed by adjusting timeouts the dependencies between the timeout times must be documented in the imcn README file
Thanks
Lennart
Hi,
I would like to go back to what I mentioned in an earlier comment about running imcn only on the active node. Imcn is doing nothing on any other node than the active only wasting resources.
I think this is still applicable also if using RDE callback and rda_get_role(). The current implementation of imcn does not guarantee that IMM events are not lost (missing notifications from imcn) e.g. during HA state change. To handle this the notification for reporting possible loss of information exist. If we don’t want to enhance imcn by removing this limitation, there is actually no need to run imcn on any node except the active. This means that it is ok to exit imcn when a state change happens and start imcn on the new active only. In order to make sure that imcn is terminated as soon as possible a rde thread in the imcn process could handle rde callbacks and immediately exit the imcn process when a state change is requested. This should prevent the no longer active imcn from sending any notification after the new imcn has been started and also see to that imcn is not running on the no longer active node.
Thanks
Lennart
Hi,
There are two aspects of this issue a) core is generated for IMCN. This is beacause IMCN as an NTF client got stuck in NtfSend() for about 10 seconds. Other aspect b) both active and standby NTFS got stuck because of circular dependecny. This second part means NTF service was not available for about 10 seconds. There may be other NTF clients (other than IMCN) that will also be waiting for using NTF service. Making timeout for IMCN to 6ses will reduce this unavailablity of NTFS to 6 seconds only but will not solve it completely. Other clients may crash.
Regarding the exit of IMCN process on role change so that there should not be two active IMCN in the system: it can be done when IMCN gets quisced role change information through RDE callback because if it waits for termination till standby then active would already be up. This seems to be the only possbility.
Thanks,
Praveen
Hi Praveen,
NTFS is not stuck and not unavailable for 10s (or 6s). The active NTFD can still handle incoming requests, but it is delayed for 1 sec for each MBCsv. We can see this in the trace of SC2. The other clients still use 10s for their APIs sync call, 6s is only for IMCN - it can't be worse than current 10s, since the dependency will be removed earlier. The most importance is there should be no loss of notification during switchover, the coredump is due to abort which works as design.
Thanks,
Minh
Hi
The absolutely easiest way to fix this should be to just call "_Exit(EXIT_SUCCESS)" in sigterm_handler(). _Exit() is a Async-signal-safe function (see Linux manual).
The current handle_sigterm_event() that is called in the poll loop is imcn_exit() which clears the special applier (clears the applier name) and _Exit(). The applier name should be cleared by IMM ist the owning process no longer exist.
Thanks
Lennart
Agree with Lennart. That could be easiest way to fix the issue.
/Vu
I has created the patch regarding Lenanrt's proposal.
/Vu
Hi Vu,
Please publish the patch officially. For further improvments there is already the ticket #157.
Note: Exiting in term signal itself may lead to some unsent notifications as IMCN would not be able to complete the current ImmDispatch() call. Since it is part of IMCN spec and it is notified to the user in the form of notification when active IMCN compes up, I think this information is enough for user to accept this solution.
Thanks,
Praveen
changeset: 8467:549487bc358f
tag: tip
parent: 8464:d7b94dc7cbf6
user: Vu Minh Nguyen vu.m.nguyen@dektech.com.au
date: Tue Dec 20 10:35:24 2016 +0700
summary: ntfd: fix circular dependency with osafntfimcnd [#2219]
changeset: 8466:c65c77763b0e
branch: opensaf-5.1.x
parent: 8463:8882a40c0e31
user: Vu Minh Nguyen vu.m.nguyen@dektech.com.au
date: Tue Dec 20 10:35:24 2016 +0700
summary: ntfd: fix circular dependency with osafntfimcnd [#2219]
changeset: 8465:f3fc63316da5
branch: opensaf-5.0.x
parent: 8462:c65b46e31000
user: Vu Minh Nguyen vu.m.nguyen@dektech.com.au
date: Tue Dec 20 10:35:24 2016 +0700
summary: ntfd: fix circular dependency with osafntfimcnd [#2219]
Related
Tickets:
#2219