The Ntfservice shall be able to recover if both SC nodes goes down at the same time. This is not possible today. A cluster restart is needed.
NOTE: This is also applicable for the LOG service. [#1179]
Tickets: #1179
Tickets: #1180
Tickets: #1641
Tickets: #1707
Wiki: ChangeLog-5.0.0
Wiki: NEWS-5.0.0
I have the patch implementing the "INVALID HANDLE" idea which is similar to #1179
Comments are welcome
--- Brief explanation ---
An observation on the MDS events in Normal View toward the NTF Agent by following cases:
Start ntfsubcribe
Nov 06 05:32:44 PL-3 ntfsubscribe: NO mds_cb_info->info.svc_evt.i_change 3 (NCSMDS_UP)
Stop SC-1, failover
Nov 06 05:33:09 PL-3 ntfsubscribe: NO mds_cb_info->info.svc_evt.i_change 1 (NCSMDS_NO_ACTIVE)
Nov 06 05:33:09 PL-3 ntfsubscribe: NO NTFS down
Active NTF Server is on SC-2
Nov 06 05:33:10 PL-3 ntfsubscribe: NO mds_cb_info->info.svc_evt.i_change 2 (NCSMDS_NEW_ACTIVE)
Nov 06 05:33:10 PL-3 ntfsubscribe: NO MSG from NTFS NCSMDS_NEW_ACTIVE/UP
Stop SC-2
Nov 06 05:33:41 PL-3 ntfsubscribe: NO mds_cb_info->info.svc_evt.i_change 1 (NCSMDS_NO_ACTIVE)
Nov 06 05:33:41 PL-3 ntfsubscribe: NO NTFS down
No Active NTF Server
Nov 06 05:33:41 PL-3 ntfsubscribe: NO mds_cb_info->info.svc_evt.i_change 4 (NCSMDS_DOWN)
Nov 06 05:33:41 PL-3 ntfsubscribe: NO NTFS down
Start SC-1 again, Active NTF Server is on SC-1
Nov 06 05:34:11 PL-3 ntfsubscribe: NO mds_cb_info->info.svc_evt.i_change 2 (NCSMDS_NEW_ACTIVE)
Nov 06 05:34:11 PL-3 ntfsubscribe: NO MSG from NTFS NCSMDS_NEW_ACTIVE/UP
Restart cluster, start ntfsubcribe, then only stop SC-2, no mds event
So the @ntfa_ntfsv_state_t is introduced to control the server states based on the MDS event.
State handling:
Please note, we have been discussing this in the TLC and have not yet agreed upon the solution. Iam still waiting for confirmation from AndersWidell on some posers around the solution if not the usecase.
One major criteria for this feature is that it must be configurable. i.e. The user should be able to turn on/off the feature.
changeset: 7342:7c969b351068
tag: tip
user: Minh Hon Chau minh.chau@dektech.com.au
date: Tue Mar 22 09:50:14 2016 +0100
summary: NTF: Add new README file for description of cloud resilience support [#1180] V2
rev: 7c969b3510681a7e5a30096fb70553cb30e6a067
changeset: 7341:e3814be9e4cc
user: Minh Hon Chau minh.chau@dektech.com.au
date: Tue Mar 22 09:49:47 2016 +0100
summary: NTF: Add tests for NTF cloud resilience feature [#1180] V2
rev: e3814be9e4cc3dba5f52db18672e63909afc87ed
changeset: 7340:81190bce2e01
user: Minh Hon Chau minh.chau@dektech.com.au
date: Tue Mar 22 09:49:29 2016 +0100
summary: NTF: Add wrapper for usage of NTF API in ntftools to handle TRY_AGAIN [#1180]
rev: 81190bce2e01c80fbf97b11c4593ba542ce8b087
changeset: 7339:1b6ced612cdd
user: Minh Hon Chau minh.chau@dektech.com.au
date: Tue Mar 22 09:49:04 2016 +0100
summary: NTF: Add support cloud resilience for NTF Agent [#1180] V3
rev: 1b6ced612cdd3b26dc1a2bf5df51beb4777b01cc
changeset: 7338:fc2b1ecfb6b0
user: Minh Hon Chau minh.chau@dektech.com.au
date: Tue Mar 22 09:48:52 2016 +0100
summary: NTF: Add support cloud resilience for NTF libs common [#1180]
rev: fc2b1ecfb6b0145ef7abb9c20eda52f44798f6ac
Related
Tickets:
#1180