Environment Details:
OS : Suse 64bit
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 1PBE enabled ).
Backward Compatability:
Opensaf versions on nodes:
SC-1 (5.0), SC-2 (5.1 FC), PL-3 (5.0), PL-4(5.1FC).
Summary: saMsgInitialize is returning continuous TRY_AGAINS after mqnd_imm_initialize failed with ERR_TIMEOUT.
Steps followed & Observed behaviour:
Mqsv test application is being ran by continuously killing mqnd.
Observations:
saMsgInitialize failed with continuous TRY_AGAIN. Below is the snapshot.
100|0| Version : B.3.1
100|0| RETRY : saMsgInitialize with all valid parameters
100|0| Return Value : SA_AIS_ERR_TRY_AGAIN
100|0|
100|0|
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1 Retry Count : 10
100|0|
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1 Retry Count : 20
100|0|
100|0| Version : B.3.1
100|0| Version Sun Sep 18 11:51:19 IST 2016
100|0|Sun Sep 18 11:51:19 IST 2016
100|0|Sun Sep 18 11:51:59 IST 2016
100|0|Sun Sep 18 11:51:59 IST 2016
100|0|Sun Sep 18 11:52:39 IST 2016
100|0|Sun Sep 18 11:52:39 IST 2016
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1 Retry Count : 30
100|0|
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1
100|0| Version : B.3.1 Retry Count : 40
100|0| Try again count exceeded**** TEST CASE FAILED ***
Below is the snippet of syslog of SC-1:
Sep 18 11:48:32 SCALE_SLOT-41 osafimmnd[19813]: NO Implementer (applier) connected: 2462 (@OpenSafImmReplicatorA) <20504, 2010f>
Sep 18 11:48:32 SCALE_SLOT-41 osafntfimcnd[19819]: NO Started
Sep 18 11:48:39 SCALE_SLOT-41 osafamfd[1816]: NO Re-initializing with IMM
Sep 18 11:48:39 SCALE_SLOT-41 osafimmnd[19813]: NO Implementer connected: 2463 (safAmfService) <20506, 2010f>
Sep 18 11:48:39 SCALE_SLOT-41 osafamfd[1816]: NO Finished re-initializing with IMM
Sep 18 11:48:39 SCALE_SLOT-41 osafmsgnd[19792]: ER mqnd_imm_initialize Failed: 5
Sep 18 11:48:39 SCALE_SLOT-41 osafamfnd[1826]: 'safComp=MQND,safSu=SC-1,safSg=NoRed,safApp=OpenSAF'unregistered
Sep 18 11:48:39 SCALE_SLOT-41 osafmsgnd[19792]: CR Destroying the shared memory segment failed
Sep 18 11:48:39 SCALE_SLOT-41 osafmsgnd[19792]: ER saAmfComponentUnregister Failed with error 9
Sep 18 11:48:39 SCALE_SLOT-41 osafmsgnd[19792]: ER Cb is NULL
Sep 18 11:48:49 SCALE_SLOT-41 osafimmnd[19813]: NO Implementer connected: 2464 (MsgQueueService131343) <20507, 2010f>
Sep 18 11:48:49 SCALE_SLOT-41 osafimmnd[19813]: NO Implementer locally disconnected. Marking it as doomed 2464 <20507, 2010f> (MsgQueueService131343)
Attachments:
1)Syslog of SC-1.
Seems this failure need to be investigate from IMM context as "immutil_saImmOiInitialize_2()" is returning SA_AIS_ERR_TIMEOUT error code.