Menu

#2158 OSAF: IMMND dies before IMMND becomes AMF component

5.0.2
duplicate
nobody
None
defect
osaf
-
major
2016-12-23
2016-11-02
No

If IMMND dies at Opensaf startup phase, IMMND is not restarted by AMF. The issue has been observed in following situation
- Restart cluster
- During active controller starts up, a critical component is death which cause a node failfast
Oct 25 12:51:21 SC-1 osafamfnd[7642]: ER safComp=ABC,safSu=1,safSg=2N,safApp=ABC Faulted due to:csiSetcallbackTimeout Recovery is:nodeFailfast
Oct 25 12:51:21 SC-1 osafamfnd[7642]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60
- In the meantime, standby controller is requested to become active
Oct 25 12:51:27 SC-2 tipclog[16221]: Lost link <1.1.2:eth0-1.1.1:eth0> on network plane A
Oct 25 12:51:27 SC-2 osafclmna[4336]: NO Starting to promote this node to a system controller
Oct 25 12:51:27 SC-2 osafrded[4387]: NO Requesting ACTIVE role
- IMMND is also death a bit later
Oct 25 12:51:29 SC-2 osafimmnd[4536]: ER MESSAGE:44816 OUT OF ORDER my highest processed:44814 - exiting
Oct 25 12:51:29 SC-2 osafamfnd[7414]: NO saClmDispatch BAD_HANDLE
- Other services could not initialize other services since IMMND is death
Oct 25 12:51:39 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:51:39 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:51:39 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:51:39 SC-2 osafclmd[7386]: WA saImmOiImplementerSet returned 9
Oct 25 12:51:39 SC-2 osafntfd[7372]: WA saLogInitialize returns try again, retries...
Oct 25 12:51:39 SC-2 osaflogd[7358]: WA saImmOiImplementerSet returned SA_AIS_ERR_BAD_HANDLE (9)
Oct 25 12:51:39 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:51:49 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:51:50 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:51:50 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:52:00 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:52:00 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:52:00 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:52:20 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:52:20 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:52:20 SC-2 osafimmd[4489]: NO Extended intro from node 2210f

  • At the end, AMFD heart beat timeout
    Oct 25 12:53:57 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( returned SA_AIS_ERR_TIMEOUT (5)
    Oct 25 12:54:01 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
    Oct 25 12:54:01 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
    Oct 25 12:54:01 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
    Oct 25 12:54:07 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( returned SA_AIS_ERR_TIMEOUT (5)
    Oct 25 12:54:11 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
    Oct 25 12:54:11 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
    Oct 25 12:54:11 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
    Oct 25 12:54:15 SC-2 osafamfnd[7414]: ER AMF director heart beat timeout, generating core for amfd

In AMFND trace in SC2, AMFND did not receive su_pres from AMFD, therefore AMFND could not initiate middleware components (including IMMND), so AMFND was not aware of IMMND's death so that AMFND can restart IMMND. The problem here is slightly different from #1828, which happened in newly promoted SC (with roamingSC feature) where AMFND had IMMND registered.

1 Attachments

Related

Tickets: #2204

Discussion

1 2 > >> (Page 1 of 2)
  • Praveen

    Praveen - 2016-11-08

    Hi Minh,

    Is this issue observed in normal cluster (SC absecnce and Spare SC features disabled)?
    In normal cluster, in failover situation FM makes standby SC active. Since other controller is still not standby, FM should reboot it as it did in #1334.

    Also even if AMFND on standby SC receives su_pres message from AMFD, it will not be able to read the comp configuration as IMMND is down.

    Thanks,
    Praveen

     
  • Minh Hon Chau

    Minh Hon Chau - 2016-11-09

    Hi Praveen,

    This issue happened with SC absence feature enabled but had no spare SC. It did not happen in context of headless. In context of #1334, standby SC reboots because cold sync is in progress?

    In this ticket, after cluster reboot, both SCs were coming up. SC1 (active) reboot before SC2 (supposedly) was assigned standby. In amfnd trace, as you said, before amfnd-SC2 initiated middleware SUs, immnd had died. So amfnd-SC2 was not aware of immnd process so amfnd can restart immnd.

    I think this situation would also happen in first active SC start up sequence (not only failover), where immnd has already responded to NID and immnd dies before amfnd can monitor immnd's process.

    I think I should change component to osaf? Any ideas for a solution?

    Thanks,
    Minh

     
  • Praveen

    Praveen - 2016-11-11

    Hi Minh,
    Since IMMND had not become AMFND component, it cannot be restarted. I think component of the ticket can be changed to "osaf".
    One solution could be, AMFND can read IMMND pid from /var/run/opensaf/osafimmnd and can check the pid status as it does in case of passive monitoring. Suppose AMFND reads pid and detects that IMMND is down, but it can restart IMMND as it does not have NoRed SU and comp imformation.
    I think in future tickets like "#1645 amf: Template node configuration" can help to solve such porblem.

    Thanks,
    Praveen

     
  • Minh Hon Chau

    Minh Hon Chau - 2016-11-11
    • Component: amf --> osaf
     
  • Neelakanta Reddy

    can you share syslog of all the nodes controllers & payloads.

     
  • Minh Hon Chau

    Minh Hon Chau - 2016-11-11

    The log files are collected from a real system with many running applications.
    I am trying to reproduce it with pure Opensaf components.

     
  • Minh Hon Chau

    Minh Hon Chau - 2016-11-15

    I have tried 2 runs, and seems to see the reported problem, although the reason of node rebooting are different.
    Changeset: 8313:fa6b413693e3
    Steps:
    - Start cluster: SC1, SC2, PL3, PL4
    - Stop SCs
    - Restart SCs
    - Reboot SC1 while SC2 has not been promoted to standby
    - Meanwhile, kill immnd in SC2
    Observations: Services can't initialize with IMM since IMMND is not restarted, AMFND has not initiated IMMND component

    Run #1:
    2016-11-15 14:31:57 SC-2 osafimmnd[431]: exiting for shutdown
    2016-11-15 14:31:57 SC-2 osafimmd[421]: NO New IMMND process is on ACTIVE Controller at 2020f
    2016-11-15 14:31:57 SC-2 osafimmd[421]: NO MDS event from svc_id 25 (change:4, dest:565213401186735)
    2016-11-15 14:31:57 SC-2 osafntfimcnd[487]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9)
    2016-11-15 14:31:57 SC-2 osafimmd[421]: NO Successfully announced dump at node 2030f. New Epoch:116
    2016-11-15 14:31:57 SC-2 osafimmd[421]: NO ACT: New Epoch for IMMND process at node 2040f old epoch: 115 new epoch:116
    2016-11-15 14:32:07 SC-2 osafamfd[467]: WA saClmInitialize_4 returned 5
    2016-11-15 14:32:07 SC-2 osafamfnd[476]: WA saClmInitialize_4 returned 5
    2016-11-15 14:32:07 SC-2 osaflogd[440]: ER saClmInitialize failed with error: 5
    2016-11-15 14:32:07 SC-2 osafntfd[449]: ER saClmInitialize failed with error: 5
    2016-11-15 14:32:07 SC-2 osafclmd[458]: safNode=PL-4,safCluster=myClmCluster JOINED, init view=2, cluster view=2
    2016-11-15 14:32:12 SC-2 osafclmd[458]: ER saNtfNotificationSend() returned: SA_AIS_ERR_TRY_AGAIN (6)
    2016-11-15 14:31:54 SC-2 osafclmna[396]: NO Starting to promote this node to a system controller
    2016-11-15 14:32:12 SC-2 osafclmna[396]: NO safNode=SC-2,safCluster=myClmCluster Joined cluster, nodeid=2020f
    2016-11-15 14:32:12 SC-2 osafclmd[458]: safNode=SC-2,safCluster=myClmCluster JOINED, init view=3, cluster view=3
    2016-11-15 14:32:17 SC-2 osafclmd[458]: ER saNtfNotificationSend() returned: SA_AIS_ERR_TRY_AGAIN (6)
    2016-11-15 14:32:19 SC-2 osafclmd[458]: NO proc_initialize_msg: send failed. dest:2020f000001c1
    2016-11-15 14:32:20 SC-2 osafclmd[458]: NO proc_initialize_msg: send failed. dest:2020f000001b8
    2016-11-15 14:32:17 SC-2 osafamfnd[476]: WA saClmInitialize_4 returned 5
    2016-11-15 14:32:50 SC-2 osafamfnd[476]: CR saImmOmInitialize failed. Use previous value of nodeName.
    2016-11-15 14:32:17 SC-2 osafamfd[467]: WA saClmInitialize_4 returned 5
    2016-11-15 14:35:06 SC-2 osafamfd[467]: ER saImmOmSearchInitialize_2 failed: 6
    2016-11-15 14:35:06 SC-2 osafamfd[467]: ER Failed to read configuration, AMF will not start
    2016-11-15 14:35:06 SC-2 osafamfd[467]: ER avd_imm_config_get FAILED
    2016-11-15 14:35:06 SC-2 osafamfd[467]: ER avd_active_role_initialization FAILED
    2016-11-15 14:35:06 SC-2 osafamfd[467]: ER initialize_for_assignment FAILED 2
    2016-11-15 14:35:06 SC-2 osafamfnd[476]: WA AMF director unexpectedly crashed
    2016-11-15 14:35:06 SC-2 osafamfnd[476]: WA AMF director unexpectedly crashed
    2016-11-15 14:35:06 SC-2 osafamfnd[476]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60
    2016-11-15 14:35:06 SC-2 opensaf_reboot: Rebooting local node; timeout=60

    Run #2:
    2016-11-15 14:45:00 SC-2 osafimmnd[430]: exiting for shutdown
    2016-11-15 14:45:00 SC-2 osafimmd[420]: NO MDS event from svc_id 25 (change:4, dest:565213401186734)
    2016-11-15 14:45:00 SC-2 osafntfimcnd[486]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9)
    2016-11-15 14:45:00 SC-2 osafamfnd[475]: NO saClmDispatch BAD_HANDLE
    2016-11-15 14:45:01 SC-2 osafimmd[420]: NO Sc Absence Allowed is configured (900) => IMMND coord at payload node:2030f dest566312912814476
    2016-11-15 14:45:01 SC-2 osafimmd[420]: NO Node 2010f request sync sync-pid:430 epoch:0
    2016-11-15 14:45:01 SC-2 osafimmd[420]: NO Successfully announced sync. New ruling epoch:135
    2016-11-15 14:45:01 SC-2 osafimmd[420]: NO ACT: New Epoch for IMMND process at node 2040f old epoch: 134 new epoch:135
    2016-11-15 14:45:01 SC-2 osafimmd[420]: NO ACT: New Epoch for IMMND process at node 2010f old epoch: 0 new epoch:135
    2016-11-15 14:45:01 SC-2 osafimmd[420]: NO ACT: New Epoch for IMMND process at node 2030f old epoch: 134 new epoch:135
    2016-11-15 14:45:01 SC-2 osafimmd[420]: NO ACT: SaImmRepositoryInitModeT changed and noted as being: SA_IMM_KEEP_REPOSITORY
    2016-11-15 14:45:02 SC-2 osafimmd[420]: NO Successfully announced dump at node 2030f. New Epoch:136
    2016-11-15 14:45:02 SC-2 osafimmd[420]: NO ACT: New Epoch for IMMND process at node 2040f old epoch: 135 new epoch:136
    2016-11-15 14:45:02 SC-2 osafimmd[420]: NO ACT: New Epoch for IMMND process at node 2010f old epoch: 135 new epoch:136
    TEST: SC failover
    2016-11-15 14:45:04 SC-2 osafdtmd[382]: NO Lost contact with 'SC-1'
    2016-11-15 14:45:04 SC-2 osafimmd[420]: NO MDS event from svc_id 25 (change:4, dest:564113889558958)
    2016-11-15 14:45:04 SC-2 osafclmd[457]: NO Node 131343 went down. Not sending track callback for agents on that node
    2016-11-15 14:45:31 SC-2 osafamfnd[475]: CR saImmOmInitialize failed. Use previous value of nodeName.
    2016-11-15 14:47:40 SC-2 osafamfnd[475]: ER AMF director heart beat timeout, generating core for amfd
    2016-11-15 14:47:41 SC-2 osafamfnd[475]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: AMF director heart beat timeout, OwnNodeId = 131599, SupervisionTime = 60
    2016-11-15 14:47:41 SC-2 opensaf_reboot: Rebooting local node; timeout=60

     
    • Minh Hon Chau

      Minh Hon Chau - 2016-11-15

      Run #1

       
    • Minh Hon Chau

      Minh Hon Chau - 2016-11-15

      Run #2

       
  • Neelakanta Reddy

    2016-11-15 14:31:45 SC-2 osafrded[404]: NO RDE role set to QUIESCED

    2016-11-15 14:31:48 SC-2 osafdtmd[383]: NO Lost contact with 'SC-1'

    2016-11-15 14:31:48 SC-2 osafrded[404]: NO Requesting ACTIVE role
    2016-11-15 14:31:48 SC-2 osafrded[404]: NO RDE role set to Undefined

    2016-11-15 14:31:49 SC-2 osafrded[404]: NO RDE role set to QUIESCED
    2016-11-15 14:31:49 SC-2 osafrded[404]: NO Giving up election against 0x2010f with role Undefined. My role is now QUIESCED

    2016-11-15 14:31:49 SC-2 osafdtmd[383]: NO Established contact with 'SC-1'
    2016-11-15 14:31:49 SC-2 osafrded[404]: NO Peer up on node 0x2010f

    2016-11-15 14:31:54 SC-2 osafdtmd[383]: NO Lost contact with 'SC-1'

    2016-11-15 14:31:54 SC-2 osafrded[404]: NO Requesting ACTIVE role
    2016-11-15 14:31:54 SC-2 osafrded[404]: NO RDE role set to Undefined

    2016-11-15 14:31:56 SC-2 osafrded[404]: NO Running '/usr/local/lib/opensaf/opensaf_sc_active' with 0 argument(s)
    2016-11-15 14:31:56 SC-2 osafrded[404]: NO Switched to ACTIVE from Undefined

    2016-11-15 14:31:57 SC-2 osafimmnd[431]: exiting for shutdown

    Once, RDE in QUISCED state detect the active is down then RDE must promote this node to active.

     
  • Praveen

    Praveen - 2016-11-16

    Hi Neel,

    Last syslog message of RDE says its role was promoted to active. After that IMMND exited.
    What is the reason IMMND exixted?
    "2016-11-15 14:31:56 SC-2 osafrded[404]: NO Switched to ACTIVE from Undefined".
    Also one more question : if in this case, suppose RDE and AMF combination distributes the active role to director components and also AMFND suscessfully restarts IMMND. Will not there be problems IMMND synciing? With whom it will sync as there is no other IMMND and what about coordinator status?

    Thanks,
    Praveen

     
  • Neelakanta Reddy

    Hi praveen,

    The reason for IMMND exit is killed (as part of test)
    Since, headless is enabled, the co-ordinator can reside at payload also.
    If the RDE or AMF has promoted this node (SC-2) as active, then IMMND will not have any problem in syncing.

    2016-11-15 14:35:10 SC-2 osafrded[405]: NO Switched to ACTIVE from Undefined
    The above log may be misleading, but the node is not promoted to active as no other director service is not taken active role.

    Thanks,
    Neel.

     
  • Praveen

    Praveen - 2016-11-16

    Hi Neel,

    Thanks for the update.
    I asked IMMND sync problem particularly because in this ticket it has been conformed (See Minh's comment) that:
    It is not headless context and controllers were coming up after cluster reset.

    Thanks,
    Praveen

     
  • Neelakanta Reddy

    2016-11-15 14:35:07 SC-2 osafimmd[422]: Started
    2016-11-15 14:35:07 SC-2 osafimmd[422]: NO * SC_ABSENCE_ALLOWED (Headless Hydra) is configured: 900 *****

     
  • Minh Hon Chau

    Minh Hon Chau - 2016-11-16

    Hi,

    The reason of IMMND's death in my test is that I manually killed IMMND. In the reported log of real applications, IMMND died due to:

    Oct 25 12:51:29 SC-2 osafimmnd[4536]: ER MESSAGE:44816 OUT OF ORDER my highest processed:44814 - exiting

    I will run the test from cluster start up, not coming from headless

    Thanks,
    Minh

     
  • Neelakanta Reddy

    If Headless is not configured then one should get below error if IMMND is killed or exited
    "No IMMD service => cluster restart, exiting

     
  • Minh Hon Chau

    Minh Hon Chau - 2016-11-16

    It has headless feature enabled, as in previous comment

    "
    This issue happened with SC absence feature enabled but had no spare SC. It did not happen in context of headless. In context of #1334, standby SC reboots because cold sync is in progress?

    In this ticket, after cluster reboot, both SCs were coming up. SC1 (active) reboot before SC2 (supposedly) was assigned standby.
    "

     
  • Minh Hon Chau

    Minh Hon Chau - 2016-11-16

    2 other runs, with and without headless feature, test from cluster start up

    Run 3: with headless feature
    2016-11-16 22:31:14 SC-2 osafimmnd[429]: exiting for shutdown
    2016-11-16 22:31:14 SC-2 osafimmd[419]: NO New IMMND process is on ACTIVE Controller at 2020f
    2016-11-16 22:31:14 SC-2 osafimmd[419]: NO MDS event from svc_id 25 (change:4, dest:565213401186733)
    2016-11-16 22:31:14 SC-2 osafntfimcnd[485]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9)
    2016-11-16 22:31:14 SC-2 osafamfnd[475]: NO saClmDispatch BAD_HANDLE
    2016-11-16 22:31:14 SC-2 osafamfnd[475]: NO Starting hb supervision of local avd
    2016-11-16 22:31:14 SC-2 osafimmd[419]: NO Sc Absence Allowed is configured (900) => IMMND coord at payload node:2040f dest567412424442252
    2016-11-16 22:31:14 SC-2 osafimmd[419]: NO Node 2010f request sync sync-pid:431 epoch:0
    2016-11-16 22:31:15 SC-2 osafimmd[419]: NO Successfully announced sync. New ruling epoch:23
    2016-11-16 22:31:15 SC-2 osafimmd[419]: NO ACT: New Epoch for IMMND process at node 2010f old epoch: 0 new epoch:23
    2016-11-16 22:31:15 SC-2 osafimmd[419]: NO ACT: New Epoch for IMMND process at node 2030f old epoch: 22 new epoch:23
    2016-11-16 22:31:15 SC-2 osafimmd[419]: NO ACT: New Epoch for IMMND process at node 2040f old epoch: 22 new epoch:23
    2016-11-16 22:31:15 SC-2 osafimmd[419]: NO ACT: SaImmRepositoryInitModeT changed and noted as being: SA_IMM_KEEP_REPOSITORY
    2016-11-16 22:31:15 SC-2 osafimmd[419]: NO Successfully announced dump at node 2040f. New Epoch:24
    2016-11-16 22:31:15 SC-2 osafimmd[419]: NO ACT: New Epoch for IMMND process at node 2030f old epoch: 23 new epoch:24
    2016-11-16 22:31:15 SC-2 osafimmd[419]: NO ACT: New Epoch for IMMND process at node 2010f old epoch: 23 new epoch:24
    TEST: SC failover
    2016-11-16 22:31:17 SC-2 osafdtmd[381]: NO Lost contact with 'SC-1'
    2016-11-16 22:31:17 SC-2 osafimmd[419]: NO MDS event from svc_id 25 (change:4, dest:564113889558959)
    2016-11-16 22:31:17 SC-2 osafclmd[457]: NO Node 131343 went down. Not sending track callback for agents on that node
    2016-11-16 22:31:44 SC-2 osafamfnd[475]: CR saImmOmInitialize failed. Use previous value of nodeName.
    2016-11-16 22:34:13 SC-2 osafamfd[466]: ER saImmOmSearchInitialize_2 failed: 6
    2016-11-16 22:34:13 SC-2 osafamfd[466]: ER Failed to read configuration, AMF will not start
    2016-11-16 22:34:13 SC-2 osafamfd[466]: ER avd_imm_config_get FAILED
    2016-11-16 22:34:13 SC-2 osafamfd[466]: ER avd_active_role_initialization FAILED
    2016-11-16 22:34:13 SC-2 osafamfd[466]: ER initialize_for_assignment FAILED 2

    Run 4: Without headless feature
    2016-11-16 22:38:33 SC-2 osafdtmd[383]: NO Lost contact with 'SC-1'
    2016-11-16 22:38:33 SC-2 osafamfnd[477]: WA AMF director unexpectedly crashed
    2016-11-16 22:38:33 SC-2 osafamfnd[477]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: local AVD down(Adest) or both AVD down(Vdest) received, OwnNodeId = 131599, SupervisionTime = 60
    2016-11-16 22:38:33 SC-2 osafimmnd[431]: ER No IMMD service => cluster restart, exiting
    2016-11-16 22:38:33 SC-2 opensaf_reboot: Rebooting local node; timeout=60

    Then SC2 is coming from reboot

    2016-11-16 22:38:41 SC-2 osafimmnd[434]: exiting for shutdown
    2016-11-16 22:38:41 SC-2 osafimmd[424]: NO MDS event from svc_id 25 (change:4, dest:565213401186738)
    2016-11-16 22:38:41 SC-2 osafimmd[424]: WA IMMND coordinator at 2020f apparently crashed => electing new coord
    2016-11-16 22:38:41 SC-2 osafimmd[424]: ER Failed to find candidate for new IMMND coordinator (ScAbsenceAllowed:0 RulingEpoch:28
    2016-11-16 22:38:41 SC-2 osafimmd[424]: ER Active IMMD has to restart the IMMSv. All IMMNDs will restart
    2016-11-16 22:38:41 SC-2 osafimmd[424]: ER IMM RELOAD => ensure cluster restart by IMMD exit at both SCs, exiting
    2016-11-16 22:38:41 SC-2 osafntfimcnd[473]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9)
    2016-11-16 22:41:50 SC-2 osafamfd[489]: ER saImmOmSearchInitialize_2 failed: 6
    2016-11-16 22:41:50 SC-2 osafamfd[489]: ER Failed to read configuration, AMF will not start
    2016-11-16 22:41:50 SC-2 osafamfd[489]: ER avd_imm_config_get FAILED
    2016-11-16 22:41:50 SC-2 osafamfd[489]: ER avd_active_role_initialization FAILED
    2016-11-16 22:41:50 SC-2 osafamfd[489]: ER initialize_for_assignment FAILED 2
    2016-11-16 22:41:50 SC-2 opensafd[378]: ER Failed #012 DESC:AMFD
    2016-11-16 22:41:50 SC-2 opensafd[378]: ER Going for recovery
    2016-11-16 22:41:50 SC-2 osafamfd[489]: ER initialize failed, exiting
    2016-11-16 22:41:50 SC-2 osafclmd[480]: exiting for shutdown
    2016-11-16 22:41:50 SC-2 osafclmna[399]: exiting for shutdown

    In 4 runs, the common issue is that IMMND could not be restarted since IMMND has not been an AMF component.

     
    • Minh Hon Chau

      Minh Hon Chau - 2016-11-16

      run #3

       
    • Minh Hon Chau

      Minh Hon Chau - 2016-11-16

      run #4

       
  • Minh Hon Chau

    Minh Hon Chau - 2016-11-16
    • summary: AMF: IMMND dies at Opensaf start up phase causes AMFD heartbeat timeout --> OSAF: IMMND dies before IMMND becomes AMF component
     
  • Praveen

    Praveen - 2016-11-18

    There is one ticket related to unmonitored middleware process:
    "#1857 amf: spare controller didnot get rebooted when amfd is killed on spare controller with headless feature enabled."
    Here there is nobody to restart the AMFD process or reboot the spare controller when users kills AMFD process.

     
  • Praveen

    Praveen - 2016-11-22

    I think this ticket must be classified as an enhancment as it adds a new independent feature to handle any MW component failures pre-AMF.

     
  • Hans Nordebäck

    Hans Nordebäck - 2016-11-23

    The FIFO monitoring used by transport monitor can also be used by nid to monitor services til system is fully started. This issue must be considered a bug as nid continue to start services when the system is in a faulty state, e.g. immnd has ‘crashed’. It should be a small task to add the FIFO monitoring to nid.

     
  • Hans Nordebäck

    Hans Nordebäck - 2016-11-23

    Ticket [#2204] created

     

    Related

    Tickets: #2204

1 2 > >> (Page 1 of 2)

Log in to post a comment.