Menu

#2160 fm: Add support for differentiating a hung node versus a stopped node

5.2.FC
fixed
None
enhancement
fm
d
major
2017-03-01
2016-11-02
No

At opensafd stop, OpenSAF don't differentiate between a hung node and a stopped node.
This ticket will implement the suggestion by AndersW from ticket [#2094], to:
1). modify the opensafd script to lock the node using CLM. The CLM lock admin operation will remove the node from cluster membership.
2).fm_main will then be able to differentiate between a hung node and a stopped node by checking the node for CLM cluster membership.

Related

Tickets: #2094
Tickets: #2160
Wiki: NEWS-5.2.0

Discussion

  • Hans Nordebäck

    Hans Nordebäck - 2016-11-04
    • status: assigned --> review
     
  • Hans Nordebäck

    Hans Nordebäck - 2016-11-04

    the implementation dosen't modify the opensafd script to lock and unlock the clm node, instead fm checks via IMM the saClmNodeIsMember attribute, if member is true, fencing will be performed. Changing clm admin state requires modifying start and stop functions in opensafd script, but e.g. immfind, immadm et.al commands may not yet be installed when opensafd start runs.

     
  • Hans Nordebäck

    Hans Nordebäck - 2016-12-07
    • status: review --> fixed
     
  • Hans Nordebäck

    Hans Nordebäck - 2016-12-07

    changeset: 8419:cbc08a4a4735
    tag: tip
    user: Hans Nordeback hans.nordeback@ericsson.com
    date: Wed Dec 07 07:33:12 2016 +0100
    files: osaf/services/infrastructure/fm/fms/fm_cb.h osaf/services/infrastructure/fm/fms/fm_evt.h osaf/services/infrastructure/fm/fms/fm_main.c osaf/services/infrastructure/fm/fms/fm_mds.c
    description:
    fm: Add support for differentiating a hung node versus a stopped node V3 [#2160]

     

    Related

    Tickets: #2160

  • Hans Nordebäck

    Hans Nordebäck - 2017-03-01

    The solution implemented is the following:
    When FM is stopped with opensafd stop a message is sent to the other peer, GFM_GFM_EVT_PEER_IS_TERMINATING, this to be able to differntiate between a hung opensaf node and a "controlled" shutdown of opensaf.

     

Log in to post a comment.