Re: [devel] Regarding FMS_RELAXED_NODE_PROMOTION in fmd.conf

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi @Thang Nguyen<mailto:tha...@en...> ,

whats opensaf_consensus_lock and opensaf_write_test opensaf  ?

Why is only opensaf_consensus_lock update always , i see in case of reboot opensaf_write_test opensaf is updated also .

One more would be when we set up this etcd on the active only active does Monitoring not the standby until its first switchover to Active then both nodes perform Monitoring on the etcd .

Regards,
Rakesh

________________________________
From: Thang Nguyen <tha...@en...>
Sent: Wednesday, July 17, 2024 13:48
To: Bangalore Ramesh, Rakesh <rak...@rb...>; ope...@li... <ope...@li...>
Subject: [EXTERNAL] RE: Regarding FMS_RELAXED_NODE_PROMOTION in fmd.conf

Hi Rakesk,

I assume your use case invalid.

I suggest to stop/disable etcd nodes and test again.

About the etcd, it is recommended having at least three etcd nodes.

B.R/Thang

From: Bangalore Ramesh, Rakesh <rak...@rb...>
Sent: Wednesday, July 17, 2024 12:27 PM
To: Thang Nguyen <tha...@en...>; ope...@li...
Subject: Re: Regarding FMS_RELAXED_NODE_PROMOTION in fmd.conf

CAUTION - EXTERNAL EMAIL

Hi @Thang Nguyen<mailto:tha...@en...> ,

Have attached the logs related to it .

Scenario -  I have recreated the scenario by deleting 2 etcd nodes out of 3, which is the quorum we are using .

The fmd.conf  file as below -

#########################################

# TARGET SYSTEM ARCHITECTURE

# Choices are:

#   ATCA (default)

#   HP_CCLASS

#   HP_PROLIANT

#########################################

export FM_TARGET_SYSTEM_ARCH="ATCA"

# Healthcheck keys

export FMS_HA_ENV_HEALTHCHECK_KEY="Default"

# Promote active timer

export FMS_PROMOTE_ACTIVE_TIMER=0

# To enable self fencing either comment the following line to get a default value of 10 seconds,

# or set an appropriate  timeout value, (unit is milliseconds).

export FMS_NODE_ISOLATION_TIMEOUT=0

# To enable remote fencing change to 1

#export FMS_USE_REMOTE_FENCING=0

# To enable split brain prevention, change to 1

export FMS_SPLIT_BRAIN_PREVENTION=1

# Used with split brain prevention, this controls

# the expiration time of takeover requests (unit is seconds)

export FMS_TAKEOVER_REQUEST_VALID_TIME=20

# Full path to key-value store plugin

export FMS_KEYVALUE_STORE_PLUGIN_CMD=/opt/opensaf/osaf-etcd3.plugin

# In the event of SCs being split into network partitions, we can try to make

# the active SC reside in the largest network partition. If it is preferable

# to keep the current SC active, then set this to 0

# Default is 1

#export FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE=1

# If FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE is set to 1, wait until

# this number of seconds for MDS events before making a decision

# on partition size. Default is 4 seconds

#export FMS_TAKEOVER_PRIORITISE_PARTITION_SIZE_MDS_WAIT_TIME=4

# Default behaviour is not to allow promotion of this node to Active

# unless a lock can be obtained, if split brain prevention is enabled.

# Uncomment the next line to allow promotion of this node at cluster startup,

# if a peer SC can be seen and we have a lower node ID, in the event the

# consensus service is not available.

# Also if the consensus service is down, but a peer SC can be seen,

# then an active SC may remain active.

# This mode should not be used together with the roaming SC feature

# Default is 0

export FMS_RELAXED_NODE_PROMOTION=1

# FM will supervise transitions to the ACTIVE role when this variable is set to

# a non-zero value. The value is the time in the unit of 10 ms to wait for a

# role change to ACTIVE to take effect. If AMF has not give FM an active

# assignment within this time, the node will be rebooted.

#export FMS_ACTIVATION_SUPERVISION_TIMER=30000

# Uncomment the next line to enable info level logging

#args="--loglevel=info"

# Uncomment the next line to enable trace

#args="--tracemask=0xffffffff"

# Only log priority LOG_WARNING and higher to the system log file.

# All logging will be recorded in a new node local log file $PKGLOGDIR/osaf.log.

# Uncomment the next line to enable this service to log to OpenSAF node local log file.

# export OSAF_LOCAL_NODE_LOG=1

# THREAD_TRACE_BUFFER variable enables the tracing, writes the trace

# to thread based buffer in circular fashion. The trace buffers will

# be flushed to file if an abnormal end hits, i.e. LOG_ER is called

# The value of THREAD_TRACE_BUFFER indicates the number of trace strings

# in a buffer. The length of a trace string is at most 256 characters.

# It can be disabled if set THREAD_TRACE_BUFFER as 0, the maximum value

# can be set as 65535.

# export THREAD_TRACE_BUFFER=10240

Regards,
Rakesh

________________________________

From: Thang Nguyen <tha...@en...<mailto:tha...@en...>>
Sent: Wednesday, July 17, 2024 08:29
To: Bangalore Ramesh, Rakesh <rak...@rb...<mailto:rak...@rb...>>; ope...@li...<mailto:ope...@li...> <ope...@li...<mailto:ope...@li...>>
Subject: [EXTERNAL] RE: Regarding FMS_RELAXED_NODE_PROMOTION in fmd.conf

Hi Rakesh,
>From my understanding, if export the below in fmd.conf
export FMS_RELAXED_NODE_PROMOTION=1

The SC will not be rebooted if it lost connection to etcd consensus service but still see the peer SC.
Could you share the fmd.conf and all the syslog ?

B.R/
Thang D Nguyen

-----Original Message-----
From: Bangalore Ramesh, Rakesh <rak...@rb...<mailto:rak...@rb...>>
Sent: Friday, July 12, 2024 1:49 PM
To: ope...@li...<mailto:ope...@li...>
Subject: [devel] Regarding FMS_RELAXED_NODE_PROMOTION in fmd.conf

CAUTION - EXTERNAL EMAIL

Hi All,

I am trying to avoid opensaf bringing node down cause of no quorum with etcd cluster .

I saw this config in fmd. eventhough i have enabled it ,

# Default behaviour is not to allow promotion of this node to Active # unless a lock can be obtained, if split brain prevention is enabled.
# Uncomment the next line to allow promotion of this node at cluster startup, # if a peer SC can be seen and we have a lower node ID, in the event the # consensus service is not available.
# Also if the consensus service is down, but a peer SC can be seen, # then an active SC may remain active.
# This mode should not be used together with the roaming SC feature # Default is 0 #export FMS_RELAXED_NODE_PROMOTION=0

opensaf fails with error given below -

Jul 11 23:24:15 rakeshgr01 osafamfd[6898]: NO (KeyValue::Execute): Executed '/opt/opensaf/osaf-etcd3.plugin unlock "SC-1"', returning 2 Jul 11 23:24:15 rakeshgr01 osafamfd[6898]: WA Unlock failed (6) Jul 11 23:24:15 rakeshgr01 osafamfd[6898]: ER Failed to demote this node from consensus service Jul 11 23:24:15 rakeshgr01 osafimmnd[5987]: WA Failed to retrieve search continuation, client died ?
Jul 11 23:24:15 rakeshgr01 osafimmnd[5987]: NO Implementer disconnected 4 <21, 2010f> (safAmfService) Jul 11 23:24:15 rakeshgr01 osafimmnd[5987]: NO Implementer (applier) connected: 13 (@safAmfService2010f) <21, 2010f>

Need help , how to use this config , so even if quorum is failed , I need the instance to not go from reboot .

Regards,
Rakesh

Disclaimer

This e-mail together with any attachments may contain information of Ribbon Communications Inc. and its Affiliates that is confidential and/or proprietary for the sole use of the intended recipient. Any review, disclosure, reliance or distribution by others or forwarding without express permission is strictly prohibited. If you are not the intended recipient, please notify the sender immediately and then delete all copies, including any attachments.

_______________________________________________
Opensaf-devel mailing list
Ope...@li...<mailto:Ope...@li...>
https://lists.sourceforge.net/lists/listinfo/opensaf-devel<https://lists.sourceforge.net/lists/listinfo/opensaf-devel>

The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Any opinions expressed are mine and do not necessarily represent the opinions of the Company. Emails are susceptible to interference. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is strictly prohibited and may be unlawful. If you have received this message in error, do not open any attachments but please notify the Endava Service Desk on (+44 (0)870 423 0187), and delete this message from your system. The sender accepts no responsibility for information, errors or omissions in this email, or for its use or misuse, or for any act committed or omitted in connection with this communication. If in doubt, please verify the authenticity of the contents with the sender. Please rely on your own virus checkers as no responsibility is taken by the sender for any damage rising out of any bug or virus infection.

Endava plc is a company registered in England under company number 5722669 whose registered office is at 125 Old Broad Street, London, EC2N 1AR, United Kingdom. Endava plc is the Endava group holding company and does not provide any services to clients. Each of Endava plc and its subsidiaries is a separate legal entity and has no liability for another such entity's acts or omissions.