OpenSAF / Tickets / #2522 dtm: if TCP_USER_TIMEOUT closes socket, no attempt is make to reconnect

dtm: if TCP_USER_TIMEOUT closes socket, no attempt is make to reconnect

#2522 dtm: if TCP_USER_TIMEOUT closes socket, no attempt is make to reconnect

Milestone: 5.17.11

Status: fixed

Owner: Alex Jones

Labels: None

Type: defect

Component: dtm

Part: d

Version: 5.1

Priority: major

Blocker: False

Updated: 2017-10-30

Created: 2017-07-06

Creator: Alex Jones

Private: No

If TCP is used for transport, and TCP_USER_TIMEOUT is used also, if a node leaves the cluster due to some quick network outage, the nodes do not come back into the cluster automatically.

If TCP_USER_TIMEOUT is set to 1500 ms, and the network outage on the link is for 2000 ms, the node never comes back into the cluster.

Anders Widell - 2017-07-20

I believe DTM sends broadcast (or multicast) messages on the network for a while after it has started, to discover other nodes on the network. But it stops doing this after a while and that is the reason why it fails to reconnect after a network disturbance.

A solution could be:
The node with the lowest node_id will never stop broadcasting the discovery messages
A node which is connected with another node with a lower node_id will never broadcast discovery messages
* The node with the lowest node_id will inform all the other connected nodes about the topology of the cluster - in particular, if a new node has appeared.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

A V Mahesh (AVM) - 2017-07-21

I don't think Alex is taking about initial discovery issue/ processes ( topology node discovery) ,
but any how we can configure very big value of DTM_INI_DIS_TIMEOUT_SECS in dtm.conf to verify

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alex Jones - 2017-08-09

If I set DTM_INI_DIS_TIMEOUT_SECS to 5000s the nodes do relearn each other and come back into the cluster.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alex Jones - 2017-08-11

status: unassigned --> accepted

assigned_to: Alex Jones

Part: - --> d
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alex Jones - 2017-08-11

status: accepted --> review
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alex Jones - 2017-08-15

status: review --> fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Alex Jones - 2017-08-15

commit 3ac6c452d30d2814f1704af578617f2a90f439b7
Author: Alex Jones alex.jones@genband.com
Date: Tue Aug 15 11:36:41 2017 -0400

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

dtm: if TCP_USER_TIMEOUT closes socket, no attempt is make to reconnect

Milestone

Searches

Help

#2522 dtm: if TCP_USER_TIMEOUT closes socket, no attempt is make to reconnect

Related

Discussion