JGroups / Discussion / Open Discussion: Joining node is elected as coordinator. Looping back of JOIN Request

Mithil Aggarwal - 2016-11-17

I am using jgroups-2.6.15.GA.jar.
I have two nodes . The first node (192.168.0.74:7801) is up and is the coordinator. The second node (192.168.0.73:7801) now wants to join the cluster . So when node 192.168.0.73:7801 is brought up, it sends the GET_MBRS_REQ to the coordinator.

Log Snippet of the request sent by the node 192.168.0.73:7801 to join the cluster
28 Oct 2016 16:12:51,964 [Timer-1,dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCP] sending msg to 192.168.0.74:7801, src=192.168.0.73:7801, headers are TCPPING: [PING: type=GET_MBRS_REQ, arg=null], TCP: [channel_name=dataChannelSAS_CLUSTER]

The coordinator node(192.168.0.74:7801) receives this message and sends the GET_MBRS_RSP to the joining node 192.168.0.73:7801

Logs of the coordinator for receiving request and sending response

28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver [192.168.0.74:7801 - 192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG [TCP] message is [dst: 192.168.0.74:7801, src: 192.168.0.73:7801 (2 headers), size=0 bytes, flags=OOB], headers are TCPPING: [PING: type=GET_MBRS_REQ, arg=null], TCP: [channel_name=dataChannelSAS_CLUSTER]

28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver [192.168.0.74:7801 - 192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG [TCPPING] received GET_MBRS_REQ from 192.168.0.73:7801, sending response [PING: type=GET_MBRS_RSP, arg=[own_addr=192.168.0.74:7801, coord_addr=192.168.0.73:7801, is_server=true]]

Here, it is the highlighted that the Coordinator node instead of sending its addresss in coord_addr is sending the joining node's address in the GET_MBRS_RSP response.

The joining node(192.168.0.73:7801) receives this message and sends the JOIN request to itself and this results in looping of this request and system never comes up.

Logs snippet of the joining node

28 Oct 2016 16:12:53,479 [ConnectionTable.Connection.Receiver [192.168.0.73:18344 - 192.168.0.74:7801],dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCPPING] received GET_MBRS_RSP, rsp=[own_addr=192.168.0.74:7801, coord_addr=192.168.0.73:7801, is_server=true]

28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS] election results: {192.168.0.73:7801=1}
28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS] sending handleJoin(192.168.0.73:7801) to 192.168.0.73:7801
28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] sending msg to 192.168.0.73:7801, src=192.168.0.73:7801, headers are GMS: GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801, TCP: [channel_name=dataChannelSAS_CLUSTER]
28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] **looping back message [dst: 192.168.0.73:7801, src: 192.168.0.73:7801** (2 headers), size=0 bytes, flags=OOB]
28 Oct 2016 16:12:54,962 [Main Thread] DEBUG [TCP] message is [dst: 192.168.0.73:7801, src: 192.168.0.73:7801 (2 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801, TCP: [channel_name=dataChannelSAS_CLUSTER]

My question is why the joining node is being elected as the coordinator? How we can fix this issue?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Bela Ban - 2016-11-17
  
  Yes, this is clearly incorrect.
  
  However, I don't support such an old version, as this is a bug that was
  likely fixed in later version (after 2.8). Could have been caused by
  reincarnation.
  
  I suggest upgrade to 3.6.x (latest stable).
  
  On 17/11/16 11:32, Mithil Aggarwal wrote:
  
  I am using jgroups-2.6.15.GA.jar.
  I have two nodes . The first node (192.168.0.74:7801) is up and is the
  coordinator. The second node (192.168.0.73:7801) now wants to join the
  cluster . So when node 192.168.0.73:7801 is brought up, it sends the
  GET_MBRS_REQ to the coordinator.
  
  Log Snippet of the request sent by the node 192.168.0.73:7801 to join
  the cluster
  28 Oct 2016 16:12:51,964
  [Timer-1,dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCP] sending
  msg to 192.168.0.74:7801, src=192.168.0.73:7801, headers are TCPPING:
  [PING: type=GET_MBRS_REQ, arg=null], TCP:
  [channel_name=dataChannelSAS_CLUSTER]
  
  The coordinator node(192.168.0.74:7801) receives this message and sends
  the GET_MBRS_RSP to the joining node 192.168.0.73:7801
  
  Logs of the coordinator for receiving request and sending response
  
  28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver
  [192.168.0.74:7801 -
  192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG
  [TCP] message is [dst: 192.168.0.74:7801, src: 192.168.0.73:7801 (2
  headers), size=0 bytes, flags=OOB], headers are TCPPING: [PING:
  type=GET_MBRS_REQ, arg=null], TCP: [channel_name=dataChannelSAS_CLUSTER]
  
  28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver
  [192.168.0.74:7801 -
  192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG
  [TCPPING] received GET_MBRS_REQ from 192.168.0.73:7801, sending response
  [PING: type=GET_MBRS_RSP, arg=[own_addr=192.168.0.74:7801,
  /coord_addr=192.168.0.73:7801/, is_server=true]]
  
  Here, it is the highlighted that the Coordinator node instead of
  sending its addresss in coord_addr is sending the joining node's address
  in the GET_MBRS_RSP response.
  
  The joining node(192.168.0.73:7801) receives this message and sends the
  JOIN request to itself and this results in looping of this request and
  system never comes up.
  
  Logs snippet of the joining node
  
  28 Oct 2016 16:12:53,479 [ConnectionTable.Connection.Receiver
  [192.168.0.73:18344 -
  192.168.0.74:7801],dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG
  [TCPPING] received GET_MBRS_RSP, rsp=[own_addr=192.168.0.74:7801,
  coord_addr=192.168.0.73:7801, is_server=true]
  
  28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS]election results:
  {192.168.0.73:7801=1}
  28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS] sending
  handleJoin(192.168.0.73:7801) to 192.168.0.73:7801
  28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] sending msg to
  192.168.0.73:7801, src=192.168.0.73:7801, headers are GMS:
  GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801, TCP:
  [channel_name=dataChannelSAS_CLUSTER]
  28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] **looping back
  message [dst: 192.168.0.73:7801, src: 192.168.0.73:7801** (2 headers),
  size=0 bytes, flags=OOB]
  28 Oct 2016 16:12:54,962 [Main Thread] DEBUG [TCP] message is [dst:
  192.168.0.73:7801, src: 192.168.0.73:7801 (2 headers), size=0 bytes,
  flags=OOB], headers are GMS: GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801,
  TCP: [channel_name=dataChannelSAS_CLUSTER]
  
  My question is why the joining node is being elected as the coordinator?
  How we can fix this issue?
  
  Joining node is elected as coordinator. Looping back of JOIN Request
  https://sourceforge.net/p/javagroups/discussion/18794/thread/8bd40cf3/?limit=25#3462
  
  Sent from sourceforge.net because you indicated interest in
  https://sourceforge.net/p/javagroups/discussion/18794/
  
  To unsubscribe from further messages, please visit
  https://sourceforge.net/auth/subscriptions/
  
  --
  Bela Ban, JGroups lead (http://www.jgroups.org)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Mithil Aggarwal - 2016-11-17
    
    Thanks for your response.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

reeta aggarwal - 2019-02-11

Hi, i am using jgroups jgroups-3.5.0.Final.jar . when i am rebooting by secondary node . my system gets shutdown after some time. there are few messages as shown below in my server logs .there is ambious address coming up after reboot which is causing the issue.

DEBUG [NAKACK2] LABECSAP01-60365: removed 766961e5-ebd6-5e5f-6e27-e9321d827688 from xmit_table (not member anymore)

[TransferQueueBundler,controlChannelSAS_CLUSTER,LABECSAP01-60365] WARN [TCP] JGRP000032: LABECSAP01-60365: no physical address for 766961e5-ebd6-5e5f-6e27-e9321d827688, dropping message

After server reboot this digest was added by NACK2 where is this ambious address:
[main] DEBUG [NAKACK2]
[LABECSAP01-60365 setDigest()]
existing digest: []
new digest: LABECSAP00-50260: [11 (11)], 766961e5-ebd6-5e5f-6e27-e9321d827688: [5 (5)], LABECSAP01-60365: [0 (0)]
resulting digest: LABECSAP00-50260: [11 (11)], 766961e5-ebd6-5e5f-6e27-e9321d827688: [5 (5)], LABECSAP01-60365: [0 (0)]

Please help.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Joining node is elected as coordinator. Looping back of JOIN Request

Forums

Help

Joining node is elected as coordinator. Looping back of JOIN Request

Joining node is elected as coordinator. Looping back of JOIN Request

Forums

Help

Joining node is elected as coordinator. Looping back of JOIN Request document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Joining node is elected as coordinator. Looping back of JOIN Request