I am using jgroups-2.6.15.GA.jar.
I have two nodes . The first node (192.168.0.74:7801) is up and is the coordinator. The second node (192.168.0.73:7801) now wants to join the cluster . So when node 192.168.0.73:7801 is brought up, it sends the GET_MBRS_REQ to the coordinator.
Log Snippet of the request sent by the node 192.168.0.73:7801 to join the cluster
28 Oct 2016 16:12:51,964 [Timer-1,dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCP] sending msg to 192.168.0.74:7801, src=192.168.0.73:7801, headers are TCPPING: [PING: type=GET_MBRS_REQ, arg=null], TCP: [channel_name=dataChannelSAS_CLUSTER]
The coordinator node(192.168.0.74:7801) receives this message and sends the GET_MBRS_RSP to the joining node 192.168.0.73:7801
Logs of the coordinator for receiving request and sending response
28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver [192.168.0.74:7801 - 192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG [TCP] message is [dst: 192.168.0.74:7801, src: 192.168.0.73:7801 (2 headers), size=0 bytes, flags=OOB], headers are TCPPING: [PING: type=GET_MBRS_REQ, arg=null], TCP: [channel_name=dataChannelSAS_CLUSTER]
28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver [192.168.0.74:7801 - 192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG [TCPPING] received GET_MBRS_REQ from 192.168.0.73:7801, sending response [PING: type=GET_MBRS_RSP, arg=[own_addr=192.168.0.74:7801, coord_addr=192.168.0.73:7801, is_server=true]]
Here, it is the highlighted that the Coordinator node instead of sending its addresss in coord_addr is sending the joining node's address in the GET_MBRS_RSP response.
The joining node(192.168.0.73:7801) receives this message and sends the JOIN request to itself and this results in looping of this request and system never comes up.
Logs snippet of the joining node
28 Oct 2016 16:12:53,479 [ConnectionTable.Connection.Receiver [192.168.0.73:18344 - 192.168.0.74:7801],dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCPPING] received GET_MBRS_RSP, rsp=[own_addr=192.168.0.74:7801, coord_addr=192.168.0.73:7801, is_server=true]
28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS] election results: {192.168.0.73:7801=1}
28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS] sending handleJoin(192.168.0.73:7801) to 192.168.0.73:7801
28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] sending msg to 192.168.0.73:7801, src=192.168.0.73:7801, headers are GMS: GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801, TCP: [channel_name=dataChannelSAS_CLUSTER]
28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] **looping back message [dst: 192.168.0.73:7801, src: 192.168.0.73:7801** (2 headers), size=0 bytes, flags=OOB]
28 Oct 2016 16:12:54,962 [Main Thread] DEBUG [TCP] message is [dst: 192.168.0.73:7801, src: 192.168.0.73:7801 (2 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801, TCP: [channel_name=dataChannelSAS_CLUSTER]
My question is why the joining node is being elected as the coordinator? How we can fix this issue?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
However, I don't support such an old version, as this is a bug that was
likely fixed in later version (after 2.8). Could have been caused by
reincarnation.
I suggest upgrade to 3.6.x (latest stable).
On 17/11/16 11:32, Mithil Aggarwal wrote:
I am using jgroups-2.6.15.GA.jar.
I have two nodes . The first node (192.168.0.74:7801) is up and is the
coordinator. The second node (192.168.0.73:7801) now wants to join the
cluster . So when node 192.168.0.73:7801 is brought up, it sends the
GET_MBRS_REQ to the coordinator.
Log Snippet of the request sent by the node 192.168.0.73:7801 to join
the cluster
28 Oct 2016 16:12:51,964 [Timer-1,dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCP] sending
msg to 192.168.0.74:7801, src=192.168.0.73:7801, headers are TCPPING: [PING: type=GET_MBRS_REQ, arg=null], TCP: [channel_name=dataChannelSAS_CLUSTER]
The coordinator node(192.168.0.74:7801) receives this message and sends
the GET_MBRS_RSP to the joining node 192.168.0.73:7801
Logs of the coordinator for receiving request and sending response
28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver
[192.168.0.74:7801 -
192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG [TCP] message is [dst: 192.168.0.74:7801, src: 192.168.0.73:7801 (2
headers), size=0 bytes, flags=OOB], headers are TCPPING: [PING:
type=GET_MBRS_REQ, arg=null], TCP: [channel_name=dataChannelSAS_CLUSTER]
28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver
[192.168.0.74:7801 -
192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG [TCPPING] received GET_MBRS_REQ from 192.168.0.73:7801, sending response [PING: type=GET_MBRS_RSP, arg=[own_addr=192.168.0.74:7801, /coord_addr=192.168.0.73:7801/, is_server=true]]
Here, it is the highlighted that the Coordinator node instead of
sending its addresss in coord_addr is sending the joining node's address
in the GET_MBRS_RSP response.
The joining node(192.168.0.73:7801) receives this message and sends the
JOIN request to itself and this results in looping of this request and
system never comes up.
Logs snippet of the joining node
28 Oct 2016 16:12:53,479 [ConnectionTable.Connection.Receiver
[192.168.0.73:18344 -
192.168.0.74:7801],dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCPPING] received GET_MBRS_RSP, rsp=[own_addr=192.168.0.74:7801, coord_addr=192.168.0.73:7801, is_server=true]
28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS]election results:
{192.168.0.73:7801=1}
28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS] sending
handleJoin(192.168.0.73:7801) to 192.168.0.73:7801
28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] sending msg to
192.168.0.73:7801, src=192.168.0.73:7801, headers are GMS:
GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801, TCP: [channel_name=dataChannelSAS_CLUSTER]
28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] **looping back
message [dst: 192.168.0.73:7801, src: 192.168.0.73:7801** (2 headers),
size=0 bytes, flags=OOB]
28 Oct 2016 16:12:54,962 [Main Thread] DEBUG [TCP] message is [dst:
192.168.0.73:7801, src: 192.168.0.73:7801 (2 headers), size=0 bytes,
flags=OOB], headers are GMS: GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801,
TCP: [channel_name=dataChannelSAS_CLUSTER]
My question is why the joining node is being elected as the coordinator?
How we can fix this issue?
Hi, i am using jgroups jgroups-3.5.0.Final.jar . when i am rebooting by secondary node . my system gets shutdown after some time. there are few messages as shown below in my server logs .there is ambious address coming up after reboot which is causing the issue.
DEBUG [NAKACK2] LABECSAP01-60365: removed 766961e5-ebd6-5e5f-6e27-e9321d827688 from xmit_table (not member anymore)
[TransferQueueBundler,controlChannelSAS_CLUSTER,LABECSAP01-60365] WARN [TCP] JGRP000032: LABECSAP01-60365: no physical address for 766961e5-ebd6-5e5f-6e27-e9321d827688, dropping message
After server reboot this digest was added by NACK2 where is this ambious address: [main] DEBUG [NAKACK2] [LABECSAP01-60365 setDigest()]
existing digest: []
new digest: LABECSAP00-50260: [11 (11)], 766961e5-ebd6-5e5f-6e27-e9321d827688: [5 (5)], LABECSAP01-60365: [0 (0)]
resulting digest: LABECSAP00-50260: [11 (11)], 766961e5-ebd6-5e5f-6e27-e9321d827688: [5 (5)], LABECSAP01-60365: [0 (0)]
Please help.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am using jgroups-2.6.15.GA.jar.
I have two nodes . The first node (192.168.0.74:7801) is up and is the coordinator. The second node (192.168.0.73:7801) now wants to join the cluster . So when node 192.168.0.73:7801 is brought up, it sends the GET_MBRS_REQ to the coordinator.
Log Snippet of the request sent by the node 192.168.0.73:7801 to join the cluster
28 Oct 2016 16:12:51,964 [Timer-1,dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCP] sending msg to 192.168.0.74:7801, src=192.168.0.73:7801, headers are TCPPING: [PING: type=GET_MBRS_REQ, arg=null], TCP: [channel_name=dataChannelSAS_CLUSTER]
The coordinator node(192.168.0.74:7801) receives this message and sends the GET_MBRS_RSP to the joining node 192.168.0.73:7801
Logs of the coordinator for receiving request and sending response
28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver [192.168.0.74:7801 - 192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG [TCP] message is [dst: 192.168.0.74:7801, src: 192.168.0.73:7801 (2 headers), size=0 bytes, flags=OOB], headers are TCPPING: [PING: type=GET_MBRS_REQ, arg=null], TCP: [channel_name=dataChannelSAS_CLUSTER]
28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver [192.168.0.74:7801 - 192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG [TCPPING] received GET_MBRS_REQ from 192.168.0.73:7801, sending response [PING: type=GET_MBRS_RSP, arg=[own_addr=192.168.0.74:7801, coord_addr=192.168.0.73:7801, is_server=true]]
Here, it is the highlighted that the Coordinator node instead of sending its addresss in coord_addr is sending the joining node's address in the GET_MBRS_RSP response.
The joining node(192.168.0.73:7801) receives this message and sends the JOIN request to itself and this results in looping of this request and system never comes up.
Logs snippet of the joining node
28 Oct 2016 16:12:53,479 [ConnectionTable.Connection.Receiver [192.168.0.73:18344 - 192.168.0.74:7801],dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCPPING] received GET_MBRS_RSP, rsp=[own_addr=192.168.0.74:7801, coord_addr=192.168.0.73:7801, is_server=true]
28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS] election results: {192.168.0.73:7801=1}
28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS] sending handleJoin(192.168.0.73:7801) to 192.168.0.73:7801
28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] sending msg to 192.168.0.73:7801, src=192.168.0.73:7801, headers are GMS: GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801, TCP: [channel_name=dataChannelSAS_CLUSTER]
28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] **looping back message [dst: 192.168.0.73:7801, src: 192.168.0.73:7801** (2 headers), size=0 bytes, flags=OOB]
28 Oct 2016 16:12:54,962 [Main Thread] DEBUG [TCP] message is [dst: 192.168.0.73:7801, src: 192.168.0.73:7801 (2 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801, TCP: [channel_name=dataChannelSAS_CLUSTER]
My question is why the joining node is being elected as the coordinator? How we can fix this issue?
Yes, this is clearly incorrect.
However, I don't support such an old version, as this is a bug that was
likely fixed in later version (after 2.8). Could have been caused by
reincarnation.
I suggest upgrade to 3.6.x (latest stable).
On 17/11/16 11:32, Mithil Aggarwal wrote:
--
Bela Ban, JGroups lead (http://www.jgroups.org)
Thanks for your response.
Hi, i am using jgroups jgroups-3.5.0.Final.jar . when i am rebooting by secondary node . my system gets shutdown after some time. there are few messages as shown below in my server logs .there is ambious address coming up after reboot which is causing the issue.
DEBUG [NAKACK2] LABECSAP01-60365: removed 766961e5-ebd6-5e5f-6e27-e9321d827688 from xmit_table (not member anymore)
[TransferQueueBundler,controlChannelSAS_CLUSTER,LABECSAP01-60365] WARN [TCP] JGRP000032: LABECSAP01-60365: no physical address for 766961e5-ebd6-5e5f-6e27-e9321d827688, dropping message
After server reboot this digest was added by NACK2 where is this ambious address:
[main] DEBUG [NAKACK2]
[LABECSAP01-60365 setDigest()]
existing digest: []
new digest: LABECSAP00-50260: [11 (11)], 766961e5-ebd6-5e5f-6e27-e9321d827688: [5 (5)], LABECSAP01-60365: [0 (0)]
resulting digest: LABECSAP00-50260: [11 (11)], 766961e5-ebd6-5e5f-6e27-e9321d827688: [5 (5)], LABECSAP01-60365: [0 (0)]
Please help.