Menu

Joining node is elected as coordinator. Looping back of JOIN Request

2016-11-17
2019-02-11
  • Mithil Aggarwal

    Mithil Aggarwal - 2016-11-17

    I am using jgroups-2.6.15.GA.jar.
    I have two nodes . The first node (192.168.0.74:7801) is up and is the coordinator. The second node (192.168.0.73:7801) now wants to join the cluster . So when node 192.168.0.73:7801 is brought up, it sends the GET_MBRS_REQ to the coordinator.

    Log Snippet of the request sent by the node 192.168.0.73:7801 to join the cluster
    28 Oct 2016 16:12:51,964 [Timer-1,dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCP] sending msg to 192.168.0.74:7801, src=192.168.0.73:7801, headers are TCPPING: [PING: type=GET_MBRS_REQ, arg=null], TCP: [channel_name=dataChannelSAS_CLUSTER]

    The coordinator node(192.168.0.74:7801) receives this message and sends the GET_MBRS_RSP to the joining node 192.168.0.73:7801

    Logs of the coordinator for receiving request and sending response

    28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver [192.168.0.74:7801 - 192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG [TCP] message is [dst: 192.168.0.74:7801, src: 192.168.0.73:7801 (2 headers), size=0 bytes, flags=OOB], headers are TCPPING: [PING: type=GET_MBRS_REQ, arg=null], TCP: [channel_name=dataChannelSAS_CLUSTER]

    28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver [192.168.0.74:7801 - 192.168.0.73:18344],dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG [TCPPING] received GET_MBRS_REQ from 192.168.0.73:7801, sending response [PING: type=GET_MBRS_RSP, arg=[own_addr=192.168.0.74:7801, coord_addr=192.168.0.73:7801, is_server=true]]

    Here, it is the highlighted that the Coordinator node instead of sending its addresss in coord_addr is sending the joining node's address in the GET_MBRS_RSP response.

    The joining node(192.168.0.73:7801) receives this message and sends the JOIN request to itself and this results in looping of this request and system never comes up.

    Logs snippet of the joining node

    28 Oct 2016 16:12:53,479 [ConnectionTable.Connection.Receiver [192.168.0.73:18344 - 192.168.0.74:7801],dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCPPING] received GET_MBRS_RSP, rsp=[own_addr=192.168.0.74:7801, coord_addr=192.168.0.73:7801, is_server=true]

    28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS] election results: {192.168.0.73:7801=1}
    28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS] sending handleJoin(192.168.0.73:7801) to 192.168.0.73:7801
    28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] sending msg to 192.168.0.73:7801, src=192.168.0.73:7801, headers are GMS: GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801, TCP: [channel_name=dataChannelSAS_CLUSTER]
    28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] **looping back message [dst: 192.168.0.73:7801, src: 192.168.0.73:7801** (2 headers), size=0 bytes, flags=OOB]
    28 Oct 2016 16:12:54,962 [Main Thread] DEBUG [TCP] message is [dst: 192.168.0.73:7801, src: 192.168.0.73:7801 (2 headers), size=0 bytes, flags=OOB], headers are GMS: GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801, TCP: [channel_name=dataChannelSAS_CLUSTER]

    My question is why the joining node is being elected as the coordinator? How we can fix this issue?

     
    • Bela Ban

      Bela Ban - 2016-11-17

      Yes, this is clearly incorrect.

      However, I don't support such an old version, as this is a bug that was
      likely fixed in later version (after 2.8). Could have been caused by
      reincarnation.

      I suggest upgrade to 3.6.x (latest stable).

      On 17/11/16 11:32, Mithil Aggarwal wrote:

      I am using jgroups-2.6.15.GA.jar.
      I have two nodes . The first node (192.168.0.74:7801) is up and is the
      coordinator. The second node (192.168.0.73:7801) now wants to join the
      cluster . So when node 192.168.0.73:7801 is brought up, it sends the
      GET_MBRS_REQ to the coordinator.

      Log Snippet of the request sent by the node 192.168.0.73:7801 to join
      the cluster
      28 Oct 2016 16:12:51,964
      [Timer-1,dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG [TCP] sending
      msg to 192.168.0.74:7801, src=192.168.0.73:7801, headers are TCPPING:
      [PING: type=GET_MBRS_REQ, arg=null], TCP:
      [channel_name=dataChannelSAS_CLUSTER]

      The coordinator node(192.168.0.74:7801) receives this message and sends
      the GET_MBRS_RSP to the joining node 192.168.0.73:7801

      Logs of the coordinator for receiving request and sending response

      28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver
      [192.168.0.74:7801 -
      192.168.0.73:18344]
      ,dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG
      [TCP] message is [dst: 192.168.0.74:7801, src: 192.168.0.73:7801 (2
      headers), size=0 bytes, flags=OOB]
      , headers are TCPPING: [PING:
      type=GET_MBRS_REQ, arg=null]
      , TCP: [channel_name=dataChannelSAS_CLUSTER]

      28 Oct 2016 16:12:51,965 [ConnectionTable.Connection.Receiver
      [192.168.0.74:7801 -
      192.168.0.73:18344]
      ,dataChannelSAS_CLUSTER,192.168.0.74:7801] DEBUG
      [TCPPING] received GET_MBRS_REQ from 192.168.0.73:7801, sending response
      [PING: type=GET_MBRS_RSP, arg=[own_addr=192.168.0.74:7801,
      /coord_addr=192.168.0.73:7801/, is_server=true]
      ]

      Here, it is the highlighted that the Coordinator node instead of
      sending its addresss in coord_addr is sending the joining node's address
      in the GET_MBRS_RSP response.

      The joining node(192.168.0.73:7801) receives this message and sends the
      JOIN request to itself and this results in looping of this request and
      system never comes up.

      Logs snippet of the joining node

      28 Oct 2016 16:12:53,479 [ConnectionTable.Connection.Receiver
      [192.168.0.73:18344 -
      192.168.0.74:7801]
      ,dataChannelSAS_CLUSTER,192.168.0.73:7801] DEBUG
      [TCPPING] received GET_MBRS_RSP, rsp=[own_addr=192.168.0.74:7801,
      coord_addr=192.168.0.73:7801, is_server=true]

      28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS]election results:
      {192.168.0.73:7801=1}

      28 Oct 2016 16:12:54,960 [Main Thread] DEBUG [GMS] sending
      handleJoin(192.168.0.73:7801) to 192.168.0.73:7801
      28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] sending msg to
      192.168.0.73:7801, src=192.168.0.73:7801, headers are GMS:
      GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801, TCP:
      [channel_name=dataChannelSAS_CLUSTER]
      28 Oct 2016 16:12:54,961 [Main Thread] DEBUG [TCP] **looping back
      message [dst: 192.168.0.73:7801, src: 192.168.0.73:7801** (2 headers),
      size=0 bytes, flags=OOB]

      28 Oct 2016 16:12:54,962 [Main Thread] DEBUG [TCP] message is [dst:
      192.168.0.73:7801, src: 192.168.0.73:7801 (2 headers), size=0 bytes,
      flags=OOB]
      , headers are GMS: GmsHeader[JOIN_REQ]: mbr=192.168.0.73:7801,
      TCP: [channel_name=dataChannelSAS_CLUSTER]

      My question is why the joining node is being elected as the coordinator?
      How we can fix this issue?


      Joining node is elected as coordinator. Looping back of JOIN Request
      https://sourceforge.net/p/javagroups/discussion/18794/thread/8bd40cf3/?limit=25#3462


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/javagroups/discussion/18794/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

      --
      Bela Ban, JGroups lead (http://www.jgroups.org)

       
      • Mithil Aggarwal

        Mithil Aggarwal - 2016-11-17

        Thanks for your response.

         
  • reeta aggarwal

    reeta aggarwal - 2019-02-11

    Hi, i am using jgroups jgroups-3.5.0.Final.jar . when i am rebooting by secondary node . my system gets shutdown after some time. there are few messages as shown below in my server logs .there is ambious address coming up after reboot which is causing the issue.

    DEBUG [NAKACK2] LABECSAP01-60365: removed 766961e5-ebd6-5e5f-6e27-e9321d827688 from xmit_table (not member anymore)

    [TransferQueueBundler,controlChannelSAS_CLUSTER,LABECSAP01-60365] WARN [TCP] JGRP000032: LABECSAP01-60365: no physical address for 766961e5-ebd6-5e5f-6e27-e9321d827688, dropping message

    After server reboot this digest was added by NACK2 where is this ambious address:
    [main] DEBUG [NAKACK2]
    [LABECSAP01-60365 setDigest()]
    existing digest: []
    new digest: LABECSAP00-50260: [11 (11)], 766961e5-ebd6-5e5f-6e27-e9321d827688: [5 (5)], LABECSAP01-60365: [0 (0)]
    resulting digest: LABECSAP00-50260: [11 (11)], 766961e5-ebd6-5e5f-6e27-e9321d827688: [5 (5)], LABECSAP01-60365: [0 (0)]

    Please help.

     

Log in to post a comment.