Re: [jgroups-users] Using Gossip Router but all 4 nodes do not communicate with each other
Brought to you by:
belaban
|
From: Bela B. <be...@ya...> - 2014-06-13 10:35:37
|
On 13/06/14 11:46, pw wrote: > Hi, > > I have 4 nodes in the cluster named A, B, C and H and each node runs on a > seperate machine. I use gossip router for connecting since 2 machines are on > one subdomain and the other two on another subdomain. As an alternative, you could use TCP and TCPPING and list all 4 hosts in TCPPING.initial_hosts. > On each machine a gossip router is started and below is the gossiprouter.xml Why ? I think this is overkill, as each node needs to register with all 4 GossipRouters. Why not just 1 GR per subnet ? Or use TCP:TCPPING as I mentioned above. > <config xmlns="urn:org:jgroups" > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xsi:schemaLocation="urn:org:jgroups > http://www.jgroups.org/schema/JGroups-3.3.xsd"> > > <TUNNEL > gossip_router_hosts="${jgroups.tunnel.gossip_router_hosts:ltlnxp1u.nroot.com[12003],ltlnxp2u.nroot.com[12003],inflmp3d.test.nroot.net[12003], > inflmp5d.test.nroot.net[12003]}"/> > <PING num_initial_members="1" num_initial_srv_members="2" > force_sending_discovery_rsps="true" timeout="6000"/> > <MERGE2/> > <FD/> > <VERIFY_SUSPECT/> > <pbcast.NAKACK2 use_mcast_xmit="false"/> > <UNICAST3/> > <pbcast.STABLE/> > <pbcast.GMS/> > <UFC/> > <MFC/> > <FRAG2/> > <pbcast.STATE_TRANSFER/> > <pbcast.FLUSH timeout="2000"/> > </config> > > The 4 nodes are able to form a cluster, but when I check the jgroups debug > logs, it appears that one node is communicating with only 2 other nodes > instead of 3 nodes. What do you mean ? Did you check that every node shows a view of {A,B,C,H} and they all have the same view-id ? Then everything should be fine. Looks like you're referring to the FD communication below. This is ring-structured, e.g. A sends heartbeats to B, who sends heartbeats to C --> H --> back to A. So the communication you're seeing (at least for FD) is OK. > Below are the Jgroup logs: > Cluster is formed: [LDP-C, LDP-A, LDP-B, LDP-H] > > Node A: > [TUNNEL::OOB-1,LOCALTEST_MyCluster,LDP-A] - sent a message to LDP-C, GR used > ltlnxp2u.nroot.com/182.124.212.169:12003 > [FD::Timer-4,LOCALTEST_MyCluster,LDP-A] - LDP-A: sending are-you-alive msg > to LDP-B > [TUNNEL::Timer-4,LOCALTEST_MyCluster,LDP-A] - sent a message to LDP-B, GR > used ltlnxp1u.nroot.com/182.124.212.170:12003 > > Node B: > [FD::Timer-2,LOCALTEST_MyCluster,LDP-B] - LDP-B: sending are-you-alive msg > to LDP-H > [TUNNEL::Timer-2,LOCALTEST_MyCluster,LDP-B] - sent a message to LDP-H, GR > used inflmp3d.test.nroot.net/189.187.177.58:12003 > [TUNNEL::OOB-2,LOCALTEST_MyCluster,LDP-B] - sent a message to LDP-A, GR used > inflmp5d.test.nroot.net/189.187.177.60:12003 > > Node C: > [FD::Timer-2,LOCALTEST_MyCluster,LDP-C] - LDP-C: sending are-you-alive msg > to LDP-A > [TUNNEL::Timer-2,LOCALTEST_MyCluster,LDP-C] - sent a message to LDP-A, GR > used ltlnxp2u.nroot.com/182.124.212.169:12003 > [TUNNEL::OOB-1,LOCALTEST_MyCluster,LDP-C] - sent a message to LDP-H, GR used > inflmp3d.test.nroot.net/189.187.177.58:12003 > > Node H: > [FD::Timer-2,LOCALTEST_MyCluster,LDP-H] - LDP-H: sending are-you-alive msg > to LDP-C > [TUNNEL::Timer-2,LOCALTEST_MyCluster,LDP-H] - sent a message to LDP-C, GR > used inflmp3d.test.nroot.net/189.187.177.58:12003 > [TUNNEL::OOB-1,LOCALTEST_MyCluster,LDP-H] - sent a message to LDP-B, GR used > inflmp3d.test.nroot.net/189.187.177.58:12003 > > What is the reason for this? How can this be fixed? > > I also did a small test by running the 4 nodes on one machine and running a > single gossip router on that machine. Once again I observed that only 3 > nodes communicate with each other. > > All help will be appreciated. > > > > -- > View this message in context: http://jgroups.1086181.n5.nabble.com/Using-Gossip-Router-but-all-4-nodes-do-not-communicate-with-each-other-tp10259.html -- Bela Ban, JGroups lead (http://www.jgroups.org) |