Re: [jgroups-users] Race condition in tcpgossip discovery?
Brought to you by:
belaban
From: Bela B. <be...@ya...> - 2012-12-27 17:24:22
|
Can you create a JIRA and attach the 2 logs to it ? On 12/15/12 1:11 AM, Ramakrishna Kandula wrote: > JGroups experts, > > We are observing a race condition in TCPGOSSIP based discovery. We have seen > this a few times but have not been able to determine how it gets into this > condition. When it does get into this condition, here is the behavior. > > A is gossip router and B is a node connecting to gossip router. > > Timer-1 on B starts connecting, does ping. > Timer-2 on B starts reconnecting > A sees this new connection and closes the old one from Timer-1 from B. > Timer-2 on B does ping > Timer-1 on B tries to do gossip_get and encounters broken pipe as the socket > is closed. > > This process goes on forever. Can you suggest how to avoid this? Timer-1 and > Timer-2 are jgroups timers and our code has no control over them. > > Thanks, > ramky > > > Logs from Gossip Router A > ========================= > > 2012-12-14 13:20:47,687 DEBUG [GossipRouter][] org.jgroups.stack.GossipRouter- > Accepted connection, socket is > Socket[addr=/192.168.1.11,port=21009,localport=12001] > 2012-12-14 13:20:47,687 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] entering > receive loop > 2012-12-14 13:20:47,749 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] received > CONNECT(group=TestCluster, addr=dd-ace1-20957, > logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800) > 2012-12-14 13:20:47,749 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer: > /192.168.1.11, logical_addrs: dd-ace1-20957, dd-ace1-20957]] for > addr[dd-ace1-20957]. Closing old connection ... > 2012-12-14 13:20:47,749 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] connection handshake completed, added > dd-ace1-20957 to group TestCluster > 2012-12-14 13:20:47,749 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] processed CONNECT(group=TestCluster, > addr=dd-ace1-20957, logical_name=dd-ace1-20957, > physical_addrs=192.168.1.11:7800) > 2012-12-14 13:20:48,875 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] received PING(group=null, addr=null) > 2012-12-14 13:20:48,875 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] processed PING(group=null, addr=null) > 2012-12-14 13:20:53,735 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] received CONNECT(group=TestCluster, > addr=dd-ace1-20957, logical_name=dd-ace1-20957, > physical_addrs=192.168.1.11:7800) > 2012-12-14 13:20:53,736 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer: > /192.168.1.11, logical_addrs: dd-ace1-20957]] for addr[dd-ace1-20957]. Closing > old connection ... > 2012-12-14 13:20:53,736 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] is being closed > 2012-12-14 13:20:53,838 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957, dd-ace1-20957] connection handshake completed, > added dd-ace1-20957 to group TestCluster > 2012-12-14 13:20:53,838 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957, dd-ace1-20957] processed > CONNECT(group=TestCluster, addr=dd-ace1-20957, > logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800) > 2012-12-14 13:21:04,023 DEBUG [GossipRouter][] org.jgroups.stack.GossipRouter- > Accepted connection, socket is > Socket[addr=/192.168.1.11,port=37357,localport=12001] > 2012-12-14 13:21:04,023 DEBUG [gossip-handlers--9881][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] entering > receive loop > 2012-12-14 13:21:04,085 TRACE [gossip-handlers--9881][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] received > CONNECT(group=TestCluster, addr=dd-ace1-20957, > logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800) > 2012-12-14 13:21:04,085 DEBUG [gossip-handlers--9881][] > org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer: > /192.168.1.11, logical_addrs: dd-ace1-20957, dd-ace1-20957]] for > addr[dd-ace1-20957]. Closing old connection ... > 2012-12-14 13:21:04,086 DEBUG [gossip-handlers--9881][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] connection handshake completed, added > dd-ace1-20957 to group TestCluster > 2012-12-14 13:21:04,086 TRACE [gossip-handlers--9881][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] processed CONNECT(group=TestCluster, > addr=dd-ace1-20957, logical_name=dd-ace1-20957, > physical_addrs=192.168.1.11:7800) > > Logs from Gossip Client B > ========================= > > 2012-12-14 13:20:47,687 DEBUG [GossipRouter][] org.jgroups.stack.GossipRouter- > Accepted connection, socket is > Socket[addr=/192.168.1.11,port=21009,localport=12001] > 2012-12-14 13:20:47,687 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] entering > receive loop > 2012-12-14 13:20:47,749 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] received > CONNECT(group=TestCluster, addr=dd-ace1-20957, > logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800) > 2012-12-14 13:20:47,749 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer: > /192.168.1.11, logical_addrs: dd-ace1-20957, dd-ace1-20957]] for > addr[dd-ace1-20957]. Closing old connection ... > 2012-12-14 13:20:47,749 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] connection handshake completed, added > dd-ace1-20957 to group TestCluster > 2012-12-14 13:20:47,749 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] processed CONNECT(group=TestCluster, > addr=dd-ace1-20957, logical_name=dd-ace1-20957, > physical_addrs=192.168.1.11:7800) > 2012-12-14 13:20:48,875 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] received PING(group=null, addr=null) > 2012-12-14 13:20:48,875 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] processed PING(group=null, addr=null) > 2012-12-14 13:20:53,735 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] received CONNECT(group=TestCluster, > addr=dd-ace1-20957, logical_name=dd-ace1-20957, > physical_addrs=192.168.1.11:7800) > 2012-12-14 13:20:53,736 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer: > /192.168.1.11, logical_addrs: dd-ace1-20957]] for addr[dd-ace1-20957]. Closing > old connection ... > 2012-12-14 13:20:53,736 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] is being closed > 2012-12-14 13:20:53,838 DEBUG [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957, dd-ace1-20957] connection handshake completed, > added dd-ace1-20957 to group TestCluster > 2012-12-14 13:20:53,838 TRACE [gossip-handlers--9882][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957, dd-ace1-20957] processed > CONNECT(group=TestCluster, addr=dd-ace1-20957, > logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800) > 2012-12-14 13:21:04,023 DEBUG [GossipRouter][] org.jgroups.stack.GossipRouter- > Accepted connection, socket is > Socket[addr=/192.168.1.11,port=37357,localport=12001] > 2012-12-14 13:21:04,023 DEBUG [gossip-handlers--9881][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] entering > receive loop > 2012-12-14 13:21:04,085 TRACE [gossip-handlers--9881][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11] received > CONNECT(group=TestCluster, addr=dd-ace1-20957, > logical_name=dd-ace1-20957, physical_addrs=192.168.1.11:7800) > 2012-12-14 13:21:04,085 DEBUG [gossip-handlers--9881][] > org.jgroups.stack.GossipRouter- Found old connection[ConnectionHandler[peer: > /192.168.1.11, logical_addrs: dd-ace1-20957, dd-ace1-20957]] for > addr[dd-ace1-20957]. Closing old connection ... > 2012-12-14 13:21:04,086 DEBUG [gossip-handlers--9881][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] connection handshake completed, added > dd-ace1-20957 to group TestCluster > 2012-12-14 13:21:04,086 TRACE [gossip-handlers--9881][] > org.jgroups.stack.GossipRouter- ConnectionHandler[peer: /192.168.1.11, > logical_addrs: dd-ace1-20957] processed CONNECT(group=TestCluster, > addr=dd-ace1-20957, logical_name=dd-ace1-20957, > physical_addrs=192.168.1.11:7800) > > > > ------------------------------------------------------------------------------ > LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial > Remotely access PCs and mobile devices and provide instant support > Improve your efficiency, and focus on delivering more value-add services > Discover what IT Professionals Know. Rescue delivers > http://p.sf.net/sfu/logmein_12329d2d > _______________________________________________ > javagroups-users mailing list > jav...@li... > https://lists.sourceforge.net/lists/listinfo/javagroups-users > -- Bela Ban, JGroups lead (http://www.jgroups.org) |