Re: [javagroups-users] Member Join reject issue
Brought to you by:
belaban
From: Bela B. <be...@ya...> - 2008-11-22 06:54:28
|
One possible solution is the following: * Set Channel.AUTO_RECONNECT to false * Implement ChannelListener and register: extend ChannelListenerAdapter and implement channelShunned() * Whenever you're shunned, close the channel * Wait for some seconds * In a loop: o open the channel o connect o If security exception "already connected", sleep a bit (then continue with the loop) o else break out of the loop The issue is most likely caused by reincarnation which will disappear with logical addresses (in 2.8), of so I hope... vivek sar wrote: > Hello Bela, > > We got 3 nodes, on one of the node we see the following exception, > > 2008-11-17 13:10:15,427 ERROR [CloserThread] JChannel - failure > reconnecting to channel, retrying org.jgroups.ChannelException: > connect() failed at > org.jgroups.JChannel.connect(JChannel.java:373) > at org.jgroups.JChannel$CloserThread.run(JChannel.java:1908) > Caused by: java.lang.SecurityException: member 10.94.40.27:4576 is > already part of the group, JOIN request is rejected at > org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:144) > at org.jgroups.protocols.pbcast.ClientGmsImpl.join(ClientGmsImpl.java:39) > at org.jgroups.protocols.pbcast.GMS.down(GMS.java:823) > at org.jgroups.protocols.FRAG2.down(FRAG2.java:158) > > After this that node is not able to join the group at all. For some > reason the failed node also doesn't get the updated view, so two nodes > (A and B) has V3, where as the failed node (C) has V2. > > I went through the GMS code and also read up the comments in jira > (https://jira.jboss.org/jira/browse/JGRP-130), but still not sure why > would node C never join the group. From what it seems if the > GMS.reject_join_from_existing_member is true (default behavior), the > ClientGmsJoinImpl.join would fail (throw SecurityException), but > shouldn't it keep retrying until it joins the group? > > Is it safe to set the GMS.reject_join_from_existing_member to false - > wouldn't that cause other re-incarnation problems? We do have FD_SOCK > so not sure if we need to reject the existing member. In this > particular case we didn't have FLUSH - would adding FLUSH help, so all > nodes get the same view? > > Also, what's does "CloserThread" do? When does it generate the view > change event? > > Here are the last few view change events when the error happened on > Node A (coordinator) and Node C (where the error happened), > > Node A (10.4.11.13) > ------------------------ > 2008-11-17 13:07:19,964 INFO [Incoming-1,PM-POS,10.4.11.13:4576] > RpcServiceManager - viewAccepted()-> New View: [10.4.11.13:4576|3] > [10.4.11.13:4576, 10.6.11.133:4576, 10.6.11.132:4576] > > 2008-11-17 13:10:14,031 INFO [CloserThread] RpcServiceManager - > viewAccepted()-> New View: [10.4.11.13:4576|0] [10.4.11.13:4576] > > 2008-11-17 13:10:15,378 INFO [Incoming-1,PM-POS,10.4.11.13:4576] > RpcServiceManager - viewAccepted()-> New View: [10.4.11.13:4576|1] > [10.4.11.13:4576, 10.6.11.133:4576] > > 2008-11-17 13:10:18,829 INFO [Incoming-1,PM-POS,10.4.11.13:4576] > RpcServiceManager - viewAccepted()-> New View: [10.4.11.13:4576|2] > [10.4.11.13:4576, 10.6.11.133:4576, 10.6.11.132:4576] > > Node C (10.6.11.133) > ---------------------------- > 2008-11-17 13:07:20,006 INFO [Incoming-2,PM-POS,10.6.11.133:4576] > RpcServiceManager - viewAccepted()-> New View: [10.4.11.13:4576|3] > [10.4.11.13:4576, 10.6.11.133:4576, 10.6.11.132:4576] > > ============> Looks like the view got reset (not sure how the view got > reset), but Node C never got the reset view (Node A above has it > started with view 0 at this time) > > 2008-11-17 13:10:15,427 ERROR [CloserThread] JChannel - failure > reconnecting to channel, retrying > org.jgroups.ChannelException: connect() failed > > Thanks, > -vivek > > -- Bela Ban Lead JGroups / Clustering Team JBoss - a division of Red Hat |