Re: [javagroups-users] Unable to recover from suspect/merge, with auto-reconnect
Brought to you by:
belaban
From: Matt M. <jbo...@ms...> - 2008-02-26 02:01:29
|
I really am using 2.6.1. I even looked at the source provided with the 2.6.1 release and on line 261 I see the typo there: if(log.isDebugEnabled()) log.debug("Recevied Ack. is invalid (was from: " + hdr.from + "), "); The FD source says it is version * @version $Id: FD.java,v 1.58 2007/07/27 11:00:58 belaban Exp $ This is the code I got from the 2.6.1 downloads. My log files even say the JGroups version of 2.6.1 when the application starts up. Could it be something else? Is the wrong FD version tagged into the 2.6.1 release somehow? -- m@ > You're using 2.4.x, *not* 2.6.x ! The line > > [org.jgroups.protocols.FD] Recevied Ack. is invalid (was from: > 172.16.172.233:19283) > > with the typo shows this. > > You indicated you were using 2.6.x below, if you don't use 2.6.x can you > give it a try ? > > Matt Magoffin wrote: >> Hello, I'm having an issue with a 2-machine cluster using a TCP stack >> based on the tcp.xml from JGroups 2.6.1. On each machine I have 8 >> separate >> channels running, on different ports, with 4 groups in 2 JVM instances. >> >> After some period of time, one machine will fail to respond to a FD >> ping, >> and gets suspected. The machine that failed is not responding in time it >> seems from high CPU use, and many of the channels will fail FD around >> the >> same time. The channels are configured with auto-reconnect. My >> understanding was that the channel should "heal" itself and eventually >> re-form into a new view with the same 2 members in the cluster, which >> should apply to this situation because the machine that failed to >> respond >> eventually will respond. >> >> However, the group does not always seem to "heal" (sometimes it does, >> sometimes not). Once it stops healing, it never seems to ever do so >> again, >> and I get tons of NAKACK "message X not found in retransmission table" >> ERROR logs. The only way to get the channel working agin is to shut down >> the channel on both machines and then start them up again. >> >> I'm attaching a transcript from a log on one machine which highlights >> this >> situation, can you spot anything that looks wrong perhaps in the >> configuration (stock tcp.xml configuration)? >> >> Or, if I know my cluster size should always remain contstant (2 members) >> do you have any recommendations on changes to the stock tcp.xml >> configuration that would work better? >> >> >> > > -- > Bela Ban > Lead JGroups / Clustering Team > JBoss - a division of Red Hat > > > |