Thread: [jgroups-dev] Continuous heartbeat messages without View change
Brought to you by:
belaban
|
From: Development i. <jav...@li...> - 2017-07-20 08:08:08
|
I am using JGroups version: 4.0.2. I am testing with 4 nodes where each node broadcasts request that it receives. I am using POSTMAN tool to send REST request. I am sending REST request for CRUD operation to node 1. While the CRUD request is being processed, I disconnected node 1 from the network. In that case, view change was not triggered in node 1 and this in turn affects application's behavior. The following messages grows in node 1, DEBUG:org.jgroups.protocols.FD_ALL: haven't received a heartbeat from node3 for 4130760 ms, adding it to suspect list WARN:org.jgroups.protocols.FD_ALL: suspecting [node2, node3, node4] The following message grows in other nodes than node 1, WARN:org.jgroups.protocols.UDP: JGRP000032: 126.71: no physical address for node1, dropping message It was noted that the view change is received after connecting node 1 to network again. What is the maximum time interval / number of heartbeat messages? How to resolve / justify this? Thanks in Advance :-) -- View this message in context: http://jgroups.1086181.n5.nabble.com/Continuous-heartbeat-messages-without-View-change-tp11351.html Sent from the JGroups - Dev mailing list archive at Nabble.com. |
|
From: Development i. <jav...@li...> - 2017-07-20 08:19:35
|
Can you post your configuration? On 20/07/17 10:08, Development issues wrote: > I am using JGroups version: 4.0.2. I am testing with 4 nodes where each node > broadcasts request that it receives. I am using POSTMAN tool to send REST > request. I am sending REST request for CRUD operation to node 1. While the > CRUD request is being processed, I disconnected node 1 from the network. In > that case, view change was not triggered in node 1 and this in turn affects > application's behavior. > > The following messages grows in node 1, > > DEBUG:org.jgroups.protocols.FD_ALL: haven't received a heartbeat from node3 > for 4130760 ms, adding it to suspect list > WARN:org.jgroups.protocols.FD_ALL: suspecting [node2, node3, node4] > > The following message grows in other nodes than node 1, > > WARN:org.jgroups.protocols.UDP: JGRP000032: 126.71: no physical address for > node1, dropping message > > > It was noted that the view change is received after connecting node 1 to > network again. > > What is the maximum time interval / number of heartbeat messages? How to > resolve / justify this? > > Thanks in Advance :-) > > > > -- > View this message in context: http://jgroups.1086181.n5.nabble.com/Continuous-heartbeat-messages-without-View-change-tp11351.html > Sent from the JGroups - Dev mailing list archive at Nabble.com. > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Javagroups-development mailing list > -- Bela Ban | http://www.jgroups.org |
|
From: Development i. <jav...@li...> - 2017-07-20 08:43:38
|
Protocol Stack:
<config xmlns="urn:org:jgroups"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:org:jgroups
http://www.jgroups.org/schema/jgroups.xsd">
<UDP
mcast_port="${jgroups.udp.mcast_port:45588}"
ip_ttl="4"
tos="8"
ucast_recv_buf_size="5M"
ucast_send_buf_size="5M"
mcast_recv_buf_size="5M"
mcast_send_buf_size="5M"
max_bundle_size="64K"
enable_diagnostics="true"
thread_naming_pattern="cl"
thread_pool.min_threads="2"
thread_pool.max_threads="8"
thread_pool.keep_alive_time="30000"/>
<PING />
<MERGE3 max_interval="30000"
min_interval="10000"/>
<FD_SOCK/>
<FD_ALL/>
<VERIFY_SUSPECT timeout="1500" />
<BARRIER />
<pbcast.NAKACK2 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="30000"
use_mcast_xmit="false"
discard_delivered_msgs="true"/>
<UNICAST3 xmit_interval="500"
xmit_table_num_rows="100"
xmit_table_msgs_per_row="2000"
xmit_table_max_compaction_time="60000"
conn_expiry_timeout="0"/>
<pbcast.STABLE desired_avg_gossip="50000"
max_bytes="4M"/>
<pbcast.GMS print_local_addr="true" join_timeout="2000"
view_bundling="false"
membership_change_policy="com.membership.CustomMembershipPolicy"
max_bundling_time="50"/>
<SEQUENCER />
<UFC max_credits="2M"
min_threshold="0.4"/>
<MFC max_credits="2M"
min_threshold="0.4"/>
<FRAG2 frag_size="60K" />
<FORK />
<RSVP resend_interval="2000" timeout="10000"/>
<pbcast.STATE_TRANSFER />
</config>
Thanks in advance :-)
--
View this message in context: http://jgroups.1086181.n5.nabble.com/Continuous-heartbeat-messages-without-View-change-tp11352p11356.html
Sent from the JGroups - Dev mailing list archive at Nabble.com.
|
|
From: Development i. <jav...@li...> - 2017-07-20 12:24:48
|
The config looks fine to me, but "haven't received a heartbeat from node3
for 4130760 ms, adding it to suspect list" indicates that node1 never
installs the new view.
I noticed that you have a custom membership policy
("com.membership.CustomMembershipPolicy")> Is it correctly forming the
new view? Can you post that code?
How do you disconnect node1? Do you pull the cable (I assume)?
On 20/07/17 10:43, Development issues wrote:
> Protocol Stack:
>
> <config xmlns="urn:org:jgroups"
> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xsi:schemaLocation="urn:org:jgroups
> http://www.jgroups.org/schema/jgroups.xsd">
> <UDP
> mcast_port="${jgroups.udp.mcast_port:45588}"
> ip_ttl="4"
> tos="8"
> ucast_recv_buf_size="5M"
> ucast_send_buf_size="5M"
> mcast_recv_buf_size="5M"
> mcast_send_buf_size="5M"
> max_bundle_size="64K"
> enable_diagnostics="true"
> thread_naming_pattern="cl"
>
> thread_pool.min_threads="2"
> thread_pool.max_threads="8"
> thread_pool.keep_alive_time="30000"/>
>
> <PING />
> <MERGE3 max_interval="30000"
> min_interval="10000"/>
> <FD_SOCK/>
> <FD_ALL/>
> <VERIFY_SUSPECT timeout="1500" />
> <BARRIER />
> <pbcast.NAKACK2 xmit_interval="500"
> xmit_table_num_rows="100"
> xmit_table_msgs_per_row="2000"
> xmit_table_max_compaction_time="30000"
> use_mcast_xmit="false"
> discard_delivered_msgs="true"/>
> <UNICAST3 xmit_interval="500"
> xmit_table_num_rows="100"
> xmit_table_msgs_per_row="2000"
> xmit_table_max_compaction_time="60000"
> conn_expiry_timeout="0"/>
> <pbcast.STABLE desired_avg_gossip="50000"
> max_bytes="4M"/>
> <pbcast.GMS print_local_addr="true" join_timeout="2000"
> view_bundling="false"
> membership_change_policy="com.membership.CustomMembershipPolicy"
> max_bundling_time="50"/>
>
> <SEQUENCER />
> <UFC max_credits="2M"
> min_threshold="0.4"/>
> <MFC max_credits="2M"
> min_threshold="0.4"/>
> <FRAG2 frag_size="60K" />
> <FORK />
> <RSVP resend_interval="2000" timeout="10000"/>
> <pbcast.STATE_TRANSFER />
>
> </config>
>
> Thanks in advance :-)
>
>
>
> --
> View this message in context: http://jgroups.1086181.n5.nabble.com/Continuous-heartbeat-messages-without-View-change-tp11352p11356.html
> Sent from the JGroups - Dev mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Javagroups-development mailing list
>
--
Bela Ban | http://www.jgroups.org
|
|
From: Development i. <jav...@li...> - 2017-07-20 13:31:18
|
The custom membership policy is correctly forming new view and it looks like:
import java.util.Collection;
import java.util.List;
import org.jgroups.Address;
import org.jgroups.Membership;
import org.jgroups.stack.MembershipChangePolicy;
public class CustomMembershipPolicy implements MembershipChangePolicy {
@Override
public List<Address> getNewMembership(final Collection<Address>
currentMembers, final Collection<Address> joiners,
final Collection<Address> leavers, final Collection<Address> suspects) {
Membership retval = new Membership();
// add the beefy nodes from the current membership first
for (Address addr : currentMembers) {
if (addr instanceof CustomAddress) {
retval.add(addr);
}
}
// then from joiners
for (Address addr : joiners) {
if (addr instanceof CustomAddress) {
retval.add(addr);
}
}
// then add all non-beefy current nodes
retval.add(currentMembers);
// finally the non-beefy joiners
retval.add(joiners);
retval.remove(leavers);
retval.remove(suspects);
return retval.getMembers();
}
@Override
public List<Address> getNewMembership(final
Collection<Collection<Address>> subviews) {
Membership mbrs = new Membership();
Membership retval = new Membership();
for (Collection<Address> subview : subviews) {
mbrs.add(subview);
}
for (Address addr : mbrs.getMembers()) {
if (addr instanceof CustomAddress) {
retval.add(addr);
}
}
retval.add(mbrs.getMembers());
return retval.getMembers();
}
}
Yes, I have disconnected the cable of node1. [Note: Unplugging of cable
didn't resulted in the above issue every time.]
Thanks in advance :-)
--
View this message in context: http://jgroups.1086181.n5.nabble.com/Continuous-heartbeat-messages-without-View-change-tp11352p11364.html
Sent from the JGroups - Dev mailing list archive at Nabble.com.
|
|
From: Development i. <jav...@li...> - 2017-07-21 07:45:52
|
This looks ok to me. I tested this with your custom membership policy and everything worked fine. Pulling the cable is something that has always worked and is supported by FD_ALL. The only code I haven't yet seen is CustomAddress, any special wizardry in there? If you try this with Draw, your config and custom membership policy, and things still don't work, then it must be the env, e.g. firewalls/SELinux/NIC issue etc. Have you tried this with Draw? -- Bela Ban | http://www.jgroups.org |
|
From: Development i. <jav...@li...> - 2017-07-21 10:50:06
|
Custom address looks like:
import java.util.function.Supplier;
import org.jgroups.Address;
import org.jgroups.conf.ClassConfigurator;
import org.jgroups.stack.AddressGenerator;
import org.jgroups.util.NameCache;
import org.jgroups.util.UUID;
public class CustomAddress extends UUID implements AddressGenerator {
/**
*
*/
static {
ClassConfigurator.add((short) 12545, CustomAddress.class);
}
public CustomAddress() {
super();
}
public CustomAddress(long mostSigBits, long leastSigBits) {
super(mostSigBits, leastSigBits);
}
protected CustomAddress(byte[] data) {
super(data);
}
public static CustomAddress randomUUID(String name) {
CustomAddress retval=new CustomAddress(generateRandomBytes());
if(name != null)
NameCache.add(retval, name);
return retval;
}
@Override
public Supplier<? extends UUID> create() {
return CustomAddress::new;
}
@Override
public Address generateAddress() {
return CustomAddress.randomUUID("master");
}
}
This issue is not reproducible always. My config and custom membership
policy works fine and no problem with the view generation.
Thanks in advance :-)
--
View this message in context: http://jgroups.1086181.n5.nabble.com/Continuous-heartbeat-messages-without-View-change-tp11352p11367.html
Sent from the JGroups - Dev mailing list archive at Nabble.com.
|
|
From: Development i. <jav...@li...> - 2017-07-25 07:07:24
|
This looks ok to me, haven't tested it though.
Report back once you have a reproducible case, with instructions on how
to reproduce it, a sample program and config, and I'll take a look.
FD_ALL has worked forever, so I assume you may run into network issues
every now and then...
On 21/07/17 12:49, Development issues wrote:
> Custom address looks like:
>
> import java.util.function.Supplier;
>
> import org.jgroups.Address;
> import org.jgroups.conf.ClassConfigurator;
> import org.jgroups.stack.AddressGenerator;
> import org.jgroups.util.NameCache;
> import org.jgroups.util.UUID;
>
> public class CustomAddress extends UUID implements AddressGenerator {
>
> /**
> *
> */
>
> static {
> ClassConfigurator.add((short) 12545, CustomAddress.class);
> }
>
> public CustomAddress() {
> super();
> }
>
> public CustomAddress(long mostSigBits, long leastSigBits) {
> super(mostSigBits, leastSigBits);
> }
>
> protected CustomAddress(byte[] data) {
> super(data);
> }
>
> public static CustomAddress randomUUID(String name) {
> CustomAddress retval=new CustomAddress(generateRandomBytes());
> if(name != null)
> NameCache.add(retval, name);
> return retval;
> }
>
> @Override
> public Supplier<? extends UUID> create() {
> return CustomAddress::new;
> }
>
> @Override
> public Address generateAddress() {
> return CustomAddress.randomUUID("master");
> }
>
> }
>
> This issue is not reproducible always. My config and custom membership
> policy works fine and no problem with the view generation.
>
> Thanks in advance :-)
>
>
>
> --
> View this message in context: http://jgroups.1086181.n5.nabble.com/Continuous-heartbeat-messages-without-View-change-tp11352p11367.html
> Sent from the JGroups - Dev mailing list archive at Nabble.com.
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Javagroups-development mailing list
>
--
Bela Ban | http://www.jgroups.org
|