[javagroups-users] JGroups 2.2.9.1: problems with UNICAST after network failure
Brought to you by:
belaban
From: Matthias W. <mat...@ba...> - 2006-05-15 11:45:12
|
Hi, I have 3 processes running, two of them sitting on machine A (I'll call them A1 and A2), one sitting on machine B (I'll call it B1). A1 is the coordinator. In case of temporary network failure the group is splitted into two subgroups: group G1 with A1 as coordinator and A2 as groupmember and group G2 with B1 as coordinator. I get into trouble after the network failure is resolved: the two subgroups are merged into one group again with A1 as coordinator. The new view ist distributed among members, I can send multicast messages to the group without problems. I can also send unicast messages between A1 and B1. But there is no chance to send unicast messages between A2 and B1: every message is discarded by UNICAST with "discarding message to XXX as this member left the group" as log entry. After a look into the sources, I guess the problem lies here: UNICAST keeps a list of previous group members (UNICAST.previous_members). In case of a view change with use_gms set to "true" (default) the leaving members are added to this list. If UNICAST.down() detects that the message receiver is on this list, the message is discarded. The member is only removed from the list if a unicast message is received from it. So we have a problem: B1 can't send a unicast message to A2 until ist gets one from A2 and vice versa. A1 and A2 can exchange unicast messages because coordinators send Event.ENABLE_UNICASTS_TO down the stack after a merge request. It would be nice, if someone could confirm this. My protocol stack: <?xml version="1.0" encoding="ISO-8859-1"?> <config> <UDP mcast_addr="239.0.0.1" mcast_port="10001" ip_ttl="1" mcast_send_buf_size="32000" mcast_recv_buf_size="64000" ucast_send_buf_size="32000" ucast_recv_buf_size="64000" loopback="true" use_incoming_packet_handler="true" use_outgoing_packet_handler="false" /> <PING timeout="3000" num_initial_members="1" down_thread="false" up_thread="false" /> <MERGE2 max_interval="10000" min_interval="5000" down_thread="false" up_thread="false" /> <FD_SOCK down_thread="false" up_thread="false" /> <FD timeout="10000" max_tries="2" shun="true" down_thread="false" up_thread="false" /> <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false" /> <pbcast.NAKACK max_xmit_size="8192" gc_lag="50" retransmit_timeout="600,1200,2400,4000" down_thread="false" up_thread="false" /> <UNICAST timeout="1000,1500,2000" down_thread="false" up_thread="false" /> <pbcast.STABLE stability_delay="1000" desired_avg_gossip="20000" max_bytes="0" down_thread="false" up_thread="false" /> <FRAG2 frag_size="8192" down_thread="false" up_thread="false" /> <VIEW_SYNC avg_send_interval="60000" down_thread="false" up_thread="false" /> <pbcast.GMS print_local_addr="true" join_timeout="3000" join_retry_timeout="2000" shun="true" down_thread="false" up_thread="false" /> </config> Regards, Matthias Weber |