|
From: jason <huz...@gm...> - 2013-03-06 07:13:13
|
Hi Ying, Another question to your new bcast sync mechanism: I the design of the new mechanism , n_ptr->bclink.supported = 0 only prevent bcast message receiving from peer, so it has nothing to do with local node bcast sending, therefore I think it should not prevent us from calling tipc_bclink_acknowledge(). There are three places in tipc code where calls tipc_bclink_acknowledge() ,I list below: 1) when tipc_bclink_recv_pkt() receives a nack. 2) in tipc_recv_msg(). 3) in node_lost_contact() I saw your patch only remove checking n_ptr->bclink.supported in node_lost_contact() , so what about the other two cases? On Mar 6, 2013 2:27 PM, "Ying Xue" <yin...@wi...> wrote: > On 03/06/2013 11:41 AM, jason wrote: > > Hi All, > > Let's say there are to nodes in cluster ,nodeA and nodeB. There is > > possibility that A has opened bcast receiving for B while B hasen't > > opened bcast receiving for A. Therefore, B hasen't sync its last_in to > > what A has sent ,then every messages sent by B will carry a invalid > > bcast acked seq number for A. Because A has open bcast receiving for B, > > it will process those invalid acks from B in tipc_bclink_acknowlegde(). > > It seems may cause problem I think. Please consider this. > > > > On Jan 31, 2013 6:06 PM, "Ying Xue" <yin...@wi... > > <mailto:yin...@wi...>> wrote: > > No, it's possible for you not to completely understand the root cause. > Of course, I admit it's is a hard thing to clearly know every detail > things. A least, after a while, I almost forgot why it happens and what > reason is. > > the key reason why it appears is that TIPC does not properly cope with > the sync problem between unicat link and multicast link. Even if one > unicast link is set up by sending link state message via unicast > channel, link states on both endpoints are not sync immediately due to > distribution environment. For example, there have two nodes, one sender > of sending multicast messages and one message receiver respectively. > Suddenly one new node joins the cluster as another multicast messages > receiver. As the link sate between new receiver and the sender is not > sync timely, for instance, the sender still thinks there only has one > receiver although the new receiver actually starts to receive the > multicast messages sent by the sender at the moment. That means, during > the time of link state being inconsistent sender can release message in > its outbound queue as long as it receives one ack from one of the two > receivers. In normally there has no big problem. But if one receiver > finds one message is missed from a series of sequential received > packets, it then sends retransmission request to ask the sender to send > the missed packet again. But the missed packet has been released by > sender as the sender already received an ack of the missed packet from > another receiver. Therefore, sender cannot send out the missed packet > for ever, however, the receiver must receive the missed packet. So > deadlock happens. > > > > |