|
From: jason <huz...@gm...> - 2013-03-07 23:27:28
|
Hi Jon, Node A just update B.acked as its next_out_no when it found out node B is up. But it seems there is nothing can prevent B from carry a acknowledge lower than that. It may be a random number which occasionally greater than it. 在 2013-3-8 上午6:22,"Jon Maloy" <jon...@er...>写道: > On 03/06/2013 01:27 AM, Ying Xue wrote: > > On 03/06/2013 11:41 AM, jason wrote: > >> Hi All, > >> Let's say there are to nodes in cluster ,nodeA and nodeB. There is > >> possibility that A has opened bcast receiving for B while B hasen't > >> opened bcast receiving for A. Therefore, B hasen't sync its last_in to > >> what A has sent ,then every messages sent by B will carry a invalid > >> bcast acked seq number for A. > > >> Because A has open bcast receiving for B, > >> it will process those invalid acks from B in tipc_bclink_acknowlegde(). > > No it won't. > The broadcast themselves don't carry valid acknowledges, since there is > no single node to acknowledge. > Unicasts B -> A will carry acknowledges, but those will be ignored by A > because they will be lower than the lowest acknowledge value A can accept > from B. A knows that value; it is A's "next_out_no" value at the moment > it opened for reception. (And sent its own BCAST_SYNC message). > > Regards > ///jon > > >> It seems may cause problem I think. Please consider this. > >> > >> On Jan 31, 2013 6:06 PM, "Ying Xue" <yin...@wi... > >> <mailto:yin...@wi...>> wrote: > > > > No, it's possible for you not to completely understand the root cause. > > Of course, I admit it's is a hard thing to clearly know every detail > > things. A least, after a while, I almost forgot why it happens and what > > reason is. > > > > the key reason why it appears is that TIPC does not properly cope with > > the sync problem between unicat link and multicast link. Even if one > > unicast link is set up by sending link state message via unicast > > channel, link states on both endpoints are not sync immediately due to > > distribution environment. For example, there have two nodes, one sender > > of sending multicast messages and one message receiver respectively. > > Suddenly one new node joins the cluster as another multicast messages > > receiver. As the link sate between new receiver and the sender is not > > sync timely, for instance, the sender still thinks there only has one > > receiver although the new receiver actually starts to receive the > > multicast messages sent by the sender at the moment. That means, during > > the time of link state being inconsistent sender can release message in > > its outbound queue as long as it receives one ack from one of the two > > receivers. In normally there has no big problem. But if one receiver > > finds one message is missed from a series of sequential received > > packets, it then sends retransmission request to ask the sender to send > > the missed packet again. But the missed packet has been released by > > sender as the sender already received an ack of the missed packet from > > another receiver. Therefore, sender cannot send out the missed packet > > for ever, however, the receiver must receive the missed packet. So > > deadlock happens. > > > > > > > > |