|
From: jason <huz...@gm...> - 2013-03-08 04:16:06
|
Hi Jon,
And my solution to the "random acknowledge number" problem is just simply
reversing the following piece of patch of the new bcast sync mechanism:
@@ -2058,9 +2059,11 @@ static void link_recv_proto_msg(struct tipc_link
*l_ptr, struct sk_buff *buf) l_ptr->max_pkt = l_ptr->max_pkt_target; }
l_ptr->owner->bclink.supportable = (max_pkt_info != 0); +
l_ptr->owner->bclink.sync = msg_bclink_sync(msg);
/* Synchronize broadcast link info, if not done previously */- if
(!tipc_node_is_up(l_ptr->owner)) { + if (!tipc_node_is_up(l_ptr->owner) &&
+ !l_ptr->owner->bclink.sync) { l_ptr->owner->bclink.last_sent =
l_ptr->owner->bclink.last_in = msg_last_bcast(msg);
Therefore, we early set last_in to a valit value from peer as soon as
possible to prevent it remains a invalid value.
在 2013-3-8 上午9:03,"jason" <huz...@gm...>写道:
> Hi Jon,
>
> Just a quick resend of my previous mail( to make my point clear.)
>
> Node A just update B.acked as its next_out_no when it found out node B is
> up(node A calls node_established_contact()). But it seems there is nothing
> can prevent B from carry a acknowledge lower than that. It may be a random
> number which occasionally greater than that until B finally init its
> last_in at the time that B got BCAST_SYNC message from A(B calls
> tipc_bclink_info_recv()).
> 在 2013-3-8 上午7:27,"jason" <huz...@gm...>写道:
>
>> Hi Jon,
>> Node A just update B.acked as its next_out_no when it found out node B is
>> up. But it seems there is nothing can prevent B from carry a acknowledge
>> lower than that. It may be a random number which occasionally greater than
>> it.
>> 在 2013-3-8 上午6:22,"Jon Maloy" <jon...@er...>写道:
>>
>>> On 03/06/2013 01:27 AM, Ying Xue wrote:
>>> > On 03/06/2013 11:41 AM, jason wrote:
>>> >> Hi All,
>>> >> Let's say there are to nodes in cluster ,nodeA and nodeB. There is
>>> >> possibility that A has opened bcast receiving for B while B hasen't
>>> >> opened bcast receiving for A. Therefore, B hasen't sync its last_in to
>>> >> what A has sent ,then every messages sent by B will carry a invalid
>>> >> bcast acked seq number for A.
>>>
>>> >> Because A has open bcast receiving for B,
>>> >> it will process those invalid acks from B in
>>> tipc_bclink_acknowlegde().
>>>
>>> No it won't.
>>> The broadcast themselves don't carry valid acknowledges, since there is
>>> no single node to acknowledge.
>>> Unicasts B -> A will carry acknowledges, but those will be ignored by A
>>> because they will be lower than the lowest acknowledge value A can accept
>>> from B. A knows that value; it is A's "next_out_no" value at the moment
>>> it opened for reception. (And sent its own BCAST_SYNC message).
>>>
>>> Regards
>>> ///jon
>>>
>>> >> It seems may cause problem I think. Please consider this.
>>> >>
>>> >> On Jan 31, 2013 6:06 PM, "Ying Xue" <yin...@wi...
>>> >> <mailto:yin...@wi...>> wrote:
>>> >
>>> > No, it's possible for you not to completely understand the root cause.
>>> > Of course, I admit it's is a hard thing to clearly know every detail
>>> > things. A least, after a while, I almost forgot why it happens and what
>>> > reason is.
>>> >
>>> > the key reason why it appears is that TIPC does not properly cope with
>>> > the sync problem between unicat link and multicast link. Even if one
>>> > unicast link is set up by sending link state message via unicast
>>> > channel, link states on both endpoints are not sync immediately due to
>>> > distribution environment. For example, there have two nodes, one
>>> sender
>>> > of sending multicast messages and one message receiver respectively.
>>> > Suddenly one new node joins the cluster as another multicast messages
>>> > receiver. As the link sate between new receiver and the sender is not
>>> > sync timely, for instance, the sender still thinks there only has one
>>> > receiver although the new receiver actually starts to receive the
>>> > multicast messages sent by the sender at the moment. That means, during
>>> > the time of link state being inconsistent sender can release message in
>>> > its outbound queue as long as it receives one ack from one of the two
>>> > receivers. In normally there has no big problem. But if one receiver
>>> > finds one message is missed from a series of sequential received
>>> > packets, it then sends retransmission request to ask the sender to send
>>> > the missed packet again. But the missed packet has been released by
>>> > sender as the sender already received an ack of the missed packet from
>>> > another receiver. Therefore, sender cannot send out the missed packet
>>> > for ever, however, the receiver must receive the missed packet. So
>>> > deadlock happens.
>>> >
>>> >
>>> >
>>>
>>>
|