From: Jon M. <jon...@er...> - 2019-11-05 13:27:59
|
Acked. But you *must* use the monitor functionality for any cluster > 100. Otherwise this is never going to work. BR ///jon > -----Original Message----- > From: Hoang Le <hoa...@de...> > Sent: 30-Oct-19 02:26 > To: Jon Maloy <jon...@er...>; ma...@do...; tip...@li...; > yin...@wi... > Subject: [net-next] tipc: reduce sensitive to retransmit failures > > With huge cluster (e.g >200nodes), the amount of that flow: > gap -> retransmit packet -> acked will take time in case of STATE_MSG > dropped/delayed because a lot of traffic. This lead to 1.5 sec tolerance > value criteria made link easy failure around 2nd, 3rd of failed > retransmission attempts. > > Instead of re-introduced criteria of 99 failed retransmissions to fix the > issue, we increase failure detection timer to ten times tolerance value. > > Fixes: 77cf8edbc0e7 ("tipc: simplify stale link failure criteria") > Signed-off-by: Hoang Le <hoa...@de...> > --- > net/tipc/link.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/tipc/link.c b/net/tipc/link.c > index 7d7a66178607..9f524c325c0d 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -1084,7 +1084,7 @@ static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r, > return false; > > if (!time_after(jiffies, TIPC_SKB_CB(skb)->retr_stamp + > - msecs_to_jiffies(r->tolerance))) > + msecs_to_jiffies(r->tolerance * 10))) > return false; > > hdr = buf_msg(skb); > -- > 2.20.1 |