You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(9) |
Feb
(11) |
Mar
(22) |
Apr
(73) |
May
(78) |
Jun
(146) |
Jul
(80) |
Aug
(27) |
Sep
(5) |
Oct
(14) |
Nov
(18) |
Dec
(27) |
2005 |
Jan
(20) |
Feb
(30) |
Mar
(19) |
Apr
(28) |
May
(50) |
Jun
(31) |
Jul
(32) |
Aug
(14) |
Sep
(36) |
Oct
(43) |
Nov
(74) |
Dec
(63) |
2006 |
Jan
(34) |
Feb
(32) |
Mar
(21) |
Apr
(76) |
May
(106) |
Jun
(72) |
Jul
(70) |
Aug
(175) |
Sep
(130) |
Oct
(39) |
Nov
(81) |
Dec
(43) |
2007 |
Jan
(81) |
Feb
(36) |
Mar
(20) |
Apr
(43) |
May
(54) |
Jun
(34) |
Jul
(44) |
Aug
(55) |
Sep
(44) |
Oct
(54) |
Nov
(43) |
Dec
(41) |
2008 |
Jan
(42) |
Feb
(84) |
Mar
(73) |
Apr
(30) |
May
(119) |
Jun
(54) |
Jul
(54) |
Aug
(93) |
Sep
(173) |
Oct
(130) |
Nov
(145) |
Dec
(153) |
2009 |
Jan
(59) |
Feb
(12) |
Mar
(28) |
Apr
(18) |
May
(56) |
Jun
(9) |
Jul
(28) |
Aug
(62) |
Sep
(16) |
Oct
(19) |
Nov
(15) |
Dec
(17) |
2010 |
Jan
(14) |
Feb
(36) |
Mar
(37) |
Apr
(30) |
May
(33) |
Jun
(53) |
Jul
(42) |
Aug
(50) |
Sep
(67) |
Oct
(66) |
Nov
(69) |
Dec
(36) |
2011 |
Jan
(52) |
Feb
(45) |
Mar
(49) |
Apr
(21) |
May
(34) |
Jun
(13) |
Jul
(19) |
Aug
(37) |
Sep
(43) |
Oct
(10) |
Nov
(23) |
Dec
(30) |
2012 |
Jan
(42) |
Feb
(36) |
Mar
(46) |
Apr
(25) |
May
(96) |
Jun
(146) |
Jul
(40) |
Aug
(28) |
Sep
(61) |
Oct
(45) |
Nov
(100) |
Dec
(53) |
2013 |
Jan
(79) |
Feb
(24) |
Mar
(134) |
Apr
(156) |
May
(118) |
Jun
(75) |
Jul
(278) |
Aug
(145) |
Sep
(136) |
Oct
(168) |
Nov
(137) |
Dec
(439) |
2014 |
Jan
(284) |
Feb
(158) |
Mar
(231) |
Apr
(275) |
May
(259) |
Jun
(91) |
Jul
(222) |
Aug
(215) |
Sep
(165) |
Oct
(166) |
Nov
(211) |
Dec
(150) |
2015 |
Jan
(164) |
Feb
(324) |
Mar
(299) |
Apr
(214) |
May
(111) |
Jun
(109) |
Jul
(105) |
Aug
(36) |
Sep
(58) |
Oct
(131) |
Nov
(68) |
Dec
(30) |
2016 |
Jan
(46) |
Feb
(87) |
Mar
(135) |
Apr
(174) |
May
(132) |
Jun
(135) |
Jul
(149) |
Aug
(125) |
Sep
(79) |
Oct
(49) |
Nov
(95) |
Dec
(102) |
2017 |
Jan
(104) |
Feb
(75) |
Mar
(72) |
Apr
(53) |
May
(18) |
Jun
(5) |
Jul
(14) |
Aug
(19) |
Sep
(2) |
Oct
(13) |
Nov
(21) |
Dec
(67) |
2018 |
Jan
(56) |
Feb
(50) |
Mar
(148) |
Apr
(41) |
May
(37) |
Jun
(34) |
Jul
(34) |
Aug
(11) |
Sep
(52) |
Oct
(48) |
Nov
(28) |
Dec
(46) |
2019 |
Jan
(29) |
Feb
(63) |
Mar
(95) |
Apr
(54) |
May
(14) |
Jun
(71) |
Jul
(60) |
Aug
(49) |
Sep
(3) |
Oct
(64) |
Nov
(115) |
Dec
(57) |
2020 |
Jan
(15) |
Feb
(9) |
Mar
(38) |
Apr
(27) |
May
(60) |
Jun
(53) |
Jul
(35) |
Aug
(46) |
Sep
(37) |
Oct
(64) |
Nov
(20) |
Dec
(25) |
2021 |
Jan
(20) |
Feb
(31) |
Mar
(27) |
Apr
(23) |
May
(21) |
Jun
(30) |
Jul
(30) |
Aug
(7) |
Sep
(18) |
Oct
|
Nov
(15) |
Dec
(4) |
2022 |
Jan
(3) |
Feb
(1) |
Mar
(10) |
Apr
|
May
(2) |
Jun
(26) |
Jul
(5) |
Aug
|
Sep
(1) |
Oct
(2) |
Nov
(9) |
Dec
(2) |
2023 |
Jan
(4) |
Feb
(4) |
Mar
(5) |
Apr
(10) |
May
(29) |
Jun
(17) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
(2) |
Dec
|
2024 |
Jan
|
Feb
(6) |
Mar
|
Apr
(1) |
May
(6) |
Jun
|
Jul
(5) |
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jon M. <jm...@re...> - 2020-03-18 14:47:34
|
On 3/18/20 12:50 AM, Tuong Lien Tong wrote: > Hi Jon, > > Ok, that makes sense (but we should have covered the case a broadcast packet is released too...). > However, I have another concern about the logic here: > >> + /* Enter fast recovery */ >> + if (unlikely(retransmitted)) { >> + l->ssthresh = max_t(u16, l->window / 2, 300); >> + l->window = l->ssthresh; >> + return; >> + } > What will if we have a retransmission when it's still in the slow-start phase? For example: > l->ssthresh = 300 > l-> window = 60 > ==> retransmitted = true, then: l->ssthresh = 300; l->window = 300??? > > This looks not correct? > Should it be: > >> + /* Enter fast recovery */ >> + if (unlikely(retransmitted)) { >> + l->ssthresh = max_t(u16, l->window / 2, 300); >> - l->window = l->ssthresh; >> + l->window = min_t(u16, l->window, l->ssthresh); >> + return; >> + } > So will fix the issue with broadcast case as well? Yes, this would fix both issues, so I think it is a good suggestion. To my surprise I have realized that this code has not been released yet (I only find it in 5.6-rc1 and later versions) but maybe that is just as well ;-) ///jon > > BR/Tuong > > -----Original Message----- > From: Jon Maloy <jm...@re...> > Sent: Wednesday, March 18, 2020 1:38 AM > To: Tuong Lien Tong <tuo...@de...>; 'Jon Maloy' <jon...@er...>; 'Jon Maloy' <ma...@do...> > Cc: tip...@li...; moh...@er... > Subject: Re: [tipc-discussion] [net-next 3/3] tipc: introduce variable window congestion control > > > > On 3/17/20 6:55 AM, Tuong Lien Tong wrote: >> Hi Jon, >> >> For the "variable window congestion control" patch, if I remember correctly, >> it is for unicast link only? Why did you apply it for broadcast link, a >> mistake or ...? > I did it so the code would be the same everywhere. Then, by setting both > min_win and max_win to the same value BC_LINK_WIN_DEFAULT (==50) > in the broadcast send link this window should never change. > >> It now causes user messages disordered on the receiving side, because on the >> sending side, the broadcast link's window is suddenly increased to 300 (i.e. >> max_t(u16, l->window / 2, 300)) at a packet retransmission, leaving some >> gaps between the link's 'transmq' & 'backlogq' unexpectedly... Will we fix >> this by removing it? > That is clearly a bug that breaks the above stated limitation. > It should be sufficient to check that also l->ssthresh never exceeds > l->max_win to remedy this. > > ///jon > >> @@ -1160,7 +1224,6 @@ static int tipc_link_bc_retrans(struct tipc_link *l, >> struct tipc_link *r, >> continue; >> if (more(msg_seqno(hdr), to)) >> break; >> - >> if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) >> continue; >> TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; >> @@ -1173,11 +1236,12 @@ static int tipc_link_bc_retrans(struct tipc_link *l, >> struct tipc_link *r, >> _skb->priority = TC_PRIO_CONTROL; >> __skb_queue_tail(xmitq, _skb); >> l->stats.retransmitted++; >> - >> + retransmitted++; >> /* Increase actual retrans counter & mark first time */ >> if (!TIPC_SKB_CB(skb)->retr_cnt++) >> TIPC_SKB_CB(skb)->retr_stamp = jiffies; >> } >> + tipc_link_update_cwin(l, 0, retransmitted); // ??? >> return 0; >> } >> >> +static void tipc_link_update_cwin(struct tipc_link *l, int released, >> + bool retransmitted) >> +{ >> + int bklog_len = skb_queue_len(&l->backlogq); >> + struct sk_buff_head *txq = &l->transmq; >> + int txq_len = skb_queue_len(txq); >> + u16 cwin = l->window; >> + >> + /* Enter fast recovery */ >> + if (unlikely(retransmitted)) { >> + l->ssthresh = max_t(u16, l->window / 2, 300); >> + l->window = l->ssthresh; >> + return; >> + } >> >> BR/Tuong >> >> -----Original Message----- >> From: Jon Maloy <jon...@er...> >> Sent: Monday, December 2, 2019 7:33 AM >> To: Jon Maloy <jon...@er...>; Jon Maloy <ma...@do...> >> Cc: moh...@er...; >> par...@gm...; tun...@de...; >> hoa...@de...; tuo...@de...; >> gor...@de...; yin...@wi...; >> tip...@li... >> Subject: [net-next 3/3] tipc: introduce variable window congestion control >> >> We introduce a simple variable window congestion control for links. >> The algorithm is inspired by the Reno algorithm, covering both 'slow >> start', 'congestion avoidance', and 'fast recovery' modes. >> >> - We introduce hard lower and upper window limits per link, still >> different and configurable per bearer type. >> >> - We introduce as 'slow start theshold' variable, initially set to >> the maximum window size. >> >> - We let a link start at the minimum congestion window, i.e. in slow >> start mode, and then let is grow rapidly (+1 per rceived ACK) until >> it reaches the slow start threshold and enters congestion avoidance >> mode. >> >> - In congestion avoidance mode we increment the congestion window for >> each window_size number of acked packets, up to a possible maximum >> equal to the configured maximum window. >> >> - For each non-duplicate NACK received, we drop back to fast recovery >> mode, by setting the both the slow start threshold to and the >> congestion window to (current_congestion_window / 2). >> >> - If the timeout handler finds that the transmit queue has not moved >> timeout, it drops the link back to slow start and forces a probe >> containing the last sent sequence number to the sent to the peer. >> >> This change does in reality have effect only on unicast ethernet >> transport, as we have seen that there is no room whatsoever for >> increasing the window max size for the UDP bearer. >> For now, we also choose to keep the limits for the broadcast link >> unchanged and equal. >> >> This algorithm seems to give a 50-100% throughput improvement for >> messages larger than MTU. >> >> Suggested-by: Xin Long <luc...@gm...> >> Acked-by: Xin Long <luc...@gm...> >> Signed-off-by: Jon Maloy <jon...@er...> >> --- >> net/tipc/bcast.c | 11 ++-- >> net/tipc/bearer.c | 11 ++-- >> net/tipc/bearer.h | 6 +- >> net/tipc/eth_media.c | 3 +- >> net/tipc/ib_media.c | 5 +- >> net/tipc/link.c | 175 >> +++++++++++++++++++++++++++++++++++---------------- >> net/tipc/link.h | 9 +-- >> net/tipc/node.c | 16 ++--- >> net/tipc/udp_media.c | 3 +- >> 9 files changed, 160 insertions(+), 79 deletions(-) >> >> >> >> >> >> >> >> _______________________________________________ >> tipc-discussion mailing list >> tip...@li... >> https://lists.sourceforge.net/lists/listinfo/tipc-discussion >> > |
From: Jon M. <jm...@re...> - 2020-03-18 14:28:10
|
On 2/20/20 10:44 AM, Xin Long wrote: > On Wed, Feb 19, 2020 at 4:34 PM Dmitry Vyukov <dv...@go...> wrote: >> On Wed, Feb 19, 2020 at 9:29 AM Dmitry Vyukov <dv...@go...> wrote: >>> On Mon, Aug 12, 2019 at 9:44 AM Ying Xue <yin...@wi...> wrote: >>>> syzbot found the following issue: >>>> >>>> [ 81.119772][ T8612] BUG: using smp_processor_id() in preemptible [00000000] code: syz-executor834/8612 >>>> [ 81.136212][ T8612] caller is dst_cache_get+0x3d/0xb0 >>>> [ 81.141450][ T8612] CPU: 0 PID: 8612 Comm: syz-executor834 Not tainted 5.2.0-rc6+ #48 [...] >>>> Fixes: e9c1a793210f ("tipc: add dst_cache support for udp media") >>>> Reported-by: syz...@sy... >>>> Signed-off-by: Hillf Danton <hd...@si...> >>>> Signed-off-by: Ying Xue <yin...@wi...> >>> Hi, >>> >>> Was this ever merged? >>> The bug is still open, alive and kicking: >>> https://syzkaller.appspot.com/bug?extid=1a68504d96cd17b33a05 >>> >>> and one of the top crashers currently. >>> Along with few other top crashers, these bugs prevent most of the >>> other kernel testing from happening. >> /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ >> >> +jmaloy new email address Acked-by: Jon Maloy <jm...@re...> >> >>>> --- >>>> net/tipc/udp_media.c | 12 +++++++++--- >>>> 1 file changed, 9 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c >>>> index 287df687..ca3ae2e 100644 >>>> --- a/net/tipc/udp_media.c >>>> +++ b/net/tipc/udp_media.c >>>> @@ -224,6 +224,8 @@ static int tipc_udp_send_msg(struct net *net, struct sk_buff *skb, >>>> struct udp_bearer *ub; >>>> int err = 0; >>>> >>>> + local_bh_disable(); >>>> + >>>> if (skb_headroom(skb) < UDP_MIN_HEADROOM) { >>>> err = pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC); >>>> if (err) >>>> @@ -237,9 +239,12 @@ static int tipc_udp_send_msg(struct net *net, struct sk_buff *skb, >>>> goto out; >>>> } >>>> >>>> - if (addr->broadcast != TIPC_REPLICAST_SUPPORT) >>>> - return tipc_udp_xmit(net, skb, ub, src, dst, >>>> - &ub->rcast.dst_cache); >>>> + if (addr->broadcast != TIPC_REPLICAST_SUPPORT) { >>>> + err = tipc_udp_xmit(net, skb, ub, src, dst, >>>> + &ub->rcast.dst_cache); >>>> + local_bh_enable(); >>>> + return err; >>>> + } >>>> >>>> /* Replicast, send an skb to each configured IP address */ >>>> list_for_each_entry_rcu(rcast, &ub->rcast.list, list) { >>>> @@ -259,6 +264,7 @@ static int tipc_udp_send_msg(struct net *net, struct sk_buff *skb, >>>> err = 0; >>>> out: >>>> kfree_skb(skb); >>>> + local_bh_enable(); >>>> return err; >>>> } >>>> >>>> -- >>>> 2.7.4 >>>> >>>> -- >>>> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. >>>> To unsubscribe from this group and stop receiving emails from it, send an email to syz...@go.... >>>> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/1565595162-1383-4-git-send-email-ying.xue%40windriver.com. |
From: Tuong L. T. <tuo...@de...> - 2020-03-18 08:13:37
|
Thanks Jon! Regarding this: Commander: Received 16 UP Events for Member Id 2 *** no report, 0 Mb/s ???*** Hmm. Even if it is zero, it should be reported. Maybe a bug in the test program? I’ve realized that this was due to the broadcast link bug we’re discussing in the other mail thread (so a large broadcast packet might be fragmented & reassembled incorrectly due to wrong sequence, causing the measurements to be out of control…). Also, because the broadcast link’s window accidentally increased to 300, the previous test results were inaccurate. TIPC broadcast performance without the patch would be much lower… I have fixed that bug and rerun the tests: # time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 8m 27.94s //before: 5m 21.75s user 0m 0.55s sys 0m 2.38s # time tipc-pipe --mc --rdm --data_size 60000 --data_num 10000 real 16m 26.73s //before: 9m 49.14s user 0m 0.50s sys 0m 1.98s # /cluster/group_test -m -b Commander: Received 0 UP Events for Member Id 100 *** TIPC Group Messaging Test Started **** Commander: Waiting for Scalers Commander: Received 1 UP Events for Member Id 101 Commander: Discovered 1 Scalers >> Starting Multicast Test Commander: Scaling out to 1 Workers with Id 0/1 Commander: Received 1 UP Events for Member Id 0 Commander: Scaling out to 16 Workers with Id 1/1 Commander: Received 16 UP Events for Member Id 1 Commander: Scaling out to 16 Workers with Id 2/2 Commander: Received 16 UP Events for Member Id 2 2222:1@0/0:0450144145@1001002: Sent UC 0, AC 0, MC 36, BC 0, throughput last intv 4 Mb/s 2222:1@0/0:0450144145@1001002: Sent UC 0, AC 0, MC 98, BC 0, throughput last intv 7 Mb/s 2222:1@0/0:0450144145@1001002: Sent UC 0, AC 0, MC 136, BC 0, throughput last intv 4 Mb/s 2222:1@0/0:0450144145@1001002: Sent UC 0, AC 0, MC 170, BC 0, throughput last intv 4 Mb/s 2222:1@0/0:0450144145@1001002: Sent UC 0, AC 0, MC 204, BC 0, throughput last intv 4 Mb/s Commander: Scaling in to 0 Workers with Cmd Member Id 1 Commander: Scaling in to 0 Workers with Cmd Member Id 2 Commander: Scaling in to 0 Workers with Cmd Member Id 0 Report #0 from 2222:1@0/0:0450144145@1001002: Sent 206 [0,205] (UC 0, AC 0, MC 206, BC 0) OK Report #1 from 2222:2@0/0:2418472915@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #2 from 2222:1@0/0:1368413030@1001003: Recv 202 [2,203] (UC 0, AC 0, MC 202, BC 0) OK Report #3 from 2222:2@0/0:3842264031@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #4 from 2222:1@0/0:2343289026@1001003: Recv 203 [1,203] (UC 0, AC 0, MC 203, BC 0) OK Report #5 from 2222:1@0/0:2953801101@1001003: Recv 203 [1,203] (UC 0, AC 0, MC 203, BC 0) OK Report #6 from 2222:2@0/0:0553907498@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #7 from 2222:2@0/0:1637533791@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #8 from 2222:1@0/0:3319056255@1001004: Recv 202 [2,203] (UC 0, AC 0, MC 202, BC 0) OK Report #9 from 2222:2@0/0:0504346632@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #10 from 2222:2@0/0:2844029094@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #11 from 2222:2@0/0:1786742073@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #12 from 2222:1@0/0:1132450893@1001003: Recv 203 [1,203] (UC 0, AC 0, MC 203, BC 0) OK Report #13 from 2222:1@0/0:3825566132@1001004: Recv 203 [1,203] (UC 0, AC 0, MC 203, BC 0) OK Report #14 from 2222:1@0/0:3296357641@1001003: Recv 202 [2,203] (UC 0, AC 0, MC 202, BC 0) OK Report #15 from 2222:2@0/0:4175715297@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #16 from 2222:1@0/0:3389134729@1001003: Recv 203 [1,203] (UC 0, AC 0, MC 203, BC 0) OK Report #17 from 2222:1@0/0:2790725985@1001004: Recv 203 [1,203] (UC 0, AC 0, MC 203, BC 0) OK Report #18 from 2222:1@0/0:0355707321@1001004: Recv 203 [1,203] (UC 0, AC 0, MC 203, BC 0) OK Report #19 from 2222:1@0/0:0248234621@1001004: Recv 203 [1,203] (UC 0, AC 0, MC 203, BC 0) OK Report #20 from 2222:1@0/0:2972792068@1001004: Recv 204 [0,203] (UC 0, AC 0, MC 204, BC 0) OK Report #21 from 2222:1@0/0:2449213849@1001003: Recv 203 [1,203] (UC 0, AC 0, MC 203, BC 0) OK Report #22 from 2222:1@0/0:1308997961@1001004: Recv 202 [2,203] (UC 0, AC 0, MC 202, BC 0) OK Report #23 from 2222:1@0/0:1067580928@1001004: Recv 202 [2,203] (UC 0, AC 0, MC 202, BC 0) OK Report #24 from 2222:2@0/0:3603797494@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #25 from 2222:2@0/0:1175804621@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #26 from 2222:1@0/0:3367433338@1001003: Recv 203 [1,203] (UC 0, AC 0, MC 203, BC 0) OK Report #27 from 2222:2@0/0:0991893680@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #28 from 2222:2@0/0:1321100062@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #29 from 2222:2@0/0:1711295503@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #30 from 2222:2@0/0:3415801530@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #31 from 2222:2@0/0:2647656161@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #32 from 2222:2@0/0:0628455908@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK >> Multicast Test SUCCESSFUL >> Starting Broadcast Test Commander: Scaling out to 1 Workers with Id 0/1 Commander: Received 1 UP Events for Member Id 0 Commander: Scaling out to 16 Workers with Id 1/1 Commander: Received 16 UP Events for Member Id 1 Commander: Scaling out to 16 Workers with Id 2/2 Commander: Received 16 UP Events for Member Id 2 2222:1@0/0:0695351808@1001002: Sent UC 0, AC 0, MC 0, BC 44, throughput last intv 5 Mb/s 2222:1@0/0:0695351808@1001002: Sent UC 0, AC 0, MC 0, BC 88, throughput last intv 5 Mb/s 2222:1@0/0:0695351808@1001002: Sent UC 0, AC 0, MC 0, BC 115, throughput last intv 2 Mb/s 2222:1@0/0:0695351808@1001002: Sent UC 0, AC 0, MC 0, BC 149, throughput last intv 4 Mb/s Commander: Scaling in to 0 Workers with Cmd Member Id 1 Commander: Scaling in to 0 Workers with Cmd Member Id 2 Commander: Scaling in to 0 Workers with Cmd Member Id 0 Report #0 from 2222:1@0/0:0695351808@1001002: Sent 183 [0,182] (UC 0, AC 0, MC 0, BC 183) OK Report #1 from 2222:1@0/0:2378798558@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #2 from 2222:2@0/0:4138497570@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #3 from 2222:2@0/0:1578004322@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #4 from 2222:2@0/0:1963550269@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #5 from 2222:1@0/0:0758742723@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #6 from 2222:2@0/0:2249538355@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #7 from 2222:1@0/0:2618664889@1001004: Recv 181 [0,180] (UC 0, AC 0, MC 0, BC 181) OK Report #8 from 2222:2@0/0:0089095375@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #9 from 2222:2@0/0:2509170385@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #10 from 2222:1@0/0:2696073811@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #11 from 2222:1@0/0:3980318671@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #12 from 2222:1@0/0:0106403114@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #13 from 2222:2@0/0:3523511402@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #14 from 2222:2@0/0:1617008759@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #15 from 2222:1@0/0:2862823240@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #16 from 2222:1@0/0:2426779766@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #17 from 2222:2@0/0:1044722113@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #18 from 2222:2@0/0:2464695570@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #19 from 2222:2@0/0:0091087500@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #20 from 2222:2@0/0:3515828561@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #21 from 2222:2@0/0:3633045121@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #22 from 2222:1@0/0:1262462104@1001004: Recv 181 [0,180] (UC 0, AC 0, MC 0, BC 181) OK Report #23 from 2222:1@0/0:3238164678@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #24 from 2222:2@0/0:3734069867@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #25 from 2222:2@0/0:2038920765@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #26 from 2222:1@0/0:2642249832@1001004: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #27 from 2222:1@0/0:0375382704@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #28 from 2222:1@0/0:0878979536@1001003: Recv 181 [0,180] (UC 0, AC 0, MC 0, BC 181) OK Report #29 from 2222:1@0/0:3519740202@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #30 from 2222:1@0/0:2767769890@1001003: Recv 181 [0,180] (UC 0, AC 0, MC 0, BC 181) OK Report #31 from 2222:1@0/0:3714610817@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK Report #32 from 2222:2@0/0:0482654234@1001003: Recv 180 [1,180] (UC 0, AC 0, MC 0, BC 180) OK >> Broadcast Test SUCCESSFUL *** TIPC Group Messaging Test Finished **** Note: with the patch, the broadcast link bug has been eliminated because the ‘tipc_link_bc_retrans()’ is completely removed. BR/Tuong From: Jon Maloy <jm...@re...> Sent: Wednesday, March 18, 2020 1:12 AM To: Tuong Lien Tong <tuo...@de...>; tip...@li...; ma...@do...; yin...@wi...; lx...@re... Subject: Re: [tipc-discussion] [PATCH RFC 1/2] tipc: add Gap ACK blocks support for broadcast link On 3/17/20 4:15 AM, Tuong Lien Tong wrote: Hi Jon, In terms of scalability, yes, the design was indeed focusing on it, the new stuffs are per individual broadcast receiver links and completely independent to each other. Also, the way its acks (e.g. via STATE_MSG) & retransmits is already working as unicast, also it still must comply the other limits (such as: the link window, retransmit timers, etc.)... So, I don't see any problems when the number of peer grows up. The unicast retransmission is really for another purpose, but of course one option as well. I have also done some other tests and here are the results: 1) tipc-pipe with large message size: ======================= - With the patch: ======================= # time tipc-pipe --mc --rdm --data_size 60000 --data_num 10000 real 1m 35.50s user 0m 0.63s sys 0m 5.02s # tipc l st sh l broadcast-link Link <broadcast-link> Window:50 packets RX packets:0 fragments:0/0 bundles:0/0 TX packets:440000 fragments:440000/10000 bundles:0/0 RX naks:72661 defs:0 dups:0 TX naks:0 acks:0 retrans:23378 Congestion link:890 Send queue max:0 avg:0 ======================= - Without the patch: ======================= # time tipc-pipe --mc --rdm --data_size 60000 --data_num 10000 real 9m 49.14s user 0m 0.41s sys 0m 1.56s # tipc l st sh Link <broadcast-link> Window:50 packets RX packets:0 fragments:0/0 bundles:0/0 TX packets:440000 fragments:440000/10000 bundles:0/0 RX naks:0 defs:0 dups:0 TX naks:0 acks:0 retrans:23651 Congestion link:2772 Send queue max:0 avg:0 2) group_test (do you mean this instead of "multicast_blast"?): Not really. I think t multicast_blast might better, but contrary to what I said there is no throughput measurement support. I think I had in mind an intermediate version that later developed into group_test, but that one was never pushed up to sourceforge. So group_test is probably more useful in this context. ======================= - With the patch: ======================= # /cluster/group_test -b -m Commander: Received 0 UP Events for Member Id 100 *** TIPC Group Messaging Test Started **** Commander: Waiting for Scalers Commander: Received 1 UP Events for Member Id 101 Commander: Discovered 1 Scalers >> Starting Multicast Test Commander: Scaling out to 1 Workers with Id 0/1 Commander: Received 1 UP Events for Member Id 0 Commander: Scaling out to 16 Workers with Id 1/1 Commander: Received 16 UP Events for Member Id 1 Commander: Scaling out to 16 Workers with Id 2/2 Commander: Received 16 UP Events for Member Id 2 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 665, BC 0, throughput last intv 87 Mb/s 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 1269, BC 0, throughput last intv 77 Mb/s 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 2042, BC 0, throughput last intv 101 Mb/s 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 2797, BC 0, throughput last intv 99 Mb/s Commander: Scaling in to 0 Workers with Cmd Member Id 1 Commander: Scaling in to 0 Workers with Cmd Member Id 2 Commander: Scaling in to 0 Workers with Cmd Member Id 0 Report #0 from 2222:1@0/0:3101578979@1001002: Sent 3555 [0,3554] (UC 0, AC 0, MC 3555, BC 0) OK Report #1 from 2222:1@0/0:3423452773@1001004: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #2 from 2222:2@0/0:3341501021@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #3 from 2222:1@0/0:3775779560@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #4 from 2222:2@0/0:0283979098@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #5 from 2222:1@0/0:1288577198@1001004: Recv 3555 [0,3554] (UC 0, AC 0, MC 3555, BC 0) OK Report #6 from 2222:1@0/0:3616132138@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #7 from 2222:1@0/0:3992078596@1001004: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #8 from 2222:2@0/0:1658002624@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #9 from 2222:1@0/0:1398137940@1001004: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #10 from 2222:1@0/0:2790669581@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #11 from 2222:1@0/0:2366726415@1001004: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #12 from 2222:2@0/0:1473723325@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #13 from 2222:2@0/0:1136757126@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #14 from 2222:1@0/0:2273798525@1001004: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #15 from 2222:2@0/0:3949256039@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #16 from 2222:2@0/0:1822300014@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #17 from 2222:1@0/0:3018695764@1001004: Recv 3555 [0,3554] (UC 0, AC 0, MC 3555, BC 0) OK Report #18 from 2222:1@0/0:2744800964@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #19 from 2222:2@0/0:0749893497@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #20 from 2222:1@0/0:1208963797@1001004: Recv 3553 [2,3554] (UC 0, AC 0, MC 3553, BC 0) OK Report #21 from 2222:2@0/0:1900862087@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #22 from 2222:2@0/0:3890385549@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #23 from 2222:1@0/0:0509529720@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #24 from 2222:2@0/0:0186529672@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #25 from 2222:1@0/0:1387317908@1001003: Recv 3553 [2,3554] (UC 0, AC 0, MC 3553, BC 0) OK Report #26 from 2222:2@0/0:4078423711@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #27 from 2222:2@0/0:1457003499@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #28 from 2222:2@0/0:3250519860@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #29 from 2222:2@0/0:3508775508@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #30 from 2222:2@0/0:1031479895@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #31 from 2222:1@0/0:3837724876@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #32 from 2222:1@0/0:2423154786@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK >> Multicast Test SUCCESSFUL >> Starting Broadcast Test Commander: Scaling out to 1 Workers with Id 0/1 Commander: Received 1 UP Events for Member Id 0 Commander: Scaling out to 16 Workers with Id 1/1 Commander: Received 16 UP Events for Member Id 1 Commander: Scaling out to 16 Workers with Id 2/2 Commander: Received 16 UP Events for Member Id 2 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 434, throughput last intv 57 Mb/s 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 988, throughput last intv 72 Mb/s 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 1549, throughput last intv 73 Mb/s 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 2078, throughput last intv 69 Mb/s 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 2621, throughput last intv 70 Mb/s Commander: Scaling in to 0 Workers with Cmd Member Id 1 Commander: Scaling in to 0 Workers with Cmd Member Id 2 Commander: Scaling in to 0 Workers with Cmd Member Id 0 Report #0 from 2222:1@0/0:2774004831@1001002: Sent 2966 [0,2965] (UC 0, AC 0, MC 0, BC 2966) OK Report #1 from 2222:1@0/0:1262350339@1001004: Recv 2966 [0,2965] (UC 0, AC 0, MC 0, BC 2966) OK Report #2 from 2222:2@0/0:2235335787@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #3 from 2222:2@0/0:2409874140@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #4 from 2222:1@0/0:3059039648@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #5 from 2222:1@0/0:3488269200@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #6 from 2222:2@0/0:4186324421@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #7 from 2222:1@0/0:2760420127@1001003: Recv 2966 [0,2965] (UC 0, AC 0, MC 0, BC 2966) OK Report #8 from 2222:2@0/0:2056504340@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #9 from 2222:2@0/0:0998162158@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #10 from 2222:2@0/0:3124321508@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #11 from 2222:2@0/0:1260121658@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #12 from 2222:2@0/0:2938973106@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #13 from 2222:2@0/0:2896700283@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #14 from 2222:1@0/0:2158652877@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #15 from 2222:1@0/0:1398540666@1001004: Recv 2966 [0,2965] (UC 0, AC 0, MC 0, BC 2966) OK Report #16 from 2222:1@0/0:1864856953@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #17 from 2222:2@0/0:3490882607@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #18 from 2222:1@0/0:2903105322@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #19 from 2222:2@0/0:1583785723@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #20 from 2222:1@0/0:3106247717@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #21 from 2222:1@0/0:2917195823@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #22 from 2222:1@0/0:0509238836@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #23 from 2222:2@0/0:2629682250@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #24 from 2222:2@0/0:1262288107@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #25 from 2222:1@0/0:3130881854@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #26 from 2222:1@0/0:0421078217@1001003: Recv 2966 [0,2965] (UC 0, AC 0, MC 0, BC 2966) OK Report #27 from 2222:1@0/0:0547555733@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #28 from 2222:2@0/0:1268394531@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #29 from 2222:1@0/0:2548830551@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #30 from 2222:1@0/0:4267281725@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #31 from 2222:2@0/0:0247684341@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #32 from 2222:2@0/0:3078989866@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK >> Broadcast Test SUCCESSFUL *** TIPC Group Messaging Test Finished **** # tipc l st sh Link <broadcast-link> Window:50 packets RX packets:0 fragments:0/0 bundles:0/0 TX packets:287043 fragments:287040/5980 bundles:1/2 RX naks: 0 defs:0 dups:0 TX naks:0 acks:0 retrans:15284 Congestion link:293 Send queue max:0 avg:0 ======================= - Without the patch: ======================= #/cluster/group_test -b -m Commander: Received 0 UP Events for Member Id 100 *** TIPC Group Messaging Test Started **** Commander: Waiting for Scalers Commander: Received 1 UP Events for Member Id 101 Commander: Discovered 1 Scalers >> Starting Multicast Test Commander: Scaling out to 1 Workers with Id 0/1 Commander: Received 1 UP Events for Member Id 0 Commander: Scaling out to 16 Workers with Id 1/1 Commander: Received 16 UP Events for Member Id 1 Commander: Scaling out to 16 Workers with Id 2/2 Commander: Received 16 UP Events for Member Id 2 *** no report, 0 Mb/s ???*** Hmm. Even if it is zero, it should be reported. Maybe a bug in the test program? Commander: Scaling in to 0 Workers with Cmd Member Id 1 Commander: Scaling in to 0 Workers with Cmd Member Id 2 Commander: Scaling in to 0 Workers with Cmd Member Id 0 Report #0 from 2222:1@0/0:3270095995@1001002: Sent 34 [0,33] (UC 0, AC 0, MC 34, BC 0) OK Report #1 from 2222:2@0/0:1798962026@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #2 from 2222:2@0/0:1842303271@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #3 from 2222:1@0/0:4030007025@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #4 from 2222:1@0/0:1332810537@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #5 from 2222:1@0/0:4040901007@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #6 from 2222:2@0/0:0672740666@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #7 from 2222:2@0/0:2595641411@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #8 from 2222:2@0/0:2556065900@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #9 from 2222:1@0/0:2327925355@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #10 from 2222:2@0/0:1332584860@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #11 from 2222:1@0/0:3726344362@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #12 from 2222:2@0/0:3889312161@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #13 from 2222:1@0/0:1807365809@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #14 from 2222:1@0/0:2525672860@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #15 from 2222:2@0/0:1931253671@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #16 from 2222:2@0/0:1610105188@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #17 from 2222:1@0/0:0767932663@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #18 from 2222:1@0/0:3290773375@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #19 from 2222:1@0/0:2576347174@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #20 from 2222:1@0/0:2028851345@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #21 from 2222:2@0/0:0123385799@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #22 from 2222:2@0/0:1395669417@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #23 from 2222:1@0/0:1098882628@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #24 from 2222:1@0/0:3398361863@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #25 from 2222:1@0/0:1085701361@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #26 from 2222:2@0/0:1790727708@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #27 from 2222:2@0/0:3199391066@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #28 from 2222:2@0/0:1232653389@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #29 from 2222:2@0/0:2255150189@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #30 from 2222:1@0/0:2526669233@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #31 from 2222:2@0/0:2479267806@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #32 from 2222:1@0/0:1097666084@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK >> Multicast Test SUCCESSFUL >> Starting Broadcast Test Commander: Scaling out to 1 Workers with Id 0/1 Commander: Received 1 UP Events for Member Id 0 Commander: Scaling out to 16 Workers with Id 1/1 Commander: Received 16 UP Events for Member Id 1 Commander: Scaling out to 16 Workers with Id 2/2 Commander: Received 16 UP Events for Member Id 2 2222:1@0/0:3890883707@1001002: Sent UC 0, AC 0, MC 0, BC 64, throughput last intv 7 Mb/s Commander: Scaling in to 0 Workers with Cmd Member Id 1 Commander: Scaling in to 0 Workers with Cmd Member Id 2 Commander: Scaling in to 0 Workers with Cmd Member Id 0 Report #0 from 2222:1@0/0:3890883707@1001002: Sent 91 [0,90] (UC 0, AC 0, MC 0, BC 91) OK Report #1 from 2222:2@0/0:1098659974@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #2 from 2222:1@0/0:3441709654@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #3 from 2222:2@0/0:0018441197@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #4 from 2222:2@0/0:0584054290@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #5 from 2222:2@0/0:4244201461@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #6 from 2222:2@0/0:1307600351@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #7 from 2222:2@0/0:3941241491@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #8 from 2222:2@0/0:2927828986@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #9 from 2222:1@0/0:1743241608@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #10 from 2222:2@0/0:1115515409@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #11 from 2222:1@0/0:0815130796@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #12 from 2222:1@0/0:2618044568@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #13 from 2222:2@0/0:1424259027@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #14 from 2222:1@0/0:1011077421@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #15 from 2222:1@0/0:3249391177@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #16 from 2222:1@0/0:2774666633@1001003: Recv 81 [0,80] (UC 0, AC 0, MC 0, BC 81) OK Report #17 from 2222:1@0/0:0860766920@1001004: Recv 81 [0,80] (UC 0, AC 0, MC 0, BC 81) OK Report #18 from 2222:1@0/0:0196231326@1001003: Recv 81 [0,80] (UC 0, AC 0, MC 0, BC 81) OK Report #19 from 2222:1@0/0:4278611377@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #20 from 2222:1@0/0:3464416884@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #21 from 2222:2@0/0:1718387937@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #22 from 2222:2@0/0:0267090087@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #23 from 2222:2@0/0:1694243136@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #24 from 2222:2@0/0:0918300899@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #25 from 2222:2@0/0:0811475995@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #26 from 2222:1@0/0:0388357605@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #27 from 2222:1@0/0:1113395305@1001004: Recv 81 [0,80] (UC 0, AC 0, MC 0, BC 81) OK Report #28 from 2222:1@0/0:3413026333@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #29 from 2222:2@0/0:2907075331@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #30 from 2222:1@0/0:1393297086@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #31 from 2222:2@0/0:3493179185@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #32 from 2222:1@0/0:3166541927@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK >> Broadcast Test SUCCESSFUL *** TIPC Group Messaging Test Finished **** # tipc l st sh Link <broadcast-link> Window:50 packets RX packets:0 fragments:0/0 bundles:0/0 TX packets:5908 fragments:5904/123 bundles:0/0 RX naks:0 defs:0 dups:0 TX naks:0 acks:0 retrans:324 Congestion link:32 Send queue max:0 avg:0 BR/Tuong I am totally convinced. Just give me time to review the patch properly and you'll have my ack. ///jon -----Original Message----- From: Jon Maloy <mailto:jm...@re...> <jm...@re...> Sent: Tuesday, March 17, 2020 2:49 AM To: Tuong Lien Tong <mailto:tuo...@de...> <tuo...@de...>; tip...@li... <mailto:tip...@li...> ; ma...@do... <mailto:ma...@do...> ; yin...@wi... <mailto:yin...@wi...> ; lx...@re... <mailto:lx...@re...> Subject: Re: [tipc-discussion] [PATCH RFC 1/2] tipc: add Gap ACK blocks support for broadcast link On 3/16/20 2:18 PM, Jon Maloy wrote: > > > On 3/16/20 7:23 AM, Tuong Lien Tong wrote: >> [...] >> > The improvement shown here is truly impressive. However, you are only > showing tipc-pipe with small messages. How does this look when you > send full-size 66k messages? How does it scale when the number of > destinations grows up to tens or even hundreds? I am particularly > concerned that the use of unicast retransmission may become a > sub-optimization if the number of destinations is large. > > ///jon You should try the "multicast_blast" program under tipc-utils/test. That will give you numbers both on throughput and loss rates as you let the number of nodes grow. ///jon > >> BR/Tuong >> >> *From:* Jon Maloy < <mailto:jm...@re...> jm...@re...> >> *Sent:* Friday, March 13, 2020 10:47 PM >> *To:* Tuong Lien < <mailto:tuo...@de...> tuo...@de...>; >> <mailto:tip...@li...> tip...@li...; <mailto:ma...@do...> ma...@do...; >> <mailto:yin...@wi...> yin...@wi... >> *Subject:* Re: [PATCH RFC 1/2] tipc: add Gap ACK blocks support for >> broadcast link >> >> On 3/13/20 6:47 AM, Tuong Lien wrote: >> >> As achieved through commit 9195948fbf34 ("tipc: improve TIPC >> throughput >> >> by Gap ACK blocks"), we apply the same mechanism for the >> broadcast link >> >> as well. The 'Gap ACK blocks' data field in a >> 'PROTOCOL/STATE_MSG' will >> >> consist of two parts built for both the broadcast and unicast types: >> >> 31 16 15 0 >> >> +-------------+-------------+-------------+-------------+ >> >> | bgack_cnt | ugack_cnt | len | >> >> +-------------+-------------+-------------+-------------+ - >> >> | gap | ack | | >> >> +-------------+-------------+-------------+-------------+ > bc gacks >> >> : : : | >> >> +-------------+-------------+-------------+-------------+ - >> >> | gap | ack | | >> >> +-------------+-------------+-------------+-------------+ > uc gacks >> >> : : : | >> >> +-------------+-------------+-------------+-------------+ - >> >> which is "automatically" backward-compatible. >> >> We also increase the max number of Gap ACK blocks to 128, >> allowing upto >> >> 64 blocks per type (total buffer size = 516 bytes). >> >> Besides, the 'tipc_link_advance_transmq()' function is refactored >> which >> >> is applicable for both the unicast and broadcast cases now, so >> some old >> >> functions can be removed and the code is optimized. >> >> With the patch, TIPC broadcast is more robust regardless of >> packet loss >> >> or disorder, latency, ... in the underlying network. Its >> performance is >> >> boost up significantly. >> >> For example, experiment with a 5% packet loss rate results: >> >> $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 >> >> real 0m 42.46s >> >> user 0m 1.16s >> >> sys 0m 17.67s >> >> Without the patch: >> >> $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 >> >> real 5m 28.80s >> >> user 0m 0.85s >> >> sys 0m 3.62s >> >> Can you explain this? To me it seems like the elapsed time is reduced >> with a factor 328.8/42.46=7.7, while we are consuming significantly >> more CPU to achieve this. Doesn't that mean that we have much more >> retransmissions which are consuming CPU? Or is there some other >> explanation? >> >> ///jon >> >> >> Signed-off-by: Tuong Lien< <mailto:tuo...@de...> tuo...@de...> >> < <mailto:tuo...@de...> mailto:tuo...@de...> >> >> --- >> >> net/tipc/bcast.c | 9 +- >> >> net/tipc/link.c | 440 >> +++++++++++++++++++++++++++++++++---------------------- >> >> net/tipc/link.h | 7 +- >> >> net/tipc/msg.h | 14 +- >> >> net/tipc/node.c | 10 +- >> >> 5 files changed, 295 insertions(+), 185 deletions(-) >> >> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c >> >> index 4c20be08b9c4..3ce690a96ee9 100644 >> >> --- a/net/tipc/bcast.c >> >> +++ b/net/tipc/bcast.c >> >> @@ -474,7 +474,7 @@ void tipc_bcast_ack_rcv(struct net *net, >> struct tipc_link *l, >> >> __skb_queue_head_init(&xmitq); >> >> >> tipc_bcast_lock(net); >> >> - tipc_link_bc_ack_rcv(l, acked, &xmitq); >> >> + tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq); >> >> tipc_bcast_unlock(net); >> >> >> tipc_bcbase_xmit(net, &xmitq); >> >> @@ -492,6 +492,7 @@ int tipc_bcast_sync_rcv(struct net *net, >> struct tipc_link *l, >> >> struct tipc_msg *hdr) >> >> { >> >> struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq; >> >> + struct tipc_gap_ack_blks *ga; >> >> struct sk_buff_head xmitq; >> >> int rc = 0; >> >> >> @@ -501,8 +502,10 @@ int tipc_bcast_sync_rcv(struct net *net, >> struct tipc_link *l, >> >> if (msg_type(hdr) != STATE_MSG) { >> >> tipc_link_bc_init_rcv(l, hdr); >> >> } else if (!msg_bc_ack_invalid(hdr)) { >> >> - tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), &xmitq); >> >> - rc = tipc_link_bc_sync_rcv(l, hdr, &xmitq); >> >> + tipc_get_gap_ack_blks(&ga, l, hdr, false); >> >> + rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), >> >> + msg_bc_gap(hdr), ga, &xmitq); >> >> + rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq); >> >> } >> >> tipc_bcast_unlock(net); >> >> >> diff --git a/net/tipc/link.c b/net/tipc/link.c >> >> index 467c53a1fb5c..6198b6d89a69 100644 >> >> --- a/net/tipc/link.c >> >> +++ b/net/tipc/link.c >> >> @@ -188,6 +188,8 @@ struct tipc_link { >> >> /* Broadcast */ >> >> u16 ackers; >> >> u16 acked; >> >> + u16 last_gap; >> >> + struct tipc_gap_ack_blks *last_ga; >> >> struct tipc_link *bc_rcvlink; >> >> struct tipc_link *bc_sndlink; >> >> u8 nack_state; >> >> @@ -249,11 +251,14 @@ static int tipc_link_build_nack_msg(struct >> tipc_link *l, >> >> struct sk_buff_head *xmitq); >> >> static void tipc_link_build_bc_init_msg(struct tipc_link *l, >> >> struct sk_buff_head *xmitq); >> >> -static int tipc_link_release_pkts(struct tipc_link *l, u16 to); >> >> -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void >> *data, u16 gap); >> >> -static int tipc_link_advance_transmq(struct tipc_link *l, u16 >> acked, u16 gap, >> >> +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, >> >> + struct tipc_link *l, u8 start_index); >> >> +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct >> tipc_msg *hdr); >> >> +static int tipc_link_advance_transmq(struct tipc_link *l, struct >> tipc_link *r, >> >> + u16 acked, u16 gap, >> >> struct tipc_gap_ack_blks *ga, >> >> - struct sk_buff_head *xmitq); >> >> + struct sk_buff_head *xmitq, >> >> + bool *retransmitted, int *rc); >> >> static void tipc_link_update_cwin(struct tipc_link *l, int >> released, >> >> bool retransmitted); >> >> /* >> >> @@ -370,7 +375,7 @@ void tipc_link_remove_bc_peer(struct >> tipc_link *snd_l, >> >> snd_l->ackers--; >> >> rcv_l->bc_peer_is_up = true; >> >> rcv_l->state = LINK_ESTABLISHED; >> >> - tipc_link_bc_ack_rcv(rcv_l, ack, xmitq); >> >> + tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq); >> >> trace_tipc_link_reset(rcv_l, TIPC_DUMP_ALL, "bclink removed!"); >> >> tipc_link_reset(rcv_l); >> >> rcv_l->state = LINK_RESET; >> >> @@ -784,8 +789,6 @@ bool tipc_link_too_silent(struct tipc_link *l) >> >> return (l->silent_intv_cnt + 2 > l->abort_limit); >> >> } >> >> >> -static int tipc_link_bc_retrans(struct tipc_link *l, struct >> tipc_link *r, >> >> - u16 from, u16 to, struct sk_buff_head >> *xmitq); >> >> /* tipc_link_timeout - perform periodic task as instructed from >> node timeout >> >> */ >> >> int tipc_link_timeout(struct tipc_link *l, struct sk_buff_head >> *xmitq) >> >> @@ -948,6 +951,9 @@ void tipc_link_reset(struct tipc_link *l) >> >> l->snd_nxt_state = 1; >> >> l->rcv_nxt_state = 1; >> >> l->acked = 0; >> >> + l->last_gap = 0; >> >> + kfree(l->last_ga); >> >> + l->last_ga = NULL; >> >> l->silent_intv_cnt = 0; >> >> l->rst_cnt = 0; >> >> l->bc_peer_is_up = false; >> >> @@ -1183,68 +1189,14 @@ static bool >> link_retransmit_failure(struct tipc_link *l, struct tipc_link *r, >> >> >> if (link_is_bc_sndlink(l)) { >> >> r->state = LINK_RESET; >> >> - *rc = TIPC_LINK_DOWN_EVT; >> >> + *rc |= TIPC_LINK_DOWN_EVT; >> >> } else { >> >> - *rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT); >> >> + *rc |= tipc_link_fsm_evt(l, LINK_FAILURE_EVT); >> >> } >> >> >> return true; >> >> } >> >> >> -/* tipc_link_bc_retrans() - retransmit zero or more packets >> >> - * @l: the link to transmit on >> >> - * @r: the receiving link ordering the retransmit. Same as l if >> unicast >> >> - * @from: retransmit from (inclusive) this sequence number >> >> - * @to: retransmit to (inclusive) this sequence number >> >> - * xmitq: queue for accumulating the retransmitted packets >> >> - */ >> >> -static int tipc_link_bc_retrans(struct tipc_link *l, struct >> tipc_link *r, >> >> - u16 from, u16 to, struct sk_buff_head >> *xmitq) >> >> -{ >> >> - struct sk_buff *_skb, *skb = skb_peek(&l->transmq); >> >> - u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; >> >> - u16 ack = l->rcv_nxt - 1; >> >> - int retransmitted = 0; >> >> - struct tipc_msg *hdr; >> >> - int rc = 0; >> >> - >> >> - if (!skb) >> >> - return 0; >> >> - if (less(to, from)) >> >> - return 0; >> >> - >> >> - trace_tipc_link_retrans(r, from, to, &l->transmq); >> >> - >> >> - if (link_retransmit_failure(l, r, &rc)) >> >> - return rc; >> >> - >> >> - skb_queue_walk(&l->transmq, skb) { >> >> - hdr = buf_msg(skb); >> >> - if (less(msg_seqno(hdr), from)) >> >> - continue; >> >> - if (more(msg_seqno(hdr), to)) >> >> - break; >> >> - if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) >> >> - continue; >> >> - TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; >> >> - _skb = pskb_copy(skb, GFP_ATOMIC); >> >> - if (!_skb) >> >> - return 0; >> >> - hdr = buf_msg(_skb); >> >> - msg_set_ack(hdr, ack); >> >> - msg_set_bcast_ack(hdr, bc_ack); >> >> - _skb->priority = TC_PRIO_CONTROL; >> >> - __skb_queue_tail(xmitq, _skb); >> >> - l->stats.retransmitted++; >> >> - retransmitted++; >> >> - /* Increase actual retrans counter & mark first time */ >> >> - if (!TIPC_SKB_CB(skb)->retr_cnt++) >> >> - TIPC_SKB_CB(skb)->retr_stamp = jiffies; >> >> - } >> >> - tipc_link_update_cwin(l, 0, retransmitted); >> >> - return 0; >> >> -} >> >> - >> >> /* tipc_data_input - deliver data and name distr msgs to upper >> layer >> >> * >> >> * Consumes buffer if message is of right type >> >> @@ -1402,46 +1354,71 @@ static int tipc_link_tnl_rcv(struct >> tipc_link *l, struct sk_buff *skb, >> >> return rc; >> >> } >> >> >> -static int tipc_link_release_pkts(struct tipc_link *l, u16 acked) >> >> -{ >> >> - int released = 0; >> >> - struct sk_buff *skb, *tmp; >> >> - >> >> - skb_queue_walk_safe(&l->transmq, skb, tmp) { >> >> - if (more(buf_seqno(skb), acked)) >> >> - break; >> >> - __skb_unlink(skb, &l->transmq); >> >> - kfree_skb(skb); >> >> - released++; >> >> +/** >> >> + * tipc_get_gap_ack_blks - get Gap ACK blocks from >> PROTOCOL/STATE_MSG >> >> + * @ga: returned pointer to the Gap ACK blocks if any >> >> + * @l: the tipc link >> >> + * @hdr: the PROTOCOL/STATE_MSG header >> >> + * @uc: desired Gap ACK blocks type, i.e. unicast (= 1) or >> broadcast (= 0) >> >> + * >> >> + * Return: the total Gap ACK blocks size >> >> + */ >> >> +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct >> tipc_link *l, >> >> + struct tipc_msg *hdr, bool uc) >> >> +{ >> >> + struct tipc_gap_ack_blks *p; >> >> + u16 sz = 0; >> >> + >> >> + /* Does peer support the Gap ACK blocks feature? */ >> >> + if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { >> >> + p = (struct tipc_gap_ack_blks *)msg_data(hdr); >> >> + sz = ntohs(p->len); >> >> + /* Sanity check */ >> >> + if (sz == tipc_gap_ack_blks_sz(p->ugack_cnt + >> p->bgack_cnt)) { >> >> + /* Good, check if the desired type exists */ >> >> + if ((uc && p->ugack_cnt) || (!uc && p->bgack_cnt)) >> >> + goto ok; >> >> + /* Backward compatible: peer might not support bc, but >> uc? */ >> >> + } else if (uc && sz == >> tipc_gap_ack_blks_sz(p->ugack_cnt)) { >> >> + if (p->ugack_cnt) { >> >> + p->bgack_cnt = 0; >> >> + goto ok; >> >> + } >> >> + } >> >> } >> >> - return released; >> >> + /* Other cases: ignore! */ >> >> + p = NULL; >> >> + >> >> +ok: >> >> + *ga = p; >> >> + return sz; >> >> } >> >> >> -/* tipc_build_gap_ack_blks - build Gap ACK blocks >> >> - * @l: tipc link that data have come with gaps in sequence if any >> >> - * @data: data buffer to store the Gap ACK blocks after built >> >> - * >> >> - * returns the actual allocated memory size >> >> - */ >> >> -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void >> *data, u16 gap) >> >> +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, >> >> + struct tipc_link *l, u8 start_index) >> >> { >> >> + struct tipc_gap_ack *gacks = &ga->gacks[start_index]; >> >> struct sk_buff *skb = skb_peek(&l->deferdq); >> >> - struct tipc_gap_ack_blks *ga = data; >> >> - u16 len, expect, seqno = 0; >> >> + u16 expect, seqno = 0; >> >> u8 n = 0; >> >> >> - if (!skb || !gap) >> >> - goto exit; >> >> + if (!skb) >> >> + return 0; >> >> >> expect = buf_seqno(skb); >> >> skb_queue_walk(&l->deferdq, skb) { >> >> seqno = buf_seqno(skb); >> >> if (unlikely(more(seqno, expect))) { >> >> - ga->gacks[n].ack = htons(expect - 1); >> >> - ga->gacks[n].gap = htons(seqno - expect); >> >> - if (++n >= MAX_GAP_ACK_BLKS) { >> >> - pr_info_ratelimited("Too few Gap ACK >> blocks!\n"); >> >> - goto exit; >> >> + gacks[n].ack = htons(expect - 1); >> >> + gacks[n].gap = htons(seqno - expect); >> >> + if (++n >= MAX_GAP_ACK_BLKS / 2) { >> >> + char buf[TIPC_MAX_LINK_NAME]; >> >> + >> >> + pr_info_ratelimited("Gacks on %s: %d, >> ql: %d!\n", >> >> + tipc_link_name_ext(l, buf), >> >> + n, >> >> + skb_queue_len(&l->deferdq)); >> >> + return n; >> >> } >> >> } else if (unlikely(less(seqno, expect))) { >> >> pr_warn("Unexpected skb in deferdq!\n"); >> >> @@ -1451,14 +1428,57 @@ static u16 tipc_build_gap_ack_blks(struct >> tipc_link *l, void *data, u16 gap) >> >> } >> >> >> /* last block */ >> >> - ga->gacks[n].ack = htons(seqno); >> >> - ga->gacks[n].gap = 0; >> >> + gacks[n].ack = htons(seqno); >> >> + gacks[n].gap = 0; >> >> n++; >> >> + return n; >> >> +} >> >> >> -exit: >> >> - len = tipc_gap_ack_blks_sz(n); >> >> +/* tipc_build_gap_ack_blks - build Gap ACK blocks >> >> + * @l: tipc unicast link >> >> + * @hdr: the tipc message buffer to store the Gap ACK blocks >> after built >> >> + * >> >> + * The function builds Gap ACK blocks for both the unicast & >> broadcast receiver >> >> + * links of a certain peer, the buffer after built has the >> network data format >> >> + * as follows: >> >> + * 31 16 15 0 >> >> + * +-------------+-------------+-------------+-------------+ >> >> + * | bgack_cnt | ugack_cnt | len | >> >> + * +-------------+-------------+-------------+-------------+ - >> >> + * | gap | ack | | >> >> + * +-------------+-------------+-------------+-------------+ > >> bc gacks >> >> + * : : : | >> >> + * +-------------+-------------+-------------+-------------+ - >> >> + * | gap | ack | | >> >> + * +-------------+-------------+-------------+-------------+ > >> uc gacks >> >> + * : : : | >> >> + * +-------------+-------------+-------------+-------------+ - >> >> + * (See struct tipc_gap_ack_blks) >> >> + * >> >> + * returns the actual allocated memory size >> >> + */ >> >> +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct >> tipc_msg *hdr) >> >> +{ >> >> + struct tipc_link *bcl = l->bc_rcvlink; >> >> + struct tipc_gap_ack_blks *ga; >> >> + u16 len; >> >> + >> >> + ga = (struct tipc_gap_ack_blks *)msg_data(hdr); >> >> + >> >> + /* Start with broadcast link first */ >> >> + tipc_bcast_lock(bcl->net); >> >> + msg_set_bcast_ack(hdr, bcl->rcv_nxt - 1); >> >> + msg_set_bc_gap(hdr, link_bc_rcv_gap(bcl)); >> >> + ga->bgack_cnt = __tipc_build_gap_ack_blks(ga, bcl, 0); >> >> + tipc_bcast_unlock(bcl->net); >> >> + >> >> + /* Now for unicast link, but an explicit NACK only (???) */ >> >> + ga->ugack_cnt = (msg_seq_gap(hdr)) ? >> >> + __tipc_build_gap_ack_blks(ga, l, ga->bgack_cnt) >> : 0; >> >> + >> >> + /* Total len */ >> >> + len = tipc_gap_ack_blks_sz(ga->bgack_cnt + ga->ugack_cnt); >> >> ga->len = htons(len); >> >> - ga->gack_cnt = n; >> >> return len; >> >> } >> >> >> @@ -1466,47 +1486,111 @@ static u16 >> tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) >> >> * acked packets, also doing >> retransmissions if >> >> * gaps found >> >> * @l: tipc link with transmq queue to be advanced >> >> + * @r: tipc link "receiver" i.e. in case of broadcast (= "l" if >> unicast) >> >> * @acked: seqno of last packet acked by peer without any gaps >> before >> >> * @gap: # of gap packets >> >> * @ga: buffer pointer to Gap ACK blocks from peer >> >> * @xmitq: queue for accumulating the retransmitted packets if any >> >> + * @retransmitted: returned boolean value if a retransmission is >> really issued >> >> + * @rc: returned code e.g. TIPC_LINK_DOWN_EVT if a repeated >> retransmit failures >> >> + * happens (- unlikely case) >> >> * >> >> - * In case of a repeated retransmit failures, the call will >> return shortly >> >> - * with a returned code (e.g. TIPC_LINK_DOWN_EVT) >> >> + * Return: the number of packets released from the link transmq >> >> */ >> >> -static int tipc_link_advance_transmq(struct tipc_link *l, u16 >> acked, u16 gap, >> >> +static int tipc_link_advance_transmq(struct tipc_link *l, struct >> tipc_link *r, >> >> + u16 acked, u16 gap, >> >> struct tipc_gap_ack_blks *ga, >> >> - struct sk_buff_head *xmitq) >> >> + struct sk_buff_head *xmitq, >> >> + bool *retransmitted, int *rc) >> >> { >> >> + struct tipc_gap_ack_blks *last_ga = r->last_ga, *this_ga = NULL; >> >> + struct tipc_gap_ack *gacks = NULL; >> >> struct sk_buff *skb, *_skb, *tmp; >> >> struct tipc_msg *hdr; >> >> + u32 qlen = skb_queue_len(&l->transmq); >> >> + u16 nacked = acked, ngap = gap, gack_cnt = 0; >> >> u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; >> >> - bool retransmitted = false; >> >> u16 ack = l->rcv_nxt - 1; >> >> - bool passed = false; >> >> - u16 released = 0; >> >> u16 seqno, n = 0; >> >> - int rc = 0; >> >> + u16 end = r->acked, start = end, offset = r->last_gap; >> >> + u16 si = (last_ga) ? last_ga->start_index : 0; >> >> + bool is_uc = !link_is_bc_sndlink(l); >> >> + bool bc_has_acked = false; >> >> + >> >> + trace_tipc_link_retrans(r, acked + 1, acked + gap, &l->transmq); >> >> + >> >> + /* Determine Gap ACK blocks if any for the particular link */ >> >> + if (ga && is_uc) { >> >> + /* Get the Gap ACKs, uc part */ >> >> + gack_cnt = ga->ugack_cnt; >> >> + gacks = &ga->gacks[ga->bgack_cnt]; >> >> + } else if (ga) { >> >> + /* Copy the Gap ACKs, bc part, for later renewal if >> needed */ >> >> + this_ga = kmemdup(ga, tipc_gap_ack_blks_sz(ga->bgack_cnt), >> >> + GFP_ATOMIC); >> >> + if (likely(this_ga)) { >> >> + this_ga->start_index = 0; >> >> + /* Start with the bc Gap ACKs */ >> >> + gack_cnt = this_ga->bgack_cnt; >> >> + gacks = &this_ga->gacks[0]; >> >> + } else { >> >> + /* Hmm, we can get in trouble..., simply ignore >> it */ >> >> + pr_warn_ratelimited("Ignoring bc Gap ACKs, no >> memory\n"); >> >> + } >> >> + } >> >> >> + /* Advance the link transmq */ >> >> skb_queue_walk_safe(&l->transmq, skb, tmp) { >> >> seqno = buf_seqno(skb); >> >> >> next_gap_ack: >> >> - if (less_eq(seqno, acked)) { >> >> + if (less_eq(seqno, nacked)) { >> >> + if (is_uc) >> >> + goto release; >> >> + /* Skip packets peer has already acked */ >> >> + if (!more(seqno, r->acked)) >> >> + continue; >> >> + /* Get the next of last Gap ACK blocks */ >> >> + while (more(seqno, end)) { >> >> + if (!last_ga || si >= last_ga->bgack_cnt) >> >> + break; >> >> + start = end + offset + 1; >> >> + end = ntohs(last_ga->gacks[si].ack); >> >> + offset = ntohs(last_ga->gacks[si].gap); >> >> + si++; >> >> + WARN_ONCE(more(start, end) || >> >> + (!offset && >> >> + si < last_ga->bgack_cnt) || >> >> + si > MAX_GAP_ACK_BLKS, >> >> + "Corrupted Gap ACK: %d %d %d %d >> %d\n", >> >> + start, end, offset, si, >> >> + last_ga->bgack_cnt); >> >> + } >> >> + /* Check against the last Gap ACK block */ >> >> + if (in_range(seqno, start, end)) >> >> + continue; >> >> + /* Update/release the packet peer is acking */ >> >> + bc_has_acked = true; >> >> + if (--TIPC_SKB_CB(skb)->ackers) >> >> + continue; >> >> +release: >> >> /* release skb */ >> >> __skb_unlink(skb, &l->transmq); >> >> kfree_skb(skb); >> >> - released++; >> >> - } else if (less_eq(seqno, acked + gap)) { >> >> - /* First, check if repeated retrans failures >> occurs? */ >> >> - if (!passed && link_retransmit_failure(l, l, &rc)) >> >> - return rc; >> >> - passed = true; >> >> - >> >> + } else if (less_eq(seqno, nacked + ngap)) { >> >> + /* First gap: check if repeated retrans >> failures? */ >> >> + if (unlikely(seqno == acked + 1 && >> >> + link_retransmit_failure(l, r, rc))) { >> >> + /* Ignore this bc Gap ACKs if any */ >> >> + kfree(this_ga); >> >> + this_ga = NULL; >> >> + break; >> >> + } >> >> /* retransmit skb if unrestricted*/ >> >> if (time_before(jiffies, >> TIPC_SKB_CB(skb)->nxt_retr)) >> >> �� continue; >> >> - TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME; >> >> + TIPC_SKB_CB(skb)->nxt_retr = (is_uc) ? >> >> + TIPC_UC_RETR_TIME : >> TIPC_BC_RETR_LIM; >> >> _skb = pskb_copy(skb, GFP_ATOMIC); >> >> if (!_skb) >> >> continue; >> >> @@ -1516,25 +1600,50 @@ static int >> tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, >> >> _skb->priority = TC_PRIO_CONTROL; >> >> __skb_queue_tail(xmitq, _skb); >> >> l->stats.retransmitted++; >> >> - retransmitted = true; >> >> + *retransmitted = true; >> >> /* Increase actual retrans counter & mark first >> time */ >> >> if (!TIPC_SKB_CB(skb)->retr_cnt++) >> >> TIPC_SKB_CB(skb)->retr_stamp = jiffies; >> >> } else { >> >> /* retry with Gap ACK blocks if any */ >> >> - if (!ga || n >= ga->gack_cnt) >> >> + if (n >= gack_cnt) >> >> break; >> >> - acked = ntohs(ga->gacks[n].ack); >> >> - gap = ntohs(ga->gacks[n].gap); >> >> + nacked = ntohs(gacks[n].ack); >> >> + ngap = ntohs(gacks[n].gap); >> >> n++; >> >> goto next_gap_ack; >> >> } >> >> } >> >> - if (released || retransmitted) >> >> - tipc_link_update_cwin(l, released, retransmitted); >> >> - if (released) >> >> - tipc_link_advance_backlog(l, xmitq); >> >> - return 0; >> >> + >> >> + /* Renew last Gap ACK blocks for bc if needed */ >> >> + if (bc_has_acked) { >> >> + if (this_ga) { >> >> + kfree(last_ga); >> >> + r->last_ga = this_ga; >> >> + r->last_gap = gap; >> >> + } else if (last_ga) { >> >> + if (less(acked, start)) { >> >> + si--; >> >> + offset = start - acked - 1; >> >> + } else if (less(acked, end)) { >> >> + acked = end; >> >> + } >> >> + if (si < last_ga->bgack_cnt) { >> >> + last_ga->start_index = si; >> >> + r->last_gap = offset; >> >> + } else { >> >> + kfree(last_ga); >> >> + r->last_ga = NULL; >> >> + r->last_gap = 0; >> >> + } >> >> + } else { >> >> + r->last_gap = 0; >> >> + } >> >> + r->acked = acked; >> >> + } else { >> >> + kfree(this_ga); >> >> + } >> >> + return skb_queue_len(&l->transmq) - qlen; >> >> } >> >> >> /* tipc_link_build_state_msg: prepare link state message for >> transmission >> >> @@ -1651,7 +1760,8 @@ int tipc_link_rcv(struct tipc_link *l, >> struct sk_buff *skb, >> >> kfree_skb(skb); >> >> break; >> >> } >>... [truncated message content] |
From: Tuong L. T. <tuo...@de...> - 2020-03-18 04:50:23
|
Hi Jon, Ok, that makes sense (but we should have covered the case a broadcast packet is released too...). However, I have another concern about the logic here: > + /* Enter fast recovery */ > + if (unlikely(retransmitted)) { > + l->ssthresh = max_t(u16, l->window / 2, 300); > + l->window = l->ssthresh; > + return; > + } What will if we have a retransmission when it's still in the slow-start phase? For example: l->ssthresh = 300 l-> window = 60 ==> retransmitted = true, then: l->ssthresh = 300; l->window = 300??? This looks not correct? Should it be: > + /* Enter fast recovery */ > + if (unlikely(retransmitted)) { > + l->ssthresh = max_t(u16, l->window / 2, 300); > - l->window = l->ssthresh; > + l->window = min_t(u16, l->window, l->ssthresh); > + return; > + } So will fix the issue with broadcast case as well? BR/Tuong -----Original Message----- From: Jon Maloy <jm...@re...> Sent: Wednesday, March 18, 2020 1:38 AM To: Tuong Lien Tong <tuo...@de...>; 'Jon Maloy' <jon...@er...>; 'Jon Maloy' <ma...@do...> Cc: tip...@li...; moh...@er... Subject: Re: [tipc-discussion] [net-next 3/3] tipc: introduce variable window congestion control On 3/17/20 6:55 AM, Tuong Lien Tong wrote: > Hi Jon, > > For the "variable window congestion control" patch, if I remember correctly, > it is for unicast link only? Why did you apply it for broadcast link, a > mistake or ...? I did it so the code would be the same everywhere. Then, by setting both min_win and max_win to the same value BC_LINK_WIN_DEFAULT (==50) in the broadcast send link this window should never change. > It now causes user messages disordered on the receiving side, because on the > sending side, the broadcast link's window is suddenly increased to 300 (i.e. > max_t(u16, l->window / 2, 300)) at a packet retransmission, leaving some > gaps between the link's 'transmq' & 'backlogq' unexpectedly... Will we fix > this by removing it? That is clearly a bug that breaks the above stated limitation. It should be sufficient to check that also l->ssthresh never exceeds l->max_win to remedy this. ///jon > > @@ -1160,7 +1224,6 @@ static int tipc_link_bc_retrans(struct tipc_link *l, > struct tipc_link *r, > continue; > if (more(msg_seqno(hdr), to)) > break; > - > if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) > continue; > TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; > @@ -1173,11 +1236,12 @@ static int tipc_link_bc_retrans(struct tipc_link *l, > struct tipc_link *r, > _skb->priority = TC_PRIO_CONTROL; > __skb_queue_tail(xmitq, _skb); > l->stats.retransmitted++; > - > + retransmitted++; > /* Increase actual retrans counter & mark first time */ > if (!TIPC_SKB_CB(skb)->retr_cnt++) > TIPC_SKB_CB(skb)->retr_stamp = jiffies; > } > + tipc_link_update_cwin(l, 0, retransmitted); // ??? > return 0; > } > > +static void tipc_link_update_cwin(struct tipc_link *l, int released, > + bool retransmitted) > +{ > + int bklog_len = skb_queue_len(&l->backlogq); > + struct sk_buff_head *txq = &l->transmq; > + int txq_len = skb_queue_len(txq); > + u16 cwin = l->window; > + > + /* Enter fast recovery */ > + if (unlikely(retransmitted)) { > + l->ssthresh = max_t(u16, l->window / 2, 300); > + l->window = l->ssthresh; > + return; > + } > > BR/Tuong > > -----Original Message----- > From: Jon Maloy <jon...@er...> > Sent: Monday, December 2, 2019 7:33 AM > To: Jon Maloy <jon...@er...>; Jon Maloy <ma...@do...> > Cc: moh...@er...; > par...@gm...; tun...@de...; > hoa...@de...; tuo...@de...; > gor...@de...; yin...@wi...; > tip...@li... > Subject: [net-next 3/3] tipc: introduce variable window congestion control > > We introduce a simple variable window congestion control for links. > The algorithm is inspired by the Reno algorithm, covering both 'slow > start', 'congestion avoidance', and 'fast recovery' modes. > > - We introduce hard lower and upper window limits per link, still > different and configurable per bearer type. > > - We introduce as 'slow start theshold' variable, initially set to > the maximum window size. > > - We let a link start at the minimum congestion window, i.e. in slow > start mode, and then let is grow rapidly (+1 per rceived ACK) until > it reaches the slow start threshold and enters congestion avoidance > mode. > > - In congestion avoidance mode we increment the congestion window for > each window_size number of acked packets, up to a possible maximum > equal to the configured maximum window. > > - For each non-duplicate NACK received, we drop back to fast recovery > mode, by setting the both the slow start threshold to and the > congestion window to (current_congestion_window / 2). > > - If the timeout handler finds that the transmit queue has not moved > timeout, it drops the link back to slow start and forces a probe > containing the last sent sequence number to the sent to the peer. > > This change does in reality have effect only on unicast ethernet > transport, as we have seen that there is no room whatsoever for > increasing the window max size for the UDP bearer. > For now, we also choose to keep the limits for the broadcast link > unchanged and equal. > > This algorithm seems to give a 50-100% throughput improvement for > messages larger than MTU. > > Suggested-by: Xin Long <luc...@gm...> > Acked-by: Xin Long <luc...@gm...> > Signed-off-by: Jon Maloy <jon...@er...> > --- > net/tipc/bcast.c | 11 ++-- > net/tipc/bearer.c | 11 ++-- > net/tipc/bearer.h | 6 +- > net/tipc/eth_media.c | 3 +- > net/tipc/ib_media.c | 5 +- > net/tipc/link.c | 175 > +++++++++++++++++++++++++++++++++++---------------- > net/tipc/link.h | 9 +-- > net/tipc/node.c | 16 ++--- > net/tipc/udp_media.c | 3 +- > 9 files changed, 160 insertions(+), 79 deletions(-) > > > > > > > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > |
From: Jon M. <jm...@re...> - 2020-03-17 18:38:09
|
On 3/17/20 6:55 AM, Tuong Lien Tong wrote: > Hi Jon, > > For the "variable window congestion control" patch, if I remember correctly, > it is for unicast link only? Why did you apply it for broadcast link, a > mistake or ...? I did it so the code would be the same everywhere. Then, by setting both min_win and max_win to the same value BC_LINK_WIN_DEFAULT (==50) in the broadcast send link this window should never change. > It now causes user messages disordered on the receiving side, because on the > sending side, the broadcast link's window is suddenly increased to 300 (i.e. > max_t(u16, l->window / 2, 300)) at a packet retransmission, leaving some > gaps between the link's 'transmq' & 'backlogq' unexpectedly... Will we fix > this by removing it? That is clearly a bug that breaks the above stated limitation. It should be sufficient to check that also l->ssthresh never exceeds l->max_win to remedy this. ///jon > > @@ -1160,7 +1224,6 @@ static int tipc_link_bc_retrans(struct tipc_link *l, > struct tipc_link *r, > continue; > if (more(msg_seqno(hdr), to)) > break; > - > if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) > continue; > TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; > @@ -1173,11 +1236,12 @@ static int tipc_link_bc_retrans(struct tipc_link *l, > struct tipc_link *r, > _skb->priority = TC_PRIO_CONTROL; > __skb_queue_tail(xmitq, _skb); > l->stats.retransmitted++; > - > + retransmitted++; > /* Increase actual retrans counter & mark first time */ > if (!TIPC_SKB_CB(skb)->retr_cnt++) > TIPC_SKB_CB(skb)->retr_stamp = jiffies; > } > + tipc_link_update_cwin(l, 0, retransmitted); // ??? > return 0; > } > > +static void tipc_link_update_cwin(struct tipc_link *l, int released, > + bool retransmitted) > +{ > + int bklog_len = skb_queue_len(&l->backlogq); > + struct sk_buff_head *txq = &l->transmq; > + int txq_len = skb_queue_len(txq); > + u16 cwin = l->window; > + > + /* Enter fast recovery */ > + if (unlikely(retransmitted)) { > + l->ssthresh = max_t(u16, l->window / 2, 300); > + l->window = l->ssthresh; > + return; > + } > > BR/Tuong > > -----Original Message----- > From: Jon Maloy <jon...@er...> > Sent: Monday, December 2, 2019 7:33 AM > To: Jon Maloy <jon...@er...>; Jon Maloy <ma...@do...> > Cc: moh...@er...; > par...@gm...; tun...@de...; > hoa...@de...; tuo...@de...; > gor...@de...; yin...@wi...; > tip...@li... > Subject: [net-next 3/3] tipc: introduce variable window congestion control > > We introduce a simple variable window congestion control for links. > The algorithm is inspired by the Reno algorithm, covering both 'slow > start', 'congestion avoidance', and 'fast recovery' modes. > > - We introduce hard lower and upper window limits per link, still > different and configurable per bearer type. > > - We introduce as 'slow start theshold' variable, initially set to > the maximum window size. > > - We let a link start at the minimum congestion window, i.e. in slow > start mode, and then let is grow rapidly (+1 per rceived ACK) until > it reaches the slow start threshold and enters congestion avoidance > mode. > > - In congestion avoidance mode we increment the congestion window for > each window_size number of acked packets, up to a possible maximum > equal to the configured maximum window. > > - For each non-duplicate NACK received, we drop back to fast recovery > mode, by setting the both the slow start threshold to and the > congestion window to (current_congestion_window / 2). > > - If the timeout handler finds that the transmit queue has not moved > timeout, it drops the link back to slow start and forces a probe > containing the last sent sequence number to the sent to the peer. > > This change does in reality have effect only on unicast ethernet > transport, as we have seen that there is no room whatsoever for > increasing the window max size for the UDP bearer. > For now, we also choose to keep the limits for the broadcast link > unchanged and equal. > > This algorithm seems to give a 50-100% throughput improvement for > messages larger than MTU. > > Suggested-by: Xin Long <luc...@gm...> > Acked-by: Xin Long <luc...@gm...> > Signed-off-by: Jon Maloy <jon...@er...> > --- > net/tipc/bcast.c | 11 ++-- > net/tipc/bearer.c | 11 ++-- > net/tipc/bearer.h | 6 +- > net/tipc/eth_media.c | 3 +- > net/tipc/ib_media.c | 5 +- > net/tipc/link.c | 175 > +++++++++++++++++++++++++++++++++++---------------- > net/tipc/link.h | 9 +-- > net/tipc/node.c | 16 ++--- > net/tipc/udp_media.c | 3 +- > 9 files changed, 160 insertions(+), 79 deletions(-) > > > > > > > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > |
From: Jon M. <jm...@re...> - 2020-03-17 18:12:43
|
On 3/17/20 4:15 AM, Tuong Lien Tong wrote: > > Hi Jon, > > In terms of scalability, yes, the design was indeed focusing on it, > the new stuffs are per individual broadcast receiver links and > completely independent to each other. Also, the way its acks (e.g. via > STATE_MSG) & retransmits is already working as unicast, also it still > must comply the other limits (such as: the link window, retransmit > timers, etc.)... So, I don't see any problems when the number of peer > grows up. > > The unicast retransmission is really for another purpose, but of > course one option as well. > > I have also done some other tests and here are the results: > > 1) tipc-pipe with large message size: > > ======================= > > - With the patch: > > ======================= > > # time tipc-pipe --mc --rdm --data_size 60000 --data_num 10000 > > real 1m 35.50s > > user 0m 0.63s > > sys 0m 5.02s > > # tipc l st sh l broadcast-link > > Link <broadcast-link> > > Window:50 packets > > RX packets:0 fragments:0/0 bundles:0/0 > > TX packets:440000 fragments:440000/10000 bundles:0/0 > > RX naks:72661 defs:0 dups:0 > > TX naks:0 acks:0 retrans:23378 > > Congestion link:890 Send queue max:0 avg:0 > > ======================= > > - Without the patch: > > ======================= > > # time tipc-pipe --mc --rdm --data_size 60000 --data_num 10000 > > real 9m 49.14s > > user 0m 0.41s > > sys 0m 1.56s > > # tipc l st sh > > Link <broadcast-link> > > Window:50 packets > > RX packets:0 fragments:0/0 bundles:0/0 > > TX packets:440000 fragments:440000/10000 bundles:0/0 > > RX naks:0 defs:0 dups:0 > > TX naks:0 acks:0 retrans:23651 > > Congestion link:2772 Send queue max:0 avg:0 > > 2) group_test (do you mean this instead of "multicast_blast"?): > Not really. I think t multicast_blast might better, but contrary to what I said there is no throughput measurement support. I think I had in mind an intermediate version that later developed into group_test, but that one was never pushed up to sourceforge. So group_test is probably more useful in this context. > ======================= > > - With the patch: > > ======================= > > # /cluster/group_test -b -m > > Commander: Received 0 UP Events for Member Id 100 > > *** TIPC Group Messaging Test Started **** > > Commander: Waiting for Scalers > > Commander: Received 1 UP Events for Member Id 101 > > Commander: Discovered 1 Scalers > > >> Starting Multicast Test > > Commander: Scaling out to 1 Workers with Id 0/1 > > Commander: Received 1 UP Events for Member Id 0 > > Commander: Scaling out to 16 Workers with Id 1/1 > > Commander: Received 16 UP Events for Member Id 1 > > Commander: Scaling out to 16 Workers with Id 2/2 > > Commander: Received 16 UP Events for Member Id 2 > > 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 665, BC 0, > throughput last intv 87 Mb/s > > 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 1269, BC 0, > throughput last intv 77 Mb/s > > 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 2042, BC 0, > throughput last intv 101 Mb/s > > 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 2797, BC 0, > throughput last intv 99 Mb/s > > Commander: Scaling in to 0 Workers with Cmd Member Id 1 > > Commander: Scaling in to 0 Workers with Cmd Member Id 2 > > Commander: Scaling in to 0 Workers with Cmd Member Id 0 > > Report #0 from 2222:1@0/0:3101578979@1001002: Sent 3555 [0,3554] (UC > 0, AC 0, MC 3555, BC 0) OK > > Report #1 from 2222:1@0/0:3423452773@1001004: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > Report #2 from 2222:2@0/0:3341501021@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #3 from 2222:1@0/0:3775779560@1001003: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > Report #4 from 2222:2@0/0:0283979098@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #5 from 2222:1@0/0:1288577198@1001004: Recv 3555 [0,3554] (UC > 0, AC 0, MC 3555, BC 0) OK > > Report #6 from 2222:1@0/0:3616132138@1001003: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > Report #7 from 2222:1@0/0:3992078596@1001004: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > Report #8 from 2222:2@0/0:1658002624@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #9 from 2222:1@0/0:1398137940@1001004: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > Report #10 from 2222:1@0/0:2790669581@1001003: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > Report #11 from 2222:1@0/0:2366726415@1001004: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > Report #12 from 2222:2@0/0:1473723325@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #13 from 2222:2@0/0:1136757126@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #14 from 2222:1@0/0:2273798525@1001004: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > Report #15 from 2222:2@0/0:3949256039@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #16 from 2222:2@0/0:1822300014@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #17 from 2222:1@0/0:3018695764@1001004: Recv 3555 [0,3554] (UC > 0, AC 0, MC 3555, BC 0) OK > > Report #18 from 2222:1@0/0:2744800964@1001003: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > Report #19 from 2222:2@0/0:0749893497@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #20 from 2222:1@0/0:1208963797@1001004: Recv 3553 [2,3554] (UC > 0, AC 0, MC 3553, BC 0) OK > > Report #21 from 2222:2@0/0:1900862087@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #22 from 2222:2@0/0:3890385549@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #23 from 2222:1@0/0:0509529720@1001003: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > Report #24 from 2222:2@0/0:0186529672@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #25 from 2222:1@0/0:1387317908@1001003: Recv 3553 [2,3554] (UC > 0, AC 0, MC 3553, BC 0) OK > > Report #26 from 2222:2@0/0:4078423711@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #27 from 2222:2@0/0:1457003499@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #28 from 2222:2@0/0:3250519860@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #29 from 2222:2@0/0:3508775508@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #30 from 2222:2@0/0:1031479895@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #31 from 2222:1@0/0:3837724876@1001003: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > Report #32 from 2222:1@0/0:2423154786@1001003: Recv 3554 [1,3554] (UC > 0, AC 0, MC 3554, BC 0) OK > > >> Multicast Test SUCCESSFUL > > >> Starting Broadcast Test > > Commander: Scaling out to 1 Workers with Id 0/1 > > Commander: Received 1 UP Events for Member Id 0 > > Commander: Scaling out to 16 Workers with Id 1/1 > > Commander: Received 16 UP Events for Member Id 1 > > Commander: Scaling out to 16 Workers with Id 2/2 > > Commander: Received 16 UP Events for Member Id 2 > > 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 434, > throughput last intv 57 Mb/s > > 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 988, > throughput last intv 72 Mb/s > > 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 1549, > throughput last intv 73 Mb/s > > 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 2078, > throughput last intv 69 Mb/s > > 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 2621, > throughput last intv 70 Mb/s > > Commander: Scaling in to 0 Workers with Cmd Member Id 1 > > Commander: Scaling in to 0 Workers with Cmd Member Id 2 > > Commander: Scaling in to 0 Workers with Cmd Member Id 0 > > Report #0 from 2222:1@0/0:2774004831@1001002: Sent 2966 [0,2965] (UC > 0, AC 0, MC 0, BC 2966) OK > > Report #1 from 2222:1@0/0:1262350339@1001004: Recv 2966 [0,2965] (UC > 0, AC 0, MC 0, BC 2966) OK > > Report #2 from 2222:2@0/0:2235335787@1001003: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #3 from 2222:2@0/0:2409874140@1001004: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #4 from 2222:1@0/0:3059039648@1001003: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #5 from 2222:1@0/0:3488269200@1001004: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #6 from 2222:2@0/0:4186324421@1001004: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #7 from 2222:1@0/0:2760420127@1001003: Recv 2966 [0,2965] (UC > 0, AC 0, MC 0, BC 2966) OK > > Report #8 from 2222:2@0/0:2056504340@1001004: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #9 from 2222:2@0/0:0998162158@1001004: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #10 from 2222:2@0/0:3124321508@1001004: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #11 from 2222:2@0/0:1260121658@1001003: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #12 from 2222:2@0/0:2938973106@1001004: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #13 from 2222:2@0/0:2896700283@1001004: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #14 from 2222:1@0/0:2158652877@1001004: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #15 from 2222:1@0/0:1398540666@1001004: Recv 2966 [0,2965] (UC > 0, AC 0, MC 0, BC 2966) OK > > Report #16 from 2222:1@0/0:1864856953@1001003: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #17 from 2222:2@0/0:3490882607@1001004: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #18 from 2222:1@0/0:2903105322@1001004: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #19 from 2222:2@0/0:1583785723@1001003: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #20 from 2222:1@0/0:3106247717@1001004: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #21 from 2222:1@0/0:2917195823@1001004: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #22 from 2222:1@0/0:0509238836@1001004: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #23 from 2222:2@0/0:2629682250@1001003: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #24 from 2222:2@0/0:1262288107@1001003: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #25 from 2222:1@0/0:3130881854@1001003: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #26 from 2222:1@0/0:0421078217@1001003: Recv 2966 [0,2965] (UC > 0, AC 0, MC 0, BC 2966) OK > > Report #27 from 2222:1@0/0:0547555733@1001003: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #28 from 2222:2@0/0:1268394531@1001003: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #29 from 2222:1@0/0:2548830551@1001003: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #30 from 2222:1@0/0:4267281725@1001003: Recv 2965 [1,2965] (UC > 0, AC 0, MC 0, BC 2965) OK > > Report #31 from 2222:2@0/0:0247684341@1001003: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > Report #32 from 2222:2@0/0:3078989866@1001003: Recv 2887 [79,2965] (UC > 0, AC 0, MC 0, BC 2887) OK > > >> Broadcast Test SUCCESSFUL > > *** TIPC Group Messaging Test Finished **** > > # tipc l st sh > > Link <broadcast-link> > > Window:50 packets > > RX packets:0 fragments:0/0 bundles:0/0 > > TX packets:287043 fragments:287040/5980 bundles:1/2 > > RX naks: 0 defs:0 dups:0 > > TX naks:0 acks:0 retrans:15284 > > Congestion link:293 Send queue max:0 avg:0 > > ======================= > > - Without the patch: > > ======================= > > #/cluster/group_test -b -m > > Commander: Received 0 UP Events for Member Id 100 > > *** TIPC Group Messaging Test Started **** > > Commander: Waiting for Scalers > > Commander: Received 1 UP Events for Member Id 101 > > Commander: Discovered 1 Scalers > > >> Starting Multicast Test > > Commander: Scaling out to 1 Workers with Id 0/1 > > Commander: Received 1 UP Events for Member Id 0 > > Commander: Scaling out to 16 Workers with Id 1/1 > > Commander: Received 16 UP Events for Member Id 1 > > Commander: Scaling out to 16 Workers with Id 2/2 > > Commander: Received 16 UP Events for Member Id 2 > > *** no report, 0 Mb/s ???*** > Hmm. Even if it is zero, it should be reported. Maybe a bug in the test program? > > Commander: Scaling in to 0 Workers with Cmd Member Id 1 > > Commander: Scaling in to 0 Workers with Cmd Member Id 2 > > Commander: Scaling in to 0 Workers with Cmd Member Id 0 > > Report #0 from 2222:1@0/0:3270095995@1001002: Sent 34 [0,33] (UC 0, > AC 0, MC 34, BC 0) OK > > Report #1 from 2222:2@0/0:1798962026@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #2 from 2222:2@0/0:1842303271@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #3 from 2222:1@0/0:4030007025@1001003: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #4 from 2222:1@0/0:1332810537@1001003: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #5 from 2222:1@0/0:4040901007@1001003: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #6 from 2222:2@0/0:0672740666@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #7 from 2222:2@0/0:2595641411@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #8 from 2222:2@0/0:2556065900@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #9 from 2222:1@0/0:2327925355@1001003: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #10 from 2222:2@0/0:1332584860@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #11 from 2222:1@0/0:3726344362@1001004: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #12 from 2222:2@0/0:3889312161@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #13 from 2222:1@0/0:1807365809@1001003: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #14 from 2222:1@0/0:2525672860@1001004: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #15 from 2222:2@0/0:1931253671@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #16 from 2222:2@0/0:1610105188@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #17 from 2222:1@0/0:0767932663@1001004: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #18 from 2222:1@0/0:3290773375@1001003: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #19 from 2222:1@0/0:2576347174@1001003: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #20 from 2222:1@0/0:2028851345@1001003: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #21 from 2222:2@0/0:0123385799@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #22 from 2222:2@0/0:1395669417@1001003: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #23 from 2222:1@0/0:1098882628@1001004: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #24 from 2222:1@0/0:3398361863@1001004: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #25 from 2222:1@0/0:1085701361@1001004: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #26 from 2222:2@0/0:1790727708@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #27 from 2222:2@0/0:3199391066@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #28 from 2222:2@0/0:1232653389@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #29 from 2222:2@0/0:2255150189@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #30 from 2222:1@0/0:2526669233@1001004: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > Report #31 from 2222:2@0/0:2479267806@1001004: Recv 0 [0,0] (UC 0, AC > 0, MC 0, BC 0) OK > > Report #32 from 2222:1@0/0:1097666084@1001004: Recv 22 [0,21] (UC 0, > AC 0, MC 22, BC 0) OK > > >> Multicast Test SUCCESSFUL > > >> Starting Broadcast Test > > Commander: Scaling out to 1 Workers with Id 0/1 > > Commander: Received 1 UP Events for Member Id 0 > > Commander: Scaling out to 16 Workers with Id 1/1 > > Commander: Received 16 UP Events for Member Id 1 > > Commander: Scaling out to 16 Workers with Id 2/2 > > Commander: Received 16 UP Events for Member Id 2 > > 2222:1@0/0:3890883707@1001002: Sent UC 0, AC 0, MC 0, BC 64, > throughput last intv 7 Mb/s > > Commander: Scaling in to 0 Workers with Cmd Member Id 1 > > Commander: Scaling in to 0 Workers with Cmd Member Id 2 > > Commander: Scaling in to 0 Workers with Cmd Member Id 0 > > Report #0 from 2222:1@0/0:3890883707@1001002: Sent 91 [0,90] (UC 0, > AC 0, MC 0, BC 91) OK > > Report #1 from 2222:2@0/0:1098659974@1001004: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #2 from 2222:1@0/0:3441709654@1001003: Recv 80 [1,80] (UC 0, > AC 0, MC 0, BC 80) OK > > Report #3 from 2222:2@0/0:0018441197@1001003: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #4 from 2222:2@0/0:0584054290@1001003: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #5 from 2222:2@0/0:4244201461@1001004: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #6 from 2222:2@0/0:1307600351@1001003: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #7 from 2222:2@0/0:3941241491@1001003: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #8 from 2222:2@0/0:2927828986@1001003: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #9 from 2222:1@0/0:1743241608@1001003: Recv 80 [1,80] (UC 0, > AC 0, MC 0, BC 80) OK > > Report #10 from 2222:2@0/0:1115515409@1001003: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #11 from 2222:1@0/0:0815130796@1001004: Recv 80 [1,80] (UC 0, > AC 0, MC 0, BC 80) OK > > Report #12 from 2222:1@0/0:2618044568@1001004: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #13 from 2222:2@0/0:1424259027@1001003: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #14 from 2222:1@0/0:1011077421@1001003: Recv 80 [1,80] (UC 0, > AC 0, MC 0, BC 80) OK > > Report #15 from 2222:1@0/0:3249391177@1001003: Recv 80 [1,80] (UC 0, > AC 0, MC 0, BC 80) OK > > Report #16 from 2222:1@0/0:2774666633@1001003: Recv 81 [0,80] (UC 0, > AC 0, MC 0, BC 81) OK > > Report #17 from 2222:1@0/0:0860766920@1001004: Recv 81 [0,80] (UC 0, > AC 0, MC 0, BC 81) OK > > Report #18 from 2222:1@0/0:0196231326@1001003: Recv 81 [0,80] (UC 0, > AC 0, MC 0, BC 81) OK > > Report #19 from 2222:1@0/0:4278611377@1001003: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #20 from 2222:1@0/0:3464416884@1001003: Recv 80 [1,80] (UC 0, > AC 0, MC 0, BC 80) OK > > Report #21 from 2222:2@0/0:1718387937@1001004: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #22 from 2222:2@0/0:0267090087@1001003: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #23 from 2222:2@0/0:1694243136@1001004: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #24 from 2222:2@0/0:0918300899@1001004: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #25 from 2222:2@0/0:0811475995@1001004: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #26 from 2222:1@0/0:0388357605@1001004: Recv 80 [1,80] (UC 0, > AC 0, MC 0, BC 80) OK > > Report #27 from 2222:1@0/0:1113395305@1001004: Recv 81 [0,80] (UC 0, > AC 0, MC 0, BC 81) OK > > Report #28 from 2222:1@0/0:3413026333@1001004: Recv 80 [1,80] (UC 0, > AC 0, MC 0, BC 80) OK > > Report #29 from 2222:2@0/0:2907075331@1001004: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #30 from 2222:1@0/0:1393297086@1001004: Recv 80 [1,80] (UC 0, > AC 0, MC 0, BC 80) OK > > Report #31 from 2222:2@0/0:3493179185@1001004: Recv 79 [2,80] (UC 0, > AC 0, MC 0, BC 79) OK > > Report #32 from 2222:1@0/0:3166541927@1001004: Recv 80 [1,80] (UC 0, > AC 0, MC 0, BC 80) OK > > >> Broadcast Test SUCCESSFUL > > *** TIPC Group Messaging Test Finished **** > > # tipc l st sh > > Link <broadcast-link> > > Window:50 packets > > RX packets:0 fragments:0/0 bundles:0/0 > > TX packets:5908 fragments:5904/123 bundles:0/0 > > RX naks:0 defs:0 dups:0 > > TX naks:0 acks:0 retrans:324 > > Congestion link:32 Send queue max:0 avg:0 > > BR/Tuong > I am totally convinced. Just give me time to review the patch properly and you'll have my ack. ///jon > -----Original Message----- > From: Jon Maloy <jm...@re...> > Sent: Tuesday, March 17, 2020 2:49 AM > To: Tuong Lien Tong <tuo...@de...>; > tip...@li...; ma...@do...; > yin...@wi...; lx...@re... > Subject: Re: [tipc-discussion] [PATCH RFC 1/2] tipc: add Gap ACK > blocks support for broadcast link > > On 3/16/20 2:18 PM, Jon Maloy wrote: > > > > > > > > > On 3/16/20 7:23 AM, Tuong Lien Tong wrote: > > >> > > [...] > > >> > > > The improvement shown here is truly impressive. However, you are only > > > showing tipc-pipe with small messages. How does this look when you > > > send full-size 66k messages? How does it scale when the number of > > > destinations grows up to tens or even hundreds? I am particularly > > > concerned that the use of unicast retransmission may become a > > > sub-optimization if the number of destinations is large. > > > > > > ///jon > > You should try the "multicast_blast" program under tipc-utils/test. That > > will give you > > numbers both on throughput and loss rates as you let the number of nodes > > grow. > > ///jon > > > > > >> BR/Tuong > > >> > > >> *From:* Jon Maloy <jm...@re... <mailto:jm...@re...>> > > >> *Sent:* Friday, March 13, 2020 10:47 PM > > >> *To:* Tuong Lien <tuo...@de... > <mailto:tuo...@de...>>; > > >> tip...@li... > <mailto:tip...@li...>; ma...@do... > <mailto:ma...@do...>; > > >> yin...@wi... <mailto:yin...@wi...> > > >> *Subject:* Re: [PATCH RFC 1/2] tipc: add Gap ACK blocks support for > > >> broadcast link > > >> > > >> On 3/13/20 6:47 AM, Tuong Lien wrote: > > >> > > >> As achieved through commit 9195948fbf34 ("tipc: improve TIPC > > >> throughput > > >> > > >> by Gap ACK blocks"), we apply the same mechanism for the > > >> broadcast link > > >> > > >> as well. The 'Gap ACK blocks' data field in a > > >> 'PROTOCOL/STATE_MSG' will > > >> > > >> consist of two parts built for both the broadcast and unicast > types: > > >> > > >> 31 16 15 0 > > >> > > >> +-------------+-------------+-------------+-------------+ > > >> > > >> | bgack_cnt | ugack_cnt | len | > > >> > > >> +-------------+-------------+-------------+-------------+ - > > >> > > >> | gap | ack | | > > >> > > >> +-------------+-------------+-------------+-------------+ > bc gacks > > >> > > >> : : : | > > >> > > >> +-------------+-------------+-------------+-------------+ - > > >> > > >> | gap | ack | | > > >> > > >> +-------------+-------------+-------------+-------------+ > uc gacks > > >> > > >> : : : | > > >> > > >> +-------------+-------------+-------------+-------------+ - > > >> > > >> which is "automatically" backward-compatible. > > >> > > >> We also increase the max number of Gap ACK blocks to 128, > > >> allowing upto > > >> > > >> 64 blocks per type (total buffer size = 516 bytes). > > >> > > >> Besides, the 'tipc_link_advance_transmq()' function is refactored > > >> which > > >> > > >> is applicable for both the unicast and broadcast cases now, so > > >> some old > > >> > > >> functions can be removed and the code is optimized. > > >> > > >> With the patch, TIPC broadcast is more robust regardless of > > >> packet loss > > >> > > >> or disorder, latency, ... in the underlying network. Its > > >> performance is > > >> > > >> boost up significantly. > > >> > > >> For example, experiment with a 5% packet loss rate results: > > >> > > >> $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 > > >> > > >> real 0m 42.46s > > >> > > >> user 0m 1.16s > > >> > > >> sys 0m 17.67s > > >> > > >> Without the patch: > > >> > > >> $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 > > >> > > >> real 5m 28.80s > > >> > > >> user 0m 0.85s > > >> > > >> sys 0m 3.62s > > >> > > >> Can you explain this? To me it seems like the elapsed time is reduced > > >> with a factor 328.8/42.46=7.7, while we are consuming significantly > > >> more CPU to achieve this. Doesn't that mean that we have much more > > >> retransmissions which are consuming CPU? Or is there some other > > >> explanation? > > >> > > >> ///jon > > >> > > >> > > >> Signed-off-by: Tuong Lien<tuo...@de... > <mailto:tuo...@de...>> > > >> <mailto:tuo...@de...> > > >> > > >> --- > > >> > > >> net/tipc/bcast.c | 9 +- > > >> > > >> net/tipc/link.c | 440 > > >> +++++++++++++++++++++++++++++++++---------------------- > > >> > > >> net/tipc/link.h | 7 +- > > >> > > >> net/tipc/msg.h | 14 +- > > >> > > >> net/tipc/node.c | 10 +- > > >> > > >> 5 files changed, 295 insertions(+), 185 deletions(-) > > >> > > >> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c > > >> > > >> index 4c20be08b9c4..3ce690a96ee9 100644 > > >> > > >> --- a/net/tipc/bcast.c > > >> > > >> +++ b/net/tipc/bcast.c > > >> > > >> @@ -474,7 +474,7 @@ void tipc_bcast_ack_rcv(struct net *net, > > >> struct tipc_link *l, > > >> > > >> __skb_queue_head_init(&xmitq); > > >> > > >> > > >> tipc_bcast_lock(net); > > >> > > >> - tipc_link_bc_ack_rcv(l, acked, &xmitq); > > >> > > >> + tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq); > > >> > > >> tipc_bcast_unlock(net); > > >> > > >> > > >> tipc_bcbase_xmit(net, &xmitq); > > >> > > >> @@ -492,6 +492,7 @@ int tipc_bcast_sync_rcv(struct net *net, > > >> struct tipc_link *l, > > >> > > >> struct tipc_msg *hdr) > > >> > > >> { > > >> > > >> struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq; > > >> > > >> + struct tipc_gap_ack_blks *ga; > > >> > > >> struct sk_buff_head xmitq; > > >> > > >> int rc = 0; > > >> > > >> > > >> @@ -501,8 +502,10 @@ int tipc_bcast_sync_rcv(struct net *net, > > >> struct tipc_link *l, > > >> > > >> if (msg_type(hdr) != STATE_MSG) { > > >> > > >> tipc_link_bc_init_rcv(l, hdr); > > >> > > >> } else if (!msg_bc_ack_invalid(hdr)) { > > >> > > >> - tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), &xmitq); > > >> > > >> - rc = tipc_link_bc_sync_rcv(l, hdr, &xmitq); > > >> > > >> + tipc_get_gap_ack_blks(&ga, l, hdr, false); > > >> > > >> + rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), > > >> > > >> + msg_bc_gap(hdr), ga, &xmitq); > > >> > > >> + rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq); > > >> > > >> } > > >> > > >> tipc_bcast_unlock(net); > > >> > > >> > > >> diff --git a/net/tipc/link.c b/net/tipc/link.c > > >> > > >> index 467c53a1fb5c..6198b6d89a69 100644 > > >> > > >> --- a/net/tipc/link.c > > >> > > >> +++ b/net/tipc/link.c > > >> > > >> @@ -188,6 +188,8 @@ struct tipc_link { > > >> > > >> /* Broadcast */ > > >> > > >> u16 ackers; > > >> > > >> u16 acked; > > >> > > >> + u16 last_gap; > > >> > > >> + struct tipc_gap_ack_blks *last_ga; > > >> > > >> struct tipc_link *bc_rcvlink; > > >> > > >> struct tipc_link *bc_sndlink; > > >> > > >> u8 nack_state; > > >> > > >> @@ -249,11 +251,14 @@ static int tipc_link_build_nack_msg(struct > > >> tipc_link *l, > > >> > > >> struct sk_buff_head *xmitq); > > >> > > >> static void tipc_link_build_bc_init_msg(struct tipc_link *l, > > >> > > >> struct sk_buff_head *xmitq); > > >> > > >> -static int tipc_link_release_pkts(struct tipc_link *l, u16 to); > > >> > > >> -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void > > >> *data, u16 gap); > > >> > > >> -static int tipc_link_advance_transmq(struct tipc_link *l, u16 > > >> acked, u16 gap, > > >> > > >> +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, > > >> > > >> + struct tipc_link *l, u8 start_index); > > >> > > >> +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct > > >> tipc_msg *hdr); > > >> > > >> +static int tipc_link_advance_transmq(struct tipc_link *l, struct > > >> tipc_link *r, > > >> > > >> + u16 acked, u16 gap, > > >> > > >> struct tipc_gap_ack_blks *ga, > > >> > > >> - struct sk_buff_head *xmitq); > > >> > > >> + struct sk_buff_head *xmitq, > > >> > > >> + bool *retransmitted, int *rc); > > >> > > >> static void tipc_link_update_cwin(struct tipc_link *l, int > > >> released, > > >> > > >> bool retransmitted); > > >> > > >> /* > > >> > > >> @@ -370,7 +375,7 @@ void tipc_link_remove_bc_peer(struct > > >> tipc_link *snd_l, > > >> > > >> snd_l->ackers--; > > >> > > >> rcv_l->bc_peer_is_up = true; > > >> > > >> rcv_l->state = LINK_ESTABLISHED; > > >> > > >> - tipc_link_bc_ack_rcv(rcv_l, ack, xmitq); > > >> > > >> + tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq); > > >> > > >> trace_tipc_link_reset(rcv_l, TIPC_DUMP_ALL, "bclink removed!"); > > >> > > >> tipc_link_reset(rcv_l); > > >> > > >> rcv_l->state = LINK_RESET; > > >> > > >> @@ -784,8 +789,6 @@ bool tipc_link_too_silent(struct tipc_link *l) > > >> > > >> return (l->silent_intv_cnt + 2 > l->abort_limit); > > >> > > >> } > > >> > > >> > > >> -static int tipc_link_bc_retrans(struct tipc_link *l, struct > > >> tipc_link *r, > > >> > > >> - u16 from, u16 to, struct sk_buff_head > > >> *xmitq); > > >> > > >> /* tipc_link_timeout - perform periodic task as instructed from > > >> node timeout > > >> > > >> */ > > >> > > >> int tipc_link_timeout(struct tipc_link *l, struct sk_buff_head > > >> *xmitq) > > >> > > >> @@ -948,6 +951,9 @@ void tipc_link_reset(struct tipc_link *l) > > >> > > >> l->snd_nxt_state = 1; > > >> > > >> l->rcv_nxt_state = 1; > > >> > > >> l->acked = 0; > > >> > > >> + l->last_gap = 0; > > >> > > >> + kfree(l->last_ga); > > >> > > >> + l->last_ga = NULL; > > >> > > >> l->silent_intv_cnt = 0; > > >> > > >> l->rst_cnt = 0; > > >> > > >> l->bc_peer_is_up = false; > > >> > > >> @@ -1183,68 +1189,14 @@ static bool > > >> link_retransmit_failure(struct tipc_link *l, struct tipc_link *r, > > >> > > >> > > >> if (link_is_bc_sndlink(l)) { > > >> > > >> r->state = LINK_RESET; > > >> > > >> - *rc = TIPC_LINK_DOWN_EVT; > > >> > > >> + *rc |= TIPC_LINK_DOWN_EVT; > > >> > > >> } else { > > >> > > >> - *rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT); > > >> > > >> + *rc |= tipc_link_fsm_evt(l, LINK_FAILURE_EVT); > > >> > > >> } > > >> > > >> > > >> return true; > > >> > > >> } > > >> > > >> > > >> -/* tipc_link_bc_retrans() - retransmit zero or more packets > > >> > > >> - * @l: the link to transmit on > > >> > > >> - * @r: the receiving link ordering the retransmit. Same as l if > > >> unicast > > >> > > >> - * @from: retransmit from (inclusive) this sequence number > > >> > > >> - * @to: retransmit to (inclusive) this sequence number > > >> > > >> - * xmitq: queue for accumulating the retransmitted packets > > >> > > >> - */ > > >> > > >> -static int tipc_link_bc_retrans(struct tipc_link *l, struct > > >> tipc_link *r, > > >> > > >> - u16 from, u16 to, struct sk_buff_head > > >> *xmitq) > > >> > > >> -{ > > >> > > >> - struct sk_buff *_skb, *skb = skb_peek(&l->transmq); > > >> > > >> - u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; > > >> > > >> - u16 ack = l->rcv_nxt - 1; > > >> > > >> - int retransmitted = 0; > > >> > > >> - struct tipc_msg *hdr; > > >> > > >> - int rc = 0; > > >> > > >> - > > >> > > >> - if (!skb) > > >> > > >> - return 0; > > >> > > >> - if (less(to, from)) > > >> > > >> - return 0; > > >> > > >> - > > >> > > >> - trace_tipc_link_retrans(r, from, to, &l->transmq); > > >> > > >> - > > >> > > >> - if (link_retransmit_failure(l, r, &rc)) > > >> > > >> - return rc; > > >> > > >> - > > >> > > >> - skb_queue_walk(&l->transmq, skb) { > > >> > > >> - hdr = buf_msg(skb); > > >> > > >> - if (less(msg_seqno(hdr), from)) > > >> > > >> - continue; > > >> > > >> - if (more(msg_seqno(hdr), to)) > > >> > > >> - break; > > >> > > >> - if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) > > >> > > >> - continue; > > >> > > >> - TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; > > >> > > >> - _skb = pskb_copy(skb, GFP_ATOMIC); > > >> > > >> - if (!_skb) > > >> > > >> - return 0; > > >> > > >> - hdr = buf_msg(_skb); > > >> > > >> - msg_set_ack(hdr, ack); > > >> > > >> - msg_set_bcast_ack(hdr, bc_ack); > > >> > > >> - _skb->priority = TC_PRIO_CONTROL; > > >> > > >> - __skb_queue_tail(xmitq, _skb); > > >> > > >> - l->stats.retransmitted++; > > >> > > >> - retransmitted++; > > >> > > >> - /* Increase actual retrans counter & mark first time */ > > >> > > >> - if (!TIPC_SKB_CB(skb)->retr_cnt++) > > >> > > >> - TIPC_SKB_CB(skb)->retr_stamp = jiffies; > > >> > > >> - } > > >> > > >> - tipc_link_update_cwin(l, 0, retransmitted); > > >> > > >> - return 0; > > >> > > >> -} > > >> > > >> - > > >> > > >> /* tipc_data_input - deliver data and name distr msgs to upper > > >> layer > > >> > > >> * > > >> > > >> * Consumes buffer if message is of right type > > >> > > >> @@ -1402,46 +1354,71 @@ static int tipc_link_tnl_rcv(struct > > >> tipc_link *l, struct sk_buff *skb, > > >> > > >> return rc; > > >> > > >> } > > >> > > >> > > >> -static int tipc_link_release_pkts(struct tipc_link *l, u16 acked) > > >> > > >> -{ > > >> > > >> - int released = 0; > > >> > > >> - struct sk_buff *skb, *tmp; > > >> > > >> - > > >> > > >> - skb_queue_walk_safe(&l->transmq, skb, tmp) { > > >> > > >> - if (more(buf_seqno(skb), acked)) > > >> > > >> - break; > > >> > > >> - __skb_unlink(skb, &l->transmq); > > >> > > >> - kfree_skb(skb); > > >> > > >> - released++; > > >> > > >> +/** > > >> > > >> + * tipc_get_gap_ack_blks - get Gap ACK blocks from > > >> PROTOCOL/STATE_MSG > > >> > > >> + * @ga: returned pointer to the Gap ACK blocks if any > > >> > > >> + * @l: the tipc link > > >> > > >> + * @hdr: the PROTOCOL/STATE_MSG header > > >> > > >> + * @uc: desired Gap ACK blocks type, i.e. unicast (= 1) or > > >> broadcast (= 0) > > >> > > >> + * > > >> > > >> + * Return: the total Gap ACK blocks size > > >> > > >> + */ > > >> > > >> +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct > > >> tipc_link *l, > > >> > > >> + struct tipc_msg *hdr, bool uc) > > >> > > >> +{ > > >> > > >> + struct tipc_gap_ack_blks *p; > > >> > > >> + u16 sz = 0; > > >> > > >> + > > >> > > >> + /* Does peer support the Gap ACK blocks feature? */ > > >> > > >> + if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { > > >> > > >> + p = (struct tipc_gap_ack_blks *)msg_data(hdr); > > >> > > >> + sz = ntohs(p->len); > > >> > > >> + /* Sanity check */ > > >> > > >> + if (sz == tipc_gap_ack_blks_sz(p->ugack_cnt + > > >> p->bgack_cnt)) { > > >> > > >> + /* Good, check if the desired type exists */ > > >> > > >> + if ((uc && p->ugack_cnt) || (!uc && p->bgack_cnt)) > > >> > > >> + goto ok; > > >> > > >> + /* Backward compatible: peer might not support bc, but > > >> uc? */ > > >> > > >> + } else if (uc && sz == > > >> tipc_gap_ack_blks_sz(p->ugack_cnt)) { > > >> > > >> + if (p->ugack_cnt) { > > >> > > >> + p->bgack_cnt = 0; > > >> > > >> + goto ok; > > >> > > >> + } > > >> > > >> + } > > >> > > >> } > > >> > > >> - return released; > > >> > > >> + /* Other cases: ignore! */ > > >> > > >> + p = NULL; > > >> > > >> + > > >> > > >> +ok: > > >> > > >> + *ga = p; > > >> > > >> + return sz; > > >> > > >> } > > >> > > >> > > >> -/* tipc_build_gap_ack_blks - build Gap ACK blocks > > >> > > >> - * @l: tipc link that data have come with gaps in sequence if any > > >> > > >> - * @data: data buffer to store the Gap ACK blocks after built > > >> > > >> - * > > >> > > >> - * returns the actual allocated memory size > > >> > > >> - */ > > >> > > >> -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void > > >> *data, u16 gap) > > >> > > >> +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, > > >> > > >> + struct tipc_link *l, u8 start_index) > > >> > > >> { > > >> > > >> + struct tipc_gap_ack *gacks = &ga->gacks[start_index]; > > >> > > >> struct sk_buff *skb = skb_peek(&l->deferdq); > > >> > > >> - struct tipc_gap_ack_blks *ga = data; > > >> > > >> - u16 len, expect, seqno = 0; > > >> > > >> + u16 expect, seqno = 0; > > >> > > >> u8 n = 0; > > >> > > >> > > >> - if (!skb || !gap) > > >> > > >> - goto exit; > > >> > > >> + if (!skb) > > >> > > >> + return 0; > > >> > > >> > > >> expect = buf_seqno(skb); > > >> > > >> skb_queue_walk(&l->deferdq, skb) { > > >> > > >> seqno = buf_seqno(skb); > > >> > > >> if (unlikely(more(seqno, expect))) { > > >> > > >> - ga->gacks[n].ack = htons(expect - 1); > > >> > > >> - ga->gacks[n].gap = htons(seqno - expect); > > >> > > >> - if (++n >= MAX_GAP_ACK_BLKS) { > > >> > > >> - pr_info_ratelimited("Too few Gap ACK > > >> blocks!\n"); > > >> > > >> - goto exit; > > >> > > >> + gacks[n].ack = htons(expect - 1); > > >> > > >> + gacks[n].gap = htons(seqno - expect); > > >> > > >> + if (++n >= MAX_GAP_ACK_BLKS / 2) { > > >> > > >> + char buf[TIPC_MAX_LINK_NAME]; > > >> > > >> + > > >> > > >> + pr_info_ratelimited("Gacks on %s: %d, > > >> ql: %d!\n", > > >> > > >> + tipc_link_name_ext(l, buf), > > >> > > >> + n, > > >> > > >> + skb_queue_len(&l->deferdq)); > > >> > > >> + return n; > > >> > > >> } > > >> > > >> } else if (unlikely(less(seqno, expect))) { > > >> > > >> pr_warn("Unexpected skb in deferdq!\n"); > > >> > > >> @@ -1451,14 +1428,57 @@ static u16 tipc_build_gap_ack_blks(struct > > >> tipc_link *l, void *data, u16 gap) > > >> > > >> } > > >> > > >> > > >> /* last block */ > > >> > > >> - ga->gacks[n].ack = htons(seqno); > > >> > > >> - ga->gacks[n].gap = 0; > > >> > > >> + gacks[n].ack = htons(seqno); > > >> > > >> + gacks[n].gap = 0; > > >> > > >> n++; > > >> > > >> + return n; > > >> > > >> +} > > >> > > >> > > >> -exit: > > >> > > >> - len = tipc_gap_ack_blks_sz(n); > > >> > > >> +/* tipc_build_gap_ack_blks - build Gap ACK blocks > > >> > > >> + * @l: tipc unicast link > > >> > > >> + * @hdr: the tipc message buffer to store the Gap ACK blocks > > >> after built > > >> > > >> + * > > >> > > >> + * The function builds Gap ACK blocks for both the unicast & > > >> broadcast receiver > > >> > > >> + * links of a certain peer, the buffer after built has the > > >> network data format > > >> > > >> + * as follows: > > >> > > >> + * 31 16 15 0 > > >> > > >> + * +-------------+-------------+-------------+-------------+ > > >> > > >> + * | bgack_cnt | ugack_cnt | len | > > >> > > >> + * +-------------+-------------+-------------+-------------+ - > > >> > > >> + * | gap | ack | | > > >> > > >> + * +-------------+-------------+-------------+-------------+ > > > >> bc gacks > > >> > > >> + * : : : | > > >> > > >> + * +-------------+-------------+-------------+-------------+ - > > >> > > >> + * | gap | ack | | > > >> > > >> + * +-------------+-------------+-------------+-------------+ > > > >> uc gacks > > >> > > >> + * : : : | > > >> > > >> + * +-------------+-------------+-------------+-------------+ - > > >> > > >> + * (See struct tipc_gap_ack_blks) > > >> > > >> + * > > >> > > >> + * returns the actual allocated memory size > > >> > > >> + */ > > >> > > >> +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct > > >> tipc_msg *hdr) > > >> > > >> +{ > > >> > > >> + struct tipc_link *bcl = l->bc_rcvlink; > > >> > > >> + struct tipc_gap_ack_blks *ga; > > >> > > >> + u16 len; > > >> > > >> + > > >> > > >> + ga = (struct tipc_gap_ack_blks *)msg_data(hdr); > > >> > > >> + > > >> > > >> + /* Start with broadcast link first */ > > >> > > >> + tipc_bcast_lock(bcl->net); > > >> > > >> + msg_set_bcast_ack(hdr, bcl->rcv_nxt - 1); > > >> > > >> + msg_set_bc_gap(hdr, link_bc_rcv_gap(bcl)); > > >> > > >> + ga->bgack_cnt = __tipc_build_gap_ack_blks(ga, bcl, 0); > > >> > > >> + tipc_bcast_unlock(bcl->net); > > >> > > >> + > > >> > > >> + /* Now for unicast link, but an explicit NACK only (???) */ > > >> > > >> + ga->ugack_cnt = (msg_seq_gap(hdr)) ? > > >> > > >> + __tipc_build_gap_ack_blks(ga, l, ga->bgack_cnt) > > >> : 0; > > >> > > >> + > > >> > > >> + /* Total len */ > > >> > > >> + len = tipc_gap_ack_blks_sz(ga->bgack_cnt + ga->ugack_cnt); > > >> > > >> ga->len = htons(len); > > >> > > >> - ga->gack_cnt = n; > > >> > > >> return len; > > >> > > >> } > > >> > > >> > > >> @@ -1466,47 +1486,111 @@ static u16 > > >> tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) > > >> > > >> * acked packets, also doing > > >> retransmissions if > > >> > > >> * gaps found > > >> > > >> * @l: tipc link with transmq queue to be advanced > > >> > > >> + * @r: tipc link "receiver" i.e. in case of broadcast (= "l" if > > >> unicast) > > >> > > >> * @acked: seqno of last packet acked by peer without any gaps > > >> before > > >> > > >> * @gap: # of gap packets > > >> > > >> * @ga: buffer pointer to Gap ACK blocks from peer > > >> > > >> * @xmitq: queue for accumulating the retransmitted packets > if any > > >> > > >> + * @retransmitted: returned boolean value if a retransmission is > > >> really issued > > >> > > >> + * @rc: returned code e.g. TIPC_LINK_DOWN_EVT if a repeated > > >> retransmit failures > > >> > > >> + * happens (- unlikely case) > > >> > > >> * > > >> > > >> - * In case of a repeated retransmit failures, the call will > > >> return shortly > > >> > > >> - * with a returned code (e.g. TIPC_LINK_DOWN_EVT) > > >> > > >> + * Return: the number of packets released from the link transmq > > >> > > >> */ > > >> > > >> -static int tipc_link_advance_transmq(struct tipc_link *l, u16 > > >> acked, u16 gap, > > >> > > >> +static int tipc_link_advance_transmq(struct tipc_link *l, struct > > >> tipc_link *r, > > >> > > >> + u16 acked, u16 gap, > > >> > > >> struct tipc_gap_ack_blks *ga, > > >> > > >> - struct sk_buff_head *xmitq) > > >> > > >> + struct sk_buff_head *xmitq, > > >> > > >> + bool *retransmitted, int *rc) > > >> > > >> { > > >> > > >> + struct tipc_gap_ack_blks *last_ga = r->last_ga, *this_ga = NULL; > > >> > > >> + struct tipc_gap_ack *gacks = NULL; > > >> > > >> struct sk_buff *skb, *_skb, *tmp; > > >> > > >> struct tipc_msg *hdr; > > >> > > >> + u32 qlen = skb_queue_len(&l->transmq); > > >> > > >> + u16 nacked = acked, ngap = gap, gack_cnt = 0; > > >> > > >> u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; > > >> > > >> - bool retransmitted = false; > > >> > > >> u16 ack = l->rcv_nxt - 1; > > >> > > >> - bool passed = false; > > >> > > >> - u16 released = 0; > > >> > > >> u16 seqno, n = 0; > > >> > > >> - int rc = 0; > > >> > > >> + u16 end = r->acked, start = end, offset = r->last_gap; > > >> > > >> + u16 si = (last_ga) ? last_ga->start_index : 0; > > >> > > >> + bool is_uc = !link_is_bc_sndlink(l); > > >> > > >> + bool bc_has_acked = false; > > >> > > >> + > > >> > > >> + trace_tipc_link_retrans(r, acked + 1, acked + gap, &l->transmq); > > >> > > >> + > > >> > > >> + /* Determine Gap ACK blocks if any for the particular link */ > > >> > > >> + if (ga && is_uc) { > > >> > > >> + /* Get the Gap ACKs, uc part */ > > >> > > >> + gack_cnt = ga->ugack_cnt; > > >> > > >> + gacks = &ga->gacks[ga->bgack_cnt]; > > >> > > >> + } else if (ga) { > > >> > > >> + /* Copy the Gap ACKs, bc part, for later renewal if > > >> needed */ > > >> > > >> + this_ga = kmemdup(ga, tipc_gap_ack_blks_sz(ga->bgack_cnt), > > >> > > >> + GFP_ATOMIC); > > >> > > >> + if (likely(this_ga)) { > > >> > > >> + this_ga->start_index = 0; > > >> > > >> + /* Start with the bc Gap ACKs */ > > >> > > >> + gack_cnt = this_ga->bgack_cnt; > > >> > > >> + gacks = &this_ga->gacks[0]; > > >> > > >> + } else { > > >> > > >> + /* Hmm, we can get in trouble..., simply ignore > > >> it */ > > >> > > >> + pr_warn_ratelimited("Ignoring bc Gap ACKs, no > > >> memory\n"); > > >> > > >> + } > > >> > > >> + } > > >> > > >> > > >> + /* Advance the link transmq */ > > >> > > >> skb_queue_walk_safe(&l->transmq, skb, tmp) { > > >> > > >> seqno = buf_seqno(skb); > > >> > > >> > > >> next_gap_ack: > > >> > > >> - if (less_eq(seqno, acked)) { > > >> > > >> + if (less_eq(seqno, nacked)) { > > >> > > >> + if (is_uc) > > >> > > >> + goto release; > > >> > > >> + /* Skip packets peer has already acked */ > > >> > > >> + if (!more(seqno, r->acked)) > > >> > > >> + continue; > > >> > > >> + /* Get the next of last Gap ACK blocks */ > > >> > > >> + while (more(seqno, end)) { > > >> > > >> + if (!last_ga || si >= last_ga->bgack_cnt) > > >> > > >> + break; > > >> > > >> + start = end + offset + 1; > > >> > > >> + end = ntohs(last_ga->gacks[si].ack); > > >> > > >> + offset = ntohs(last_ga->gacks[si].gap); > > >> > > >> + si++; > > >> > > >> + WARN_ONCE(more(start, end) || > > >> > > >> + (!offset && > > >> > > >> + si < last_ga->bgack_cnt) || > > >> > > >> + si > MAX_GAP_ACK_BLKS, > > >> > > >> + "Corrupted Gap ACK: %d %d %d %d > > >> %d\n", > > >> > > >> + start, end, offset, si, > > >> > > >> + last_ga->bgack_cnt); > > >> > > >> + } > > >> > > >> + /* Check against the last Gap ACK block */ > > >> > > >> + if (in_range(seqno, start, end)) > > >> > > >> + continue; > > >> > > >> + /* Update/release the packet peer is acking */ > > >> > > >> + bc_has_acked = true; > > >> > > >> + if (--TIPC_SKB_CB(skb)->ackers) > > >> > > >> + continue; > > >> > > >> +release: > > >> > > >> /* release skb */ > > >> > > >> __skb_unlink(skb, &l->transmq); > > >> > > >> kfree_skb(skb); > > >> > > >> - released++; > > >> > > >> - } else if (less_eq(seqno, acked + gap)) { > > >> > > >> - /* First, check if repeated retrans failures > > >> occurs? */ > > >> > > >> - if (!passed && link_retransmit_failure(l, l, &rc)) > > >> > > >> - return rc; > > >> > > >> - passed = true; > > >> > > >> - > > >> > > >> + } else if (less_eq(seqno, nacked + ngap)) { > > >> > > >> + /* First gap: check if repeated retrans > > >> failures? */ > > >> > > >> + if (unlikely(seqno == acked + 1 && > > >> > > >> + link_retransmit_failure(l, r, rc))) { > > >> > > >> + /* Ignore this bc Gap ACKs if any */ > > >> > > >> + kfree(this_ga); > > >> > > >> + this_ga = NULL; > > >> > > >> + break; > > >> > > >> + } > > >> > > >> /* retransmit skb if unrestricted*/ > > >> > > >> if (time_before(jiffies, > > >> TIPC_SKB_CB(skb)->nxt_retr)) > > >> > > >> �� continue; > > >> > > >> - TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME; > > >> > > >> + TIPC_SKB_CB(skb)->nxt_retr = (is_uc) ? > > >> > > >> + TIPC_UC_RETR_TIME : > > >> TIPC_BC_RETR_LIM; > > >> > > >> _skb = pskb_copy(skb, GFP_ATOMIC); > > >> > > >> if (!_skb) > > >> > > >> continue; > > >> > > >> @@ -1516,25 +1600,50 @@ static int > > >> tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, > > >> > > >> _skb->priority = TC_PRIO_CONTROL; > > >> > > >> __skb_queue_tail(xmitq, _skb); > > >> > > >> l->stats.retransmitted++; > > >> > > >> - retransmitted = true; > > >> > > >> + *retransmitted = true; > > >> > > >> /* Increase actual retrans counter & mark first > > >> time */ > > >> > > >> if (!TIPC_SKB_CB(skb)->retr_cnt++) > > >> > > >> TIPC_SKB_CB(skb)->retr_stamp = jiffies; > > >> > > >> } else { > > >> > > >> /* retry with Gap ACK blocks if any */ > > >> > > >> - if (!ga || n >= ga->gack_cnt) > > >> > > >> + if (n >= gack_cnt) > > >> > > >> break; > > >> > > >> - acked = ntohs(ga->gacks[n].ack); > > >> > > >> - gap = ntohs(ga->gacks[n].gap); > > >> > > >> + nacked = ntohs(gacks[n].ack); > > >> > > >> + ngap = ntohs(gacks[n].gap); > > >> > > >> n++; > > >> > > >> goto next_gap_ack; > > >> > > >> } > > >> > > >> } > > >> > > >> - if (released || retransmitted) > > >> > > >> - tipc_link_update_cwin(l, released, retransmitted); > > >> > > >> - if (released) > > >> > > >> - tipc_link_advance_backlog(l, xmitq); > > >> > > >> - return 0; > > >> > > >> + > > >> > > >> + /* Renew last Gap ACK blocks for bc if needed */ > > >> > > >> + if (bc_has_acked) { > > >> > > >> + if (this_ga) { > > >> > > >> + kfree(last_ga); > > >> > > >> + r->last_ga = this_ga; > > >> > > >> + r->last_gap = gap; > > >> > > >> + } else if (last_ga) { > > >> > > >> + if (less(acked, start)) { > > >> > > >> + si--; > > >> > > >> + offset = start - acked - 1; > > >> > > >> + } else if (less(acked, end)) { > > >> > > >> + acked = end; > > >> > > >> + } > > >> > > >> + if (si < last_ga->bgack_cnt) { > > >> > > >> + last_ga->start_index = si; > > >> > > >> + r->last_gap = offset; > > >> > > >> + } else { > > >> > > >> + kfree(last_ga); > > >> > > >> + r->last_ga = NULL; > > >> > > >> + r->last_gap = 0; > > >> > > >> + } > > >> > > >> + } else { > > >> > > >> + r->last_gap = 0; > > >> > > >> + } > > >> > > >> + r->acked = acked; > > >> > > >> + } else { > > >> > > >> + kfree(this_ga); > > >> > > >> + } > > >> > > >> + return skb_queue_len(&l->transmq) - qlen; > > >> > > >> } > > >> > > >> > > >> /* tipc_link_build_state_msg: prepare link state message for > > >> transmission > > >> > > >> @@ -1651,7 +1760,8 @@ int tipc_link_rcv(struct tipc_link *l, > > >> struct sk_buff *skb, > > >> > > >> kfree_skb(skb); > > >> > > >> break; > > >> > > >> } > > >> > > >> - released += tipc_link_release_pkts(l, msg_ack(hdr)); > > >> > > >> + released += tipc_link_advance_transmq(l, l, > > >> msg_ack(hdr), 0, > > >> > > >> + NULL, NULL, NULL, > > >> NULL); > > >> > > >> > > >> /* Defer delivery if sequence gap */ > > >> > > >> if (unlikely(seqno != rcv_nxt)) { > > >> > > >> @@ -1739,7 +1849,7 @@ static void > > >> tipc_link_build_proto_msg(struct tipc_link *l, int mtyp, bool probe, > > >> > > >> msg_set_probe(hdr, probe); > > >> > > >> msg_set_is_keepalive(hdr, probe || probe_reply); > > >> > > >> if (l->peer_caps & TIPC_GAP_ACK_BLOCK) > > >> > > >> - glen = tipc_build_gap_ack_blks(l, data, rcvgap); > > >> > > >> + glen = tipc_build_gap_ack_blks(l, hdr); > > >> > > >> tipc_mon_prep(l->net, data + glen, &dlen, mstate, > > >> l->bearer_id); > > >> > > >> msg_set_size(hdr, INT_H_SIZE + glen + dlen); > > >> > > >> skb_trim(skb, INT_H_SIZE + glen + dlen); > > >> > > >> @@ -2027,20 +2137,19 @@ static int tipc_link_proto_rcv(struct > > >> tipc_link *l, struct sk_buff *skb, > > >> > > >> { > > >> > > >> struct tipc_msg *hdr = buf_msg(skb); > > >> > > >> struct tipc_gap_ack_blks *ga = NULL; > > >> > > >> - u16 rcvgap = 0; > > >> > > >> - u16 ack = msg_ack(hdr); > > >> > > >> - u16 gap = msg_seq_gap(hdr); > > >> > > >> + bool reply = msg_probe(hdr), retransmitted = false; > > >> > > >> + u16 dlen = msg_data_sz(hdr), glen = 0; > > >> > > >> u16 peers_snd_nxt = msg_next_sent(hdr); > > >> > > >> u16 peers_tol = msg_link_tolerance(hdr); > > >> > > >> u16 peers_prio = msg_linkprio(hdr); > > >> > > >> + u16 gap = msg_seq_gap(hdr); > > >> > > >> + u16 ack = msg_ack(hdr); > > >> > > >> u16 rcv_nxt = l->rcv_nxt; > > >> > > >> - u16 dlen = msg_data_sz(hdr); > > >> > > >> + u16 rcvgap = 0; > > >> > > >> int mtyp = msg_type(hdr); > > >> > > >> - bool reply = msg_probe(hdr); > > >> > > >> - u16 glen = 0; > > >> > > >> - void *data; > > >> > > >> + int rc = 0, released; > > >> > > >> char *if_name; > > >> > > >> - int rc = 0; > > >> > > >> + void *data; > > >> > > >> > > >> trace_tipc_proto_rcv(skb, false, l->name); > > >> > > >> if (tipc_link_is_blocked(l) || !xmitq) > > >> > > >> @@ -2137,13 +2246,7 @@ static int tipc_link_proto_rcv(struct > > >> tipc_link *l, struct sk_buff *skb, > > >> > > >> } > > >> > > >> > > >> /* Receive Gap ACK blocks from peer if any */ > > >> > > >> - if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { > > >> > > >> - ga = (struct tipc_gap_ack_blks *)data; > > >> > > >> - glen = ntohs(ga->len); > > >> > > >> - /* sanity check: if failed, ignore Gap ACK > > >> blocks */ > > >> > > >> - if (glen != tipc_gap_ack_blks_sz(ga->gack_cnt)) > > >> > > >> - ga = NULL; > > >> > > >> - } > > >> > > >> + glen = tipc_get_gap_ack_blks(&ga, l, hdr, true); > > >> > > >> > > >> tipc_mon_rcv(l->net, data + glen, dlen - glen, l->addr, > > >> > > >> &l->mon_state, l->bearer_id); > > >> > > >> @@ -2158,9 +2261,14 @@ static int tipc_link_proto_rcv(struct > > >> tipc_link *l, struct sk_buff *skb, > > >> > > >> tipc_link_build_proto_msg(l, STATE_MSG, 0, reply, > > >> > > >> rcvgap, 0, 0, xmitq); > > >> > > >> > > >> - rc |= tipc_link_advance_transmq(l, ack, gap, ga, xmitq); > > >> > > >> + released = tipc_link_advance_transmq(l, l, ack, gap, ga, > > >> xmitq, > > >> > > >> + &retransmitted, &rc); > > >> > > >> if (gap) > > >> > > >> l->stats.recv_nacks++; > > >> > > >> + if (released || retransmitted) > > >> > > >> + tipc_link_update_cwin(l, released, retransmitted); > > >> > > >> + if (released) > > >> > > >> + tipc_link_advance_backlog(l, xmitq); > > >> > > >> if (unlikely(!skb_queue_empty(&l->wakeupq))) > > >> > > >> link_prepare_wakeup(l); > > >> > > >> } > > >> > > >> @@ -2246,10 +2354,7 @@ void tipc_link_bc_init_rcv(struct > > >> tipc_link *l, struct tipc_msg *hdr) > > >> > > >> int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg > > >> *hdr, > > >> > > >> struct sk_buff_head *xmitq) > > >> > > >> { > > >> > > >> - struct tipc_link *snd_l = l->bc_sndlink; > > >> > > >> u16 peers_snd_nxt = msg_bc_snd_nxt(hdr); > > >> > > >> - u16 from = msg_bcast_ack(hdr) + 1; > > >> > > >> - u16 to = from + msg_bc_gap(hdr) - 1; > > >> > > >> int rc = 0; > > >> > > >> > > >> if (!link_is_up(l)) > > >> > > >> @@ -2271,8 +2376,6 @@ int tipc_link_bc_sync_rcv(struct tipc_link > > >> *l, struct tipc_msg *hdr, > > >> > > >> if (more(peers_snd_nxt, l->rcv_nxt + l->window)) > > >> > > >> return rc; > > >> > > >> > > >> - rc = tipc_link_bc_retrans(snd_l, l, from, to, xmitq); > > >> > > >> - > > >> > > >> l->snd_nxt = peers_snd_nxt; > > >> > > >> if (link_bc_rcv_gap(l)) > > >> > > >> rc |= TIPC_LINK_SND_STATE; > > >> > > >> @@ -2307,38 +2410,28 @@ int tipc_link_bc_sync_rcv(struct > > >> tipc_link *l, struct tipc_msg *hdr, > > >> > > >> return 0; > > >> > > >> } > > >> > > >> > > >> -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, > > >> > > >> - struct sk_buff_head *xmitq) > > >> > > >> +int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap, > > >> > > >> + struct tipc_gap_ack_blks *ga, > > >> > > >> + struct sk_buff_head *xmitq) > > >> > > >> { > > >> > > >> - struct sk_buff *skb, *tmp; > > >> > > >> - struct tipc_link *snd_l = l->bc_sndlink; > > >> > > >> + struct tipc_link *l = r->bc_sndlink; > > >> > > >> + bool unused = false; > > >> > > >> + int rc = 0; > > >> > > >> > > >> - if (!link_is_up(l) || !l->bc_peer_is_up) > > >> > > >> - return; > > >> > > >> + if (!link_is_up(r) || !r->bc_peer_is_up) > > >> > > >> + return 0; > > >> > > >> > > >> - if (!more(acked, l->acked)) > > >> > > >> - return; > > >> > > >> + if (less(acked, r->acked) || (acked == r->acked && !gap && !ga)) > > >> > > >> + return 0; > > >> > > >> > > >> - trace_tipc_link_bc_ack(l, l->acked, acked, &snd_l->transmq); > > >> > > >> - /* Skip over packets peer has already acked */ > > >> > > >> - skb_queue_walk(&snd_l->transmq, skb) { > > >> > > >> - if (more(buf_seqno(skb), l->acked)) > > >> > > >> - break; > > >> > > >> - } > > >> > > >> + trace_tipc_link_bc_ack(r, r->acked, acked, &l->transmq); > > >> > > >> + tipc_link_advance_transmq(l, r, acked, gap, ga, xmitq, &unused, > > >> &rc); > > >> > > >> > > >> - /* Update/release the packets peer is acking now */ > > >> > > >> - skb_queue_walk_from_safe(&snd_l->transmq, skb, tmp) { > > >> > > >> - if (more(buf_seqno(skb), acked)) > > >> > > >> - break; > > >> > > >> - if (!--TIPC_SKB_CB(skb)->ackers) { > > >> > > >> - __skb_unlink(skb, &snd_l->transmq); > > >> > > >> - kfree_skb(skb); > > >> > > >> - } > > >> > > >> - } > > >> > > >> - l->acked = acked; > > >> > > >> - tipc_link_advance_backlog(snd_l, xmitq); > > >> > > >> - if (unlikely(!skb_queue_empty(&snd_l->wakeupq))) > > >> > > >> - link_prepare_wakeup(snd_l); > > >> > > >> + tipc_link_advance_backlog(l, xmitq)... [truncated message content] |
From: Tuong L. T. <tuo...@de...> - 2020-03-17 10:55:58
|
Hi Jon, For the "variable window congestion control" patch, if I remember correctly, it is for unicast link only? Why did you apply it for broadcast link, a mistake or ...? It now causes user messages disordered on the receiving side, because on the sending side, the broadcast link's window is suddenly increased to 300 (i.e. max_t(u16, l->window / 2, 300)) at a packet retransmission, leaving some gaps between the link's 'transmq' & 'backlogq' unexpectedly... Will we fix this by removing it? @@ -1160,7 +1224,6 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, continue; if (more(msg_seqno(hdr), to)) break; - if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) continue; TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; @@ -1173,11 +1236,12 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, _skb->priority = TC_PRIO_CONTROL; __skb_queue_tail(xmitq, _skb); l->stats.retransmitted++; - + retransmitted++; /* Increase actual retrans counter & mark first time */ if (!TIPC_SKB_CB(skb)->retr_cnt++) TIPC_SKB_CB(skb)->retr_stamp = jiffies; } + tipc_link_update_cwin(l, 0, retransmitted); // ??? return 0; } +static void tipc_link_update_cwin(struct tipc_link *l, int released, + bool retransmitted) +{ + int bklog_len = skb_queue_len(&l->backlogq); + struct sk_buff_head *txq = &l->transmq; + int txq_len = skb_queue_len(txq); + u16 cwin = l->window; + + /* Enter fast recovery */ + if (unlikely(retransmitted)) { + l->ssthresh = max_t(u16, l->window / 2, 300); + l->window = l->ssthresh; + return; + } BR/Tuong -----Original Message----- From: Jon Maloy <jon...@er...> Sent: Monday, December 2, 2019 7:33 AM To: Jon Maloy <jon...@er...>; Jon Maloy <ma...@do...> Cc: moh...@er...; par...@gm...; tun...@de...; hoa...@de...; tuo...@de...; gor...@de...; yin...@wi...; tip...@li... Subject: [net-next 3/3] tipc: introduce variable window congestion control We introduce a simple variable window congestion control for links. The algorithm is inspired by the Reno algorithm, covering both 'slow start', 'congestion avoidance', and 'fast recovery' modes. - We introduce hard lower and upper window limits per link, still different and configurable per bearer type. - We introduce as 'slow start theshold' variable, initially set to the maximum window size. - We let a link start at the minimum congestion window, i.e. in slow start mode, and then let is grow rapidly (+1 per rceived ACK) until it reaches the slow start threshold and enters congestion avoidance mode. - In congestion avoidance mode we increment the congestion window for each window_size number of acked packets, up to a possible maximum equal to the configured maximum window. - For each non-duplicate NACK received, we drop back to fast recovery mode, by setting the both the slow start threshold to and the congestion window to (current_congestion_window / 2). - If the timeout handler finds that the transmit queue has not moved timeout, it drops the link back to slow start and forces a probe containing the last sent sequence number to the sent to the peer. This change does in reality have effect only on unicast ethernet transport, as we have seen that there is no room whatsoever for increasing the window max size for the UDP bearer. For now, we also choose to keep the limits for the broadcast link unchanged and equal. This algorithm seems to give a 50-100% throughput improvement for messages larger than MTU. Suggested-by: Xin Long <luc...@gm...> Acked-by: Xin Long <luc...@gm...> Signed-off-by: Jon Maloy <jon...@er...> --- net/tipc/bcast.c | 11 ++-- net/tipc/bearer.c | 11 ++-- net/tipc/bearer.h | 6 +- net/tipc/eth_media.c | 3 +- net/tipc/ib_media.c | 5 +- net/tipc/link.c | 175 +++++++++++++++++++++++++++++++++++---------------- net/tipc/link.h | 9 +-- net/tipc/node.c | 16 ++--- net/tipc/udp_media.c | 3 +- 9 files changed, 160 insertions(+), 79 deletions(-) |
From: Tuong L. T. <tuo...@de...> - 2020-03-17 08:15:40
|
Hi Jon, In terms of scalability, yes, the design was indeed focusing on it, the new stuffs are per individual broadcast receiver links and completely independent to each other. Also, the way its acks (e.g. via STATE_MSG) & retransmits is already working as unicast, also it still must comply the other limits (such as: the link window, retransmit timers, etc.)... So, I don't see any problems when the number of peer grows up. The unicast retransmission is really for another purpose, but of course one option as well. I have also done some other tests and here are the results: 1) tipc-pipe with large message size: ======================= - With the patch: ======================= # time tipc-pipe --mc --rdm --data_size 60000 --data_num 10000 real 1m 35.50s user 0m 0.63s sys 0m 5.02s # tipc l st sh l broadcast-link Link <broadcast-link> Window:50 packets RX packets:0 fragments:0/0 bundles:0/0 TX packets:440000 fragments:440000/10000 bundles:0/0 RX naks:72661 defs:0 dups:0 TX naks:0 acks:0 retrans:23378 Congestion link:890 Send queue max:0 avg:0 ======================= - Without the patch: ======================= # time tipc-pipe --mc --rdm --data_size 60000 --data_num 10000 real 9m 49.14s user 0m 0.41s sys 0m 1.56s # tipc l st sh Link <broadcast-link> Window:50 packets RX packets:0 fragments:0/0 bundles:0/0 TX packets:440000 fragments:440000/10000 bundles:0/0 RX naks:0 defs:0 dups:0 TX naks:0 acks:0 retrans:23651 Congestion link:2772 Send queue max:0 avg:0 2) group_test (do you mean this instead of "multicast_blast"?): ======================= - With the patch: ======================= # /cluster/group_test -b -m Commander: Received 0 UP Events for Member Id 100 *** TIPC Group Messaging Test Started **** Commander: Waiting for Scalers Commander: Received 1 UP Events for Member Id 101 Commander: Discovered 1 Scalers >> Starting Multicast Test Commander: Scaling out to 1 Workers with Id 0/1 Commander: Received 1 UP Events for Member Id 0 Commander: Scaling out to 16 Workers with Id 1/1 Commander: Received 16 UP Events for Member Id 1 Commander: Scaling out to 16 Workers with Id 2/2 Commander: Received 16 UP Events for Member Id 2 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 665, BC 0, throughput last intv 87 Mb/s 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 1269, BC 0, throughput last intv 77 Mb/s 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 2042, BC 0, throughput last intv 101 Mb/s 2222:1@0/0:3101578979@1001002: Sent UC 0, AC 0, MC 2797, BC 0, throughput last intv 99 Mb/s Commander: Scaling in to 0 Workers with Cmd Member Id 1 Commander: Scaling in to 0 Workers with Cmd Member Id 2 Commander: Scaling in to 0 Workers with Cmd Member Id 0 Report #0 from 2222:1@0/0:3101578979@1001002: Sent 3555 [0,3554] (UC 0, AC 0, MC 3555, BC 0) OK Report #1 from 2222:1@0/0:3423452773@1001004: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #2 from 2222:2@0/0:3341501021@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #3 from 2222:1@0/0:3775779560@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #4 from 2222:2@0/0:0283979098@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #5 from 2222:1@0/0:1288577198@1001004: Recv 3555 [0,3554] (UC 0, AC 0, MC 3555, BC 0) OK Report #6 from 2222:1@0/0:3616132138@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #7 from 2222:1@0/0:3992078596@1001004: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #8 from 2222:2@0/0:1658002624@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #9 from 2222:1@0/0:1398137940@1001004: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #10 from 2222:1@0/0:2790669581@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #11 from 2222:1@0/0:2366726415@1001004: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #12 from 2222:2@0/0:1473723325@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #13 from 2222:2@0/0:1136757126@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #14 from 2222:1@0/0:2273798525@1001004: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #15 from 2222:2@0/0:3949256039@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #16 from 2222:2@0/0:1822300014@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #17 from 2222:1@0/0:3018695764@1001004: Recv 3555 [0,3554] (UC 0, AC 0, MC 3555, BC 0) OK Report #18 from 2222:1@0/0:2744800964@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #19 from 2222:2@0/0:0749893497@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #20 from 2222:1@0/0:1208963797@1001004: Recv 3553 [2,3554] (UC 0, AC 0, MC 3553, BC 0) OK Report #21 from 2222:2@0/0:1900862087@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #22 from 2222:2@0/0:3890385549@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #23 from 2222:1@0/0:0509529720@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #24 from 2222:2@0/0:0186529672@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #25 from 2222:1@0/0:1387317908@1001003: Recv 3553 [2,3554] (UC 0, AC 0, MC 3553, BC 0) OK Report #26 from 2222:2@0/0:4078423711@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #27 from 2222:2@0/0:1457003499@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #28 from 2222:2@0/0:3250519860@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #29 from 2222:2@0/0:3508775508@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #30 from 2222:2@0/0:1031479895@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #31 from 2222:1@0/0:3837724876@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK Report #32 from 2222:1@0/0:2423154786@1001003: Recv 3554 [1,3554] (UC 0, AC 0, MC 3554, BC 0) OK >> Multicast Test SUCCESSFUL >> Starting Broadcast Test Commander: Scaling out to 1 Workers with Id 0/1 Commander: Received 1 UP Events for Member Id 0 Commander: Scaling out to 16 Workers with Id 1/1 Commander: Received 16 UP Events for Member Id 1 Commander: Scaling out to 16 Workers with Id 2/2 Commander: Received 16 UP Events for Member Id 2 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 434, throughput last intv 57 Mb/s 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 988, throughput last intv 72 Mb/s 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 1549, throughput last intv 73 Mb/s 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 2078, throughput last intv 69 Mb/s 2222:1@0/0:2774004831@1001002: Sent UC 0, AC 0, MC 0, BC 2621, throughput last intv 70 Mb/s Commander: Scaling in to 0 Workers with Cmd Member Id 1 Commander: Scaling in to 0 Workers with Cmd Member Id 2 Commander: Scaling in to 0 Workers with Cmd Member Id 0 Report #0 from 2222:1@0/0:2774004831@1001002: Sent 2966 [0,2965] (UC 0, AC 0, MC 0, BC 2966) OK Report #1 from 2222:1@0/0:1262350339@1001004: Recv 2966 [0,2965] (UC 0, AC 0, MC 0, BC 2966) OK Report #2 from 2222:2@0/0:2235335787@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #3 from 2222:2@0/0:2409874140@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #4 from 2222:1@0/0:3059039648@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #5 from 2222:1@0/0:3488269200@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #6 from 2222:2@0/0:4186324421@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #7 from 2222:1@0/0:2760420127@1001003: Recv 2966 [0,2965] (UC 0, AC 0, MC 0, BC 2966) OK Report #8 from 2222:2@0/0:2056504340@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #9 from 2222:2@0/0:0998162158@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #10 from 2222:2@0/0:3124321508@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #11 from 2222:2@0/0:1260121658@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #12 from 2222:2@0/0:2938973106@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #13 from 2222:2@0/0:2896700283@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #14 from 2222:1@0/0:2158652877@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #15 from 2222:1@0/0:1398540666@1001004: Recv 2966 [0,2965] (UC 0, AC 0, MC 0, BC 2966) OK Report #16 from 2222:1@0/0:1864856953@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #17 from 2222:2@0/0:3490882607@1001004: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #18 from 2222:1@0/0:2903105322@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #19 from 2222:2@0/0:1583785723@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #20 from 2222:1@0/0:3106247717@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #21 from 2222:1@0/0:2917195823@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #22 from 2222:1@0/0:0509238836@1001004: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #23 from 2222:2@0/0:2629682250@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #24 from 2222:2@0/0:1262288107@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #25 from 2222:1@0/0:3130881854@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #26 from 2222:1@0/0:0421078217@1001003: Recv 2966 [0,2965] (UC 0, AC 0, MC 0, BC 2966) OK Report #27 from 2222:1@0/0:0547555733@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #28 from 2222:2@0/0:1268394531@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #29 from 2222:1@0/0:2548830551@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #30 from 2222:1@0/0:4267281725@1001003: Recv 2965 [1,2965] (UC 0, AC 0, MC 0, BC 2965) OK Report #31 from 2222:2@0/0:0247684341@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK Report #32 from 2222:2@0/0:3078989866@1001003: Recv 2887 [79,2965] (UC 0, AC 0, MC 0, BC 2887) OK >> Broadcast Test SUCCESSFUL *** TIPC Group Messaging Test Finished **** # tipc l st sh Link <broadcast-link> Window:50 packets RX packets:0 fragments:0/0 bundles:0/0 TX packets:287043 fragments:287040/5980 bundles:1/2 RX naks: 0 defs:0 dups:0 TX naks:0 acks:0 retrans:15284 Congestion link:293 Send queue max:0 avg:0 ======================= - Without the patch: ======================= #/cluster/group_test -b -m Commander: Received 0 UP Events for Member Id 100 *** TIPC Group Messaging Test Started **** Commander: Waiting for Scalers Commander: Received 1 UP Events for Member Id 101 Commander: Discovered 1 Scalers >> Starting Multicast Test Commander: Scaling out to 1 Workers with Id 0/1 Commander: Received 1 UP Events for Member Id 0 Commander: Scaling out to 16 Workers with Id 1/1 Commander: Received 16 UP Events for Member Id 1 Commander: Scaling out to 16 Workers with Id 2/2 Commander: Received 16 UP Events for Member Id 2 *** no report, 0 Mb/s ???*** Commander: Scaling in to 0 Workers with Cmd Member Id 1 Commander: Scaling in to 0 Workers with Cmd Member Id 2 Commander: Scaling in to 0 Workers with Cmd Member Id 0 Report #0 from 2222:1@0/0:3270095995@1001002: Sent 34 [0,33] (UC 0, AC 0, MC 34, BC 0) OK Report #1 from 2222:2@0/0:1798962026@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #2 from 2222:2@0/0:1842303271@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #3 from 2222:1@0/0:4030007025@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #4 from 2222:1@0/0:1332810537@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #5 from 2222:1@0/0:4040901007@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #6 from 2222:2@0/0:0672740666@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #7 from 2222:2@0/0:2595641411@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #8 from 2222:2@0/0:2556065900@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #9 from 2222:1@0/0:2327925355@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #10 from 2222:2@0/0:1332584860@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #11 from 2222:1@0/0:3726344362@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #12 from 2222:2@0/0:3889312161@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #13 from 2222:1@0/0:1807365809@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #14 from 2222:1@0/0:2525672860@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #15 from 2222:2@0/0:1931253671@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #16 from 2222:2@0/0:1610105188@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #17 from 2222:1@0/0:0767932663@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #18 from 2222:1@0/0:3290773375@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #19 from 2222:1@0/0:2576347174@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #20 from 2222:1@0/0:2028851345@1001003: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #21 from 2222:2@0/0:0123385799@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #22 from 2222:2@0/0:1395669417@1001003: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #23 from 2222:1@0/0:1098882628@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #24 from 2222:1@0/0:3398361863@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #25 from 2222:1@0/0:1085701361@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #26 from 2222:2@0/0:1790727708@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #27 from 2222:2@0/0:3199391066@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #28 from 2222:2@0/0:1232653389@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #29 from 2222:2@0/0:2255150189@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #30 from 2222:1@0/0:2526669233@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK Report #31 from 2222:2@0/0:2479267806@1001004: Recv 0 [0,0] (UC 0, AC 0, MC 0, BC 0) OK Report #32 from 2222:1@0/0:1097666084@1001004: Recv 22 [0,21] (UC 0, AC 0, MC 22, BC 0) OK >> Multicast Test SUCCESSFUL >> Starting Broadcast Test Commander: Scaling out to 1 Workers with Id 0/1 Commander: Received 1 UP Events for Member Id 0 Commander: Scaling out to 16 Workers with Id 1/1 Commander: Received 16 UP Events for Member Id 1 Commander: Scaling out to 16 Workers with Id 2/2 Commander: Received 16 UP Events for Member Id 2 2222:1@0/0:3890883707@1001002: Sent UC 0, AC 0, MC 0, BC 64, throughput last intv 7 Mb/s Commander: Scaling in to 0 Workers with Cmd Member Id 1 Commander: Scaling in to 0 Workers with Cmd Member Id 2 Commander: Scaling in to 0 Workers with Cmd Member Id 0 Report #0 from 2222:1@0/0:3890883707@1001002: Sent 91 [0,90] (UC 0, AC 0, MC 0, BC 91) OK Report #1 from 2222:2@0/0:1098659974@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #2 from 2222:1@0/0:3441709654@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #3 from 2222:2@0/0:0018441197@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #4 from 2222:2@0/0:0584054290@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #5 from 2222:2@0/0:4244201461@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #6 from 2222:2@0/0:1307600351@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #7 from 2222:2@0/0:3941241491@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #8 from 2222:2@0/0:2927828986@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #9 from 2222:1@0/0:1743241608@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #10 from 2222:2@0/0:1115515409@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #11 from 2222:1@0/0:0815130796@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #12 from 2222:1@0/0:2618044568@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #13 from 2222:2@0/0:1424259027@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #14 from 2222:1@0/0:1011077421@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #15 from 2222:1@0/0:3249391177@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #16 from 2222:1@0/0:2774666633@1001003: Recv 81 [0,80] (UC 0, AC 0, MC 0, BC 81) OK Report #17 from 2222:1@0/0:0860766920@1001004: Recv 81 [0,80] (UC 0, AC 0, MC 0, BC 81) OK Report #18 from 2222:1@0/0:0196231326@1001003: Recv 81 [0,80] (UC 0, AC 0, MC 0, BC 81) OK Report #19 from 2222:1@0/0:4278611377@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #20 from 2222:1@0/0:3464416884@1001003: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #21 from 2222:2@0/0:1718387937@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #22 from 2222:2@0/0:0267090087@1001003: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #23 from 2222:2@0/0:1694243136@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #24 from 2222:2@0/0:0918300899@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #25 from 2222:2@0/0:0811475995@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #26 from 2222:1@0/0:0388357605@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #27 from 2222:1@0/0:1113395305@1001004: Recv 81 [0,80] (UC 0, AC 0, MC 0, BC 81) OK Report #28 from 2222:1@0/0:3413026333@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #29 from 2222:2@0/0:2907075331@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #30 from 2222:1@0/0:1393297086@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK Report #31 from 2222:2@0/0:3493179185@1001004: Recv 79 [2,80] (UC 0, AC 0, MC 0, BC 79) OK Report #32 from 2222:1@0/0:3166541927@1001004: Recv 80 [1,80] (UC 0, AC 0, MC 0, BC 80) OK >> Broadcast Test SUCCESSFUL *** TIPC Group Messaging Test Finished **** # tipc l st sh Link <broadcast-link> Window:50 packets RX packets:0 fragments:0/0 bundles:0/0 TX packets:5908 fragments:5904/123 bundles:0/0 RX naks:0 defs:0 dups:0 TX naks:0 acks:0 retrans:324 Congestion link:32 Send queue max:0 avg:0 BR/Tuong -----Original Message----- From: Jon Maloy <jm...@re...> Sent: Tuesday, March 17, 2020 2:49 AM To: Tuong Lien Tong <tuo...@de...>; tip...@li...; ma...@do...; yin...@wi...; lx...@re... Subject: Re: [tipc-discussion] [PATCH RFC 1/2] tipc: add Gap ACK blocks support for broadcast link On 3/16/20 2:18 PM, Jon Maloy wrote: > > > On 3/16/20 7:23 AM, Tuong Lien Tong wrote: >> [...] >> > The improvement shown here is truly impressive. However, you are only > showing tipc-pipe with small messages. How does this look when you > send full-size 66k messages? How does it scale when the number of > destinations grows up to tens or even hundreds? I am particularly > concerned that the use of unicast retransmission may become a > sub-optimization if the number of destinations is large. > > ///jon You should try the "multicast_blast" program under tipc-utils/test. That will give you numbers both on throughput and loss rates as you let the number of nodes grow. ///jon > >> BR/Tuong >> >> *From:* Jon Maloy < <mailto:jm...@re...> jm...@re...> >> *Sent:* Friday, March 13, 2020 10:47 PM >> *To:* Tuong Lien < <mailto:tuo...@de...> tuo...@de...>; >> <mailto:tip...@li...> tip...@li...; <mailto:ma...@do...> ma...@do...; >> <mailto:yin...@wi...> yin...@wi... >> *Subject:* Re: [PATCH RFC 1/2] tipc: add Gap ACK blocks support for >> broadcast link >> >> On 3/13/20 6:47 AM, Tuong Lien wrote: >> >> As achieved through commit 9195948fbf34 ("tipc: improve TIPC >> throughput >> >> by Gap ACK blocks"), we apply the same mechanism for the >> broadcast link >> >> as well. The 'Gap ACK blocks' data field in a >> 'PROTOCOL/STATE_MSG' will >> >> consist of two parts built for both the broadcast and unicast types: >> >> 31 16 15 0 >> >> +-------------+-------------+-------------+-------------+ >> >> | bgack_cnt | ugack_cnt | len | >> >> +-------------+-------------+-------------+-------------+ - >> >> | gap | ack | | >> >> +-------------+-------------+-------------+-------------+ > bc gacks >> >> : : : | >> >> +-------------+-------------+-------------+-------------+ - >> >> | gap | ack | | >> >> +-------------+-------------+-------------+-------------+ > uc gacks >> >> : : : | >> >> +-------------+-------------+-------------+-------------+ - >> >> which is "automatically" backward-compatible. >> >> We also increase the max number of Gap ACK blocks to 128, >> allowing upto >> >> 64 blocks per type (total buffer size = 516 bytes). >> >> Besides, the 'tipc_link_advance_transmq()' function is refactored >> which >> >> is applicable for both the unicast and broadcast cases now, so >> some old >> >> functions can be removed and the code is optimized. >> >> With the patch, TIPC broadcast is more robust regardless of >> packet loss >> >> or disorder, latency, ... in the underlying network. Its >> performance is >> >> boost up significantly. >> >> For example, experiment with a 5% packet loss rate results: >> >> $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 >> >> real 0m 42.46s >> >> user 0m 1.16s >> >> sys 0m 17.67s >> >> Without the patch: >> >> $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 >> >> real 5m 28.80s >> >> user 0m 0.85s >> >> sys 0m 3.62s >> >> Can you explain this? To me it seems like the elapsed time is reduced >> with a factor 328.8/42.46=7.7, while we are consuming significantly >> more CPU to achieve this. Doesn't that mean that we have much more >> retransmissions which are consuming CPU? Or is there some other >> explanation? >> >> ///jon >> >> >> Signed-off-by: Tuong Lien< <mailto:tuo...@de...> tuo...@de...> >> < <mailto:tuo...@de...> mailto:tuo...@de...> >> >> --- >> >> net/tipc/bcast.c | 9 +- >> >> net/tipc/link.c | 440 >> +++++++++++++++++++++++++++++++++---------------------- >> >> net/tipc/link.h | 7 +- >> >> net/tipc/msg.h | 14 +- >> >> net/tipc/node.c | 10 +- >> >> 5 files changed, 295 insertions(+), 185 deletions(-) >> >> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c >> >> index 4c20be08b9c4..3ce690a96ee9 100644 >> >> --- a/net/tipc/bcast.c >> >> +++ b/net/tipc/bcast.c >> >> @@ -474,7 +474,7 @@ void tipc_bcast_ack_rcv(struct net *net, >> struct tipc_link *l, >> >> __skb_queue_head_init(&xmitq); >> >> >> tipc_bcast_lock(net); >> >> - tipc_link_bc_ack_rcv(l, acked, &xmitq); >> >> + tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq); >> >> tipc_bcast_unlock(net); >> >> >> tipc_bcbase_xmit(net, &xmitq); >> >> @@ -492,6 +492,7 @@ int tipc_bcast_sync_rcv(struct net *net, >> struct tipc_link *l, >> >> struct tipc_msg *hdr) >> >> { >> >> struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq; >> >> + struct tipc_gap_ack_blks *ga; >> >> struct sk_buff_head xmitq; >> >> int rc = 0; >> >> >> @@ -501,8 +502,10 @@ int tipc_bcast_sync_rcv(struct net *net, >> struct tipc_link *l, >> >> if (msg_type(hdr) != STATE_MSG) { >> >> tipc_link_bc_init_rcv(l, hdr); >> >> } else if (!msg_bc_ack_invalid(hdr)) { >> >> - tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), &xmitq); >> >> - rc = tipc_link_bc_sync_rcv(l, hdr, &xmitq); >> >> + tipc_get_gap_ack_blks(&ga, l, hdr, false); >> >> + rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), >> >> + msg_bc_gap(hdr), ga, &xmitq); >> >> + rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq); >> >> } >> >> tipc_bcast_unlock(net); >> >> >> diff --git a/net/tipc/link.c b/net/tipc/link.c >> >> index 467c53a1fb5c..6198b6d89a69 100644 >> >> --- a/net/tipc/link.c >> >> +++ b/net/tipc/link.c >> >> @@ -188,6 +188,8 @@ struct tipc_link { >> >> /* Broadcast */ >> >> u16 ackers; >> >> u16 acked; >> >> + u16 last_gap; >> >> + struct tipc_gap_ack_blks *last_ga; >> >> struct tipc_link *bc_rcvlink; >> >> struct tipc_link *bc_sndlink; >> >> u8 nack_state; >> >> @@ -249,11 +251,14 @@ static int tipc_link_build_nack_msg(struct >> tipc_link *l, >> >> struct sk_buff_head *xmitq); >> >> static void tipc_link_build_bc_init_msg(struct tipc_link *l, >> >> struct sk_buff_head *xmitq); >> >> -static int tipc_link_release_pkts(struct tipc_link *l, u16 to); >> >> -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void >> *data, u16 gap); >> >> -static int tipc_link_advance_transmq(struct tipc_link *l, u16 >> acked, u16 gap, >> >> +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, >> >> + struct tipc_link *l, u8 start_index); >> >> +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct >> tipc_msg *hdr); >> >> +static int tipc_link_advance_transmq(struct tipc_link *l, struct >> tipc_link *r, >> >> + u16 acked, u16 gap, >> >> struct tipc_gap_ack_blks *ga, >> >> - struct sk_buff_head *xmitq); >> >> + struct sk_buff_head *xmitq, >> >> + bool *retransmitted, int *rc); >> >> static void tipc_link_update_cwin(struct tipc_link *l, int >> released, >> >> bool retransmitted); >> >> /* >> >> @@ -370,7 +375,7 @@ void tipc_link_remove_bc_peer(struct >> tipc_link *snd_l, >> >> snd_l->ackers--; >> >> rcv_l->bc_peer_is_up = true; >> >> rcv_l->state = LINK_ESTABLISHED; >> >> - tipc_link_bc_ack_rcv(rcv_l, ack, xmitq); >> >> + tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq); >> >> trace_tipc_link_reset(rcv_l, TIPC_DUMP_ALL, "bclink removed!"); >> >> tipc_link_reset(rcv_l); >> >> rcv_l->state = LINK_RESET; >> >> @@ -784,8 +789,6 @@ bool tipc_link_too_silent(struct tipc_link *l) >> >> return (l->silent_intv_cnt + 2 > l->abort_limit); >> >> } >> >> >> -static int tipc_link_bc_retrans(struct tipc_link *l, struct >> tipc_link *r, >> >> - u16 from, u16 to, struct sk_buff_head >> *xmitq); >> >> /* tipc_link_timeout - perform periodic task as instructed from >> node timeout >> >> */ >> >> int tipc_link_timeout(struct tipc_link *l, struct sk_buff_head >> *xmitq) >> >> @@ -948,6 +951,9 @@ void tipc_link_reset(struct tipc_link *l) >> >> l->snd_nxt_state = 1; >> >> l->rcv_nxt_state = 1; >> >> l->acked = 0; >> >> + l->last_gap = 0; >> >> + kfree(l->last_ga); >> >> + l->last_ga = NULL; >> >> l->silent_intv_cnt = 0; >> >> l->rst_cnt = 0; >> >> l->bc_peer_is_up = false; >> >> @@ -1183,68 +1189,14 @@ static bool >> link_retransmit_failure(struct tipc_link *l, struct tipc_link *r, >> >> >> if (link_is_bc_sndlink(l)) { >> >> r->state = LINK_RESET; >> >> - *rc = TIPC_LINK_DOWN_EVT; >> >> + *rc |= TIPC_LINK_DOWN_EVT; >> >> } else { >> >> - *rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT); >> >> + *rc |= tipc_link_fsm_evt(l, LINK_FAILURE_EVT); >> >> } >> >> >> return true; >> >> } >> >> >> -/* tipc_link_bc_retrans() - retransmit zero or more packets >> >> - * @l: the link to transmit on >> >> - * @r: the receiving link ordering the retransmit. Same as l if >> unicast >> >> - * @from: retransmit from (inclusive) this sequence number >> >> - * @to: retransmit to (inclusive) this sequence number >> >> - * xmitq: queue for accumulating the retransmitted packets >> >> - */ >> >> -static int tipc_link_bc_retrans(struct tipc_link *l, struct >> tipc_link *r, >> >> - u16 from, u16 to, struct sk_buff_head >> *xmitq) >> >> -{ >> >> - struct sk_buff *_skb, *skb = skb_peek(&l->transmq); >> >> - u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; >> >> - u16 ack = l->rcv_nxt - 1; >> >> - int retransmitted = 0; >> >> - struct tipc_msg *hdr; >> >> - int rc = 0; >> >> - >> >> - if (!skb) >> >> - return 0; >> >> - if (less(to, from)) >> >> - return 0; >> >> - >> >> - trace_tipc_link_retrans(r, from, to, &l->transmq); >> >> - >> >> - if (link_retransmit_failure(l, r, &rc)) >> >> - return rc; >> >> - >> >> - skb_queue_walk(&l->transmq, skb) { >> >> - hdr = buf_msg(skb); >> >> - if (less(msg_seqno(hdr), from)) >> >> - continue; >> >> - if (more(msg_seqno(hdr), to)) >> >> - break; >> >> - if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) >> >> - continue; >> >> - TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; >> >> - _skb = pskb_copy(skb, GFP_ATOMIC); >> >> - if (!_skb) >> >> - return 0; >> >> - hdr = buf_msg(_skb); >> >> - msg_set_ack(hdr, ack); >> >> - msg_set_bcast_ack(hdr, bc_ack); >> >> - _skb->priority = TC_PRIO_CONTROL; >> >> - __skb_queue_tail(xmitq, _skb); >> >> - l->stats.retransmitted++; >> >> - retransmitted++; >> >> - /* Increase actual retrans counter & mark first time */ >> >> - if (!TIPC_SKB_CB(skb)->retr_cnt++) >> >> - TIPC_SKB_CB(skb)->retr_stamp = jiffies; >> >> - } >> >> - tipc_link_update_cwin(l, 0, retransmitted); >> >> - return 0; >> >> -} >> >> - >> >> /* tipc_data_input - deliver data and name distr msgs to upper >> layer >> >> * >> >> * Consumes buffer if message is of right type >> >> @@ -1402,46 +1354,71 @@ static int tipc_link_tnl_rcv(struct >> tipc_link *l, struct sk_buff *skb, >> >> return rc; >> >> } >> >> >> -static int tipc_link_release_pkts(struct tipc_link *l, u16 acked) >> >> -{ >> >> - int released = 0; >> >> - struct sk_buff *skb, *tmp; >> >> - >> >> - skb_queue_walk_safe(&l->transmq, skb, tmp) { >> >> - if (more(buf_seqno(skb), acked)) >> >> - break; >> >> - __skb_unlink(skb, &l->transmq); >> >> - kfree_skb(skb); >> >> - released++; >> >> +/** >> >> + * tipc_get_gap_ack_blks - get Gap ACK blocks from >> PROTOCOL/STATE_MSG >> >> + * @ga: returned pointer to the Gap ACK blocks if any >> >> + * @l: the tipc link >> >> + * @hdr: the PROTOCOL/STATE_MSG header >> >> + * @uc: desired Gap ACK blocks type, i.e. unicast (= 1) or >> broadcast (= 0) >> >> + * >> >> + * Return: the total Gap ACK blocks size >> >> + */ >> >> +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct >> tipc_link *l, >> >> + struct tipc_msg *hdr, bool uc) >> >> +{ >> >> + struct tipc_gap_ack_blks *p; >> >> + u16 sz = 0; >> >> + >> >> + /* Does peer support the Gap ACK blocks feature? */ >> >> + if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { >> >> + p = (struct tipc_gap_ack_blks *)msg_data(hdr); >> >> + sz = ntohs(p->len); >> >> + /* Sanity check */ >> >> + if (sz == tipc_gap_ack_blks_sz(p->ugack_cnt + >> p->bgack_cnt)) { >> >> + /* Good, check if the desired type exists */ >> >> + if ((uc && p->ugack_cnt) || (!uc && p->bgack_cnt)) >> >> + goto ok; >> >> + /* Backward compatible: peer might not support bc, but >> uc? */ >> >> + } else if (uc && sz == >> tipc_gap_ack_blks_sz(p->ugack_cnt)) { >> >> + if (p->ugack_cnt) { >> >> + p->bgack_cnt = 0; >> >> + goto ok; >> >> + } >> >> + } >> >> } >> >> - return released; >> >> + /* Other cases: ignore! */ >> >> + p = NULL; >> >> + >> >> +ok: >> >> + *ga = p; >> >> + return sz; >> >> } >> >> >> -/* tipc_build_gap_ack_blks - build Gap ACK blocks >> >> - * @l: tipc link that data have come with gaps in sequence if any >> >> - * @data: data buffer to store the Gap ACK blocks after built >> >> - * >> >> - * returns the actual allocated memory size >> >> - */ >> >> -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void >> *data, u16 gap) >> >> +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, >> >> + struct tipc_link *l, u8 start_index) >> >> { >> >> + struct tipc_gap_ack *gacks = &ga->gacks[start_index]; >> >> struct sk_buff *skb = skb_peek(&l->deferdq); >> >> - struct tipc_gap_ack_blks *ga = data; >> >> - u16 len, expect, seqno = 0; >> >> + u16 expect, seqno = 0; >> >> u8 n = 0; >> >> >> - if (!skb || !gap) >> >> - goto exit; >> >> + if (!skb) >> >> + return 0; >> >> >> expect = buf_seqno(skb); >> >> skb_queue_walk(&l->deferdq, skb) { >> >> seqno = buf_seqno(skb); >> >> if (unlikely(more(seqno, expect))) { >> >> - ga->gacks[n].ack = htons(expect - 1); >> >> - ga->gacks[n].gap = htons(seqno - expect); >> >> - if (++n >= MAX_GAP_ACK_BLKS) { >> >> - pr_info_ratelimited("Too few Gap ACK >> blocks!\n"); >> >> - goto exit; >> >> + gacks[n].ack = htons(expect - 1); >> >> + gacks[n].gap = htons(seqno - expect); >> >> + if (++n >= MAX_GAP_ACK_BLKS / 2) { >> >> + char buf[TIPC_MAX_LINK_NAME]; >> >> + >> >> + pr_info_ratelimited("Gacks on %s: %d, >> ql: %d!\n", >> >> + tipc_link_name_ext(l, buf), >> >> + n, >> >> + skb_queue_len(&l->deferdq)); >> >> + return n; >> >> } >> >> } else if (unlikely(less(seqno, expect))) { >> >> pr_warn("Unexpected skb in deferdq!\n"); >> >> @@ -1451,14 +1428,57 @@ static u16 tipc_build_gap_ack_blks(struct >> tipc_link *l, void *data, u16 gap) >> >> } >> >> >> /* last block */ >> >> - ga->gacks[n].ack = htons(seqno); >> >> - ga->gacks[n].gap = 0; >> >> + gacks[n].ack = htons(seqno); >> >> + gacks[n].gap = 0; >> >> n++; >> >> + return n; >> >> +} >> >> >> -exit: >> >> - len = tipc_gap_ack_blks_sz(n); >> >> +/* tipc_build_gap_ack_blks - build Gap ACK blocks >> >> + * @l: tipc unicast link >> >> + * @hdr: the tipc message buffer to store the Gap ACK blocks >> after built >> >> + * >> >> + * The function builds Gap ACK blocks for both the unicast & >> broadcast receiver >> >> + * links of a certain peer, the buffer after built has the >> network data format >> >> + * as follows: >> >> + * 31 16 15 0 >> >> + * +-------------+-------------+-------------+-------------+ >> >> + * | bgack_cnt | ugack_cnt | len | >> >> + * +-------------+-------------+-------------+-------------+ - >> >> + * | gap | ack | | >> >> + * +-------------+-------------+-------------+-------------+ > >> bc gacks >> >> + * : : : | >> >> + * +-------------+-------------+-------------+-------------+ - >> >> + * | gap | ack | | >> >> + * +-------------+-------------+-------------+-------------+ > >> uc gacks >> >> + * : : : | >> >> + * +-------------+-------------+-------------+-------------+ - >> >> + * (See struct tipc_gap_ack_blks) >> >> + * >> >> + * returns the actual allocated memory size >> >> + */ >> >> +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct >> tipc_msg *hdr) >> >> +{ >> >> + struct tipc_link *bcl = l->bc_rcvlink; >> >> + struct tipc_gap_ack_blks *ga; >> >> + u16 len; >> >> + >> >> + ga = (struct tipc_gap_ack_blks *)msg_data(hdr); >> >> + >> >> + /* Start with broadcast link first */ >> >> + tipc_bcast_lock(bcl->net); >> >> + msg_set_bcast_ack(hdr, bcl->rcv_nxt - 1); >> >> + msg_set_bc_gap(hdr, link_bc_rcv_gap(bcl)); >> >> + ga->bgack_cnt = __tipc_build_gap_ack_blks(ga, bcl, 0); >> >> + tipc_bcast_unlock(bcl->net); >> >> + >> >> + /* Now for unicast link, but an explicit NACK only (???) */ >> >> + ga->ugack_cnt = (msg_seq_gap(hdr)) ? >> >> + __tipc_build_gap_ack_blks(ga, l, ga->bgack_cnt) >> : 0; >> >> + >> >> + /* Total len */ >> >> + len = tipc_gap_ack_blks_sz(ga->bgack_cnt + ga->ugack_cnt); >> >> ga->len = htons(len); >> >> - ga->gack_cnt = n; >> >> return len; >> >> } >> >> >> @@ -1466,47 +1486,111 @@ static u16 >> tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) >> >> * acked packets, also doing >> retransmissions if >> >> * gaps found >> >> * @l: tipc link with transmq queue to be advanced >> >> + * @r: tipc link "receiver" i.e. in case of broadcast (= "l" if >> unicast) >> >> * @acked: seqno of last packet acked by peer without any gaps >> before >> >> * @gap: # of gap packets >> >> * @ga: buffer pointer to Gap ACK blocks from peer >> >> * @xmitq: queue for accumulating the retransmitted packets if any >> >> + * @retransmitted: returned boolean value if a retransmission is >> really issued >> >> + * @rc: returned code e.g. TIPC_LINK_DOWN_EVT if a repeated >> retransmit failures >> >> + * happens (- unlikely case) >> >> * >> >> - * In case of a repeated retransmit failures, the call will >> return shortly >> >> - * with a returned code (e.g. TIPC_LINK_DOWN_EVT) >> >> + * Return: the number of packets released from the link transmq >> >> */ >> >> -static int tipc_link_advance_transmq(struct tipc_link *l, u16 >> acked, u16 gap, >> >> +static int tipc_link_advance_transmq(struct tipc_link *l, struct >> tipc_link *r, >> >> + u16 acked, u16 gap, >> >> struct tipc_gap_ack_blks *ga, >> >> - struct sk_buff_head *xmitq) >> >> + struct sk_buff_head *xmitq, >> >> + bool *retransmitted, int *rc) >> >> { >> >> + struct tipc_gap_ack_blks *last_ga = r->last_ga, *this_ga = NULL; >> >> + struct tipc_gap_ack *gacks = NULL; >> >> struct sk_buff *skb, *_skb, *tmp; >> >> struct tipc_msg *hdr; >> >> + u32 qlen = skb_queue_len(&l->transmq); >> >> + u16 nacked = acked, ngap = gap, gack_cnt = 0; >> >> u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; >> >> - bool retransmitted = false; >> >> u16 ack = l->rcv_nxt - 1; >> >> - bool passed = false; >> >> - u16 released = 0; >> >> u16 seqno, n = 0; >> >> - int rc = 0; >> >> + u16 end = r->acked, start = end, offset = r->last_gap; >> >> + u16 si = (last_ga) ? last_ga->start_index : 0; >> >> + bool is_uc = !link_is_bc_sndlink(l); >> >> + bool bc_has_acked = false; >> >> + >> >> + trace_tipc_link_retrans(r, acked + 1, acked + gap, &l->transmq); >> >> + >> >> + /* Determine Gap ACK blocks if any for the particular link */ >> >> + if (ga && is_uc) { >> >> + /* Get the Gap ACKs, uc part */ >> >> + gack_cnt = ga->ugack_cnt; >> >> + gacks = &ga->gacks[ga->bgack_cnt]; >> >> + } else if (ga) { >> >> + /* Copy the Gap ACKs, bc part, for later renewal if >> needed */ >> >> + this_ga = kmemdup(ga, tipc_gap_ack_blks_sz(ga->bgack_cnt), >> >> + GFP_ATOMIC); >> >> + if (likely(this_ga)) { >> >> + this_ga->start_index = 0; >> >> + /* Start with the bc Gap ACKs */ >> >> + gack_cnt = this_ga->bgack_cnt; >> >> + gacks = &this_ga->gacks[0]; >> >> + } else { >> >> + /* Hmm, we can get in trouble..., simply ignore >> it */ >> >> + pr_warn_ratelimited("Ignoring bc Gap ACKs, no >> memory\n"); >> >> + } >> >> + } >> >> >> + /* Advance the link transmq */ >> >> skb_queue_walk_safe(&l->transmq, skb, tmp) { >> >> seqno = buf_seqno(skb); >> >> >> next_gap_ack: >> >> - if (less_eq(seqno, acked)) { >> >> + if (less_eq(seqno, nacked)) { >> >> + if (is_uc) >> >> + goto release; >> >> + /* Skip packets peer has already acked */ >> >> + if (!more(seqno, r->acked)) >> >> + continue; >> >> + /* Get the next of last Gap ACK blocks */ >> >> + while (more(seqno, end)) { >> >> + if (!last_ga || si >= last_ga->bgack_cnt) >> >> + break; >> >> + start = end + offset + 1; >> >> + end = ntohs(last_ga->gacks[si].ack); >> >> + offset = ntohs(last_ga->gacks[si].gap); >> >> + si++; >> >> + WARN_ONCE(more(start, end) || >> >> + (!offset && >> >> + si < last_ga->bgack_cnt) || >> >> + si > MAX_GAP_ACK_BLKS, >> >> + "Corrupted Gap ACK: %d %d %d %d >> %d\n", >> >> + start, end, offset, si, >> >> + last_ga->bgack_cnt); >> >> + } >> >> + /* Check against the last Gap ACK block */ >> >> + if (in_range(seqno, start, end)) >> >> + continue; >> >> + /* Update/release the packet peer is acking */ >> >> + bc_has_acked = true; >> >> + if (--TIPC_SKB_CB(skb)->ackers) >> >> + continue; >> >> +release: >> >> /* release skb */ >> >> __skb_unlink(skb, &l->transmq); >> >> kfree_skb(skb); >> >> - released++; >> >> - } else if (less_eq(seqno, acked + gap)) { >> >> - /* First, check if repeated retrans failures >> occurs? */ >> >> - if (!passed && link_retransmit_failure(l, l, &rc)) >> >> - return rc; >> >> - passed = true; >> >> - >> >> + } else if (less_eq(seqno, nacked + ngap)) { >> >> + /* First gap: check if repeated retrans >> failures? */ >> >> + if (unlikely(seqno == acked + 1 && >> >> + link_retransmit_failure(l, r, rc))) { >> >> + /* Ignore this bc Gap ACKs if any */ >> >> + kfree(this_ga); >> >> + this_ga = NULL; >> >> + break; >> >> + } >> >> /* retransmit skb if unrestricted*/ >> >> if (time_before(jiffies, >> TIPC_SKB_CB(skb)->nxt_retr)) >> >> continue; >> >> - TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME; >> >> + TIPC_SKB_CB(skb)->nxt_retr = (is_uc) ? >> >> + TIPC_UC_RETR_TIME : >> TIPC_BC_RETR_LIM; >> >> _skb = pskb_copy(skb, GFP_ATOMIC); >> >> if (!_skb) >> >> continue; >> >> @@ -1516,25 +1600,50 @@ static int >> tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, >> >> _skb->priority = TC_PRIO_CONTROL; >> >> __skb_queue_tail(xmitq, _skb); >> >> l->stats.retransmitted++; >> >> - retransmitted = true; >> >> + *retransmitted = true; >> >> /* Increase actual retrans counter & mark first >> time */ >> >> if (!TIPC_SKB_CB(skb)->retr_cnt++) >> >> TIPC_SKB_CB(skb)->retr_stamp = jiffies; >> >> } else { >> >> /* retry with Gap ACK blocks if any */ >> >> - if (!ga || n >= ga->gack_cnt) >> >> + if (n >= gack_cnt) >> >> break; >> >> - acked = ntohs(ga->gacks[n].ack); >> >> - gap = ntohs(ga->gacks[n].gap); >> >> + nacked = ntohs(gacks[n].ack); >> >> + ngap = ntohs(gacks[n].gap); >> >> n++; >> >> goto next_gap_ack; >> >> } >> >> } >> >> - if (released || retransmitted) >> >> - tipc_link_update_cwin(l, released, retransmitted); >> >> - if (released) >> >> - tipc_link_advance_backlog(l, xmitq); >> >> - return 0; >> >> + >> >> + /* Renew last Gap ACK blocks for bc if needed */ >> >> + if (bc_has_acked) { >> >> + if (this_ga) { >> >> + kfree(last_ga); >> >> + r->last_ga = this_ga; >> >> + r->last_gap = gap; >> >> + } else if (last_ga) { >> >> + if (less(acked, start)) { >> >> + si--; >> >> + offset = start - acked - 1; >> >> + } else if (less(acked, end)) { >> >> + acked = end; >> >> + } >> >> + if (si < last_ga->bgack_cnt) { >> >> + last_ga->start_index = si; >> >> + r->last_gap = offset; >> >> + } else { >> >> + kfree(last_ga); >> >> + r->last_ga = NULL; >> >> + r->last_gap = 0; >> >> + } >> >> + } else { >> >> + r->last_gap = 0; >> >> + } >> >> + r->acked = acked; >> >> + } else { >> >> + kfree(this_ga); >> >> + } >> >> + return skb_queue_len(&l->transmq) - qlen; >> >> } >> >> >> /* tipc_link_build_state_msg: prepare link state message for >> transmission >> >> @@ -1651,7 +1760,8 @@ int tipc_link_rcv(struct tipc_link *l, >> struct sk_buff *skb, >> >> kfree_skb(skb); >> >> break; >> >> } >> >> - released += tipc_link_release_pkts(l, msg_ack(hdr)); >> >> + released += tipc_link_advance_transmq(l, l, >> msg_ack(hdr), 0, >> >> + NULL, NULL, NULL, >> NULL); >> >> >> /* Defer delivery if sequence gap */ >> >> if (unlikely(seqno != rcv_nxt)) { >> >> @@ -1739,7 +1849,7 @@ static void >> tipc_link_build_proto_msg(struct tipc_link *l, int mtyp, bool probe, >> >> msg_set_probe(hdr, probe); >> >> msg_set_is_keepalive(hdr, probe || probe_reply); >> >> if (l->peer_caps & TIPC_GAP_ACK_BLOCK) >> >> - glen = tipc_build_gap_ack_blks(l, data, rcvgap); >> >> + glen = tipc_build_gap_ack_blks(l, hdr); >> >> tipc_mon_prep(l->net, data + glen, &dlen, mstate, >> l->bearer_id); >> >> msg_set_size(hdr, INT_H_SIZE + glen + dlen); >> >> skb_trim(skb, INT_H_SIZE + glen + dlen); >> >> @@ -2027,20 +2137,19 @@ static int tipc_link_proto_rcv(struct >> tipc_link *l, struct sk_buff *skb, >> >> { >> >> struct tipc_msg *hdr = buf_msg(skb); >> >> struct tipc_gap_ack_blks *ga = NULL; >> >> - u16 rcvgap = 0; >> >> - u16 ack = msg_ack(hdr); >> >> - u16 gap = msg_seq_gap(hdr); >> >> + bool reply = msg_probe(hdr), retransmitted = false; >> >> + u16 dlen = msg_data_sz(hdr), glen = 0; >> >> u16 peers_snd_nxt = msg_next_sent(hdr); >> >> u16 peers_tol = msg_link_tolerance(hdr); >> >> u16 peers_prio = msg_linkprio(hdr); >> >> + u16 gap = msg_seq_gap(hdr); >> >> + u16 ack = msg_ack(hdr); >> >> u16 rcv_nxt = l->rcv_nxt; >> >> - u16 dlen = msg_data_sz(hdr); >> >> + u16 rcvgap = 0; >> >> int mtyp = msg_type(hdr); >> >> - bool reply = msg_probe(hdr); >> >> - u16 glen = 0; >> >> - void *data; >> >> + int rc = 0, released; >> >> char *if_name; >> >> - int rc = 0; >> >> + void *data; >> >> >> trace_tipc_proto_rcv(skb, false, l->name); >> >> if (tipc_link_is_blocked(l) || !xmitq) >> >> @@ -2137,13 +2246,7 @@ static int tipc_link_proto_rcv(struct >> tipc_link *l, struct sk_buff *skb, >> >> } >> >> >> /* Receive Gap ACK blocks from peer if any */ >> >> - if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { >> >> - ga = (struct tipc_gap_ack_blks *)data; >> >> - glen = ntohs(ga->len); >> >> - /* sanity check: if failed, ignore Gap ACK >> blocks */ >> >> - if (glen != tipc_gap_ack_blks_sz(ga->gack_cnt)) >> >> - ga = NULL; >> >> - } >> >> + glen = tipc_get_gap_ack_blks(&ga, l, hdr, true); >> >> >> tipc_mon_rcv(l->net, data + glen, dlen - glen, l->addr, >> >> &l->mon_state, l->bearer_id); >> >> @@ -2158,9 +2261,14 @@ static int tipc_link_proto_rcv(struct >> tipc_link *l, struct sk_buff *skb, >> >> tipc_link_build_proto_msg(l, STATE_MSG, 0, reply, >> >> rcvgap, 0, 0, xmitq); >> >> >> - rc |= tipc_link_advance_transmq(l, ack, gap, ga, xmitq); >> >> + released = tipc_link_advance_transmq(l, l, ack, gap, ga, >> xmitq, >> >> + &retransmitted, &rc); >> >> if (gap) >> >> l->stats.recv_nacks++; >> >> + if (released || retransmitted) >> >> + tipc_link_update_cwin(l, released, retransmitted); >> >> + if (released) >> >> + tipc_link_advance_backlog(l, xmitq); >> >> if (unlikely(!skb_queue_empty(&l->wakeupq))) >> >> link_prepare_wakeup(l); >> >> } >> >> @@ -2246,10 +2354,7 @@ void tipc_link_bc_init_rcv(struct >> tipc_link *l, struct tipc_msg *hdr) >> >> int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg >> *hdr, >> >> struct sk_buff_head *xmitq) >> >> { >> >> - struct tipc_link *snd_l = l->bc_sndlink; >> >> u16 peers_snd_nxt = msg_bc_snd_nxt(hdr); >> >> - u16 from = msg_bcast_ack(hdr) + 1; >> >> - u16 to = from + msg_bc_gap(hdr) - 1; >> >> int rc = 0; >> >> >> if (!link_is_up(l)) >> >> @@ -2271,8 +2376,6 @@ int tipc_link_bc_sync_rcv(struct tipc_link >> *l, struct tipc_msg *hdr, >> >> if (more(peers_snd_nxt, l->rcv_nxt + l->window)) >> >> return rc; >> >> >> - rc = tipc_link_bc_retrans(snd_l, l, from, to, xmitq); >> >> - >> >> l->snd_nxt = peers_snd_nxt; >> >> if (link_bc_rcv_gap(l)) >> >> rc |= TIPC_LINK_SND_STATE; >> >> @@ -2307,38 +2410,28 @@ int tipc_link_bc_sync_rcv(struct >> tipc_link *l, struct tipc_msg *hdr, >> >> return 0; >> >> } >> >> >> -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, >> >> - struct sk_buff_head *xmitq) >> >> +int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap, >> >> + struct tipc_gap_ack_blks *ga, >> >> + struct sk_buff_head *xmitq) >> >> { >> >> - struct sk_buff *skb, *tmp; >> >> - struct tipc_link *snd_l = l->bc_sndlink; >> >> + struct tipc_link *l = r->bc_sndlink; >> >> + bool unused = false; >> >> + int rc = 0; >> >> >> - if (!link_is_up(l) || !l->bc_peer_is_up) >> >> - return; >> >> + if (!link_is_up(r) || !r->bc_peer_is_up) >> >> + return 0; >> >> >> - if (!more(acked, l->acked)) >> >> - return; >> >> + if (less(acked, r->acked) || (acked == r->acked && !gap && !ga)) >> >> + return 0; >> >> >> - trace_tipc_link_bc_ack(l, l->acked, acked, &snd_l->transmq); >> >> - /* Skip over packets peer has already acked */ >> >> - skb_queue_walk(&snd_l->transmq, skb) { >> >> - if (more(buf_seqno(skb), l->acked)) >> >> - break; >> >> - } >> >> + trace_tipc_link_bc_ack(r, r->acked, acked, &l->transmq); >> >> + tipc_link_advance_transmq(l, r, acked, gap, ga, xmitq, &unused, >> &rc); >> >> >> - /* Update/release the packets peer is acking now */ >> >> - skb_queue_walk_from_safe(&snd_l->transmq, skb, tmp) { >> >> - if (more(buf_seqno(skb), acked)) >> >> - break; >> >> - if (!--TIPC_SKB_CB(skb)->ackers) { >> >> - __skb_unlink(skb, &snd_l->transmq); >> >> - kfree_skb(skb); >> >> - } >> >> - } >> >> - l->acked = acked; >> >> - tipc_link_advance_backlog(snd_l, xmitq); >> >> - if (unlikely(!skb_queue_empty(&snd_l->wakeupq))) >> >> - link_prepare_wakeup(snd_l); >> >> + tipc_link_advance_backlog(l, xmitq); >> >> + if (unlikely(!skb_queue_empty(&l->wakeupq))) >> >> + link_prepare_wakeup(l); >> >> + >> >> + return rc; >> >> } >> >> >> /* tipc_link_bc_nack_rcv(): receive broadcast nack message >> >> @@ -2366,8 +2459,7 @@ int tipc_link_bc_nack_rcv(struct tipc_link >> *l, struct sk_buff *skb, >> >> return 0; >> >> >> if (dnode == tipc_own_addr(l->net)) { >> >> - tipc_link_bc_ack_rcv(l, acked, xmitq); >> >> - rc = tipc_link_bc_retrans(l->bc_sndlink, l, from, to, >> xmitq); >> >> + rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, >> xmitq); >> >> l->stats.recv_nacks++; >> >> return rc; >> >> } >> >> diff --git a/net/tipc/link.h b/net/tipc/link.h >> >> index d3c1c3fc1659..0a0fa7350722 100644 >> >> --- a/net/tipc/link.h >> >> +++ b/net/tipc/link.h >> >> @@ -143,8 +143,11 @@ int tipc_link_bc_peers(struct tipc_link *l); >> >> void tipc_link_set_mtu(struct tipc_link *l, int mtu); >> >> int tipc_link_mtu(struct tipc_link *l); >> >> int tipc_link_mss(struct tipc_link *l); >> >> -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, >> >> - struct sk_buff_head *xmitq); >> >> +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct >> tipc_link *l, >> >> + struct tipc_msg *hdr, bool uc); >> >> +int tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, u16 gap, >> >> + struct tipc_gap_ack_blks *ga, >> >> + struct sk_buff_head *xmitq); >> >> void tipc_link_build_bc_sync_msg(struct tipc_link *l, >> >> struct sk_buff_head *xmitq); >> >> void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg >> *hdr); >> >> diff --git a/net/tipc/msg.h b/net/tipc/msg.h >> >> index 6d466ebdb64f..9a38f9c9d6eb 100644 >> >> --- a/net/tipc/msg.h >> >> +++ b/net/tipc/msg.h >> >> @@ -160,20 +160,26 @@ struct tipc_gap_ack { >> >> >> /* struct tipc_gap_ack_blks >> >> * @len: actual length of the record >> >> - * @gack_cnt: number of Gap ACK blocks in the record >> >> + * @bgack_cnt: number of Gap ACK blocks for broadcast in the record >> >> + * @ugack_cnt: number of Gap ACK blocks for unicast (following >> the broadcast >> >> + * ones) >> >> + * @start_index: starting index for "valid" broadcast Gap ACK >> blocks >> >> * @gacks: array of Gap ACK blocks >> >> */ >> >> struct tipc_gap_ack_blks { >> >> __be16 len; >> >> - u8 gack_cnt; >> >> - u8 reserved; >> >> + union { >> >> + u8 ugack_cnt; >> >> + u8 start_index; >> >> + }; >> >> + u8 bgack_cnt; >> >> struct tipc_gap_ack gacks[]; >> >> }; >> >> >> #define tipc_gap_ack_blks_sz(n) (sizeof(struct >> tipc_gap_ack_blks) + \ >> >> sizeof(struct tipc_ga... [truncated message content] |
From: Jon M. <jm...@re...> - 2020-03-16 19:51:08
|
On 3/16/20 2:18 PM, Jon Maloy wrote: > > > On 3/16/20 7:23 AM, Tuong Lien Tong wrote: >> [...] >> > The improvement shown here is truly impressive. However, you are only > showing tipc-pipe with small messages. How does this look when you > send full-size 66k messages? How does it scale when the number of > destinations grows up to tens or even hundreds? I am particularly > concerned that the use of unicast retransmission may become a > sub-optimization if the number of destinations is large. > > ///jon You should try the "multicast_blast" program under tipc-utils/test. That will give you numbers both on throughput and loss rates as you let the number of nodes grow. ///jon > >> BR/Tuong >> >> *From:* Jon Maloy <jm...@re...> >> *Sent:* Friday, March 13, 2020 10:47 PM >> *To:* Tuong Lien <tuo...@de...>; >> tip...@li...; ma...@do...; >> yin...@wi... >> *Subject:* Re: [PATCH RFC 1/2] tipc: add Gap ACK blocks support for >> broadcast link >> >> On 3/13/20 6:47 AM, Tuong Lien wrote: >> >> As achieved through commit 9195948fbf34 ("tipc: improve TIPC >> throughput >> >> by Gap ACK blocks"), we apply the same mechanism for the >> broadcast link >> >> as well. The 'Gap ACK blocks' data field in a >> 'PROTOCOL/STATE_MSG' will >> >> consist of two parts built for both the broadcast and unicast types: >> >> 31 16 15 0 >> >> +-------------+-------------+-------------+-------------+ >> >> | bgack_cnt | ugack_cnt | len | >> >> +-------------+-------------+-------------+-------------+ - >> >> | gap | ack | | >> >> +-------------+-------------+-------------+-------------+ > bc gacks >> >> : : : | >> >> +-------------+-------------+-------------+-------------+ - >> >> | gap | ack | | >> >> +-------------+-------------+-------------+-------------+ > uc gacks >> >> : : : | >> >> +-------------+-------------+-------------+-------------+ - >> >> which is "automatically" backward-compatible. >> >> We also increase the max number of Gap ACK blocks to 128, >> allowing upto >> >> 64 blocks per type (total buffer size = 516 bytes). >> >> Besides, the 'tipc_link_advance_transmq()' function is refactored >> which >> >> is applicable for both the unicast and broadcast cases now, so >> some old >> >> functions can be removed and the code is optimized. >> >> With the patch, TIPC broadcast is more robust regardless of >> packet loss >> >> or disorder, latency, ... in the underlying network. Its >> performance is >> >> boost up significantly. >> >> For example, experiment with a 5% packet loss rate results: >> >> $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 >> >> real 0m 42.46s >> >> user 0m 1.16s >> >> sys 0m 17.67s >> >> Without the patch: >> >> $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 >> >> real 5m 28.80s >> >> user 0m 0.85s >> >> sys 0m 3.62s >> >> Can you explain this? To me it seems like the elapsed time is reduced >> with a factor 328.8/42.46=7.7, while we are consuming significantly >> more CPU to achieve this. Doesn't that mean that we have much more >> retransmissions which are consuming CPU? Or is there some other >> explanation? >> >> ///jon >> >> >> Signed-off-by: Tuong Lien<tuo...@de...> >> <mailto:tuo...@de...> >> >> --- >> >> net/tipc/bcast.c | 9 +- >> >> net/tipc/link.c | 440 >> +++++++++++++++++++++++++++++++++---------------------- >> >> net/tipc/link.h | 7 +- >> >> net/tipc/msg.h | 14 +- >> >> net/tipc/node.c | 10 +- >> >> 5 files changed, 295 insertions(+), 185 deletions(-) >> >> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c >> >> index 4c20be08b9c4..3ce690a96ee9 100644 >> >> --- a/net/tipc/bcast.c >> >> +++ b/net/tipc/bcast.c >> >> @@ -474,7 +474,7 @@ void tipc_bcast_ack_rcv(struct net *net, >> struct tipc_link *l, >> >> __skb_queue_head_init(&xmitq); >> >> >> tipc_bcast_lock(net); >> >> - tipc_link_bc_ack_rcv(l, acked, &xmitq); >> >> + tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq); >> >> tipc_bcast_unlock(net); >> >> >> tipc_bcbase_xmit(net, &xmitq); >> >> @@ -492,6 +492,7 @@ int tipc_bcast_sync_rcv(struct net *net, >> struct tipc_link *l, >> >> struct tipc_msg *hdr) >> >> { >> >> struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq; >> >> + struct tipc_gap_ack_blks *ga; >> >> struct sk_buff_head xmitq; >> >> int rc = 0; >> >> >> @@ -501,8 +502,10 @@ int tipc_bcast_sync_rcv(struct net *net, >> struct tipc_link *l, >> >> if (msg_type(hdr) != STATE_MSG) { >> >> tipc_link_bc_init_rcv(l, hdr); >> >> } else if (!msg_bc_ack_invalid(hdr)) { >> >> - tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), &xmitq); >> >> - rc = tipc_link_bc_sync_rcv(l, hdr, &xmitq); >> >> + tipc_get_gap_ack_blks(&ga, l, hdr, false); >> >> + rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), >> >> + msg_bc_gap(hdr), ga, &xmitq); >> >> + rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq); >> >> } >> >> tipc_bcast_unlock(net); >> >> >> diff --git a/net/tipc/link.c b/net/tipc/link.c >> >> index 467c53a1fb5c..6198b6d89a69 100644 >> >> --- a/net/tipc/link.c >> >> +++ b/net/tipc/link.c >> >> @@ -188,6 +188,8 @@ struct tipc_link { >> >> /* Broadcast */ >> >> u16 ackers; >> >> u16 acked; >> >> + u16 last_gap; >> >> + struct tipc_gap_ack_blks *last_ga; >> >> struct tipc_link *bc_rcvlink; >> >> struct tipc_link *bc_sndlink; >> >> u8 nack_state; >> >> @@ -249,11 +251,14 @@ static int tipc_link_build_nack_msg(struct >> tipc_link *l, >> >> struct sk_buff_head *xmitq); >> >> static void tipc_link_build_bc_init_msg(struct tipc_link *l, >> >> struct sk_buff_head *xmitq); >> >> -static int tipc_link_release_pkts(struct tipc_link *l, u16 to); >> >> -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void >> *data, u16 gap); >> >> -static int tipc_link_advance_transmq(struct tipc_link *l, u16 >> acked, u16 gap, >> >> +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, >> >> + struct tipc_link *l, u8 start_index); >> >> +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct >> tipc_msg *hdr); >> >> +static int tipc_link_advance_transmq(struct tipc_link *l, struct >> tipc_link *r, >> >> + u16 acked, u16 gap, >> >> struct tipc_gap_ack_blks *ga, >> >> - struct sk_buff_head *xmitq); >> >> + struct sk_buff_head *xmitq, >> >> + bool *retransmitted, int *rc); >> >> static void tipc_link_update_cwin(struct tipc_link *l, int >> released, >> >> bool retransmitted); >> >> /* >> >> @@ -370,7 +375,7 @@ void tipc_link_remove_bc_peer(struct >> tipc_link *snd_l, >> >> snd_l->ackers--; >> >> rcv_l->bc_peer_is_up = true; >> >> rcv_l->state = LINK_ESTABLISHED; >> >> - tipc_link_bc_ack_rcv(rcv_l, ack, xmitq); >> >> + tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq); >> >> trace_tipc_link_reset(rcv_l, TIPC_DUMP_ALL, "bclink removed!"); >> >> tipc_link_reset(rcv_l); >> >> rcv_l->state = LINK_RESET; >> >> @@ -784,8 +789,6 @@ bool tipc_link_too_silent(struct tipc_link *l) >> >> return (l->silent_intv_cnt + 2 > l->abort_limit); >> >> } >> >> >> -static int tipc_link_bc_retrans(struct tipc_link *l, struct >> tipc_link *r, >> >> - u16 from, u16 to, struct sk_buff_head >> *xmitq); >> >> /* tipc_link_timeout - perform periodic task as instructed from >> node timeout >> >> */ >> >> int tipc_link_timeout(struct tipc_link *l, struct sk_buff_head >> *xmitq) >> >> @@ -948,6 +951,9 @@ void tipc_link_reset(struct tipc_link *l) >> >> l->snd_nxt_state = 1; >> >> l->rcv_nxt_state = 1; >> >> l->acked = 0; >> >> + l->last_gap = 0; >> >> + kfree(l->last_ga); >> >> + l->last_ga = NULL; >> >> l->silent_intv_cnt = 0; >> >> l->rst_cnt = 0; >> >> l->bc_peer_is_up = false; >> >> @@ -1183,68 +1189,14 @@ static bool >> link_retransmit_failure(struct tipc_link *l, struct tipc_link *r, >> >> >> if (link_is_bc_sndlink(l)) { >> >> r->state = LINK_RESET; >> >> - *rc = TIPC_LINK_DOWN_EVT; >> >> + *rc |= TIPC_LINK_DOWN_EVT; >> >> } else { >> >> - *rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT); >> >> + *rc |= tipc_link_fsm_evt(l, LINK_FAILURE_EVT); >> >> } >> >> >> return true; >> >> } >> >> >> -/* tipc_link_bc_retrans() - retransmit zero or more packets >> >> - * @l: the link to transmit on >> >> - * @r: the receiving link ordering the retransmit. Same as l if >> unicast >> >> - * @from: retransmit from (inclusive) this sequence number >> >> - * @to: retransmit to (inclusive) this sequence number >> >> - * xmitq: queue for accumulating the retransmitted packets >> >> - */ >> >> -static int tipc_link_bc_retrans(struct tipc_link *l, struct >> tipc_link *r, >> >> - u16 from, u16 to, struct sk_buff_head >> *xmitq) >> >> -{ >> >> - struct sk_buff *_skb, *skb = skb_peek(&l->transmq); >> >> - u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; >> >> - u16 ack = l->rcv_nxt - 1; >> >> - int retransmitted = 0; >> >> - struct tipc_msg *hdr; >> >> - int rc = 0; >> >> - >> >> - if (!skb) >> >> - return 0; >> >> - if (less(to, from)) >> >> - return 0; >> >> - >> >> - trace_tipc_link_retrans(r, from, to, &l->transmq); >> >> - >> >> - if (link_retransmit_failure(l, r, &rc)) >> >> - return rc; >> >> - >> >> - skb_queue_walk(&l->transmq, skb) { >> >> - hdr = buf_msg(skb); >> >> - if (less(msg_seqno(hdr), from)) >> >> - continue; >> >> - if (more(msg_seqno(hdr), to)) >> >> - break; >> >> - if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) >> >> - continue; >> >> - TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; >> >> - _skb = pskb_copy(skb, GFP_ATOMIC); >> >> - if (!_skb) >> >> - return 0; >> >> - hdr = buf_msg(_skb); >> >> - msg_set_ack(hdr, ack); >> >> - msg_set_bcast_ack(hdr, bc_ack); >> >> - _skb->priority = TC_PRIO_CONTROL; >> >> - __skb_queue_tail(xmitq, _skb); >> >> - l->stats.retransmitted++; >> >> - retransmitted++; >> >> - /* Increase actual retrans counter & mark first time */ >> >> - if (!TIPC_SKB_CB(skb)->retr_cnt++) >> >> - TIPC_SKB_CB(skb)->retr_stamp = jiffies; >> >> - } >> >> - tipc_link_update_cwin(l, 0, retransmitted); >> >> - return 0; >> >> -} >> >> - >> >> /* tipc_data_input - deliver data and name distr msgs to upper >> layer >> >> * >> >> * Consumes buffer if message is of right type >> >> @@ -1402,46 +1354,71 @@ static int tipc_link_tnl_rcv(struct >> tipc_link *l, struct sk_buff *skb, >> >> return rc; >> >> } >> >> >> -static int tipc_link_release_pkts(struct tipc_link *l, u16 acked) >> >> -{ >> >> - int released = 0; >> >> - struct sk_buff *skb, *tmp; >> >> - >> >> - skb_queue_walk_safe(&l->transmq, skb, tmp) { >> >> - if (more(buf_seqno(skb), acked)) >> >> - break; >> >> - __skb_unlink(skb, &l->transmq); >> >> - kfree_skb(skb); >> >> - released++; >> >> +/** >> >> + * tipc_get_gap_ack_blks - get Gap ACK blocks from >> PROTOCOL/STATE_MSG >> >> + * @ga: returned pointer to the Gap ACK blocks if any >> >> + * @l: the tipc link >> >> + * @hdr: the PROTOCOL/STATE_MSG header >> >> + * @uc: desired Gap ACK blocks type, i.e. unicast (= 1) or >> broadcast (= 0) >> >> + * >> >> + * Return: the total Gap ACK blocks size >> >> + */ >> >> +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct >> tipc_link *l, >> >> + struct tipc_msg *hdr, bool uc) >> >> +{ >> >> + struct tipc_gap_ack_blks *p; >> >> + u16 sz = 0; >> >> + >> >> + /* Does peer support the Gap ACK blocks feature? */ >> >> + if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { >> >> + p = (struct tipc_gap_ack_blks *)msg_data(hdr); >> >> + sz = ntohs(p->len); >> >> + /* Sanity check */ >> >> + if (sz == tipc_gap_ack_blks_sz(p->ugack_cnt + >> p->bgack_cnt)) { >> >> + /* Good, check if the desired type exists */ >> >> + if ((uc && p->ugack_cnt) || (!uc && p->bgack_cnt)) >> >> + goto ok; >> >> + /* Backward compatible: peer might not support bc, but >> uc? */ >> >> + } else if (uc && sz == >> tipc_gap_ack_blks_sz(p->ugack_cnt)) { >> >> + if (p->ugack_cnt) { >> >> + p->bgack_cnt = 0; >> >> + goto ok; >> >> + } >> >> + } >> >> } >> >> - return released; >> >> + /* Other cases: ignore! */ >> >> + p = NULL; >> >> + >> >> +ok: >> >> + *ga = p; >> >> + return sz; >> >> } >> >> >> -/* tipc_build_gap_ack_blks - build Gap ACK blocks >> >> - * @l: tipc link that data have come with gaps in sequence if any >> >> - * @data: data buffer to store the Gap ACK blocks after built >> >> - * >> >> - * returns the actual allocated memory size >> >> - */ >> >> -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void >> *data, u16 gap) >> >> +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, >> >> + struct tipc_link *l, u8 start_index) >> >> { >> >> + struct tipc_gap_ack *gacks = &ga->gacks[start_index]; >> >> struct sk_buff *skb = skb_peek(&l->deferdq); >> >> - struct tipc_gap_ack_blks *ga = data; >> >> - u16 len, expect, seqno = 0; >> >> + u16 expect, seqno = 0; >> >> u8 n = 0; >> >> >> - if (!skb || !gap) >> >> - goto exit; >> >> + if (!skb) >> >> + return 0; >> >> >> expect = buf_seqno(skb); >> >> skb_queue_walk(&l->deferdq, skb) { >> >> seqno = buf_seqno(skb); >> >> if (unlikely(more(seqno, expect))) { >> >> - ga->gacks[n].ack = htons(expect - 1); >> >> - ga->gacks[n].gap = htons(seqno - expect); >> >> - if (++n >= MAX_GAP_ACK_BLKS) { >> >> - pr_info_ratelimited("Too few Gap ACK >> blocks!\n"); >> >> - goto exit; >> >> + gacks[n].ack = htons(expect - 1); >> >> + gacks[n].gap = htons(seqno - expect); >> >> + if (++n >= MAX_GAP_ACK_BLKS / 2) { >> >> + char buf[TIPC_MAX_LINK_NAME]; >> >> + >> >> + pr_info_ratelimited("Gacks on %s: %d, >> ql: %d!\n", >> >> + tipc_link_name_ext(l, buf), >> >> + n, >> >> + skb_queue_len(&l->deferdq)); >> >> + return n; >> >> } >> >> } else if (unlikely(less(seqno, expect))) { >> >> pr_warn("Unexpected skb in deferdq!\n"); >> >> @@ -1451,14 +1428,57 @@ static u16 tipc_build_gap_ack_blks(struct >> tipc_link *l, void *data, u16 gap) >> >> } >> >> >> /* last block */ >> >> - ga->gacks[n].ack = htons(seqno); >> >> - ga->gacks[n].gap = 0; >> >> + gacks[n].ack = htons(seqno); >> >> + gacks[n].gap = 0; >> >> n++; >> >> + return n; >> >> +} >> >> >> -exit: >> >> - len = tipc_gap_ack_blks_sz(n); >> >> +/* tipc_build_gap_ack_blks - build Gap ACK blocks >> >> + * @l: tipc unicast link >> >> + * @hdr: the tipc message buffer to store the Gap ACK blocks >> after built >> >> + * >> >> + * The function builds Gap ACK blocks for both the unicast & >> broadcast receiver >> >> + * links of a certain peer, the buffer after built has the >> network data format >> >> + * as follows: >> >> + * 31 16 15 0 >> >> + * +-------------+-------------+-------------+-------------+ >> >> + * | bgack_cnt | ugack_cnt | len | >> >> + * +-------------+-------------+-------------+-------------+ - >> >> + * | gap | ack | | >> >> + * +-------------+-------------+-------------+-------------+ > >> bc gacks >> >> + * : : : | >> >> + * +-------------+-------------+-------------+-------------+ - >> >> + * | gap | ack | | >> >> + * +-------------+-------------+-------------+-------------+ > >> uc gacks >> >> + * : : : | >> >> + * +-------------+-------------+-------------+-------------+ - >> >> + * (See struct tipc_gap_ack_blks) >> >> + * >> >> + * returns the actual allocated memory size >> >> + */ >> >> +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct >> tipc_msg *hdr) >> >> +{ >> >> + struct tipc_link *bcl = l->bc_rcvlink; >> >> + struct tipc_gap_ack_blks *ga; >> >> + u16 len; >> >> + >> >> + ga = (struct tipc_gap_ack_blks *)msg_data(hdr); >> >> + >> >> + /* Start with broadcast link first */ >> >> + tipc_bcast_lock(bcl->net); >> >> + msg_set_bcast_ack(hdr, bcl->rcv_nxt - 1); >> >> + msg_set_bc_gap(hdr, link_bc_rcv_gap(bcl)); >> >> + ga->bgack_cnt = __tipc_build_gap_ack_blks(ga, bcl, 0); >> >> + tipc_bcast_unlock(bcl->net); >> >> + >> >> + /* Now for unicast link, but an explicit NACK only (???) */ >> >> + ga->ugack_cnt = (msg_seq_gap(hdr)) ? >> >> + __tipc_build_gap_ack_blks(ga, l, ga->bgack_cnt) >> : 0; >> >> + >> >> + /* Total len */ >> >> + len = tipc_gap_ack_blks_sz(ga->bgack_cnt + ga->ugack_cnt); >> >> ga->len = htons(len); >> >> - ga->gack_cnt = n; >> >> return len; >> >> } >> >> >> @@ -1466,47 +1486,111 @@ static u16 >> tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) >> >> * acked packets, also doing >> retransmissions if >> >> * gaps found >> >> * @l: tipc link with transmq queue to be advanced >> >> + * @r: tipc link "receiver" i.e. in case of broadcast (= "l" if >> unicast) >> >> * @acked: seqno of last packet acked by peer without any gaps >> before >> >> * @gap: # of gap packets >> >> * @ga: buffer pointer to Gap ACK blocks from peer >> >> * @xmitq: queue for accumulating the retransmitted packets if any >> >> + * @retransmitted: returned boolean value if a retransmission is >> really issued >> >> + * @rc: returned code e.g. TIPC_LINK_DOWN_EVT if a repeated >> retransmit failures >> >> + * happens (- unlikely case) >> >> * >> >> - * In case of a repeated retransmit failures, the call will >> return shortly >> >> - * with a returned code (e.g. TIPC_LINK_DOWN_EVT) >> >> + * Return: the number of packets released from the link transmq >> >> */ >> >> -static int tipc_link_advance_transmq(struct tipc_link *l, u16 >> acked, u16 gap, >> >> +static int tipc_link_advance_transmq(struct tipc_link *l, struct >> tipc_link *r, >> >> + u16 acked, u16 gap, >> >> struct tipc_gap_ack_blks *ga, >> >> - struct sk_buff_head *xmitq) >> >> + struct sk_buff_head *xmitq, >> >> + bool *retransmitted, int *rc) >> >> { >> >> + struct tipc_gap_ack_blks *last_ga = r->last_ga, *this_ga = NULL; >> >> + struct tipc_gap_ack *gacks = NULL; >> >> struct sk_buff *skb, *_skb, *tmp; >> >> struct tipc_msg *hdr; >> >> + u32 qlen = skb_queue_len(&l->transmq); >> >> + u16 nacked = acked, ngap = gap, gack_cnt = 0; >> >> u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; >> >> - bool retransmitted = false; >> >> u16 ack = l->rcv_nxt - 1; >> >> - bool passed = false; >> >> - u16 released = 0; >> >> u16 seqno, n = 0; >> >> - int rc = 0; >> >> + u16 end = r->acked, start = end, offset = r->last_gap; >> >> + u16 si = (last_ga) ? last_ga->start_index : 0; >> >> + bool is_uc = !link_is_bc_sndlink(l); >> >> + bool bc_has_acked = false; >> >> + >> >> + trace_tipc_link_retrans(r, acked + 1, acked + gap, &l->transmq); >> >> + >> >> + /* Determine Gap ACK blocks if any for the particular link */ >> >> + if (ga && is_uc) { >> >> + /* Get the Gap ACKs, uc part */ >> >> + gack_cnt = ga->ugack_cnt; >> >> + gacks = &ga->gacks[ga->bgack_cnt]; >> >> + } else if (ga) { >> >> + /* Copy the Gap ACKs, bc part, for later renewal if >> needed */ >> >> + this_ga = kmemdup(ga, tipc_gap_ack_blks_sz(ga->bgack_cnt), >> >> + GFP_ATOMIC); >> >> + if (likely(this_ga)) { >> >> + this_ga->start_index = 0; >> >> + /* Start with the bc Gap ACKs */ >> >> + gack_cnt = this_ga->bgack_cnt; >> >> + gacks = &this_ga->gacks[0]; >> >> + } else { >> >> + /* Hmm, we can get in trouble..., simply ignore >> it */ >> >> + pr_warn_ratelimited("Ignoring bc Gap ACKs, no >> memory\n"); >> >> + } >> >> + } >> >> >> + /* Advance the link transmq */ >> >> skb_queue_walk_safe(&l->transmq, skb, tmp) { >> >> seqno = buf_seqno(skb); >> >> >> next_gap_ack: >> >> - if (less_eq(seqno, acked)) { >> >> + if (less_eq(seqno, nacked)) { >> >> + if (is_uc) >> >> + goto release; >> >> + /* Skip packets peer has already acked */ >> >> + if (!more(seqno, r->acked)) >> >> + continue; >> >> + /* Get the next of last Gap ACK blocks */ >> >> + while (more(seqno, end)) { >> >> + if (!last_ga || si >= last_ga->bgack_cnt) >> >> + break; >> >> + start = end + offset + 1; >> >> + end = ntohs(last_ga->gacks[si].ack); >> >> + offset = ntohs(last_ga->gacks[si].gap); >> >> + si++; >> >> + WARN_ONCE(more(start, end) || >> >> + (!offset && >> >> + si < last_ga->bgack_cnt) || >> >> + si > MAX_GAP_ACK_BLKS, >> >> + "Corrupted Gap ACK: %d %d %d %d >> %d\n", >> >> + start, end, offset, si, >> >> + last_ga->bgack_cnt); >> >> + } >> >> + /* Check against the last Gap ACK block */ >> >> + if (in_range(seqno, start, end)) >> >> + continue; >> >> + /* Update/release the packet peer is acking */ >> >> + bc_has_acked = true; >> >> + if (--TIPC_SKB_CB(skb)->ackers) >> >> + continue; >> >> +release: >> >> /* release skb */ >> >> __skb_unlink(skb, &l->transmq); >> >> kfree_skb(skb); >> >> - released++; >> >> - } else if (less_eq(seqno, acked + gap)) { >> >> - /* First, check if repeated retrans failures >> occurs? */ >> >> - if (!passed && link_retransmit_failure(l, l, &rc)) >> >> - return rc; >> >> - passed = true; >> >> - >> >> + } else if (less_eq(seqno, nacked + ngap)) { >> >> + /* First gap: check if repeated retrans >> failures? */ >> >> + if (unlikely(seqno == acked + 1 && >> >> + link_retransmit_failure(l, r, rc))) { >> >> + /* Ignore this bc Gap ACKs if any */ >> >> + kfree(this_ga); >> >> + this_ga = NULL; >> >> + break; >> >> + } >> >> /* retransmit skb if unrestricted*/ >> >> if (time_before(jiffies, >> TIPC_SKB_CB(skb)->nxt_retr)) >> >> continue; >> >> - TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME; >> >> + TIPC_SKB_CB(skb)->nxt_retr = (is_uc) ? >> >> + TIPC_UC_RETR_TIME : >> TIPC_BC_RETR_LIM; >> >> _skb = pskb_copy(skb, GFP_ATOMIC); >> >> if (!_skb) >> >> continue; >> >> @@ -1516,25 +1600,50 @@ static int >> tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, >> >> _skb->priority = TC_PRIO_CONTROL; >> >> __skb_queue_tail(xmitq, _skb); >> >> l->stats.retransmitted++; >> >> - retransmitted = true; >> >> + *retransmitted = true; >> >> /* Increase actual retrans counter & mark first >> time */ >> >> if (!TIPC_SKB_CB(skb)->retr_cnt++) >> >> TIPC_SKB_CB(skb)->retr_stamp = jiffies; >> >> } else { >> >> /* retry with Gap ACK blocks if any */ >> >> - if (!ga || n >= ga->gack_cnt) >> >> + if (n >= gack_cnt) >> >> break; >> >> - acked = ntohs(ga->gacks[n].ack); >> >> - gap = ntohs(ga->gacks[n].gap); >> >> + nacked = ntohs(gacks[n].ack); >> >> + ngap = ntohs(gacks[n].gap); >> >> n++; >> >> goto next_gap_ack; >> >> } >> >> } >> >> - if (released || retransmitted) >> >> - tipc_link_update_cwin(l, released, retransmitted); >> >> - if (released) >> >> - tipc_link_advance_backlog(l, xmitq); >> >> - return 0; >> >> + >> >> + /* Renew last Gap ACK blocks for bc if needed */ >> >> + if (bc_has_acked) { >> >> + if (this_ga) { >> >> + kfree(last_ga); >> >> + r->last_ga = this_ga; >> >> + r->last_gap = gap; >> >> + } else if (last_ga) { >> >> + if (less(acked, start)) { >> >> + si--; >> >> + offset = start - acked - 1; >> >> + } else if (less(acked, end)) { >> >> + acked = end; >> >> + } >> >> + if (si < last_ga->bgack_cnt) { >> >> + last_ga->start_index = si; >> >> + r->last_gap = offset; >> >> + } else { >> >> + kfree(last_ga); >> >> + r->last_ga = NULL; >> >> + r->last_gap = 0; >> >> + } >> >> + } else { >> >> + r->last_gap = 0; >> >> + } >> >> + r->acked = acked; >> >> + } else { >> >> + kfree(this_ga); >> >> + } >> >> + return skb_queue_len(&l->transmq) - qlen; >> >> } >> >> >> /* tipc_link_build_state_msg: prepare link state message for >> transmission >> >> @@ -1651,7 +1760,8 @@ int tipc_link_rcv(struct tipc_link *l, >> struct sk_buff *skb, >> >> kfree_skb(skb); >> >> break; >> >> } >> >> - released += tipc_link_release_pkts(l, msg_ack(hdr)); >> >> + released += tipc_link_advance_transmq(l, l, >> msg_ack(hdr), 0, >> >> + NULL, NULL, NULL, >> NULL); >> >> >> /* Defer delivery if sequence gap */ >> >> if (unlikely(seqno != rcv_nxt)) { >> >> @@ -1739,7 +1849,7 @@ static void >> tipc_link_build_proto_msg(struct tipc_link *l, int mtyp, bool probe, >> >> msg_set_probe(hdr, probe); >> >> msg_set_is_keepalive(hdr, probe || probe_reply); >> >> if (l->peer_caps & TIPC_GAP_ACK_BLOCK) >> >> - glen = tipc_build_gap_ack_blks(l, data, rcvgap); >> >> + glen = tipc_build_gap_ack_blks(l, hdr); >> >> tipc_mon_prep(l->net, data + glen, &dlen, mstate, >> l->bearer_id); >> >> msg_set_size(hdr, INT_H_SIZE + glen + dlen); >> >> skb_trim(skb, INT_H_SIZE + glen + dlen); >> >> @@ -2027,20 +2137,19 @@ static int tipc_link_proto_rcv(struct >> tipc_link *l, struct sk_buff *skb, >> >> { >> >> struct tipc_msg *hdr = buf_msg(skb); >> >> struct tipc_gap_ack_blks *ga = NULL; >> >> - u16 rcvgap = 0; >> >> - u16 ack = msg_ack(hdr); >> >> - u16 gap = msg_seq_gap(hdr); >> >> + bool reply = msg_probe(hdr), retransmitted = false; >> >> + u16 dlen = msg_data_sz(hdr), glen = 0; >> >> u16 peers_snd_nxt = msg_next_sent(hdr); >> >> u16 peers_tol = msg_link_tolerance(hdr); >> >> u16 peers_prio = msg_linkprio(hdr); >> >> + u16 gap = msg_seq_gap(hdr); >> >> + u16 ack = msg_ack(hdr); >> >> u16 rcv_nxt = l->rcv_nxt; >> >> - u16 dlen = msg_data_sz(hdr); >> >> + u16 rcvgap = 0; >> >> int mtyp = msg_type(hdr); >> >> - bool reply = msg_probe(hdr); >> >> - u16 glen = 0; >> >> - void *data; >> >> + int rc = 0, released; >> >> char *if_name; >> >> - int rc = 0; >> >> + void *data; >> >> >> trace_tipc_proto_rcv(skb, false, l->name); >> >> if (tipc_link_is_blocked(l) || !xmitq) >> >> @@ -2137,13 +2246,7 @@ static int tipc_link_proto_rcv(struct >> tipc_link *l, struct sk_buff *skb, >> >> } >> >> >> /* Receive Gap ACK blocks from peer if any */ >> >> - if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { >> >> - ga = (struct tipc_gap_ack_blks *)data; >> >> - glen = ntohs(ga->len); >> >> - /* sanity check: if failed, ignore Gap ACK >> blocks */ >> >> - if (glen != tipc_gap_ack_blks_sz(ga->gack_cnt)) >> >> - ga = NULL; >> >> - } >> >> + glen = tipc_get_gap_ack_blks(&ga, l, hdr, true); >> >> >> tipc_mon_rcv(l->net, data + glen, dlen - glen, l->addr, >> >> &l->mon_state, l->bearer_id); >> >> @@ -2158,9 +2261,14 @@ static int tipc_link_proto_rcv(struct >> tipc_link *l, struct sk_buff *skb, >> >> tipc_link_build_proto_msg(l, STATE_MSG, 0, reply, >> >> rcvgap, 0, 0, xmitq); >> >> >> - rc |= tipc_link_advance_transmq(l, ack, gap, ga, xmitq); >> >> + released = tipc_link_advance_transmq(l, l, ack, gap, ga, >> xmitq, >> >> + &retransmitted, &rc); >> >> if (gap) >> >> l->stats.recv_nacks++; >> >> + if (released || retransmitted) >> >> + tipc_link_update_cwin(l, released, retransmitted); >> >> + if (released) >> >> + tipc_link_advance_backlog(l, xmitq); >> >> if (unlikely(!skb_queue_empty(&l->wakeupq))) >> >> link_prepare_wakeup(l); >> >> } >> >> @@ -2246,10 +2354,7 @@ void tipc_link_bc_init_rcv(struct >> tipc_link *l, struct tipc_msg *hdr) >> >> int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg >> *hdr, >> >> struct sk_buff_head *xmitq) >> >> { >> >> - struct tipc_link *snd_l = l->bc_sndlink; >> >> u16 peers_snd_nxt = msg_bc_snd_nxt(hdr); >> >> - u16 from = msg_bcast_ack(hdr) + 1; >> >> - u16 to = from + msg_bc_gap(hdr) - 1; >> >> int rc = 0; >> >> >> if (!link_is_up(l)) >> >> @@ -2271,8 +2376,6 @@ int tipc_link_bc_sync_rcv(struct tipc_link >> *l, struct tipc_msg *hdr, >> >> if (more(peers_snd_nxt, l->rcv_nxt + l->window)) >> >> return rc; >> >> >> - rc = tipc_link_bc_retrans(snd_l, l, from, to, xmitq); >> >> - >> >> l->snd_nxt = peers_snd_nxt; >> >> if (link_bc_rcv_gap(l)) >> >> rc |= TIPC_LINK_SND_STATE; >> >> @@ -2307,38 +2410,28 @@ int tipc_link_bc_sync_rcv(struct >> tipc_link *l, struct tipc_msg *hdr, >> >> return 0; >> >> } >> >> >> -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, >> >> - struct sk_buff_head *xmitq) >> >> +int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap, >> >> + struct tipc_gap_ack_blks *ga, >> >> + struct sk_buff_head *xmitq) >> >> { >> >> - struct sk_buff *skb, *tmp; >> >> - struct tipc_link *snd_l = l->bc_sndlink; >> >> + struct tipc_link *l = r->bc_sndlink; >> >> + bool unused = false; >> >> + int rc = 0; >> >> >> - if (!link_is_up(l) || !l->bc_peer_is_up) >> >> - return; >> >> + if (!link_is_up(r) || !r->bc_peer_is_up) >> >> + return 0; >> >> >> - if (!more(acked, l->acked)) >> >> - return; >> >> + if (less(acked, r->acked) || (acked == r->acked && !gap && !ga)) >> >> + return 0; >> >> >> - trace_tipc_link_bc_ack(l, l->acked, acked, &snd_l->transmq); >> >> - /* Skip over packets peer has already acked */ >> >> - skb_queue_walk(&snd_l->transmq, skb) { >> >> - if (more(buf_seqno(skb), l->acked)) >> >> - break; >> >> - } >> >> + trace_tipc_link_bc_ack(r, r->acked, acked, &l->transmq); >> >> + tipc_link_advance_transmq(l, r, acked, gap, ga, xmitq, &unused, >> &rc); >> >> >> - /* Update/release the packets peer is acking now */ >> >> - skb_queue_walk_from_safe(&snd_l->transmq, skb, tmp) { >> >> - if (more(buf_seqno(skb), acked)) >> >> - break; >> >> - if (!--TIPC_SKB_CB(skb)->ackers) { >> >> - __skb_unlink(skb, &snd_l->transmq); >> >> - kfree_skb(skb); >> >> - } >> >> - } >> >> - l->acked = acked; >> >> - tipc_link_advance_backlog(snd_l, xmitq); >> >> - if (unlikely(!skb_queue_empty(&snd_l->wakeupq))) >> >> - link_prepare_wakeup(snd_l); >> >> + tipc_link_advance_backlog(l, xmitq); >> >> + if (unlikely(!skb_queue_empty(&l->wakeupq))) >> >> + link_prepare_wakeup(l); >> >> + >> >> + return rc; >> >> } >> >> >> /* tipc_link_bc_nack_rcv(): receive broadcast nack message >> >> @@ -2366,8 +2459,7 @@ int tipc_link_bc_nack_rcv(struct tipc_link >> *l, struct sk_buff *skb, >> >> return 0; >> >> >> if (dnode == tipc_own_addr(l->net)) { >> >> - tipc_link_bc_ack_rcv(l, acked, xmitq); >> >> - rc = tipc_link_bc_retrans(l->bc_sndlink, l, from, to, >> xmitq); >> >> + rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, >> xmitq); >> >> l->stats.recv_nacks++; >> >> return rc; >> >> } >> >> diff --git a/net/tipc/link.h b/net/tipc/link.h >> >> index d3c1c3fc1659..0a0fa7350722 100644 >> >> --- a/net/tipc/link.h >> >> +++ b/net/tipc/link.h >> >> @@ -143,8 +143,11 @@ int tipc_link_bc_peers(struct tipc_link *l); >> >> void tipc_link_set_mtu(struct tipc_link *l, int mtu); >> >> int tipc_link_mtu(struct tipc_link *l); >> >> int tipc_link_mss(struct tipc_link *l); >> >> -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, >> >> - struct sk_buff_head *xmitq); >> >> +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct >> tipc_link *l, >> >> + struct tipc_msg *hdr, bool uc); >> >> +int tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, u16 gap, >> >> + struct tipc_gap_ack_blks *ga, >> >> + struct sk_buff_head *xmitq); >> >> void tipc_link_build_bc_sync_msg(struct tipc_link *l, >> >> struct sk_buff_head *xmitq); >> >> void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg >> *hdr); >> >> diff --git a/net/tipc/msg.h b/net/tipc/msg.h >> >> index 6d466ebdb64f..9a38f9c9d6eb 100644 >> >> --- a/net/tipc/msg.h >> >> +++ b/net/tipc/msg.h >> >> @@ -160,20 +160,26 @@ struct tipc_gap_ack { >> >> >> /* struct tipc_gap_ack_blks >> >> * @len: actual length of the record >> >> - * @gack_cnt: number of Gap ACK blocks in the record >> >> + * @bgack_cnt: number of Gap ACK blocks for broadcast in the record >> >> + * @ugack_cnt: number of Gap ACK blocks for unicast (following >> the broadcast >> >> + * ones) >> >> + * @start_index: starting index for "valid" broadcast Gap ACK >> blocks >> >> * @gacks: array of Gap ACK blocks >> >> */ >> >> struct tipc_gap_ack_blks { >> >> __be16 len; >> >> - u8 gack_cnt; >> >> - u8 reserved; >> >> + union { >> >> + u8 ugack_cnt; >> >> + u8 start_index; >> >> + }; >> >> + u8 bgack_cnt; >> >> struct tipc_gap_ack gacks[]; >> >> }; >> >> >> #define tipc_gap_ack_blks_sz(n) (sizeof(struct >> tipc_gap_ack_blks) + \ >> >> sizeof(struct tipc_gap_ack) * (n)) >> >> >> -#define MAX_GAP_ACK_BLKS 32 >> >> +#define MAX_GAP_ACK_BLKS 128 >> >> #define MAX_GAP_ACK_BLKS_SZ �� >> tipc_gap_ack_blks_sz(MAX_GAP_ACK_BLKS) >> >> >> static inline struct tipc_msg *buf_msg(struct sk_buff *skb) >> >> diff --git a/net/tipc/node.c b/net/tipc/node.c >> >> index 0c88778c88b5..eb6b62de81a7 100644 >> >> --- a/net/tipc/node.c >> >> +++ b/net/tipc/node.c >> >> @@ -2069,10 +2069,16 @@ void tipc_rcv(struct net *net, struct >> sk_buff *skb, struct tipc_bearer *b) >> >> le = &n->links[bearer_id]; >> >> >> /* Ensure broadcast reception is in synch with peer's send >> state */ >> >> - if (unlikely(usr == LINK_PROTOCOL)) >> >> + if (unlikely(usr == LINK_PROTOCOL)) { >> >> + if (unlikely(skb_linearize(skb))) { >> >> + tipc_node_put(n); >> >> + goto discard; >> >> + } >> >> + hdr = buf_msg(skb); >> >> tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq); >> >> - else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) >> >> + } else if (unlikely(tipc_link_acked(n->bc_entry.link) != >> bc_ack)) { >> >> tipc_bcast_ack_rcv(net, n->bc_entry.link, hdr); >> >> + } >> >> >> /* Receive packet directly if conditions permit */ >> >> tipc_node_read_lock(n); >> >> -- >> > > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > |
From: Jon M. <jm...@re...> - 2020-03-16 18:39:08
|
On 3/16/20 7:23 AM, Tuong Lien Tong wrote: > > Hi Jon, > > I don’t think that is because of retransmissions… also the ‘time’ > command will only measure the program’s execution time, not the whole > system or kernel (nor a retransmission by kernel). I have repeated the > experiments and collected some statistics, so we can see things more > clearly: > > With the patch: > > > > Without the patch: > > # time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 > > real *0m 52.97s* > > user 0m 1.12s > > sys 0m 17.35s > > # # tipc l st sh > > Link <broadcast-link> > > Window:50 packets > > RX packets:0 fragments:0/0 bundles:0/0 > > TX packets:836930 fragments:0/0 bundles:98095/761165 > > RX naks:0 defs:0 dups:0 > > TX naks:0 acks:0 retrans:43874 > > Congestion link:296 Send queue max:0 avg:0 > > > > # time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 > > real *5m 21.75s* > > user 0m 0.67s > > sys 0m 2.13s > > # tipc l st sh > > Link <broadcast-link> > > Window:50 packets > > RX packets:0 fragments:0/0 bundles:0/0 > > TX packets:218031 fragments:0/0 bundles:184629/1466598 > > RX naks:0 defs:0 dups:0 > > TX naks:0 acks:0 retrans:13235 > > Congestion link:1923 Send queue max:0 avg:0 > > Yes, we had much more retransmissions but its ratio was really better: > 5.24% vs. 6.07%! In fact, we had fewer bundles due to less congestion… > I see. The much lower occurrence of bundling explains it. The CPU has to drive more than half of the messages individually down through the stack, something taking much more cycles of course. > Testing with a large message data size, we can see the difference more > accurately: > > With the patch: > > > > Without the patch: > > time tipc-pipe --mc --rdm --data_size 1000 --data_num 1500000 > > real *1m 6.81s* > > user 0m 3.03s > > sys 0m 37.22s > > # tipc l st sh > > Link <broadcast-link> > > Window:50 packets > > RX packets:0 fragments:0/0 bundles:0/0 > > TX packets:1500000 fragments:0/0 bundles:0/0 > > RX naks:0 defs:0 dups:0 > > TX naks:0 acks:0 retrans:*79505* //~ 5.30% > > Congestion link:249 Send queue max:0 avg:0 > > > > # time tipc-pipe --mc --rdm --data_size 1000 --data_num 1500000 > > ^CCommand terminated by signal 2 //I terminated it... > > real *35m 53.35s* > > user 0m 1.41s > > sys 0m 8.36s > > # tipc l st sh > > Link <broadcast-link> > > Window:50 packets > > RX packets:0 fragments:0/0 bundles:0/0 > > TX packets:1364457 fragments:0/0 bundles:0/0 > > RX naks:0 defs:0 dups:0 > > TX naks:0 acks:0 retrans:*90282* //~6.62% > > Congestion link:11348 Send queue max:0 avg:0 > > Actually, we can explain your concern as follows: with the patch, the > ‘transmq’ is advanced faster, so sending messages is less likely to > face link congestion or window. That means, the program will not have > any ‘relax’ time but get involved in message xmit-ing to the link, l2 > layers… directly and it’s counted as the ‘sys’ time. In contrary, > without the patch, the program ‘send()’ calls will shortly be returned > because of link congestions and just have to wait + retry, but that’s > user part! Obviously, the ‘time’ doesn’t cover everything here, > especially the kernel work to complete the whole process (which is I > believe also better with the patch…), but what's interesting is still > the real time that user will notice? > The improvement shown here is truly impressive. However, you are only showing tipc-pipe with small messages. How does this look when you send full-size 66k messages? How does it scale when the number of destinations grows up to tens or even hundreds? I am particularly concerned that the use of unicast retransmission may become a sub-optimization if the number of destinations is large. ///jon > BR/Tuong > > *From:* Jon Maloy <jm...@re...> > *Sent:* Friday, March 13, 2020 10:47 PM > *To:* Tuong Lien <tuo...@de...>; > tip...@li...; ma...@do...; > yin...@wi... > *Subject:* Re: [PATCH RFC 1/2] tipc: add Gap ACK blocks support for > broadcast link > > On 3/13/20 6:47 AM, Tuong Lien wrote: > > As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput > > by Gap ACK blocks"), we apply the same mechanism for the broadcast link > > as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will > > consist of two parts built for both the broadcast and unicast types: > > 31 16 15 0 > > +-------------+-------------+-------------+-------------+ > > | bgack_cnt | ugack_cnt | len | > > +-------------+-------------+-------------+-------------+ - > > | gap | ack | | > > +-------------+-------------+-------------+-------------+ > bc gacks > > : : : | > > +-------------+-------------+-------------+-------------+ - > > | gap | ack | | > > +-------------+-------------+-------------+-------------+ > uc gacks > > : : : | > > +-------------+-------------+-------------+-------------+ - > > which is "automatically" backward-compatible. > > We also increase the max number of Gap ACK blocks to 128, allowing upto > > 64 blocks per type (total buffer size = 516 bytes). > > Besides, the 'tipc_link_advance_transmq()' function is refactored which > > is applicable for both the unicast and broadcast cases now, so some old > > functions can be removed and the code is optimized. > > With the patch, TIPC broadcast is more robust regardless of packet loss > > or disorder, latency, ... in the underlying network. Its performance is > > boost up significantly. > > For example, experiment with a 5% packet loss rate results: > > $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 > > real 0m 42.46s > > user 0m 1.16s > > sys 0m 17.67s > > Without the patch: > > $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 > > real 5m 28.80s > > user 0m 0.85s > > sys 0m 3.62s > > Can you explain this? To me it seems like the elapsed time is reduced > with a factor 328.8/42.46=7.7, while we are consuming significantly > more CPU to achieve this. Doesn't that mean that we have much more > retransmissions which are consuming CPU? Or is there some other > explanation? > > ///jon > > > Signed-off-by: Tuong Lien<tuo...@de...> <mailto:tuo...@de...> > > --- > > net/tipc/bcast.c | 9 +- > > net/tipc/link.c | 440 +++++++++++++++++++++++++++++++++---------------------- > > net/tipc/link.h | 7 +- > > net/tipc/msg.h | 14 +- > > net/tipc/node.c | 10 +- > > 5 files changed, 295 insertions(+), 185 deletions(-) > > diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c > > index 4c20be08b9c4..3ce690a96ee9 100644 > > --- a/net/tipc/bcast.c > > +++ b/net/tipc/bcast.c > > @@ -474,7 +474,7 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, > > __skb_queue_head_init(&xmitq); > > > > tipc_bcast_lock(net); > > - tipc_link_bc_ack_rcv(l, acked, &xmitq); > > + tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq); > > tipc_bcast_unlock(net); > > > > tipc_bcbase_xmit(net, &xmitq); > > @@ -492,6 +492,7 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, > > struct tipc_msg *hdr) > > { > > struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq; > > + struct tipc_gap_ack_blks *ga; > > struct sk_buff_head xmitq; > > int rc = 0; > > > > @@ -501,8 +502,10 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, > > if (msg_type(hdr) != STATE_MSG) { > > tipc_link_bc_init_rcv(l, hdr); > > } else if (!msg_bc_ack_invalid(hdr)) { > > - tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), &xmitq); > > - rc = tipc_link_bc_sync_rcv(l, hdr, &xmitq); > > + tipc_get_gap_ack_blks(&ga, l, hdr, false); > > + rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), > > + msg_bc_gap(hdr), ga, &xmitq); > > + rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq); > > } > > tipc_bcast_unlock(net); > > > > diff --git a/net/tipc/link.c b/net/tipc/link.c > > index 467c53a1fb5c..6198b6d89a69 100644 > > --- a/net/tipc/link.c > > +++ b/net/tipc/link.c > > @@ -188,6 +188,8 @@ struct tipc_link { > > /* Broadcast */ > > u16 ackers; > > u16 acked; > > + u16 last_gap; > > + struct tipc_gap_ack_blks *last_ga; > > struct tipc_link *bc_rcvlink; > > struct tipc_link *bc_sndlink; > > u8 nack_state; > > @@ -249,11 +251,14 @@ static int tipc_link_build_nack_msg(struct tipc_link *l, > > struct sk_buff_head *xmitq); > > static void tipc_link_build_bc_init_msg(struct tipc_link *l, > > struct sk_buff_head *xmitq); > > -static int tipc_link_release_pkts(struct tipc_link *l, u16 to); > > -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap); > > -static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, > > +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, > > + struct tipc_link *l, u8 start_index); > > +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct tipc_msg *hdr); > > +static int tipc_link_advance_transmq(struct tipc_link *l, struct tipc_link *r, > > + u16 acked, u16 gap, > > struct tipc_gap_ack_blks *ga, > > - struct sk_buff_head *xmitq); > > + struct sk_buff_head *xmitq, > > + bool *retransmitted, int *rc); > > static void tipc_link_update_cwin(struct tipc_link *l, int released, > > bool retransmitted); > > /* > > @@ -370,7 +375,7 @@ void tipc_link_remove_bc_peer(struct tipc_link *snd_l, > > snd_l->ackers--; > > rcv_l->bc_peer_is_up = true; > > rcv_l->state = LINK_ESTABLISHED; > > - tipc_link_bc_ack_rcv(rcv_l, ack, xmitq); > > + tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq); > > trace_tipc_link_reset(rcv_l, TIPC_DUMP_ALL, "bclink removed!"); > > tipc_link_reset(rcv_l); > > rcv_l->state = LINK_RESET; > > @@ -784,8 +789,6 @@ bool tipc_link_too_silent(struct tipc_link *l) > > return (l->silent_intv_cnt + 2 > l->abort_limit); > > } > > > > -static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, > > - u16 from, u16 to, struct sk_buff_head *xmitq); > > /* tipc_link_timeout - perform periodic task as instructed from node timeout > > */ > > int tipc_link_timeout(struct tipc_link *l, struct sk_buff_head *xmitq) > > @@ -948,6 +951,9 @@ void tipc_link_reset(struct tipc_link *l) > > l->snd_nxt_state = 1; > > l->rcv_nxt_state = 1; > > l->acked = 0; > > + l->last_gap = 0; > > + kfree(l->last_ga); > > + l->last_ga = NULL; > > l->silent_intv_cnt = 0; > > l->rst_cnt = 0; > > l->bc_peer_is_up = false; > > @@ -1183,68 +1189,14 @@ static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r, > > > > if (link_is_bc_sndlink(l)) { > > r->state = LINK_RESET; > > - *rc = TIPC_LINK_DOWN_EVT; > > + *rc |= TIPC_LINK_DOWN_EVT; > > } else { > > - *rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT); > > + *rc |= tipc_link_fsm_evt(l, LINK_FAILURE_EVT); > > } > > > > return true; > > } > > > > -/* tipc_link_bc_retrans() - retransmit zero or more packets > > - * @l: the link to transmit on > > - * @r: the receiving link ordering the retransmit. Same as l if unicast > > - * @from: retransmit from (inclusive) this sequence number > > - * @to: retransmit to (inclusive) this sequence number > > - * xmitq: queue for accumulating the retransmitted packets > > - */ > > -static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, > > - u16 from, u16 to, struct sk_buff_head *xmitq) > > -{ > > - struct sk_buff *_skb, *skb = skb_peek(&l->transmq); > > - u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; > > - u16 ack = l->rcv_nxt - 1; > > - int retransmitted = 0; > > - struct tipc_msg *hdr; > > - int rc = 0; > > - > > - if (!skb) > > - return 0; > > - if (less(to, from)) > > - return 0; > > - > > - trace_tipc_link_retrans(r, from, to, &l->transmq); > > - > > - if (link_retransmit_failure(l, r, &rc)) > > - return rc; > > - > > - skb_queue_walk(&l->transmq, skb) { > > - hdr = buf_msg(skb); > > - if (less(msg_seqno(hdr), from)) > > - continue; > > - if (more(msg_seqno(hdr), to)) > > - break; > > - if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) > > - continue; > > - TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; > > - _skb = pskb_copy(skb, GFP_ATOMIC); > > - if (!_skb) > > - return 0; > > - hdr = buf_msg(_skb); > > - msg_set_ack(hdr, ack); > > - msg_set_bcast_ack(hdr, bc_ack); > > - _skb->priority = TC_PRIO_CONTROL; > > - __skb_queue_tail(xmitq, _skb); > > - l->stats.retransmitted++; > > - retransmitted++; > > - /* Increase actual retrans counter & mark first time */ > > - if (!TIPC_SKB_CB(skb)->retr_cnt++) > > - TIPC_SKB_CB(skb)->retr_stamp = jiffies; > > - } > > - tipc_link_update_cwin(l, 0, retransmitted); > > - return 0; > > -} > > - > > /* tipc_data_input - deliver data and name distr msgs to upper layer > > * > > * Consumes buffer if message is of right type > > @@ -1402,46 +1354,71 @@ static int tipc_link_tnl_rcv(struct tipc_link *l, struct sk_buff *skb, > > return rc; > > } > > > > -static int tipc_link_release_pkts(struct tipc_link *l, u16 acked) > > -{ > > - int released = 0; > > - struct sk_buff *skb, *tmp; > > - > > - skb_queue_walk_safe(&l->transmq, skb, tmp) { > > - if (more(buf_seqno(skb), acked)) > > - break; > > - __skb_unlink(skb, &l->transmq); > > - kfree_skb(skb); > > - released++; > > +/** > > + * tipc_get_gap_ack_blks - get Gap ACK blocks from PROTOCOL/STATE_MSG > > + * @ga: returned pointer to the Gap ACK blocks if any > > + * @l: the tipc link > > + * @hdr: the PROTOCOL/STATE_MSG header > > + * @uc: desired Gap ACK blocks type, i.e. unicast (= 1) or broadcast (= 0) > > + * > > + * Return: the total Gap ACK blocks size > > + */ > > +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct tipc_link *l, > > + struct tipc_msg *hdr, bool uc) > > +{ > > + struct tipc_gap_ack_blks *p; > > + u16 sz = 0; > > + > > + /* Does peer support the Gap ACK blocks feature? */ > > + if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { > > + p = (struct tipc_gap_ack_blks *)msg_data(hdr); > > + sz = ntohs(p->len); > > + /* Sanity check */ > > + if (sz == tipc_gap_ack_blks_sz(p->ugack_cnt + p->bgack_cnt)) { > > + /* Good, check if the desired type exists */ > > + if ((uc && p->ugack_cnt) || (!uc && p->bgack_cnt)) > > + goto ok; > > + /* Backward compatible: peer might not support bc, but uc? */ > > + } else if (uc && sz == tipc_gap_ack_blks_sz(p->ugack_cnt)) { > > + if (p->ugack_cnt) { > > + p->bgack_cnt = 0; > > + goto ok; > > + } > > + } > > } > > - return released; > > + /* Other cases: ignore! */ > > + p = NULL; > > + > > +ok: > > + *ga = p; > > + return sz; > > } > > > > -/* tipc_build_gap_ack_blks - build Gap ACK blocks > > - * @l: tipc link that data have come with gaps in sequence if any > > - * @data: data buffer to store the Gap ACK blocks after built > > - * > > - * returns the actual allocated memory size > > - */ > > -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) > > +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, > > + struct tipc_link *l, u8 start_index) > > { > > + struct tipc_gap_ack *gacks = &ga->gacks[start_index]; > > struct sk_buff *skb = skb_peek(&l->deferdq); > > - struct tipc_gap_ack_blks *ga = data; > > - u16 len, expect, seqno = 0; > > + u16 expect, seqno = 0; > > u8 n = 0; > > > > - if (!skb || !gap) > > - goto exit; > > + if (!skb) > > + return 0; > > > > expect = buf_seqno(skb); > > skb_queue_walk(&l->deferdq, skb) { > > seqno = buf_seqno(skb); > > if (unlikely(more(seqno, expect))) { > > - ga->gacks[n].ack = htons(expect - 1); > > - ga->gacks[n].gap = htons(seqno - expect); > > - if (++n >= MAX_GAP_ACK_BLKS) { > > - pr_info_ratelimited("Too few Gap ACK blocks!\n"); > > - goto exit; > > + gacks[n].ack = htons(expect - 1); > > + gacks[n].gap = htons(seqno - expect); > > + if (++n >= MAX_GAP_ACK_BLKS / 2) { > > + char buf[TIPC_MAX_LINK_NAME]; > > + > > + pr_info_ratelimited("Gacks on %s: %d, ql: %d!\n", > > + tipc_link_name_ext(l, buf), > > + n, > > + skb_queue_len(&l->deferdq)); > > + return n; > > } > > } else if (unlikely(less(seqno, expect))) { > > pr_warn("Unexpected skb in deferdq!\n"); > > @@ -1451,14 +1428,57 @@ static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) > > } > > > > /* last block */ > > - ga->gacks[n].ack = htons(seqno); > > - ga->gacks[n].gap = 0; > > + gacks[n].ack = htons(seqno); > > + gacks[n].gap = 0; > > n++; > > + return n; > > +} > > > > -exit: > > - len = tipc_gap_ack_blks_sz(n); > > +/* tipc_build_gap_ack_blks - build Gap ACK blocks > > + * @l: tipc unicast link > > + * @hdr: the tipc message buffer to store the Gap ACK blocks after built > > + * > > + * The function builds Gap ACK blocks for both the unicast & broadcast receiver > > + * links of a certain peer, the buffer after built has the network data format > > + * as follows: > > + * 31 16 15 0 > > + * +-------------+-------------+-------------+-------------+ > > + * | bgack_cnt | ugack_cnt | len | > > + * +-------------+-------------+-------------+-------------+ - > > + * | gap | ack | | > > + * +-------------+-------------+-------------+-------------+ > bc gacks > > + * : : : | > > + * +-------------+-------------+-------------+-------------+ - > > + * | gap | ack | | > > + * +-------------+-------------+-------------+-------------+ > uc gacks > > + * : : : | > > + * +-------------+-------------+-------------+-------------+ - > > + * (See struct tipc_gap_ack_blks) > > + * > > + * returns the actual allocated memory size > > + */ > > +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct tipc_msg *hdr) > > +{ > > + struct tipc_link *bcl = l->bc_rcvlink; > > + struct tipc_gap_ack_blks *ga; > > + u16 len; > > + > > + ga = (struct tipc_gap_ack_blks *)msg_data(hdr); > > + > > + /* Start with broadcast link first */ > > + tipc_bcast_lock(bcl->net); > > + msg_set_bcast_ack(hdr, bcl->rcv_nxt - 1); > > + msg_set_bc_gap(hdr, link_bc_rcv_gap(bcl)); > > + ga->bgack_cnt = __tipc_build_gap_ack_blks(ga, bcl, 0); > > + tipc_bcast_unlock(bcl->net); > > + > > + /* Now for unicast link, but an explicit NACK only (???) */ > > + ga->ugack_cnt = (msg_seq_gap(hdr)) ? > > + __tipc_build_gap_ack_blks(ga, l, ga->bgack_cnt) : 0; > > + > > + /* Total len */ > > + len = tipc_gap_ack_blks_sz(ga->bgack_cnt + ga->ugack_cnt); > > ga->len = htons(len); > > - ga->gack_cnt = n; > > return len; > > } > > > > @@ -1466,47 +1486,111 @@ static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) > > * acked packets, also doing retransmissions if > > * gaps found > > * @l: tipc link with transmq queue to be advanced > > + * @r: tipc link "receiver" i.e. in case of broadcast (= "l" if unicast) > > * @acked: seqno of last packet acked by peer without any gaps before > > * @gap: # of gap packets > > * @ga: buffer pointer to Gap ACK blocks from peer > > * @xmitq: queue for accumulating the retransmitted packets if any > > + * @retransmitted: returned boolean value if a retransmission is really issued > > + * @rc: returned code e.g. TIPC_LINK_DOWN_EVT if a repeated retransmit failures > > + * happens (- unlikely case) > > * > > - * In case of a repeated retransmit failures, the call will return shortly > > - * with a returned code (e.g. TIPC_LINK_DOWN_EVT) > > + * Return: the number of packets released from the link transmq > > */ > > -static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, > > +static int tipc_link_advance_transmq(struct tipc_link *l, struct tipc_link *r, > > + u16 acked, u16 gap, > > struct tipc_gap_ack_blks *ga, > > - struct sk_buff_head *xmitq) > > + struct sk_buff_head *xmitq, > > + bool *retransmitted, int *rc) > > { > > + struct tipc_gap_ack_blks *last_ga = r->last_ga, *this_ga = NULL; > > + struct tipc_gap_ack *gacks = NULL; > > struct sk_buff *skb, *_skb, *tmp; > > struct tipc_msg *hdr; > > + u32 qlen = skb_queue_len(&l->transmq); > > + u16 nacked = acked, ngap = gap, gack_cnt = 0; > > u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; > > - bool retransmitted = false; > > u16 ack = l->rcv_nxt - 1; > > - bool passed = false; > > - u16 released = 0; > > u16 seqno, n = 0; > > - int rc = 0; > > + u16 end = r->acked, start = end, offset = r->last_gap; > > + u16 si = (last_ga) ? last_ga->start_index : 0; > > + bool is_uc = !link_is_bc_sndlink(l); > > + bool bc_has_acked = false; > > + > > + trace_tipc_link_retrans(r, acked + 1, acked + gap, &l->transmq); > > + > > + /* Determine Gap ACK blocks if any for the particular link */ > > + if (ga && is_uc) { > > + /* Get the Gap ACKs, uc part */ > > + gack_cnt = ga->ugack_cnt; > > + gacks = &ga->gacks[ga->bgack_cnt]; > > + } else if (ga) { > > + /* Copy the Gap ACKs, bc part, for later renewal if needed */ > > + this_ga = kmemdup(ga, tipc_gap_ack_blks_sz(ga->bgack_cnt), > > + GFP_ATOMIC); > > + if (likely(this_ga)) { > > + this_ga->start_index = 0; > > + /* Start with the bc Gap ACKs */ > > + gack_cnt = this_ga->bgack_cnt; > > + gacks = &this_ga->gacks[0]; > > + } else { > > + /* Hmm, we can get in trouble..., simply ignore it */ > > + pr_warn_ratelimited("Ignoring bc Gap ACKs, no memory\n"); > > + } > > + } > > > > + /* Advance the link transmq */ > > skb_queue_walk_safe(&l->transmq, skb, tmp) { > > seqno = buf_seqno(skb); > > > > next_gap_ack: > > - if (less_eq(seqno, acked)) { > > + if (less_eq(seqno, nacked)) { > > + if (is_uc) > > + goto release; > > + /* Skip packets peer has already acked */ > > + if (!more(seqno, r->acked)) > > + continue; > > + /* Get the next of last Gap ACK blocks */ > > + while (more(seqno, end)) { > > + if (!last_ga || si >= last_ga->bgack_cnt) > > + break; > > + start = end + offset + 1; > > + end = ntohs(last_ga->gacks[si].ack); > > + offset = ntohs(last_ga->gacks[si].gap); > > + si++; > > + WARN_ONCE(more(start, end) || > > + (!offset && > > + si < last_ga->bgack_cnt) || > > + si > MAX_GAP_ACK_BLKS, > > + "Corrupted Gap ACK: %d %d %d %d %d\n", > > + start, end, offset, si, > > + last_ga->bgack_cnt); > > + } > > + /* Check against the last Gap ACK block */ > > + if (in_range(seqno, start, end)) > > + continue; > > + /* Update/release the packet peer is acking */ > > + bc_has_acked = true; > > + if (--TIPC_SKB_CB(skb)->ackers) > > + continue; > > +release: > > /* release skb */ > > __skb_unlink(skb, &l->transmq); > > kfree_skb(skb); > > - released++; > > - } else if (less_eq(seqno, acked + gap)) { > > - /* First, check if repeated retrans failures occurs? */ > > - if (!passed && link_retransmit_failure(l, l, &rc)) > > - return rc; > > - passed = true; > > - > > + } else if (less_eq(seqno, nacked + ngap)) { > > + /* First gap: check if repeated retrans failures? */ > > + if (unlikely(seqno == acked + 1 && > > + link_retransmit_failure(l, r, rc))) { > > + /* Ignore this bc Gap ACKs if any */ > > + kfree(this_ga); > > + this_ga = NULL; > > + break; > > + } > > /* retransmit skb if unrestricted*/ > > if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) > > continue; > > - TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME; > > + TIPC_SKB_CB(skb)->nxt_retr = (is_uc) ? > > + TIPC_UC_RETR_TIME : TIPC_BC_RETR_LIM; > > _skb = pskb_copy(skb, GFP_ATOMIC); > > if (!_skb) > > continue; > > @@ -1516,25 +1600,50 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, > > _skb->priority = TC_PRIO_CONTROL; > > __skb_queue_tail(xmitq, _skb); > > l->stats.retransmitted++; > > - retransmitted = true; > > + *retransmitted = true; > > /* Increase actual retrans counter & mark first time */ > > if (!TIPC_SKB_CB(skb)->retr_cnt++) > > TIPC_SKB_CB(skb)->retr_stamp = jiffies; > > } else { > > /* retry with Gap ACK blocks if any */ > > - if (!ga || n >= ga->gack_cnt) > > + if (n >= gack_cnt) > > break; > > - acked = ntohs(ga->gacks[n].ack); > > - gap = ntohs(ga->gacks[n].gap); > > + nacked = ntohs(gacks[n].ack); > > + ngap = ntohs(gacks[n].gap); > > n++; > > goto next_gap_ack; > > } > > } > > - if (released || retransmitted) > > - tipc_link_update_cwin(l, released, retransmitted); > > - if (released) > > - tipc_link_advance_backlog(l, xmitq); > > - return 0; > > + > > + /* Renew last Gap ACK blocks for bc if needed */ > > + if (bc_has_acked) { > > + if (this_ga) { > > + kfree(last_ga); > > + r->last_ga = this_ga; > > + r->last_gap = gap; > > + } else if (last_ga) { > > + if (less(acked, start)) { > > + si--; > > + offset = start - acked - 1; > > + } else if (less(acked, end)) { > > + acked = end; > > + } > > + if (si < last_ga->bgack_cnt) { > > + last_ga->start_index = si; > > + r->last_gap = offset; > > + } else { > > + kfree(last_ga); > > + r->last_ga = NULL; > > + r->last_gap = 0; > > + } > > + } else { > > + r->last_gap = 0; > > + } > > + r->acked = acked; > > + } else { > > + kfree(this_ga); > > + } > > + return skb_queue_len(&l->transmq) - qlen; > > } > > > > /* tipc_link_build_state_msg: prepare link state message for transmission > > @@ -1651,7 +1760,8 @@ int tipc_link_rcv(struct tipc_link *l, struct sk_buff *skb, > > kfree_skb(skb); > > break; > > } > > - released += tipc_link_release_pkts(l, msg_ack(hdr)); > > + released += tipc_link_advance_transmq(l, l, msg_ack(hdr), 0, > > + NULL, NULL, NULL, NULL); > > > > /* Defer delivery if sequence gap */ > > if (unlikely(seqno != rcv_nxt)) { > > @@ -1739,7 +1849,7 @@ static void tipc_link_build_proto_msg(struct tipc_link *l, int mtyp, bool probe, > > msg_set_probe(hdr, probe); > > msg_set_is_keepalive(hdr, probe || probe_reply); > > if (l->peer_caps & TIPC_GAP_ACK_BLOCK) > > - glen = tipc_build_gap_ack_blks(l, data, rcvgap); > > + glen = tipc_build_gap_ack_blks(l, hdr); > > tipc_mon_prep(l->net, data + glen, &dlen, mstate, l->bearer_id); > > msg_set_size(hdr, INT_H_SIZE + glen + dlen); > > skb_trim(skb, INT_H_SIZE + glen + dlen); > > @@ -2027,20 +2137,19 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, > > { > > struct tipc_msg *hdr = buf_msg(skb); > > struct tipc_gap_ack_blks *ga = NULL; > > - u16 rcvgap = 0; > > - u16 ack = msg_ack(hdr); > > - u16 gap = msg_seq_gap(hdr); > > + bool reply = msg_probe(hdr), retransmitted = false; > > + u16 dlen = msg_data_sz(hdr), glen = 0; > > u16 peers_snd_nxt = msg_next_sent(hdr); > > u16 peers_tol = msg_link_tolerance(hdr); > > u16 peers_prio = msg_linkprio(hdr); > > + u16 gap = msg_seq_gap(hdr); > > + u16 ack = msg_ack(hdr); > > u16 rcv_nxt = l->rcv_nxt; > > - u16 dlen = msg_data_sz(hdr); > > + u16 rcvgap = 0; > > int mtyp = msg_type(hdr); > > - bool reply = msg_probe(hdr); > > - u16 glen = 0; > > - void *data; > > + int rc = 0, released; > > char *if_name; > > - int rc = 0; > > + void *data; > > > > trace_tipc_proto_rcv(skb, false, l->name); > > if (tipc_link_is_blocked(l) || !xmitq) > > @@ -2137,13 +2246,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, > > } > > > > /* Receive Gap ACK blocks from peer if any */ > > - if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { > > - ga = (struct tipc_gap_ack_blks *)data; > > - glen = ntohs(ga->len); > > - /* sanity check: if failed, ignore Gap ACK blocks */ > > - if (glen != tipc_gap_ack_blks_sz(ga->gack_cnt)) > > - ga = NULL; > > - } > > + glen = tipc_get_gap_ack_blks(&ga, l, hdr, true); > > > > tipc_mon_rcv(l->net, data + glen, dlen - glen, l->addr, > > &l->mon_state, l->bearer_id); > > @@ -2158,9 +2261,14 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, > > tipc_link_build_proto_msg(l, STATE_MSG, 0, reply, > > rcvgap, 0, 0, xmitq); > > > > - rc |= tipc_link_advance_transmq(l, ack, gap, ga, xmitq); > > + released = tipc_link_advance_transmq(l, l, ack, gap, ga, xmitq, > > + &retransmitted, &rc); > > if (gap) > > l->stats.recv_nacks++; > > + if (released || retransmitted) > > + tipc_link_update_cwin(l, released, retransmitted); > > + if (released) > > + tipc_link_advance_backlog(l, xmitq); > > if (unlikely(!skb_queue_empty(&l->wakeupq))) > > link_prepare_wakeup(l); > > } > > @@ -2246,10 +2354,7 @@ void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg *hdr) > > int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, > > struct sk_buff_head *xmitq) > > { > > - struct tipc_link *snd_l = l->bc_sndlink; > > u16 peers_snd_nxt = msg_bc_snd_nxt(hdr); > > - u16 from = msg_bcast_ack(hdr) + 1; > > - u16 to = from + msg_bc_gap(hdr) - 1; > > int rc = 0; > > > > if (!link_is_up(l)) > > @@ -2271,8 +2376,6 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, > > if (more(peers_snd_nxt, l->rcv_nxt + l->window)) > > return rc; > > > > - rc = tipc_link_bc_retrans(snd_l, l, from, to, xmitq); > > - > > l->snd_nxt = peers_snd_nxt; > > if (link_bc_rcv_gap(l)) > > rc |= TIPC_LINK_SND_STATE; > > @@ -2307,38 +2410,28 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, > > return 0; > > } > > > > -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, > > - struct sk_buff_head *xmitq) > > +int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap, > > + struct tipc_gap_ack_blks *ga, > > + struct sk_buff_head *xmitq) > > { > > - struct sk_buff *skb, *tmp; > > - struct tipc_link *snd_l = l->bc_sndlink; > > + struct tipc_link *l = r->bc_sndlink; > > + bool unused = false; > > + int rc = 0; > > > > - if (!link_is_up(l) || !l->bc_peer_is_up) > > - return; > > + if (!link_is_up(r) || !r->bc_peer_is_up) > > + return 0; > > > > - if (!more(acked, l->acked)) > > - return; > > + if (less(acked, r->acked) || (acked == r->acked && !gap && !ga)) > > + return 0; > > > > - trace_tipc_link_bc_ack(l, l->acked, acked, &snd_l->transmq); > > - /* Skip over packets peer has already acked */ > > - skb_queue_walk(&snd_l->transmq, skb) { > > - if (more(buf_seqno(skb), l->acked)) > > - break; > > - } > > + trace_tipc_link_bc_ack(r, r->acked, acked, &l->transmq); > > + tipc_link_advance_transmq(l, r, acked, gap, ga, xmitq, &unused, &rc); > > > > - /* Update/release the packets peer is acking now */ > > - skb_queue_walk_from_safe(&snd_l->transmq, skb, tmp) { > > - if (more(buf_seqno(skb), acked)) > > - break; > > - if (!--TIPC_SKB_CB(skb)->ackers) { > > - __skb_unlink(skb, &snd_l->transmq); > > - kfree_skb(skb); > > - } > > - } > > - l->acked = acked; > > - tipc_link_advance_backlog(snd_l, xmitq); > > - if (unlikely(!skb_queue_empty(&snd_l->wakeupq))) > > - link_prepare_wakeup(snd_l); > > + tipc_link_advance_backlog(l, xmitq); > > + if (unlikely(!skb_queue_empty(&l->wakeupq))) > > + link_prepare_wakeup(l); > > + > > + return rc; > > } > > > > /* tipc_link_bc_nack_rcv(): receive broadcast nack message > > @@ -2366,8 +2459,7 @@ int tipc_link_bc_nack_rcv(struct tipc_link *l, struct sk_buff *skb, > > return 0; > > > > if (dnode == tipc_own_addr(l->net)) { > > - tipc_link_bc_ack_rcv(l, acked, xmitq); > > - rc = tipc_link_bc_retrans(l->bc_sndlink, l, from, to, xmitq); > > + rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, xmitq); > > l->stats.recv_nacks++; > > return rc; > > } > > diff --git a/net/tipc/link.h b/net/tipc/link.h > > index d3c1c3fc1659..0a0fa7350722 100644 > > --- a/net/tipc/link.h > > +++ b/net/tipc/link.h > > @@ -143,8 +143,11 @@ int tipc_link_bc_peers(struct tipc_link *l); > > void tipc_link_set_mtu(struct tipc_link *l, int mtu); > > int tipc_link_mtu(struct tipc_link *l); > > int tipc_link_mss(struct tipc_link *l); > > -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, > > - struct sk_buff_head *xmitq); > > +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct tipc_link *l, > > + struct tipc_msg *hdr, bool uc); > > +int tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, u16 gap, > > + struct tipc_gap_ack_blks *ga, > > + struct sk_buff_head *xmitq); > > void tipc_link_build_bc_sync_msg(struct tipc_link *l, > > struct sk_buff_head *xmitq); > > void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg *hdr); > > diff --git a/net/tipc/msg.h b/net/tipc/msg.h > > index 6d466ebdb64f..9a38f9c9d6eb 100644 > > --- a/net/tipc/msg.h > > +++ b/net/tipc/msg.h > > @@ -160,20 +160,26 @@ struct tipc_gap_ack { > > > > /* struct tipc_gap_ack_blks > > * @len: actual length of the record > > - * @gack_cnt: number of Gap ACK blocks in the record > > + * @bgack_cnt: number of Gap ACK blocks for broadcast in the record > > + * @ugack_cnt: number of Gap ACK blocks for unicast (following the broadcast > > + * ones) > > + * @start_index: starting index for "valid" broadcast Gap ACK blocks > > * @gacks: array of Gap ACK blocks > > */ > > struct tipc_gap_ack_blks { > > __be16 len; > > - u8 gack_cnt; > > - u8 reserved; > > + union { > > + u8 ugack_cnt; > > + u8 start_index; > > + }; > > + u8 bgack_cnt; > > struct tipc_gap_ack gacks[]; > > }; > > > > #define tipc_gap_ack_blks_sz(n) (sizeof(struct tipc_gap_ack_blks) + \ > > sizeof(struct tipc_gap_ack) * (n)) > > > > -#define MAX_GAP_ACK_BLKS 32 > > +#define MAX_GAP_ACK_BLKS 128 > > #define MAX_GAP_ACK_BLKS_SZ tipc_gap_ack_blks_sz(MAX_GAP_ACK_BLKS) > > > > static inline struct tipc_msg *buf_msg(struct sk_buff *skb) > > diff --git a/net/tipc/node.c b/net/tipc/node.c > > index 0c88778c88b5..eb6b62de81a7 100644 > > --- a/net/tipc/node.c > > +++ b/net/tipc/node.c > > @@ -2069,10 +2069,16 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b) > > le = &n->links[bearer_id]; > > > > /* Ensure broadcast reception is in synch with peer's send state */ > > - if (unlikely(usr == LINK_PROTOCOL)) > > + if (unlikely(usr == LINK_PROTOCOL)) { > > + if (unlikely(skb_linearize(skb))) { > > + tipc_node_put(n); > > + goto discard; > > + } > > + hdr = buf_msg(skb); > > tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq); > > - else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) > > + } else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) { > > tipc_bcast_ack_rcv(net, n->bc_entry.link, hdr); > > + } > > > > /* Receive packet directly if conditions permit */ > > tipc_node_read_lock(n); > > -- > |
From: Jon M. <jm...@re...> - 2020-03-13 15:47:06
|
On 3/13/20 6:47 AM, Tuong Lien wrote: > As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput > by Gap ACK blocks"), we apply the same mechanism for the broadcast link > as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will > consist of two parts built for both the broadcast and unicast types: > > 31 16 15 0 > +-------------+-------------+-------------+-------------+ > | bgack_cnt | ugack_cnt | len | > +-------------+-------------+-------------+-------------+ - > | gap | ack | | > +-------------+-------------+-------------+-------------+ > bc gacks > : : : | > +-------------+-------------+-------------+-------------+ - > | gap | ack | | > +-------------+-------------+-------------+-------------+ > uc gacks > : : : | > +-------------+-------------+-------------+-------------+ - > > which is "automatically" backward-compatible. > > We also increase the max number of Gap ACK blocks to 128, allowing upto > 64 blocks per type (total buffer size = 516 bytes). > > Besides, the 'tipc_link_advance_transmq()' function is refactored which > is applicable for both the unicast and broadcast cases now, so some old > functions can be removed and the code is optimized. > > With the patch, TIPC broadcast is more robust regardless of packet loss > or disorder, latency, ... in the underlying network. Its performance is > boost up significantly. > For example, experiment with a 5% packet loss rate results: > > $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 > real 0m 42.46s > user 0m 1.16s > sys 0m 17.67s > > Without the patch: > > $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 > real 5m 28.80s > user 0m 0.85s > sys 0m 3.62s Can you explain this? To me it seems like the elapsed time is reduced with a factor 328.8/42.46=7.7, while we are consuming significantly more CPU to achieve this. Doesn't that mean that we have much more retransmissions which are consuming CPU? Or is there some other explanation? ///jon > > Signed-off-by: Tuong Lien <tuo...@de...> > --- > net/tipc/bcast.c | 9 +- > net/tipc/link.c | 440 +++++++++++++++++++++++++++++++++---------------------- > net/tipc/link.h | 7 +- > net/tipc/msg.h | 14 +- > net/tipc/node.c | 10 +- > 5 files changed, 295 insertions(+), 185 deletions(-) > > diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c > index 4c20be08b9c4..3ce690a96ee9 100644 > --- a/net/tipc/bcast.c > +++ b/net/tipc/bcast.c > @@ -474,7 +474,7 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, > __skb_queue_head_init(&xmitq); > > tipc_bcast_lock(net); > - tipc_link_bc_ack_rcv(l, acked, &xmitq); > + tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq); > tipc_bcast_unlock(net); > > tipc_bcbase_xmit(net, &xmitq); > @@ -492,6 +492,7 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, > struct tipc_msg *hdr) > { > struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq; > + struct tipc_gap_ack_blks *ga; > struct sk_buff_head xmitq; > int rc = 0; > > @@ -501,8 +502,10 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, > if (msg_type(hdr) != STATE_MSG) { > tipc_link_bc_init_rcv(l, hdr); > } else if (!msg_bc_ack_invalid(hdr)) { > - tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), &xmitq); > - rc = tipc_link_bc_sync_rcv(l, hdr, &xmitq); > + tipc_get_gap_ack_blks(&ga, l, hdr, false); > + rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), > + msg_bc_gap(hdr), ga, &xmitq); > + rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq); > } > tipc_bcast_unlock(net); > > diff --git a/net/tipc/link.c b/net/tipc/link.c > index 467c53a1fb5c..6198b6d89a69 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -188,6 +188,8 @@ struct tipc_link { > /* Broadcast */ > u16 ackers; > u16 acked; > + u16 last_gap; > + struct tipc_gap_ack_blks *last_ga; > struct tipc_link *bc_rcvlink; > struct tipc_link *bc_sndlink; > u8 nack_state; > @@ -249,11 +251,14 @@ static int tipc_link_build_nack_msg(struct tipc_link *l, > struct sk_buff_head *xmitq); > static void tipc_link_build_bc_init_msg(struct tipc_link *l, > struct sk_buff_head *xmitq); > -static int tipc_link_release_pkts(struct tipc_link *l, u16 to); > -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap); > -static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, > +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, > + struct tipc_link *l, u8 start_index); > +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct tipc_msg *hdr); > +static int tipc_link_advance_transmq(struct tipc_link *l, struct tipc_link *r, > + u16 acked, u16 gap, > struct tipc_gap_ack_blks *ga, > - struct sk_buff_head *xmitq); > + struct sk_buff_head *xmitq, > + bool *retransmitted, int *rc); > static void tipc_link_update_cwin(struct tipc_link *l, int released, > bool retransmitted); > /* > @@ -370,7 +375,7 @@ void tipc_link_remove_bc_peer(struct tipc_link *snd_l, > snd_l->ackers--; > rcv_l->bc_peer_is_up = true; > rcv_l->state = LINK_ESTABLISHED; > - tipc_link_bc_ack_rcv(rcv_l, ack, xmitq); > + tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq); > trace_tipc_link_reset(rcv_l, TIPC_DUMP_ALL, "bclink removed!"); > tipc_link_reset(rcv_l); > rcv_l->state = LINK_RESET; > @@ -784,8 +789,6 @@ bool tipc_link_too_silent(struct tipc_link *l) > return (l->silent_intv_cnt + 2 > l->abort_limit); > } > > -static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, > - u16 from, u16 to, struct sk_buff_head *xmitq); > /* tipc_link_timeout - perform periodic task as instructed from node timeout > */ > int tipc_link_timeout(struct tipc_link *l, struct sk_buff_head *xmitq) > @@ -948,6 +951,9 @@ void tipc_link_reset(struct tipc_link *l) > l->snd_nxt_state = 1; > l->rcv_nxt_state = 1; > l->acked = 0; > + l->last_gap = 0; > + kfree(l->last_ga); > + l->last_ga = NULL; > l->silent_intv_cnt = 0; > l->rst_cnt = 0; > l->bc_peer_is_up = false; > @@ -1183,68 +1189,14 @@ static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r, > > if (link_is_bc_sndlink(l)) { > r->state = LINK_RESET; > - *rc = TIPC_LINK_DOWN_EVT; > + *rc |= TIPC_LINK_DOWN_EVT; > } else { > - *rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT); > + *rc |= tipc_link_fsm_evt(l, LINK_FAILURE_EVT); > } > > return true; > } > > -/* tipc_link_bc_retrans() - retransmit zero or more packets > - * @l: the link to transmit on > - * @r: the receiving link ordering the retransmit. Same as l if unicast > - * @from: retransmit from (inclusive) this sequence number > - * @to: retransmit to (inclusive) this sequence number > - * xmitq: queue for accumulating the retransmitted packets > - */ > -static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, > - u16 from, u16 to, struct sk_buff_head *xmitq) > -{ > - struct sk_buff *_skb, *skb = skb_peek(&l->transmq); > - u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; > - u16 ack = l->rcv_nxt - 1; > - int retransmitted = 0; > - struct tipc_msg *hdr; > - int rc = 0; > - > - if (!skb) > - return 0; > - if (less(to, from)) > - return 0; > - > - trace_tipc_link_retrans(r, from, to, &l->transmq); > - > - if (link_retransmit_failure(l, r, &rc)) > - return rc; > - > - skb_queue_walk(&l->transmq, skb) { > - hdr = buf_msg(skb); > - if (less(msg_seqno(hdr), from)) > - continue; > - if (more(msg_seqno(hdr), to)) > - break; > - if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) > - continue; > - TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; > - _skb = pskb_copy(skb, GFP_ATOMIC); > - if (!_skb) > - return 0; > - hdr = buf_msg(_skb); > - msg_set_ack(hdr, ack); > - msg_set_bcast_ack(hdr, bc_ack); > - _skb->priority = TC_PRIO_CONTROL; > - __skb_queue_tail(xmitq, _skb); > - l->stats.retransmitted++; > - retransmitted++; > - /* Increase actual retrans counter & mark first time */ > - if (!TIPC_SKB_CB(skb)->retr_cnt++) > - TIPC_SKB_CB(skb)->retr_stamp = jiffies; > - } > - tipc_link_update_cwin(l, 0, retransmitted); > - return 0; > -} > - > /* tipc_data_input - deliver data and name distr msgs to upper layer > * > * Consumes buffer if message is of right type > @@ -1402,46 +1354,71 @@ static int tipc_link_tnl_rcv(struct tipc_link *l, struct sk_buff *skb, > return rc; > } > > -static int tipc_link_release_pkts(struct tipc_link *l, u16 acked) > -{ > - int released = 0; > - struct sk_buff *skb, *tmp; > - > - skb_queue_walk_safe(&l->transmq, skb, tmp) { > - if (more(buf_seqno(skb), acked)) > - break; > - __skb_unlink(skb, &l->transmq); > - kfree_skb(skb); > - released++; > +/** > + * tipc_get_gap_ack_blks - get Gap ACK blocks from PROTOCOL/STATE_MSG > + * @ga: returned pointer to the Gap ACK blocks if any > + * @l: the tipc link > + * @hdr: the PROTOCOL/STATE_MSG header > + * @uc: desired Gap ACK blocks type, i.e. unicast (= 1) or broadcast (= 0) > + * > + * Return: the total Gap ACK blocks size > + */ > +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct tipc_link *l, > + struct tipc_msg *hdr, bool uc) > +{ > + struct tipc_gap_ack_blks *p; > + u16 sz = 0; > + > + /* Does peer support the Gap ACK blocks feature? */ > + if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { > + p = (struct tipc_gap_ack_blks *)msg_data(hdr); > + sz = ntohs(p->len); > + /* Sanity check */ > + if (sz == tipc_gap_ack_blks_sz(p->ugack_cnt + p->bgack_cnt)) { > + /* Good, check if the desired type exists */ > + if ((uc && p->ugack_cnt) || (!uc && p->bgack_cnt)) > + goto ok; > + /* Backward compatible: peer might not support bc, but uc? */ > + } else if (uc && sz == tipc_gap_ack_blks_sz(p->ugack_cnt)) { > + if (p->ugack_cnt) { > + p->bgack_cnt = 0; > + goto ok; > + } > + } > } > - return released; > + /* Other cases: ignore! */ > + p = NULL; > + > +ok: > + *ga = p; > + return sz; > } > > -/* tipc_build_gap_ack_blks - build Gap ACK blocks > - * @l: tipc link that data have come with gaps in sequence if any > - * @data: data buffer to store the Gap ACK blocks after built > - * > - * returns the actual allocated memory size > - */ > -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) > +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, > + struct tipc_link *l, u8 start_index) > { > + struct tipc_gap_ack *gacks = &ga->gacks[start_index]; > struct sk_buff *skb = skb_peek(&l->deferdq); > - struct tipc_gap_ack_blks *ga = data; > - u16 len, expect, seqno = 0; > + u16 expect, seqno = 0; > u8 n = 0; > > - if (!skb || !gap) > - goto exit; > + if (!skb) > + return 0; > > expect = buf_seqno(skb); > skb_queue_walk(&l->deferdq, skb) { > seqno = buf_seqno(skb); > if (unlikely(more(seqno, expect))) { > - ga->gacks[n].ack = htons(expect - 1); > - ga->gacks[n].gap = htons(seqno - expect); > - if (++n >= MAX_GAP_ACK_BLKS) { > - pr_info_ratelimited("Too few Gap ACK blocks!\n"); > - goto exit; > + gacks[n].ack = htons(expect - 1); > + gacks[n].gap = htons(seqno - expect); > + if (++n >= MAX_GAP_ACK_BLKS / 2) { > + char buf[TIPC_MAX_LINK_NAME]; > + > + pr_info_ratelimited("Gacks on %s: %d, ql: %d!\n", > + tipc_link_name_ext(l, buf), > + n, > + skb_queue_len(&l->deferdq)); > + return n; > } > } else if (unlikely(less(seqno, expect))) { > pr_warn("Unexpected skb in deferdq!\n"); > @@ -1451,14 +1428,57 @@ static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) > } > > /* last block */ > - ga->gacks[n].ack = htons(seqno); > - ga->gacks[n].gap = 0; > + gacks[n].ack = htons(seqno); > + gacks[n].gap = 0; > n++; > + return n; > +} > > -exit: > - len = tipc_gap_ack_blks_sz(n); > +/* tipc_build_gap_ack_blks - build Gap ACK blocks > + * @l: tipc unicast link > + * @hdr: the tipc message buffer to store the Gap ACK blocks after built > + * > + * The function builds Gap ACK blocks for both the unicast & broadcast receiver > + * links of a certain peer, the buffer after built has the network data format > + * as follows: > + * 31 16 15 0 > + * +-------------+-------------+-------------+-------------+ > + * | bgack_cnt | ugack_cnt | len | > + * +-------------+-------------+-------------+-------------+ - > + * | gap | ack | | > + * +-------------+-------------+-------------+-------------+ > bc gacks > + * : : : | > + * +-------------+-------------+-------------+-------------+ - > + * | gap | ack | | > + * +-------------+-------------+-------------+-------------+ > uc gacks > + * : : : | > + * +-------------+-------------+-------------+-------------+ - > + * (See struct tipc_gap_ack_blks) > + * > + * returns the actual allocated memory size > + */ > +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct tipc_msg *hdr) > +{ > + struct tipc_link *bcl = l->bc_rcvlink; > + struct tipc_gap_ack_blks *ga; > + u16 len; > + > + ga = (struct tipc_gap_ack_blks *)msg_data(hdr); > + > + /* Start with broadcast link first */ > + tipc_bcast_lock(bcl->net); > + msg_set_bcast_ack(hdr, bcl->rcv_nxt - 1); > + msg_set_bc_gap(hdr, link_bc_rcv_gap(bcl)); > + ga->bgack_cnt = __tipc_build_gap_ack_blks(ga, bcl, 0); > + tipc_bcast_unlock(bcl->net); > + > + /* Now for unicast link, but an explicit NACK only (???) */ > + ga->ugack_cnt = (msg_seq_gap(hdr)) ? > + __tipc_build_gap_ack_blks(ga, l, ga->bgack_cnt) : 0; > + > + /* Total len */ > + len = tipc_gap_ack_blks_sz(ga->bgack_cnt + ga->ugack_cnt); > ga->len = htons(len); > - ga->gack_cnt = n; > return len; > } > > @@ -1466,47 +1486,111 @@ static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) > * acked packets, also doing retransmissions if > * gaps found > * @l: tipc link with transmq queue to be advanced > + * @r: tipc link "receiver" i.e. in case of broadcast (= "l" if unicast) > * @acked: seqno of last packet acked by peer without any gaps before > * @gap: # of gap packets > * @ga: buffer pointer to Gap ACK blocks from peer > * @xmitq: queue for accumulating the retransmitted packets if any > + * @retransmitted: returned boolean value if a retransmission is really issued > + * @rc: returned code e.g. TIPC_LINK_DOWN_EVT if a repeated retransmit failures > + * happens (- unlikely case) > * > - * In case of a repeated retransmit failures, the call will return shortly > - * with a returned code (e.g. TIPC_LINK_DOWN_EVT) > + * Return: the number of packets released from the link transmq > */ > -static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, > +static int tipc_link_advance_transmq(struct tipc_link *l, struct tipc_link *r, > + u16 acked, u16 gap, > struct tipc_gap_ack_blks *ga, > - struct sk_buff_head *xmitq) > + struct sk_buff_head *xmitq, > + bool *retransmitted, int *rc) > { > + struct tipc_gap_ack_blks *last_ga = r->last_ga, *this_ga = NULL; > + struct tipc_gap_ack *gacks = NULL; > struct sk_buff *skb, *_skb, *tmp; > struct tipc_msg *hdr; > + u32 qlen = skb_queue_len(&l->transmq); > + u16 nacked = acked, ngap = gap, gack_cnt = 0; > u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; > - bool retransmitted = false; > u16 ack = l->rcv_nxt - 1; > - bool passed = false; > - u16 released = 0; > u16 seqno, n = 0; > - int rc = 0; > + u16 end = r->acked, start = end, offset = r->last_gap; > + u16 si = (last_ga) ? last_ga->start_index : 0; > + bool is_uc = !link_is_bc_sndlink(l); > + bool bc_has_acked = false; > + > + trace_tipc_link_retrans(r, acked + 1, acked + gap, &l->transmq); > + > + /* Determine Gap ACK blocks if any for the particular link */ > + if (ga && is_uc) { > + /* Get the Gap ACKs, uc part */ > + gack_cnt = ga->ugack_cnt; > + gacks = &ga->gacks[ga->bgack_cnt]; > + } else if (ga) { > + /* Copy the Gap ACKs, bc part, for later renewal if needed */ > + this_ga = kmemdup(ga, tipc_gap_ack_blks_sz(ga->bgack_cnt), > + GFP_ATOMIC); > + if (likely(this_ga)) { > + this_ga->start_index = 0; > + /* Start with the bc Gap ACKs */ > + gack_cnt = this_ga->bgack_cnt; > + gacks = &this_ga->gacks[0]; > + } else { > + /* Hmm, we can get in trouble..., simply ignore it */ > + pr_warn_ratelimited("Ignoring bc Gap ACKs, no memory\n"); > + } > + } > > + /* Advance the link transmq */ > skb_queue_walk_safe(&l->transmq, skb, tmp) { > seqno = buf_seqno(skb); > > next_gap_ack: > - if (less_eq(seqno, acked)) { > + if (less_eq(seqno, nacked)) { > + if (is_uc) > + goto release; > + /* Skip packets peer has already acked */ > + if (!more(seqno, r->acked)) > + continue; > + /* Get the next of last Gap ACK blocks */ > + while (more(seqno, end)) { > + if (!last_ga || si >= last_ga->bgack_cnt) > + break; > + start = end + offset + 1; > + end = ntohs(last_ga->gacks[si].ack); > + offset = ntohs(last_ga->gacks[si].gap); > + si++; > + WARN_ONCE(more(start, end) || > + (!offset && > + si < last_ga->bgack_cnt) || > + si > MAX_GAP_ACK_BLKS, > + "Corrupted Gap ACK: %d %d %d %d %d\n", > + start, end, offset, si, > + last_ga->bgack_cnt); > + } > + /* Check against the last Gap ACK block */ > + if (in_range(seqno, start, end)) > + continue; > + /* Update/release the packet peer is acking */ > + bc_has_acked = true; > + if (--TIPC_SKB_CB(skb)->ackers) > + continue; > +release: > /* release skb */ > __skb_unlink(skb, &l->transmq); > kfree_skb(skb); > - released++; > - } else if (less_eq(seqno, acked + gap)) { > - /* First, check if repeated retrans failures occurs? */ > - if (!passed && link_retransmit_failure(l, l, &rc)) > - return rc; > - passed = true; > - > + } else if (less_eq(seqno, nacked + ngap)) { > + /* First gap: check if repeated retrans failures? */ > + if (unlikely(seqno == acked + 1 && > + link_retransmit_failure(l, r, rc))) { > + /* Ignore this bc Gap ACKs if any */ > + kfree(this_ga); > + this_ga = NULL; > + break; > + } > /* retransmit skb if unrestricted*/ > if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) > continue; > - TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME; > + TIPC_SKB_CB(skb)->nxt_retr = (is_uc) ? > + TIPC_UC_RETR_TIME : TIPC_BC_RETR_LIM; > _skb = pskb_copy(skb, GFP_ATOMIC); > if (!_skb) > continue; > @@ -1516,25 +1600,50 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, > _skb->priority = TC_PRIO_CONTROL; > __skb_queue_tail(xmitq, _skb); > l->stats.retransmitted++; > - retransmitted = true; > + *retransmitted = true; > /* Increase actual retrans counter & mark first time */ > if (!TIPC_SKB_CB(skb)->retr_cnt++) > TIPC_SKB_CB(skb)->retr_stamp = jiffies; > } else { > /* retry with Gap ACK blocks if any */ > - if (!ga || n >= ga->gack_cnt) > + if (n >= gack_cnt) > break; > - acked = ntohs(ga->gacks[n].ack); > - gap = ntohs(ga->gacks[n].gap); > + nacked = ntohs(gacks[n].ack); > + ngap = ntohs(gacks[n].gap); > n++; > goto next_gap_ack; > } > } > - if (released || retransmitted) > - tipc_link_update_cwin(l, released, retransmitted); > - if (released) > - tipc_link_advance_backlog(l, xmitq); > - return 0; > + > + /* Renew last Gap ACK blocks for bc if needed */ > + if (bc_has_acked) { > + if (this_ga) { > + kfree(last_ga); > + r->last_ga = this_ga; > + r->last_gap = gap; > + } else if (last_ga) { > + if (less(acked, start)) { > + si--; > + offset = start - acked - 1; > + } else if (less(acked, end)) { > + acked = end; > + } > + if (si < last_ga->bgack_cnt) { > + last_ga->start_index = si; > + r->last_gap = offset; > + } else { > + kfree(last_ga); > + r->last_ga = NULL; > + r->last_gap = 0; > + } > + } else { > + r->last_gap = 0; > + } > + r->acked = acked; > + } else { > + kfree(this_ga); > + } > + return skb_queue_len(&l->transmq) - qlen; > } > > /* tipc_link_build_state_msg: prepare link state message for transmission > @@ -1651,7 +1760,8 @@ int tipc_link_rcv(struct tipc_link *l, struct sk_buff *skb, > kfree_skb(skb); > break; > } > - released += tipc_link_release_pkts(l, msg_ack(hdr)); > + released += tipc_link_advance_transmq(l, l, msg_ack(hdr), 0, > + NULL, NULL, NULL, NULL); > > /* Defer delivery if sequence gap */ > if (unlikely(seqno != rcv_nxt)) { > @@ -1739,7 +1849,7 @@ static void tipc_link_build_proto_msg(struct tipc_link *l, int mtyp, bool probe, > msg_set_probe(hdr, probe); > msg_set_is_keepalive(hdr, probe || probe_reply); > if (l->peer_caps & TIPC_GAP_ACK_BLOCK) > - glen = tipc_build_gap_ack_blks(l, data, rcvgap); > + glen = tipc_build_gap_ack_blks(l, hdr); > tipc_mon_prep(l->net, data + glen, &dlen, mstate, l->bearer_id); > msg_set_size(hdr, INT_H_SIZE + glen + dlen); > skb_trim(skb, INT_H_SIZE + glen + dlen); > @@ -2027,20 +2137,19 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, > { > struct tipc_msg *hdr = buf_msg(skb); > struct tipc_gap_ack_blks *ga = NULL; > - u16 rcvgap = 0; > - u16 ack = msg_ack(hdr); > - u16 gap = msg_seq_gap(hdr); > + bool reply = msg_probe(hdr), retransmitted = false; > + u16 dlen = msg_data_sz(hdr), glen = 0; > u16 peers_snd_nxt = msg_next_sent(hdr); > u16 peers_tol = msg_link_tolerance(hdr); > u16 peers_prio = msg_linkprio(hdr); > + u16 gap = msg_seq_gap(hdr); > + u16 ack = msg_ack(hdr); > u16 rcv_nxt = l->rcv_nxt; > - u16 dlen = msg_data_sz(hdr); > + u16 rcvgap = 0; > int mtyp = msg_type(hdr); > - bool reply = msg_probe(hdr); > - u16 glen = 0; > - void *data; > + int rc = 0, released; > char *if_name; > - int rc = 0; > + void *data; > > trace_tipc_proto_rcv(skb, false, l->name); > if (tipc_link_is_blocked(l) || !xmitq) > @@ -2137,13 +2246,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, > } > > /* Receive Gap ACK blocks from peer if any */ > - if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { > - ga = (struct tipc_gap_ack_blks *)data; > - glen = ntohs(ga->len); > - /* sanity check: if failed, ignore Gap ACK blocks */ > - if (glen != tipc_gap_ack_blks_sz(ga->gack_cnt)) > - ga = NULL; > - } > + glen = tipc_get_gap_ack_blks(&ga, l, hdr, true); > > tipc_mon_rcv(l->net, data + glen, dlen - glen, l->addr, > &l->mon_state, l->bearer_id); > @@ -2158,9 +2261,14 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, > tipc_link_build_proto_msg(l, STATE_MSG, 0, reply, > rcvgap, 0, 0, xmitq); > > - rc |= tipc_link_advance_transmq(l, ack, gap, ga, xmitq); > + released = tipc_link_advance_transmq(l, l, ack, gap, ga, xmitq, > + &retransmitted, &rc); > if (gap) > l->stats.recv_nacks++; > + if (released || retransmitted) > + tipc_link_update_cwin(l, released, retransmitted); > + if (released) > + tipc_link_advance_backlog(l, xmitq); > if (unlikely(!skb_queue_empty(&l->wakeupq))) > link_prepare_wakeup(l); > } > @@ -2246,10 +2354,7 @@ void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg *hdr) > int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, > struct sk_buff_head *xmitq) > { > - struct tipc_link *snd_l = l->bc_sndlink; > u16 peers_snd_nxt = msg_bc_snd_nxt(hdr); > - u16 from = msg_bcast_ack(hdr) + 1; > - u16 to = from + msg_bc_gap(hdr) - 1; > int rc = 0; > > if (!link_is_up(l)) > @@ -2271,8 +2376,6 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, > if (more(peers_snd_nxt, l->rcv_nxt + l->window)) > return rc; > > - rc = tipc_link_bc_retrans(snd_l, l, from, to, xmitq); > - > l->snd_nxt = peers_snd_nxt; > if (link_bc_rcv_gap(l)) > rc |= TIPC_LINK_SND_STATE; > @@ -2307,38 +2410,28 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, > return 0; > } > > -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, > - struct sk_buff_head *xmitq) > +int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap, > + struct tipc_gap_ack_blks *ga, > + struct sk_buff_head *xmitq) > { > - struct sk_buff *skb, *tmp; > - struct tipc_link *snd_l = l->bc_sndlink; > + struct tipc_link *l = r->bc_sndlink; > + bool unused = false; > + int rc = 0; > > - if (!link_is_up(l) || !l->bc_peer_is_up) > - return; > + if (!link_is_up(r) || !r->bc_peer_is_up) > + return 0; > > - if (!more(acked, l->acked)) > - return; > + if (less(acked, r->acked) || (acked == r->acked && !gap && !ga)) > + return 0; > > - trace_tipc_link_bc_ack(l, l->acked, acked, &snd_l->transmq); > - /* Skip over packets peer has already acked */ > - skb_queue_walk(&snd_l->transmq, skb) { > - if (more(buf_seqno(skb), l->acked)) > - break; > - } > + trace_tipc_link_bc_ack(r, r->acked, acked, &l->transmq); > + tipc_link_advance_transmq(l, r, acked, gap, ga, xmitq, &unused, &rc); > > - /* Update/release the packets peer is acking now */ > - skb_queue_walk_from_safe(&snd_l->transmq, skb, tmp) { > - if (more(buf_seqno(skb), acked)) > - break; > - if (!--TIPC_SKB_CB(skb)->ackers) { > - __skb_unlink(skb, &snd_l->transmq); > - kfree_skb(skb); > - } > - } > - l->acked = acked; > - tipc_link_advance_backlog(snd_l, xmitq); > - if (unlikely(!skb_queue_empty(&snd_l->wakeupq))) > - link_prepare_wakeup(snd_l); > + tipc_link_advance_backlog(l, xmitq); > + if (unlikely(!skb_queue_empty(&l->wakeupq))) > + link_prepare_wakeup(l); > + > + return rc; > } > > /* tipc_link_bc_nack_rcv(): receive broadcast nack message > @@ -2366,8 +2459,7 @@ int tipc_link_bc_nack_rcv(struct tipc_link *l, struct sk_buff *skb, > return 0; > > if (dnode == tipc_own_addr(l->net)) { > - tipc_link_bc_ack_rcv(l, acked, xmitq); > - rc = tipc_link_bc_retrans(l->bc_sndlink, l, from, to, xmitq); > + rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, xmitq); > l->stats.recv_nacks++; > return rc; > } > diff --git a/net/tipc/link.h b/net/tipc/link.h > index d3c1c3fc1659..0a0fa7350722 100644 > --- a/net/tipc/link.h > +++ b/net/tipc/link.h > @@ -143,8 +143,11 @@ int tipc_link_bc_peers(struct tipc_link *l); > void tipc_link_set_mtu(struct tipc_link *l, int mtu); > int tipc_link_mtu(struct tipc_link *l); > int tipc_link_mss(struct tipc_link *l); > -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, > - struct sk_buff_head *xmitq); > +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct tipc_link *l, > + struct tipc_msg *hdr, bool uc); > +int tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, u16 gap, > + struct tipc_gap_ack_blks *ga, > + struct sk_buff_head *xmitq); > void tipc_link_build_bc_sync_msg(struct tipc_link *l, > struct sk_buff_head *xmitq); > void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg *hdr); > diff --git a/net/tipc/msg.h b/net/tipc/msg.h > index 6d466ebdb64f..9a38f9c9d6eb 100644 > --- a/net/tipc/msg.h > +++ b/net/tipc/msg.h > @@ -160,20 +160,26 @@ struct tipc_gap_ack { > > /* struct tipc_gap_ack_blks > * @len: actual length of the record > - * @gack_cnt: number of Gap ACK blocks in the record > + * @bgack_cnt: number of Gap ACK blocks for broadcast in the record > + * @ugack_cnt: number of Gap ACK blocks for unicast (following the broadcast > + * ones) > + * @start_index: starting index for "valid" broadcast Gap ACK blocks > * @gacks: array of Gap ACK blocks > */ > struct tipc_gap_ack_blks { > __be16 len; > - u8 gack_cnt; > - u8 reserved; > + union { > + u8 ugack_cnt; > + u8 start_index; > + }; > + u8 bgack_cnt; > struct tipc_gap_ack gacks[]; > }; > > #define tipc_gap_ack_blks_sz(n) (sizeof(struct tipc_gap_ack_blks) + \ > sizeof(struct tipc_gap_ack) * (n)) > > -#define MAX_GAP_ACK_BLKS 32 > +#define MAX_GAP_ACK_BLKS 128 > #define MAX_GAP_ACK_BLKS_SZ tipc_gap_ack_blks_sz(MAX_GAP_ACK_BLKS) > > static inline struct tipc_msg *buf_msg(struct sk_buff *skb) > diff --git a/net/tipc/node.c b/net/tipc/node.c > index 0c88778c88b5..eb6b62de81a7 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -2069,10 +2069,16 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b) > le = &n->links[bearer_id]; > > /* Ensure broadcast reception is in synch with peer's send state */ > - if (unlikely(usr == LINK_PROTOCOL)) > + if (unlikely(usr == LINK_PROTOCOL)) { > + if (unlikely(skb_linearize(skb))) { > + tipc_node_put(n); > + goto discard; > + } > + hdr = buf_msg(skb); > tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq); > - else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) > + } else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) { > tipc_bcast_ack_rcv(net, n->bc_entry.link, hdr); > + } > > /* Receive packet directly if conditions permit */ > tipc_node_read_lock(n); -- |
From: Tuong L. <tuo...@de...> - 2020-03-13 10:47:58
|
Tuong Lien (2): tipc: add Gap ACK blocks support for broadcast link tipc: enable broadcast retransmission via unicast net/tipc/bcast.c | 16 +- net/tipc/bcast.h | 4 +- net/tipc/link.c | 442 +++++++++++++++++++++++++++++++++--------------------- net/tipc/link.h | 8 +- net/tipc/msg.h | 14 +- net/tipc/node.c | 12 +- net/tipc/sysctl.c | 9 +- 7 files changed, 316 insertions(+), 189 deletions(-) -- 2.13.7 |
From: Tuong L. <tuo...@de...> - 2020-03-13 10:47:58
|
In some environment, broadcast traffic is suppressed at high rate (i.e. a kind of bandwidth limit setting). When it is applied, TIPC broadcast can still run successfully. However, when it comes to a high load, some packets will be dropped first and TIPC tries to retransmit them but the packet retransmission is intentionally broadcast too, so making things worse and not helpful at all. This commit enables the broadcast retransmission via unicast which only retransmits packets to the specific peer that has really reported a gap i.e. not broadcasting to all nodes in the cluster, so will prevent from being suppressed, and also reduce some overheads on the other peers due to duplicates, finally improve the overall TIPC broadcast performance. Note: the functionality can be turned on/off via the sysctl file: echo 1 > /proc/sys/net/tipc/bc_retruni echo 0 > /proc/sys/net/tipc/bc_retruni Default is '0', i.e. the broadcast retransmission still works as usual. Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/bcast.c | 11 ++++++++--- net/tipc/bcast.h | 4 +++- net/tipc/link.c | 10 ++++++---- net/tipc/link.h | 3 ++- net/tipc/node.c | 2 +- net/tipc/sysctl.c | 9 ++++++++- 6 files changed, 28 insertions(+), 11 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 3ce690a96ee9..50a16f8bebd9 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -46,6 +46,7 @@ #define BCLINK_WIN_MIN 32 /* bcast minimum link window size */ const char tipc_bclink_name[] = "broadcast-link"; +unsigned long sysctl_tipc_bc_retruni __read_mostly; /** * struct tipc_bc_base - base structure for keeping broadcast send state @@ -474,7 +475,7 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, __skb_queue_head_init(&xmitq); tipc_bcast_lock(net); - tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq); + tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq, NULL); tipc_bcast_unlock(net); tipc_bcbase_xmit(net, &xmitq); @@ -489,7 +490,8 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, * RCU is locked, no other locks set */ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, - struct tipc_msg *hdr) + struct tipc_msg *hdr, + struct sk_buff_head *retrq) { struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq; struct tipc_gap_ack_blks *ga; @@ -503,8 +505,11 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, tipc_link_bc_init_rcv(l, hdr); } else if (!msg_bc_ack_invalid(hdr)) { tipc_get_gap_ack_blks(&ga, l, hdr, false); + if (!sysctl_tipc_bc_retruni) + retrq = &xmitq; rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), - msg_bc_gap(hdr), ga, &xmitq); + msg_bc_gap(hdr), ga, &xmitq, + retrq); rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq); } tipc_bcast_unlock(net); diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h index 9e847d9617d3..97d3cf9d3e4d 100644 --- a/net/tipc/bcast.h +++ b/net/tipc/bcast.h @@ -45,6 +45,7 @@ struct tipc_nl_msg; struct tipc_nlist; struct tipc_nitem; extern const char tipc_bclink_name[]; +extern unsigned long sysctl_tipc_bc_retruni; #define TIPC_METHOD_EXPIRE msecs_to_jiffies(5000) @@ -93,7 +94,8 @@ int tipc_bcast_rcv(struct net *net, struct tipc_link *l, struct sk_buff *skb); void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, struct tipc_msg *hdr); int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, - struct tipc_msg *hdr); + struct tipc_msg *hdr, + struct sk_buff_head *retrq); int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg); int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[]); int tipc_bclink_reset_stats(struct net *net); diff --git a/net/tipc/link.c b/net/tipc/link.c index 6198b6d89a69..dabdf08cc9f4 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -375,7 +375,7 @@ void tipc_link_remove_bc_peer(struct tipc_link *snd_l, snd_l->ackers--; rcv_l->bc_peer_is_up = true; rcv_l->state = LINK_ESTABLISHED; - tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq); + tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq, NULL); trace_tipc_link_reset(rcv_l, TIPC_DUMP_ALL, "bclink removed!"); tipc_link_reset(rcv_l); rcv_l->state = LINK_RESET; @@ -2412,7 +2412,8 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap, struct tipc_gap_ack_blks *ga, - struct sk_buff_head *xmitq) + struct sk_buff_head *xmitq, + struct sk_buff_head *retrq) { struct tipc_link *l = r->bc_sndlink; bool unused = false; @@ -2425,7 +2426,7 @@ int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap, return 0; trace_tipc_link_bc_ack(r, r->acked, acked, &l->transmq); - tipc_link_advance_transmq(l, r, acked, gap, ga, xmitq, &unused, &rc); + tipc_link_advance_transmq(l, r, acked, gap, ga, retrq, &unused, &rc); tipc_link_advance_backlog(l, xmitq); if (unlikely(!skb_queue_empty(&l->wakeupq))) @@ -2459,7 +2460,8 @@ int tipc_link_bc_nack_rcv(struct tipc_link *l, struct sk_buff *skb, return 0; if (dnode == tipc_own_addr(l->net)) { - rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, xmitq); + rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, xmitq, + xmitq); l->stats.recv_nacks++; return rc; } diff --git a/net/tipc/link.h b/net/tipc/link.h index 0a0fa7350722..4d0768cf91d5 100644 --- a/net/tipc/link.h +++ b/net/tipc/link.h @@ -147,7 +147,8 @@ u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct tipc_link *l, struct tipc_msg *hdr, bool uc); int tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, u16 gap, struct tipc_gap_ack_blks *ga, - struct sk_buff_head *xmitq); + struct sk_buff_head *xmitq, + struct sk_buff_head *retrq); void tipc_link_build_bc_sync_msg(struct tipc_link *l, struct sk_buff_head *xmitq); void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg *hdr); diff --git a/net/tipc/node.c b/net/tipc/node.c index eb6b62de81a7..917ad3920fac 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -1771,7 +1771,7 @@ static void tipc_node_bc_sync_rcv(struct tipc_node *n, struct tipc_msg *hdr, struct tipc_link *ucl; int rc; - rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr); + rc = tipc_bcast_sync_rcv(n->net, n->bc_entry.link, hdr, xmitq); if (rc & TIPC_LINK_DOWN_EVT) { tipc_node_reset_links(n); diff --git a/net/tipc/sysctl.c b/net/tipc/sysctl.c index 58ab3d6dcdce..97a6264a2993 100644 --- a/net/tipc/sysctl.c +++ b/net/tipc/sysctl.c @@ -36,7 +36,7 @@ #include "core.h" #include "trace.h" #include "crypto.h" - +#include "bcast.h" #include <linux/sysctl.h> static struct ctl_table_header *tipc_ctl_hdr; @@ -75,6 +75,13 @@ static struct ctl_table tipc_table[] = { .extra1 = SYSCTL_ONE, }, #endif + { + .procname = "bc_retruni", + .data = &sysctl_tipc_bc_retruni, + .maxlen = sizeof(sysctl_tipc_bc_retruni), + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + }, {} }; -- 2.13.7 |
From: Tuong L. <tuo...@de...> - 2020-03-13 10:47:55
|
As achieved through commit 9195948fbf34 ("tipc: improve TIPC throughput by Gap ACK blocks"), we apply the same mechanism for the broadcast link as well. The 'Gap ACK blocks' data field in a 'PROTOCOL/STATE_MSG' will consist of two parts built for both the broadcast and unicast types: 31 16 15 0 +-------------+-------------+-------------+-------------+ | bgack_cnt | ugack_cnt | len | +-------------+-------------+-------------+-------------+ - | gap | ack | | +-------------+-------------+-------------+-------------+ > bc gacks : : : | +-------------+-------------+-------------+-------------+ - | gap | ack | | +-------------+-------------+-------------+-------------+ > uc gacks : : : | +-------------+-------------+-------------+-------------+ - which is "automatically" backward-compatible. We also increase the max number of Gap ACK blocks to 128, allowing upto 64 blocks per type (total buffer size = 516 bytes). Besides, the 'tipc_link_advance_transmq()' function is refactored which is applicable for both the unicast and broadcast cases now, so some old functions can be removed and the code is optimized. With the patch, TIPC broadcast is more robust regardless of packet loss or disorder, latency, ... in the underlying network. Its performance is boost up significantly. For example, experiment with a 5% packet loss rate results: $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 0m 42.46s user 0m 1.16s sys 0m 17.67s Without the patch: $ time tipc-pipe --mc --rdm --data_size 123 --data_num 1500000 real 5m 28.80s user 0m 0.85s sys 0m 3.62s Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/bcast.c | 9 +- net/tipc/link.c | 440 +++++++++++++++++++++++++++++++++---------------------- net/tipc/link.h | 7 +- net/tipc/msg.h | 14 +- net/tipc/node.c | 10 +- 5 files changed, 295 insertions(+), 185 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 4c20be08b9c4..3ce690a96ee9 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -474,7 +474,7 @@ void tipc_bcast_ack_rcv(struct net *net, struct tipc_link *l, __skb_queue_head_init(&xmitq); tipc_bcast_lock(net); - tipc_link_bc_ack_rcv(l, acked, &xmitq); + tipc_link_bc_ack_rcv(l, acked, 0, NULL, &xmitq); tipc_bcast_unlock(net); tipc_bcbase_xmit(net, &xmitq); @@ -492,6 +492,7 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, struct tipc_msg *hdr) { struct sk_buff_head *inputq = &tipc_bc_base(net)->inputq; + struct tipc_gap_ack_blks *ga; struct sk_buff_head xmitq; int rc = 0; @@ -501,8 +502,10 @@ int tipc_bcast_sync_rcv(struct net *net, struct tipc_link *l, if (msg_type(hdr) != STATE_MSG) { tipc_link_bc_init_rcv(l, hdr); } else if (!msg_bc_ack_invalid(hdr)) { - tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), &xmitq); - rc = tipc_link_bc_sync_rcv(l, hdr, &xmitq); + tipc_get_gap_ack_blks(&ga, l, hdr, false); + rc = tipc_link_bc_ack_rcv(l, msg_bcast_ack(hdr), + msg_bc_gap(hdr), ga, &xmitq); + rc |= tipc_link_bc_sync_rcv(l, hdr, &xmitq); } tipc_bcast_unlock(net); diff --git a/net/tipc/link.c b/net/tipc/link.c index 467c53a1fb5c..6198b6d89a69 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -188,6 +188,8 @@ struct tipc_link { /* Broadcast */ u16 ackers; u16 acked; + u16 last_gap; + struct tipc_gap_ack_blks *last_ga; struct tipc_link *bc_rcvlink; struct tipc_link *bc_sndlink; u8 nack_state; @@ -249,11 +251,14 @@ static int tipc_link_build_nack_msg(struct tipc_link *l, struct sk_buff_head *xmitq); static void tipc_link_build_bc_init_msg(struct tipc_link *l, struct sk_buff_head *xmitq); -static int tipc_link_release_pkts(struct tipc_link *l, u16 to); -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap); -static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, + struct tipc_link *l, u8 start_index); +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct tipc_msg *hdr); +static int tipc_link_advance_transmq(struct tipc_link *l, struct tipc_link *r, + u16 acked, u16 gap, struct tipc_gap_ack_blks *ga, - struct sk_buff_head *xmitq); + struct sk_buff_head *xmitq, + bool *retransmitted, int *rc); static void tipc_link_update_cwin(struct tipc_link *l, int released, bool retransmitted); /* @@ -370,7 +375,7 @@ void tipc_link_remove_bc_peer(struct tipc_link *snd_l, snd_l->ackers--; rcv_l->bc_peer_is_up = true; rcv_l->state = LINK_ESTABLISHED; - tipc_link_bc_ack_rcv(rcv_l, ack, xmitq); + tipc_link_bc_ack_rcv(rcv_l, ack, 0, NULL, xmitq); trace_tipc_link_reset(rcv_l, TIPC_DUMP_ALL, "bclink removed!"); tipc_link_reset(rcv_l); rcv_l->state = LINK_RESET; @@ -784,8 +789,6 @@ bool tipc_link_too_silent(struct tipc_link *l) return (l->silent_intv_cnt + 2 > l->abort_limit); } -static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, - u16 from, u16 to, struct sk_buff_head *xmitq); /* tipc_link_timeout - perform periodic task as instructed from node timeout */ int tipc_link_timeout(struct tipc_link *l, struct sk_buff_head *xmitq) @@ -948,6 +951,9 @@ void tipc_link_reset(struct tipc_link *l) l->snd_nxt_state = 1; l->rcv_nxt_state = 1; l->acked = 0; + l->last_gap = 0; + kfree(l->last_ga); + l->last_ga = NULL; l->silent_intv_cnt = 0; l->rst_cnt = 0; l->bc_peer_is_up = false; @@ -1183,68 +1189,14 @@ static bool link_retransmit_failure(struct tipc_link *l, struct tipc_link *r, if (link_is_bc_sndlink(l)) { r->state = LINK_RESET; - *rc = TIPC_LINK_DOWN_EVT; + *rc |= TIPC_LINK_DOWN_EVT; } else { - *rc = tipc_link_fsm_evt(l, LINK_FAILURE_EVT); + *rc |= tipc_link_fsm_evt(l, LINK_FAILURE_EVT); } return true; } -/* tipc_link_bc_retrans() - retransmit zero or more packets - * @l: the link to transmit on - * @r: the receiving link ordering the retransmit. Same as l if unicast - * @from: retransmit from (inclusive) this sequence number - * @to: retransmit to (inclusive) this sequence number - * xmitq: queue for accumulating the retransmitted packets - */ -static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, - u16 from, u16 to, struct sk_buff_head *xmitq) -{ - struct sk_buff *_skb, *skb = skb_peek(&l->transmq); - u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; - u16 ack = l->rcv_nxt - 1; - int retransmitted = 0; - struct tipc_msg *hdr; - int rc = 0; - - if (!skb) - return 0; - if (less(to, from)) - return 0; - - trace_tipc_link_retrans(r, from, to, &l->transmq); - - if (link_retransmit_failure(l, r, &rc)) - return rc; - - skb_queue_walk(&l->transmq, skb) { - hdr = buf_msg(skb); - if (less(msg_seqno(hdr), from)) - continue; - if (more(msg_seqno(hdr), to)) - break; - if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) - continue; - TIPC_SKB_CB(skb)->nxt_retr = TIPC_BC_RETR_LIM; - _skb = pskb_copy(skb, GFP_ATOMIC); - if (!_skb) - return 0; - hdr = buf_msg(_skb); - msg_set_ack(hdr, ack); - msg_set_bcast_ack(hdr, bc_ack); - _skb->priority = TC_PRIO_CONTROL; - __skb_queue_tail(xmitq, _skb); - l->stats.retransmitted++; - retransmitted++; - /* Increase actual retrans counter & mark first time */ - if (!TIPC_SKB_CB(skb)->retr_cnt++) - TIPC_SKB_CB(skb)->retr_stamp = jiffies; - } - tipc_link_update_cwin(l, 0, retransmitted); - return 0; -} - /* tipc_data_input - deliver data and name distr msgs to upper layer * * Consumes buffer if message is of right type @@ -1402,46 +1354,71 @@ static int tipc_link_tnl_rcv(struct tipc_link *l, struct sk_buff *skb, return rc; } -static int tipc_link_release_pkts(struct tipc_link *l, u16 acked) -{ - int released = 0; - struct sk_buff *skb, *tmp; - - skb_queue_walk_safe(&l->transmq, skb, tmp) { - if (more(buf_seqno(skb), acked)) - break; - __skb_unlink(skb, &l->transmq); - kfree_skb(skb); - released++; +/** + * tipc_get_gap_ack_blks - get Gap ACK blocks from PROTOCOL/STATE_MSG + * @ga: returned pointer to the Gap ACK blocks if any + * @l: the tipc link + * @hdr: the PROTOCOL/STATE_MSG header + * @uc: desired Gap ACK blocks type, i.e. unicast (= 1) or broadcast (= 0) + * + * Return: the total Gap ACK blocks size + */ +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct tipc_link *l, + struct tipc_msg *hdr, bool uc) +{ + struct tipc_gap_ack_blks *p; + u16 sz = 0; + + /* Does peer support the Gap ACK blocks feature? */ + if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { + p = (struct tipc_gap_ack_blks *)msg_data(hdr); + sz = ntohs(p->len); + /* Sanity check */ + if (sz == tipc_gap_ack_blks_sz(p->ugack_cnt + p->bgack_cnt)) { + /* Good, check if the desired type exists */ + if ((uc && p->ugack_cnt) || (!uc && p->bgack_cnt)) + goto ok; + /* Backward compatible: peer might not support bc, but uc? */ + } else if (uc && sz == tipc_gap_ack_blks_sz(p->ugack_cnt)) { + if (p->ugack_cnt) { + p->bgack_cnt = 0; + goto ok; + } + } } - return released; + /* Other cases: ignore! */ + p = NULL; + +ok: + *ga = p; + return sz; } -/* tipc_build_gap_ack_blks - build Gap ACK blocks - * @l: tipc link that data have come with gaps in sequence if any - * @data: data buffer to store the Gap ACK blocks after built - * - * returns the actual allocated memory size - */ -static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) +static u8 __tipc_build_gap_ack_blks(struct tipc_gap_ack_blks *ga, + struct tipc_link *l, u8 start_index) { + struct tipc_gap_ack *gacks = &ga->gacks[start_index]; struct sk_buff *skb = skb_peek(&l->deferdq); - struct tipc_gap_ack_blks *ga = data; - u16 len, expect, seqno = 0; + u16 expect, seqno = 0; u8 n = 0; - if (!skb || !gap) - goto exit; + if (!skb) + return 0; expect = buf_seqno(skb); skb_queue_walk(&l->deferdq, skb) { seqno = buf_seqno(skb); if (unlikely(more(seqno, expect))) { - ga->gacks[n].ack = htons(expect - 1); - ga->gacks[n].gap = htons(seqno - expect); - if (++n >= MAX_GAP_ACK_BLKS) { - pr_info_ratelimited("Too few Gap ACK blocks!\n"); - goto exit; + gacks[n].ack = htons(expect - 1); + gacks[n].gap = htons(seqno - expect); + if (++n >= MAX_GAP_ACK_BLKS / 2) { + char buf[TIPC_MAX_LINK_NAME]; + + pr_info_ratelimited("Gacks on %s: %d, ql: %d!\n", + tipc_link_name_ext(l, buf), + n, + skb_queue_len(&l->deferdq)); + return n; } } else if (unlikely(less(seqno, expect))) { pr_warn("Unexpected skb in deferdq!\n"); @@ -1451,14 +1428,57 @@ static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) } /* last block */ - ga->gacks[n].ack = htons(seqno); - ga->gacks[n].gap = 0; + gacks[n].ack = htons(seqno); + gacks[n].gap = 0; n++; + return n; +} -exit: - len = tipc_gap_ack_blks_sz(n); +/* tipc_build_gap_ack_blks - build Gap ACK blocks + * @l: tipc unicast link + * @hdr: the tipc message buffer to store the Gap ACK blocks after built + * + * The function builds Gap ACK blocks for both the unicast & broadcast receiver + * links of a certain peer, the buffer after built has the network data format + * as follows: + * 31 16 15 0 + * +-------------+-------------+-------------+-------------+ + * | bgack_cnt | ugack_cnt | len | + * +-------------+-------------+-------------+-------------+ - + * | gap | ack | | + * +-------------+-------------+-------------+-------------+ > bc gacks + * : : : | + * +-------------+-------------+-------------+-------------+ - + * | gap | ack | | + * +-------------+-------------+-------------+-------------+ > uc gacks + * : : : | + * +-------------+-------------+-------------+-------------+ - + * (See struct tipc_gap_ack_blks) + * + * returns the actual allocated memory size + */ +static u16 tipc_build_gap_ack_blks(struct tipc_link *l, struct tipc_msg *hdr) +{ + struct tipc_link *bcl = l->bc_rcvlink; + struct tipc_gap_ack_blks *ga; + u16 len; + + ga = (struct tipc_gap_ack_blks *)msg_data(hdr); + + /* Start with broadcast link first */ + tipc_bcast_lock(bcl->net); + msg_set_bcast_ack(hdr, bcl->rcv_nxt - 1); + msg_set_bc_gap(hdr, link_bc_rcv_gap(bcl)); + ga->bgack_cnt = __tipc_build_gap_ack_blks(ga, bcl, 0); + tipc_bcast_unlock(bcl->net); + + /* Now for unicast link, but an explicit NACK only (???) */ + ga->ugack_cnt = (msg_seq_gap(hdr)) ? + __tipc_build_gap_ack_blks(ga, l, ga->bgack_cnt) : 0; + + /* Total len */ + len = tipc_gap_ack_blks_sz(ga->bgack_cnt + ga->ugack_cnt); ga->len = htons(len); - ga->gack_cnt = n; return len; } @@ -1466,47 +1486,111 @@ static u16 tipc_build_gap_ack_blks(struct tipc_link *l, void *data, u16 gap) * acked packets, also doing retransmissions if * gaps found * @l: tipc link with transmq queue to be advanced + * @r: tipc link "receiver" i.e. in case of broadcast (= "l" if unicast) * @acked: seqno of last packet acked by peer without any gaps before * @gap: # of gap packets * @ga: buffer pointer to Gap ACK blocks from peer * @xmitq: queue for accumulating the retransmitted packets if any + * @retransmitted: returned boolean value if a retransmission is really issued + * @rc: returned code e.g. TIPC_LINK_DOWN_EVT if a repeated retransmit failures + * happens (- unlikely case) * - * In case of a repeated retransmit failures, the call will return shortly - * with a returned code (e.g. TIPC_LINK_DOWN_EVT) + * Return: the number of packets released from the link transmq */ -static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, +static int tipc_link_advance_transmq(struct tipc_link *l, struct tipc_link *r, + u16 acked, u16 gap, struct tipc_gap_ack_blks *ga, - struct sk_buff_head *xmitq) + struct sk_buff_head *xmitq, + bool *retransmitted, int *rc) { + struct tipc_gap_ack_blks *last_ga = r->last_ga, *this_ga = NULL; + struct tipc_gap_ack *gacks = NULL; struct sk_buff *skb, *_skb, *tmp; struct tipc_msg *hdr; + u32 qlen = skb_queue_len(&l->transmq); + u16 nacked = acked, ngap = gap, gack_cnt = 0; u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; - bool retransmitted = false; u16 ack = l->rcv_nxt - 1; - bool passed = false; - u16 released = 0; u16 seqno, n = 0; - int rc = 0; + u16 end = r->acked, start = end, offset = r->last_gap; + u16 si = (last_ga) ? last_ga->start_index : 0; + bool is_uc = !link_is_bc_sndlink(l); + bool bc_has_acked = false; + + trace_tipc_link_retrans(r, acked + 1, acked + gap, &l->transmq); + + /* Determine Gap ACK blocks if any for the particular link */ + if (ga && is_uc) { + /* Get the Gap ACKs, uc part */ + gack_cnt = ga->ugack_cnt; + gacks = &ga->gacks[ga->bgack_cnt]; + } else if (ga) { + /* Copy the Gap ACKs, bc part, for later renewal if needed */ + this_ga = kmemdup(ga, tipc_gap_ack_blks_sz(ga->bgack_cnt), + GFP_ATOMIC); + if (likely(this_ga)) { + this_ga->start_index = 0; + /* Start with the bc Gap ACKs */ + gack_cnt = this_ga->bgack_cnt; + gacks = &this_ga->gacks[0]; + } else { + /* Hmm, we can get in trouble..., simply ignore it */ + pr_warn_ratelimited("Ignoring bc Gap ACKs, no memory\n"); + } + } + /* Advance the link transmq */ skb_queue_walk_safe(&l->transmq, skb, tmp) { seqno = buf_seqno(skb); next_gap_ack: - if (less_eq(seqno, acked)) { + if (less_eq(seqno, nacked)) { + if (is_uc) + goto release; + /* Skip packets peer has already acked */ + if (!more(seqno, r->acked)) + continue; + /* Get the next of last Gap ACK blocks */ + while (more(seqno, end)) { + if (!last_ga || si >= last_ga->bgack_cnt) + break; + start = end + offset + 1; + end = ntohs(last_ga->gacks[si].ack); + offset = ntohs(last_ga->gacks[si].gap); + si++; + WARN_ONCE(more(start, end) || + (!offset && + si < last_ga->bgack_cnt) || + si > MAX_GAP_ACK_BLKS, + "Corrupted Gap ACK: %d %d %d %d %d\n", + start, end, offset, si, + last_ga->bgack_cnt); + } + /* Check against the last Gap ACK block */ + if (in_range(seqno, start, end)) + continue; + /* Update/release the packet peer is acking */ + bc_has_acked = true; + if (--TIPC_SKB_CB(skb)->ackers) + continue; +release: /* release skb */ __skb_unlink(skb, &l->transmq); kfree_skb(skb); - released++; - } else if (less_eq(seqno, acked + gap)) { - /* First, check if repeated retrans failures occurs? */ - if (!passed && link_retransmit_failure(l, l, &rc)) - return rc; - passed = true; - + } else if (less_eq(seqno, nacked + ngap)) { + /* First gap: check if repeated retrans failures? */ + if (unlikely(seqno == acked + 1 && + link_retransmit_failure(l, r, rc))) { + /* Ignore this bc Gap ACKs if any */ + kfree(this_ga); + this_ga = NULL; + break; + } /* retransmit skb if unrestricted*/ if (time_before(jiffies, TIPC_SKB_CB(skb)->nxt_retr)) continue; - TIPC_SKB_CB(skb)->nxt_retr = TIPC_UC_RETR_TIME; + TIPC_SKB_CB(skb)->nxt_retr = (is_uc) ? + TIPC_UC_RETR_TIME : TIPC_BC_RETR_LIM; _skb = pskb_copy(skb, GFP_ATOMIC); if (!_skb) continue; @@ -1516,25 +1600,50 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, _skb->priority = TC_PRIO_CONTROL; __skb_queue_tail(xmitq, _skb); l->stats.retransmitted++; - retransmitted = true; + *retransmitted = true; /* Increase actual retrans counter & mark first time */ if (!TIPC_SKB_CB(skb)->retr_cnt++) TIPC_SKB_CB(skb)->retr_stamp = jiffies; } else { /* retry with Gap ACK blocks if any */ - if (!ga || n >= ga->gack_cnt) + if (n >= gack_cnt) break; - acked = ntohs(ga->gacks[n].ack); - gap = ntohs(ga->gacks[n].gap); + nacked = ntohs(gacks[n].ack); + ngap = ntohs(gacks[n].gap); n++; goto next_gap_ack; } } - if (released || retransmitted) - tipc_link_update_cwin(l, released, retransmitted); - if (released) - tipc_link_advance_backlog(l, xmitq); - return 0; + + /* Renew last Gap ACK blocks for bc if needed */ + if (bc_has_acked) { + if (this_ga) { + kfree(last_ga); + r->last_ga = this_ga; + r->last_gap = gap; + } else if (last_ga) { + if (less(acked, start)) { + si--; + offset = start - acked - 1; + } else if (less(acked, end)) { + acked = end; + } + if (si < last_ga->bgack_cnt) { + last_ga->start_index = si; + r->last_gap = offset; + } else { + kfree(last_ga); + r->last_ga = NULL; + r->last_gap = 0; + } + } else { + r->last_gap = 0; + } + r->acked = acked; + } else { + kfree(this_ga); + } + return skb_queue_len(&l->transmq) - qlen; } /* tipc_link_build_state_msg: prepare link state message for transmission @@ -1651,7 +1760,8 @@ int tipc_link_rcv(struct tipc_link *l, struct sk_buff *skb, kfree_skb(skb); break; } - released += tipc_link_release_pkts(l, msg_ack(hdr)); + released += tipc_link_advance_transmq(l, l, msg_ack(hdr), 0, + NULL, NULL, NULL, NULL); /* Defer delivery if sequence gap */ if (unlikely(seqno != rcv_nxt)) { @@ -1739,7 +1849,7 @@ static void tipc_link_build_proto_msg(struct tipc_link *l, int mtyp, bool probe, msg_set_probe(hdr, probe); msg_set_is_keepalive(hdr, probe || probe_reply); if (l->peer_caps & TIPC_GAP_ACK_BLOCK) - glen = tipc_build_gap_ack_blks(l, data, rcvgap); + glen = tipc_build_gap_ack_blks(l, hdr); tipc_mon_prep(l->net, data + glen, &dlen, mstate, l->bearer_id); msg_set_size(hdr, INT_H_SIZE + glen + dlen); skb_trim(skb, INT_H_SIZE + glen + dlen); @@ -2027,20 +2137,19 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, { struct tipc_msg *hdr = buf_msg(skb); struct tipc_gap_ack_blks *ga = NULL; - u16 rcvgap = 0; - u16 ack = msg_ack(hdr); - u16 gap = msg_seq_gap(hdr); + bool reply = msg_probe(hdr), retransmitted = false; + u16 dlen = msg_data_sz(hdr), glen = 0; u16 peers_snd_nxt = msg_next_sent(hdr); u16 peers_tol = msg_link_tolerance(hdr); u16 peers_prio = msg_linkprio(hdr); + u16 gap = msg_seq_gap(hdr); + u16 ack = msg_ack(hdr); u16 rcv_nxt = l->rcv_nxt; - u16 dlen = msg_data_sz(hdr); + u16 rcvgap = 0; int mtyp = msg_type(hdr); - bool reply = msg_probe(hdr); - u16 glen = 0; - void *data; + int rc = 0, released; char *if_name; - int rc = 0; + void *data; trace_tipc_proto_rcv(skb, false, l->name); if (tipc_link_is_blocked(l) || !xmitq) @@ -2137,13 +2246,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, } /* Receive Gap ACK blocks from peer if any */ - if (l->peer_caps & TIPC_GAP_ACK_BLOCK) { - ga = (struct tipc_gap_ack_blks *)data; - glen = ntohs(ga->len); - /* sanity check: if failed, ignore Gap ACK blocks */ - if (glen != tipc_gap_ack_blks_sz(ga->gack_cnt)) - ga = NULL; - } + glen = tipc_get_gap_ack_blks(&ga, l, hdr, true); tipc_mon_rcv(l->net, data + glen, dlen - glen, l->addr, &l->mon_state, l->bearer_id); @@ -2158,9 +2261,14 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb, tipc_link_build_proto_msg(l, STATE_MSG, 0, reply, rcvgap, 0, 0, xmitq); - rc |= tipc_link_advance_transmq(l, ack, gap, ga, xmitq); + released = tipc_link_advance_transmq(l, l, ack, gap, ga, xmitq, + &retransmitted, &rc); if (gap) l->stats.recv_nacks++; + if (released || retransmitted) + tipc_link_update_cwin(l, released, retransmitted); + if (released) + tipc_link_advance_backlog(l, xmitq); if (unlikely(!skb_queue_empty(&l->wakeupq))) link_prepare_wakeup(l); } @@ -2246,10 +2354,7 @@ void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg *hdr) int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, struct sk_buff_head *xmitq) { - struct tipc_link *snd_l = l->bc_sndlink; u16 peers_snd_nxt = msg_bc_snd_nxt(hdr); - u16 from = msg_bcast_ack(hdr) + 1; - u16 to = from + msg_bc_gap(hdr) - 1; int rc = 0; if (!link_is_up(l)) @@ -2271,8 +2376,6 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, if (more(peers_snd_nxt, l->rcv_nxt + l->window)) return rc; - rc = tipc_link_bc_retrans(snd_l, l, from, to, xmitq); - l->snd_nxt = peers_snd_nxt; if (link_bc_rcv_gap(l)) rc |= TIPC_LINK_SND_STATE; @@ -2307,38 +2410,28 @@ int tipc_link_bc_sync_rcv(struct tipc_link *l, struct tipc_msg *hdr, return 0; } -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, - struct sk_buff_head *xmitq) +int tipc_link_bc_ack_rcv(struct tipc_link *r, u16 acked, u16 gap, + struct tipc_gap_ack_blks *ga, + struct sk_buff_head *xmitq) { - struct sk_buff *skb, *tmp; - struct tipc_link *snd_l = l->bc_sndlink; + struct tipc_link *l = r->bc_sndlink; + bool unused = false; + int rc = 0; - if (!link_is_up(l) || !l->bc_peer_is_up) - return; + if (!link_is_up(r) || !r->bc_peer_is_up) + return 0; - if (!more(acked, l->acked)) - return; + if (less(acked, r->acked) || (acked == r->acked && !gap && !ga)) + return 0; - trace_tipc_link_bc_ack(l, l->acked, acked, &snd_l->transmq); - /* Skip over packets peer has already acked */ - skb_queue_walk(&snd_l->transmq, skb) { - if (more(buf_seqno(skb), l->acked)) - break; - } + trace_tipc_link_bc_ack(r, r->acked, acked, &l->transmq); + tipc_link_advance_transmq(l, r, acked, gap, ga, xmitq, &unused, &rc); - /* Update/release the packets peer is acking now */ - skb_queue_walk_from_safe(&snd_l->transmq, skb, tmp) { - if (more(buf_seqno(skb), acked)) - break; - if (!--TIPC_SKB_CB(skb)->ackers) { - __skb_unlink(skb, &snd_l->transmq); - kfree_skb(skb); - } - } - l->acked = acked; - tipc_link_advance_backlog(snd_l, xmitq); - if (unlikely(!skb_queue_empty(&snd_l->wakeupq))) - link_prepare_wakeup(snd_l); + tipc_link_advance_backlog(l, xmitq); + if (unlikely(!skb_queue_empty(&l->wakeupq))) + link_prepare_wakeup(l); + + return rc; } /* tipc_link_bc_nack_rcv(): receive broadcast nack message @@ -2366,8 +2459,7 @@ int tipc_link_bc_nack_rcv(struct tipc_link *l, struct sk_buff *skb, return 0; if (dnode == tipc_own_addr(l->net)) { - tipc_link_bc_ack_rcv(l, acked, xmitq); - rc = tipc_link_bc_retrans(l->bc_sndlink, l, from, to, xmitq); + rc = tipc_link_bc_ack_rcv(l, acked, to - acked, NULL, xmitq); l->stats.recv_nacks++; return rc; } diff --git a/net/tipc/link.h b/net/tipc/link.h index d3c1c3fc1659..0a0fa7350722 100644 --- a/net/tipc/link.h +++ b/net/tipc/link.h @@ -143,8 +143,11 @@ int tipc_link_bc_peers(struct tipc_link *l); void tipc_link_set_mtu(struct tipc_link *l, int mtu); int tipc_link_mtu(struct tipc_link *l); int tipc_link_mss(struct tipc_link *l); -void tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, - struct sk_buff_head *xmitq); +u16 tipc_get_gap_ack_blks(struct tipc_gap_ack_blks **ga, struct tipc_link *l, + struct tipc_msg *hdr, bool uc); +int tipc_link_bc_ack_rcv(struct tipc_link *l, u16 acked, u16 gap, + struct tipc_gap_ack_blks *ga, + struct sk_buff_head *xmitq); void tipc_link_build_bc_sync_msg(struct tipc_link *l, struct sk_buff_head *xmitq); void tipc_link_bc_init_rcv(struct tipc_link *l, struct tipc_msg *hdr); diff --git a/net/tipc/msg.h b/net/tipc/msg.h index 6d466ebdb64f..9a38f9c9d6eb 100644 --- a/net/tipc/msg.h +++ b/net/tipc/msg.h @@ -160,20 +160,26 @@ struct tipc_gap_ack { /* struct tipc_gap_ack_blks * @len: actual length of the record - * @gack_cnt: number of Gap ACK blocks in the record + * @bgack_cnt: number of Gap ACK blocks for broadcast in the record + * @ugack_cnt: number of Gap ACK blocks for unicast (following the broadcast + * ones) + * @start_index: starting index for "valid" broadcast Gap ACK blocks * @gacks: array of Gap ACK blocks */ struct tipc_gap_ack_blks { __be16 len; - u8 gack_cnt; - u8 reserved; + union { + u8 ugack_cnt; + u8 start_index; + }; + u8 bgack_cnt; struct tipc_gap_ack gacks[]; }; #define tipc_gap_ack_blks_sz(n) (sizeof(struct tipc_gap_ack_blks) + \ sizeof(struct tipc_gap_ack) * (n)) -#define MAX_GAP_ACK_BLKS 32 +#define MAX_GAP_ACK_BLKS 128 #define MAX_GAP_ACK_BLKS_SZ tipc_gap_ack_blks_sz(MAX_GAP_ACK_BLKS) static inline struct tipc_msg *buf_msg(struct sk_buff *skb) diff --git a/net/tipc/node.c b/net/tipc/node.c index 0c88778c88b5..eb6b62de81a7 100644 --- a/net/tipc/node.c +++ b/net/tipc/node.c @@ -2069,10 +2069,16 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b) le = &n->links[bearer_id]; /* Ensure broadcast reception is in synch with peer's send state */ - if (unlikely(usr == LINK_PROTOCOL)) + if (unlikely(usr == LINK_PROTOCOL)) { + if (unlikely(skb_linearize(skb))) { + tipc_node_put(n); + goto discard; + } + hdr = buf_msg(skb); tipc_node_bc_sync_rcv(n, hdr, bearer_id, &xmitq); - else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) + } else if (unlikely(tipc_link_acked(n->bc_entry.link) != bc_ack)) { tipc_bcast_ack_rcv(net, n->bc_entry.link, hdr); + } /* Receive packet directly if conditions permit */ tipc_node_read_lock(n); -- 2.13.7 |
From: Jon M. <jm...@re...> - 2020-03-12 13:32:57
|
On 3/12/20 2:38 AM, hoa...@de... wrote: > From: Hoang Le <hoa...@de...> > > Calling: > tipc_node_link_down()-> > - tipc_node_write_unlock()->tipc_mon_peer_down() > - tipc_mon_peer_down() > just after disabling bearer could be caused kernel oops. > > Fix this by adding a sanity check to make sure valid memory > access. > > Signed-off-by: Hoang Le <hoa...@de...> > --- > net/tipc/monitor.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c > index 58708b4c7719..6dce2abf436e 100644 > --- a/net/tipc/monitor.c > +++ b/net/tipc/monitor.c > @@ -322,9 +322,13 @@ static void mon_assign_roles(struct tipc_monitor *mon, struct tipc_peer *head) > void tipc_mon_remove_peer(struct net *net, u32 addr, int bearer_id) > { > struct tipc_monitor *mon = tipc_monitor(net, bearer_id); > - struct tipc_peer *self = get_self(net, bearer_id); > + struct tipc_peer *self; > struct tipc_peer *peer, *prev, *head; > > + if (!mon) > + return; > + > + self = get_self(net, bearer_id); > write_lock_bh(&mon->lock); > peer = get_peer(mon, addr); > if (!peer) > @@ -407,11 +411,15 @@ void tipc_mon_peer_up(struct net *net, u32 addr, int bearer_id) > void tipc_mon_peer_down(struct net *net, u32 addr, int bearer_id) > { > struct tipc_monitor *mon = tipc_monitor(net, bearer_id); > - struct tipc_peer *self = get_self(net, bearer_id); > + struct tipc_peer *self; > struct tipc_peer *peer, *head; > struct tipc_mon_domain *dom; > int applied; > > + if (!mon) > + return; > + > + self = get_self(net, bearer_id); > write_lock_bh(&mon->lock); > peer = get_peer(mon, addr); > if (!peer) { Acked-by: Jon Maloy <jm...@re...> -- |
From: <hoa...@de...> - 2020-03-12 06:38:50
|
From: Hoang Le <hoa...@de...> Calling: tipc_node_link_down()-> - tipc_node_write_unlock()->tipc_mon_peer_down() - tipc_mon_peer_down() just after disabling bearer could be caused kernel oops. Fix this by adding a sanity check to make sure valid memory access. Signed-off-by: Hoang Le <hoa...@de...> --- net/tipc/monitor.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c index 58708b4c7719..6dce2abf436e 100644 --- a/net/tipc/monitor.c +++ b/net/tipc/monitor.c @@ -322,9 +322,13 @@ static void mon_assign_roles(struct tipc_monitor *mon, struct tipc_peer *head) void tipc_mon_remove_peer(struct net *net, u32 addr, int bearer_id) { struct tipc_monitor *mon = tipc_monitor(net, bearer_id); - struct tipc_peer *self = get_self(net, bearer_id); + struct tipc_peer *self; struct tipc_peer *peer, *prev, *head; + if (!mon) + return; + + self = get_self(net, bearer_id); write_lock_bh(&mon->lock); peer = get_peer(mon, addr); if (!peer) @@ -407,11 +411,15 @@ void tipc_mon_peer_up(struct net *net, u32 addr, int bearer_id) void tipc_mon_peer_down(struct net *net, u32 addr, int bearer_id) { struct tipc_monitor *mon = tipc_monitor(net, bearer_id); - struct tipc_peer *self = get_self(net, bearer_id); + struct tipc_peer *self; struct tipc_peer *peer, *head; struct tipc_mon_domain *dom; int applied; + if (!mon) + return; + + self = get_self(net, bearer_id); write_lock_bh(&mon->lock); peer = get_peer(mon, addr); if (!peer) { -- 2.20.1 |
From: Jon M. <jm...@re...> - 2020-03-11 14:39:46
|
Looks good. Not sure if this is needed for utils, but still: Acked-by: Jon Maloy <jm...@re...> ///jon On 3/11/20 6:34 AM, Tuong Lien Tong wrote: > Resend this... It seemed to be dropped somehow... > > BR/Tuong > > -----Original Message----- > From: Tuong Lien <tuo...@de...> > Sent: Wednesday, February 19, 2020 2:42 PM > To: tip...@li...; jm...@re... > Cc: tip...@de...; Tuong Lien <tuo...@de...> > Subject: [PATCH] ptts: fix tipcTS failure in case of latency > > The 'ptts' test keeps failed when testing under high traffic with some > network latency. This is because the 'tipcTS' server side doesn't wait > long enough at its 'select()' call, just 1s+ and gets timeout. The > time variable is also not re-initiated after the 1st timeout, so the > next attempts just return shortly i.e. timeout = 0: > > ./tipcTS -v > ... > Received on 0 sockets in subtest 6, expected 2 > Received on 0 sockets in subtest 6, expected 2 > Received on 0 sockets in subtest 6, expected 2 > ===>Finished SubTest 7: received 0 msgs of sz -1 at 2 sockets (40 per > socket) > TEST FAILED Received wrong number of multicast messages > > The commit fixes the issue by increasing the timeout value to 3s and also > re-initiating it correctly. > > Signed-off-by: Tuong Lien <tuo...@de...> > --- > test/ptts/tipc_ts_server.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/test/ptts/tipc_ts_server.c b/test/ptts/tipc_ts_server.c > index 3a2f96f..e102c94 100644 > --- a/test/ptts/tipc_ts_server.c > +++ b/test/ptts/tipc_ts_server.c > @@ -610,7 +610,7 @@ void server_mcast > rcvbuf = malloc(66000); > buf = rcvbuf; > recvSyncTIPC (TS_SYNC_ID_3); /* wait for client to tell us to > start */ > - timeout.tv_sec = 1; > + timeout.tv_sec = 3; > timeout.tv_usec = 0; > dbg1("===>Starting SubTest %d\n", st); > > @@ -625,12 +625,12 @@ void server_mcast > while (sk_cnt < exp_sks ) { > fds = *readfds; > num_ready = select(FD_SETSIZE, &fds, NULL, NULL, > &timeout); > + timeout.tv_sec = 3; > if (!num_ready) { > printf("Received on %u sockets in subtest > %u, expected %u\n", > sk_cnt, st, exp_num[numSubTest]); > break; > } > - timeout.tv_sec = 1; > for (i = 0; i < TIPC_MCAST_SOCKETS; i++) { > > if(!FD_ISSET(sd[i], &fds)) -- |
From: Tuong L. T. <tuo...@de...> - 2020-03-11 10:34:14
|
Resend this... It seemed to be dropped somehow... BR/Tuong -----Original Message----- From: Tuong Lien <tuo...@de...> Sent: Wednesday, February 19, 2020 2:42 PM To: tip...@li...; jm...@re... Cc: tip...@de...; Tuong Lien <tuo...@de...> Subject: [PATCH] ptts: fix tipcTS failure in case of latency The 'ptts' test keeps failed when testing under high traffic with some network latency. This is because the 'tipcTS' server side doesn't wait long enough at its 'select()' call, just 1s+ and gets timeout. The time variable is also not re-initiated after the 1st timeout, so the next attempts just return shortly i.e. timeout = 0: ./tipcTS -v ... Received on 0 sockets in subtest 6, expected 2 Received on 0 sockets in subtest 6, expected 2 Received on 0 sockets in subtest 6, expected 2 ===>Finished SubTest 7: received 0 msgs of sz -1 at 2 sockets (40 per socket) TEST FAILED Received wrong number of multicast messages The commit fixes the issue by increasing the timeout value to 3s and also re-initiating it correctly. Signed-off-by: Tuong Lien <tuo...@de...> --- test/ptts/tipc_ts_server.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/test/ptts/tipc_ts_server.c b/test/ptts/tipc_ts_server.c index 3a2f96f..e102c94 100644 --- a/test/ptts/tipc_ts_server.c +++ b/test/ptts/tipc_ts_server.c @@ -610,7 +610,7 @@ void server_mcast rcvbuf = malloc(66000); buf = rcvbuf; recvSyncTIPC (TS_SYNC_ID_3); /* wait for client to tell us to start */ - timeout.tv_sec = 1; + timeout.tv_sec = 3; timeout.tv_usec = 0; dbg1("===>Starting SubTest %d\n", st); @@ -625,12 +625,12 @@ void server_mcast while (sk_cnt < exp_sks ) { fds = *readfds; num_ready = select(FD_SETSIZE, &fds, NULL, NULL, &timeout); + timeout.tv_sec = 3; if (!num_ready) { printf("Received on %u sockets in subtest %u, expected %u\n", sk_cnt, st, exp_num[numSubTest]); break; } - timeout.tv_sec = 1; for (i = 0; i < TIPC_MCAST_SOCKETS; i++) { if(!FD_ISSET(sd[i], &fds)) -- 2.1.4 |
From: Jon M. <jm...@re...> - 2020-03-05 19:01:22
|
Acked-by: Jon Maloy <jm...@re...> On 3/5/20 7:52 AM, Xue, Ying wrote: > Acked-by: Ying Xue <yin...@wi...> > > -----Original Message----- > From: Hoang Le [mailto:hoa...@de...] > Sent: Friday, February 21, 2020 12:49 PM > To: jm...@re...; ma...@do...; tip...@li...; Xue, Ying > Subject: [net-next] tipc: simplify trivial boolean return > > Checking and returning 'true' boolean is useless as it will be > returning at end of function > > Signed-off-by: Hoang Le <hoa...@de...> > --- > net/tipc/msg.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/net/tipc/msg.c b/net/tipc/msg.c > index 0d515d20b056..4d0e0bdd997b 100644 > --- a/net/tipc/msg.c > +++ b/net/tipc/msg.c > @@ -736,9 +736,6 @@ bool tipc_msg_lookup_dest(struct net *net, struct sk_buff *skb, int *err) > msg_set_destport(msg, dport); > *err = TIPC_OK; > > - if (!skb_cloned(skb)) > - return true; > - > return true; > } > -- |
From: Xue, Y. <Yin...@wi...> - 2020-03-05 13:26:53
|
Acked-by: Ying Xue <yin...@wi...> -----Original Message----- From: Hoang Le [mailto:hoa...@de...] Sent: Friday, February 21, 2020 12:49 PM To: jm...@re...; ma...@do...; tip...@li...; Xue, Ying Subject: [net-next] tipc: simplify trivial boolean return Checking and returning 'true' boolean is useless as it will be returning at end of function Signed-off-by: Hoang Le <hoa...@de...> --- net/tipc/msg.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/net/tipc/msg.c b/net/tipc/msg.c index 0d515d20b056..4d0e0bdd997b 100644 --- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@ -736,9 +736,6 @@ bool tipc_msg_lookup_dest(struct net *net, struct sk_buff *skb, int *err) msg_set_destport(msg, dport); *err = TIPC_OK; - if (!skb_cloned(skb)) - return true; - return true; } -- 2.20.1 |
From: Hoang Le <hoa...@de...> - 2020-02-21 04:49:23
|
Checking and returning 'true' boolean is useless as it will be returning at end of function Signed-off-by: Hoang Le <hoa...@de...> --- net/tipc/msg.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/net/tipc/msg.c b/net/tipc/msg.c index 0d515d20b056..4d0e0bdd997b 100644 --- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@ -736,9 +736,6 @@ bool tipc_msg_lookup_dest(struct net *net, struct sk_buff *skb, int *err) msg_set_destport(msg, dport); *err = TIPC_OK; - if (!skb_cloned(skb)) - return true; - return true; } -- 2.20.1 |
From: Xin L. <luc...@gm...> - 2020-02-20 20:11:04
|
On Wed, Feb 19, 2020 at 4:34 PM Dmitry Vyukov <dv...@go...> wrote: > > On Wed, Feb 19, 2020 at 9:29 AM Dmitry Vyukov <dv...@go...> wrote: > > > > On Mon, Aug 12, 2019 at 9:44 AM Ying Xue <yin...@wi...> wrote: > > > > > > syzbot found the following issue: > > > > > > [ 81.119772][ T8612] BUG: using smp_processor_id() in preemptible [00000000] code: syz-executor834/8612 > > > [ 81.136212][ T8612] caller is dst_cache_get+0x3d/0xb0 > > > [ 81.141450][ T8612] CPU: 0 PID: 8612 Comm: syz-executor834 Not tainted 5.2.0-rc6+ #48 > > > [ 81.149435][ T8612] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 > > > [ 81.159480][ T8612] Call Trace: > > > [ 81.162789][ T8612] dump_stack+0x172/0x1f0 > > > [ 81.167123][ T8612] debug_smp_processor_id+0x251/0x280 > > > [ 81.172479][ T8612] dst_cache_get+0x3d/0xb0 > > > [ 81.176928][ T8612] tipc_udp_xmit.isra.0+0xc4/0xb80 > > > [ 81.182046][ T8612] ? kasan_kmalloc+0x9/0x10 > > > [ 81.186531][ T8612] ? tipc_udp_addr2str+0x170/0x170 > > > [ 81.191641][ T8612] ? __copy_skb_header+0x2e8/0x560 > > > [ 81.196750][ T8612] ? __skb_checksum_complete+0x3f0/0x3f0 > > > [ 81.202364][ T8612] ? netdev_alloc_frag+0x1b0/0x1b0 > > > [ 81.207452][ T8612] ? skb_copy_header+0x21/0x2b0 > > > [ 81.212282][ T8612] ? __pskb_copy_fclone+0x516/0xc90 > > > [ 81.217470][ T8612] tipc_udp_send_msg+0x29a/0x4b0 In tipc_bearer_xmit_skb(), b->media->send_msg()/tipc_udp_send_msg() is called under rcu_read_lock(), which is already ensure it's a non-preemptible context. What I saw here is imbalance rcu_read_(un)lock() call somewhere. > > > [ 81.222400][ T8612] tipc_bearer_xmit_skb+0x16c/0x360 > > > [ 81.227585][ T8612] tipc_enable_bearer+0xabe/0xd20 > > > [ 81.232606][ T8612] ? __nla_validate_parse+0x2d0/0x1ee0 > > > [ 81.238048][ T8612] ? tipc_bearer_xmit_skb+0x360/0x360 > > > [ 81.243401][ T8612] ? nla_memcpy+0xb0/0xb0 > > > [ 81.247710][ T8612] ? nla_memcpy+0xb0/0xb0 > > > [ 81.252020][ T8612] ? __nla_parse+0x43/0x60 > > > [ 81.256417][ T8612] __tipc_nl_bearer_enable+0x2de/0x3a0 > > > [ 81.261856][ T8612] ? __tipc_nl_bearer_enable+0x2de/0x3a0 > > > [ 81.267467][ T8612] ? tipc_nl_bearer_disable+0x40/0x40 > > > [ 81.272848][ T8612] ? unwind_get_return_address+0x58/0xa0 > > > [ 81.278501][ T8612] ? lock_acquire+0x16f/0x3f0 > > > [ 81.283190][ T8612] tipc_nl_bearer_enable+0x23/0x40 > > > [ 81.288300][ T8612] genl_family_rcv_msg+0x74b/0xf90 > > > [ 81.293404][ T8612] ? genl_unregister_family+0x790/0x790 > > > [ 81.298935][ T8612] ? __lock_acquire+0x54f/0x5490 > > > [ 81.303852][ T8612] ? __netlink_lookup+0x3fa/0x7b0 > > > [ 81.308865][ T8612] genl_rcv_msg+0xca/0x16c > > > [ 81.313266][ T8612] netlink_rcv_skb+0x177/0x450 > > > [ 81.318043][ T8612] ? genl_family_rcv_msg+0xf90/0xf90 > > > [ 81.323311][ T8612] ? netlink_ack+0xb50/0xb50 > > > [ 81.327906][ T8612] ? lock_acquire+0x16f/0x3f0 > > > [ 81.332589][ T8612] ? kasan_check_write+0x14/0x20 > > > [ 81.337511][ T8612] genl_rcv+0x29/0x40 > > > [ 81.341485][ T8612] netlink_unicast+0x531/0x710 > > > [ 81.346268][ T8612] ? netlink_attachskb+0x770/0x770 > > > [ 81.351374][ T8612] ? _copy_from_iter_full+0x25d/0x8c0 > > > [ 81.356765][ T8612] ? __sanitizer_cov_trace_cmp8+0x18/0x20 > > > [ 81.362479][ T8612] ? __check_object_size+0x3d/0x42f > > > [ 81.367667][ T8612] netlink_sendmsg+0x8ae/0xd70 > > > [ 81.372415][ T8612] ? netlink_unicast+0x710/0x710 > > > [ 81.377520][ T8612] ? aa_sock_msg_perm.isra.0+0xba/0x170 > > > [ 81.383051][ T8612] ? apparmor_socket_sendmsg+0x2a/0x30 > > > [ 81.388530][ T8612] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20 > > > [ 81.394775][ T8612] ? security_socket_sendmsg+0x8d/0xc0 > > > [ 81.400240][ T8612] ? netlink_unicast+0x710/0x710 > > > [ 81.405161][ T8612] sock_sendmsg+0xd7/0x130 > > > [ 81.409561][ T8612] ___sys_sendmsg+0x803/0x920 > > > [ 81.414220][ T8612] ? copy_msghdr_from_user+0x430/0x430 > > > [ 81.419667][ T8612] ? _raw_spin_unlock_irqrestore+0x6b/0xe0 > > > [ 81.425461][ T8612] ? debug_object_active_state+0x25d/0x380 > > > [ 81.431255][ T8612] ? __lock_acquire+0x54f/0x5490 > > > [ 81.436174][ T8612] ? kasan_check_read+0x11/0x20 > > > [ 81.441208][ T8612] ? _raw_spin_unlock_irqrestore+0xa4/0xe0 > > > [ 81.447008][ T8612] ? mark_held_locks+0xf0/0xf0 > > > [ 81.451768][ T8612] ? __call_rcu.constprop.0+0x28b/0x720 > > > [ 81.457298][ T8612] ? call_rcu+0xb/0x10 > > > [ 81.461353][ T8612] ? __sanitizer_cov_trace_const_cmp4+0x16/0x20 > > > [ 81.467589][ T8612] ? __fget_light+0x1a9/0x230 > > > [ 81.472249][ T8612] ? __fdget+0x1b/0x20 > > > [ 81.476301][ T8612] ? __sanitizer_cov_trace_const_cmp8+0x18/0x20 > > > [ 81.482545][ T8612] __sys_sendmsg+0x105/0x1d0 > > > [ 81.487115][ T8612] ? __ia32_sys_shutdown+0x80/0x80 > > > [ 81.492208][ T8612] ? blkcg_maybe_throttle_current+0x5e2/0xfb0 > > > [ 81.498272][ T8612] ? trace_hardirqs_on_thunk+0x1a/0x1c > > > [ 81.503726][ T8612] ? do_syscall_64+0x26/0x680 > > > [ 81.508385][ T8612] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > [ 81.514444][ T8612] ? do_syscall_64+0x26/0x680 > > > [ 81.519110][ T8612] __x64_sys_sendmsg+0x78/0xb0 > > > [ 81.523862][ T8612] do_syscall_64+0xfd/0x680 > > > [ 81.528352][ T8612] entry_SYSCALL_64_after_hwframe+0x49/0xbe > > > [ 81.534234][ T8612] RIP: 0033:0x444679 > > > [ 81.538114][ T8612] Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b d8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 > > > [ 81.557709][ T8612] RSP: 002b:00007fff0201a8b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e > > > [ 81.566147][ T8612] RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000444679 > > > [ 81.574108][ T8612] RDX: 0000000000000000 RSI: 0000000020000580 RDI: 0000000000000003 > > > [ 81.582152][ T8612] RBP: 00000000006cf018 R08: 0000000000000001 R09: 00000000004002e0 > > > [ 81.590113][ T8612] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000402320 > > > [ 81.598089][ T8612] R13: 00000000004023b0 R14: 0000000000000000 R15: 0000000000 > > > > > > In commit e9c1a793210f ("tipc: add dst_cache support for udp media") > > > dst_cache_get() was introduced to be called in tipc_udp_xmit(). But > > > smp_processor_id() called by dst_cache_get() cannot be invoked in > > > preemptible context, as a result, the complaint above was reported. > > > > > > Fixes: e9c1a793210f ("tipc: add dst_cache support for udp media") > > > Reported-by: syz...@sy... > > > Signed-off-by: Hillf Danton <hd...@si...> > > > Signed-off-by: Ying Xue <yin...@wi...> > > > > Hi, > > > > Was this ever merged? > > The bug is still open, alive and kicking: > > https://syzkaller.appspot.com/bug?extid=1a68504d96cd17b33a05 > > > > and one of the top crashers currently. > > Along with few other top crashers, these bugs prevent most of the > > other kernel testing from happening. > > /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ > > +jmaloy new email address > > > > --- > > > net/tipc/udp_media.c | 12 +++++++++--- > > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > > > diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c > > > index 287df687..ca3ae2e 100644 > > > --- a/net/tipc/udp_media.c > > > +++ b/net/tipc/udp_media.c > > > @@ -224,6 +224,8 @@ static int tipc_udp_send_msg(struct net *net, struct sk_buff *skb, > > > struct udp_bearer *ub; > > > int err = 0; > > > > > > + local_bh_disable(); > > > + > > > if (skb_headroom(skb) < UDP_MIN_HEADROOM) { > > > err = pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC); > > > if (err) > > > @@ -237,9 +239,12 @@ static int tipc_udp_send_msg(struct net *net, struct sk_buff *skb, > > > goto out; > > > } > > > > > > - if (addr->broadcast != TIPC_REPLICAST_SUPPORT) > > > - return tipc_udp_xmit(net, skb, ub, src, dst, > > > - &ub->rcast.dst_cache); > > > + if (addr->broadcast != TIPC_REPLICAST_SUPPORT) { > > > + err = tipc_udp_xmit(net, skb, ub, src, dst, > > > + &ub->rcast.dst_cache); > > > + local_bh_enable(); > > > + return err; > > > + } > > > > > > /* Replicast, send an skb to each configured IP address */ > > > list_for_each_entry_rcu(rcast, &ub->rcast.list, list) { > > > @@ -259,6 +264,7 @@ static int tipc_udp_send_msg(struct net *net, struct sk_buff *skb, > > > err = 0; > > > out: > > > kfree_skb(skb); > > > + local_bh_enable(); > > > return err; > > > } > > > > > > -- > > > 2.7.4 > > > > > > -- > > > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group. > > > To unsubscribe from this group and stop receiving emails from it, send an email to syz...@go.... > > > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/1565595162-1383-4-git-send-email-ying.xue%40windriver.com. |
From: David M. <da...@da...> - 2020-02-10 09:25:35
|
From: Chen Wandun <che...@hu...> Date: Mon, 10 Feb 2020 16:11:09 +0800 > Fix the following sparse warning: > > net/tipc/node.c:281:6: warning: symbol 'tipc_node_free' was not declared. Should it be static? > net/tipc/node.c:2801:5: warning: symbol '__tipc_nl_node_set_key' was not declared. Should it be static? > net/tipc/node.c:2878:5: warning: symbol '__tipc_nl_node_flush_key' was not declared. Should it be static? > > Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication") > Fixes: e1f32190cf7d ("tipc: add support for AEAD key setting via netlink") > > Signed-off-by: Chen Wandun <che...@hu...> Applied. |
From: David M. <da...@da...> - 2020-02-10 09:25:32
|
From: Tuong Lien <tuo...@de...> Date: Mon, 10 Feb 2020 15:35:44 +0700 > In commit 9546a0b7ce00 ("tipc: fix wrong connect() return code"), we > fixed the issue with the 'connect()' that returns zero even though the > connecting has failed by waiting for the connection to be 'ESTABLISHED' > really. However, the approach has one drawback in conjunction with our > 'lightweight' connection setup mechanism that the following scenario > can happen: ... > Upon the receipt of the server 'ACK', the client becomes 'ESTABLISHED' > and the 'wait_for_conn()' process is woken up but not run. Meanwhile, > the server starts to send a number of data following by a 'close()' > shortly without waiting any response from the client, which then forces > the client socket to be 'DISCONNECTING' immediately. When the wait > process is switched to be running, it continues to wait until the timer > expires because of the unexpected socket state. The client 'connect()' > will finally get ‘-ETIMEDOUT’ and force to release the socket whereas > there remains the messages in its receive queue. > > Obviously the issue would not happen if the server had some delay prior > to its 'close()' (or the number of 'DATA' messages is large enough), > but any kind of delay would make the connection setup/shutdown "heavy". > We solve this by simply allowing the 'connect()' returns zero in this > particular case. The socket is already 'DISCONNECTING', so any further > write will get '-EPIPE' but the socket is still able to read the > messages existing in its receive queue. > > Note: This solution doesn't break the previous one as it deals with a > different situation that the socket state is 'DISCONNECTING' but has no > error (i.e. sk->sk_err = 0). > > Fixes: 9546a0b7ce00 ("tipc: fix wrong connect() return code") > Acked-by: Ying Xue <yin...@wi...> > Acked-by: Jon Maloy <jon...@er...> > Signed-off-by: Tuong Lien <tuo...@de...> Applied. |
From: Tuong L. <tuo...@de...> - 2020-02-10 08:36:02
|
In commit 9546a0b7ce00 ("tipc: fix wrong connect() return code"), we fixed the issue with the 'connect()' that returns zero even though the connecting has failed by waiting for the connection to be 'ESTABLISHED' really. However, the approach has one drawback in conjunction with our 'lightweight' connection setup mechanism that the following scenario can happen: (server) (client) +- accept()| | wait_for_conn() | | |connect() -------+ | |<-------[SYN]---------| > sleeping | | *CONNECTING | |--------->*ESTABLISHED | | |--------[ACK]-------->*ESTABLISHED > wakeup() send()|--------[DATA]------->|\ > wakeup() send()|--------[DATA]------->| | > wakeup() . . . . |-> recvq . . . . . | . send()|--------[DATA]------->|/ > wakeup() close()|--------[FIN]-------->*DISCONNECTING | *DISCONNECTING | | | ~~~~~~~~~~~~~~~~~~> schedule() | wait again . . | ETIMEDOUT Upon the receipt of the server 'ACK', the client becomes 'ESTABLISHED' and the 'wait_for_conn()' process is woken up but not run. Meanwhile, the server starts to send a number of data following by a 'close()' shortly without waiting any response from the client, which then forces the client socket to be 'DISCONNECTING' immediately. When the wait process is switched to be running, it continues to wait until the timer expires because of the unexpected socket state. The client 'connect()' will finally get ‘-ETIMEDOUT’ and force to release the socket whereas there remains the messages in its receive queue. Obviously the issue would not happen if the server had some delay prior to its 'close()' (or the number of 'DATA' messages is large enough), but any kind of delay would make the connection setup/shutdown "heavy". We solve this by simply allowing the 'connect()' returns zero in this particular case. The socket is already 'DISCONNECTING', so any further write will get '-EPIPE' but the socket is still able to read the messages existing in its receive queue. Note: This solution doesn't break the previous one as it deals with a different situation that the socket state is 'DISCONNECTING' but has no error (i.e. sk->sk_err = 0). Fixes: 9546a0b7ce00 ("tipc: fix wrong connect() return code") Acked-by: Ying Xue <yin...@wi...> Acked-by: Jon Maloy <jon...@er...> Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/socket.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index f9b4fb92c0b1..693e8902161e 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2441,6 +2441,8 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) return -ETIMEDOUT; if (signal_pending(current)) return sock_intr_errno(*timeo_p); + if (sk->sk_state == TIPC_DISCONNECTING) + break; add_wait_queue(sk_sleep(sk), &wait); done = sk_wait_event(sk, timeo_p, tipc_sk_connected(sk), -- 2.13.7 |