From: Jon M. <jon...@er...> - 2019-08-24 14:44:40
|
Hi Tuong, While experimenting with byte-oriented flow control I realized that this is a very real problem that has to be fixed. I first tried your suggestion with putting the congestion test at the end of tipc_link_xmit(), but realized that we need access to the original message header when we are scheduling a user to the wakeup queue. But this header is already gone if original the message was bundled and deleted! Also, there is no more space in the CB area for storing the per-level counter in the bundle packets, as was my first suggestion. So, this was the simplest solution I could come up with. It seems to work well, but seems to give a slight performance degradation. I am afraid we will have to accept that for now. Please give feedback. ///jon > -----Original Message----- > From: Jon Maloy <jon...@er...> > Sent: 24-Aug-19 10:19 > To: Jon Maloy <jon...@er...>; Jon Maloy > <ma...@do...> > Cc: Mohan Krishna Ghanta Krishnamurthy > <moh...@er...>; > par...@gm...; Tung Quang Nguyen > <tun...@de...>; Hoang Huu Le > <hoa...@de...>; Tuong Tong Lien > <tuo...@de...>; Gordan Mihaljevic > <gor...@de...>; yin...@wi...; tipc- > dis...@li... > Subject: [net-next 1/1] tipc: reduce risk of wakeup queue starvation > > We have identified a problem with the "oversubscription" policy in the link > transmission code. > > When small messages are transmitted, and the sending link has reached the > transmit window limit, those messages will be bundled and put into the link > backlog queue. However, bundles of data messages are counted at the > 'CRITICAL' level, so that the counter for that level, instead of the counter for > the real, bundled message's level is the one being increased. > Subsequent, to-be-bundled data messagea at non-CRITICAL levels continue to > be tested against the unchanged counter for their own level, while > contributing to an unrestrained increase at the CRITICAL backlog level. > > This leaves a gap in congestion control algorithm for small messages, and may > eventually lead to buffer exhaustion and link reset. > > We fix this by adding a test for congestion at the CRITICAL level, as well as the > existing testing for the message's own level, whenever a message is > transmitted. We also refuse to notify any waiting users as long as congestion at > the CRITICAL level exists. > > Reported-by: Tuong Lien <tuo...@de...> > Signed-off-by: Jon Maloy <jon...@er...> > --- > net/tipc/link.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/net/tipc/link.c b/net/tipc/link.c index 6cc75ff..25a6acb 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -77,6 +77,11 @@ struct tipc_stats { > u32 msg_length_profile[7]; /* used for msg. length profiling */ }; > > +struct tipc_backlog { > + u16 len; > + u16 limit; > +}; > + > /** > * struct tipc_link - TIPC link data structure > * @addr: network address of link's peer node @@ -157,10 +162,7 @@ > struct tipc_link { > /* Sending */ > struct sk_buff_head transmq; > struct sk_buff_head backlogq; > - struct { > - u16 len; > - u16 limit; > - } backlog[5]; > + struct tipc_backlog backlog[5]; > u16 snd_nxt; > u16 window; > > @@ -862,6 +864,9 @@ static void link_prepare_wakeup(struct tipc_link *l) > for (; imp <= TIPC_SYSTEM_IMPORTANCE; imp++) > avail[imp] = l->backlog[imp].limit - l->backlog[imp].len; > > + if (avail[TIPC_CRITICAL_IMPORTANCE] <= 0) > + return; > + > skb_queue_walk_safe(wakeupq, skb, tmp) { > imp = TIPC_SKB_CB(skb)->chain_imp; > if (avail[imp] <= 0) > @@ -949,6 +954,7 @@ int tipc_link_xmit(struct tipc_link *l, struct > sk_buff_head *list, > struct sk_buff_head *backlogq = &l->backlogq; > struct sk_buff *skb, *_skb, *bskb; > int pkt_cnt = skb_queue_len(list); > + struct tipc_backlog *bklog = l->backlog; > int rc = 0; > > if (unlikely(msg_size(hdr) > mtu)) { > @@ -960,7 +966,9 @@ int tipc_link_xmit(struct tipc_link *l, struct > sk_buff_head *list, > } > > /* Allow oversubscription of one data msg per source at congestion */ > - if (unlikely(l->backlog[imp].len >= l->backlog[imp].limit)) { > + if (bklog[TIPC_CRITICAL_IMPORTANCE].len >= > + bklog[TIPC_CRITICAL_IMPORTANCE].limit || > + bklog[imp].len >= bklog[imp].limit) { > if (imp == TIPC_SYSTEM_IMPORTANCE) { > pr_warn("%s<%s>, link overflow", link_rst_msg, l->name); > return -ENOBUFS; > -- > 2.1.4 |