From: Tuong L. T. <tuo...@de...> - 2019-08-26 11:44:17
|
Hi Jon, Yes, you are right, my previous patch was not complete (sorry, I have not verified it but just wanted to give a general idea...). Actually, we could still preserve the necessary data/header of the original message for building the wakeup message later as needed (look, just the message 'dport' is enough). However, I don't really like this approach because there is still an issue there (see below). Your patch can fix the bug I mentioned earlier (i.e. unlimited bundles of small messages...), but looks like it has a side-effect. We may again encounter the starvation issue that we have tried to overcome by the previous patches, that is, a socket user with a certain importance level messages can make the others starved, in this case it's the 'CRITICAL' level? With the additional condition at the link_xmit(), we will limit the other level users (i.e. report link congestion & cause them to wait...) just due to the congestion at the 'CRITICAL' level (i.e. not their own levels) of the backlog queue. Even, a "true" CRITICAL user that wants to send a message will face the issue because the bundles of small messages at lower levels occupy all the 'CRITICAL' slots... Really, I don't understand the purpose we set the importance level of a bundle to 'CRITICAL', which even gives more slots for the "less important" users with small messages... Isn't it by dividing & increasing the backlog level limits, we want to give more chances for higher level users in message sending? I think we should improve the bundle algorithm a little bit to reflect the backlog level usages accurately instead. I will send you another patch... BR/Tuong -----Original Message----- From: Jon Maloy <jon...@er...> Sent: Saturday, August 24, 2019 9:29 PM To: Jon Maloy <ma...@do...> Cc: Mohan Krishna Ghanta Krishnamurthy <moh...@er...>; par...@gm...; Tung Quang Nguyen <tun...@de...>; Hoang Huu Le <hoa...@de...>; Tuong Tong Lien <tuo...@de...>; Gordan Mihaljevic <gor...@de...>; yin...@wi...; tip...@li... Subject: RE: [net-next 1/1] tipc: reduce risk of wakeup queue starvation Hi Tuong, While experimenting with byte-oriented flow control I realized that this is a very real problem that has to be fixed. I first tried your suggestion with putting the congestion test at the end of tipc_link_xmit(), but realized that we need access to the original message header when we are scheduling a user to the wakeup queue. But this header is already gone if original the message was bundled and deleted! Also, there is no more space in the CB area for storing the per-level counter in the bundle packets, as was my first suggestion. So, this was the simplest solution I could come up with. It seems to work well, but seems to give a slight performance degradation. I am afraid we will have to accept that for now. Please give feedback. ///jon > -----Original Message----- > From: Jon Maloy <jon...@er...> > Sent: 24-Aug-19 10:19 > To: Jon Maloy <jon...@er...>; Jon Maloy > <ma...@do...> > Cc: Mohan Krishna Ghanta Krishnamurthy > <moh...@er...>; > par...@gm...; Tung Quang Nguyen > <tun...@de...>; Hoang Huu Le > <hoa...@de...>; Tuong Tong Lien > <tuo...@de...>; Gordan Mihaljevic > <gor...@de...>; yin...@wi...; tipc- > dis...@li... > Subject: [net-next 1/1] tipc: reduce risk of wakeup queue starvation > > We have identified a problem with the "oversubscription" policy in the link > transmission code. > > When small messages are transmitted, and the sending link has reached the > transmit window limit, those messages will be bundled and put into the link > backlog queue. However, bundles of data messages are counted at the > 'CRITICAL' level, so that the counter for that level, instead of the counter for > the real, bundled message's level is the one being increased. > Subsequent, to-be-bundled data messagea at non-CRITICAL levels continue to > be tested against the unchanged counter for their own level, while > contributing to an unrestrained increase at the CRITICAL backlog level. > > This leaves a gap in congestion control algorithm for small messages, and may > eventually lead to buffer exhaustion and link reset. > > We fix this by adding a test for congestion at the CRITICAL level, as well as the > existing testing for the message's own level, whenever a message is > transmitted. We also refuse to notify any waiting users as long as congestion at > the CRITICAL level exists. > > Reported-by: Tuong Lien <tuo...@de...> > Signed-off-by: Jon Maloy <jon...@er...> > --- > net/tipc/link.c | 18 +++++++++++++----- > 1 file changed, 13 insertions(+), 5 deletions(-) > > diff --git a/net/tipc/link.c b/net/tipc/link.c index 6cc75ff..25a6acb 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -77,6 +77,11 @@ struct tipc_stats { > u32 msg_length_profile[7]; /* used for msg. length profiling */ }; > > +struct tipc_backlog { > + u16 len; > + u16 limit; > +}; > + > /** > * struct tipc_link - TIPC link data structure > * @addr: network address of link's peer node @@ -157,10 +162,7 @@ > struct tipc_link { > /* Sending */ > struct sk_buff_head transmq; > struct sk_buff_head backlogq; > - struct { > - u16 len; > - u16 limit; > - } backlog[5]; > + struct tipc_backlog backlog[5]; > u16 snd_nxt; > u16 window; > > @@ -862,6 +864,9 @@ static void link_prepare_wakeup(struct tipc_link *l) > for (; imp <= TIPC_SYSTEM_IMPORTANCE; imp++) > avail[imp] = l->backlog[imp].limit - l->backlog[imp].len; > > + if (avail[TIPC_CRITICAL_IMPORTANCE] <= 0) > + return; > + > skb_queue_walk_safe(wakeupq, skb, tmp) { > imp = TIPC_SKB_CB(skb)->chain_imp; > if (avail[imp] <= 0) > @@ -949,6 +954,7 @@ int tipc_link_xmit(struct tipc_link *l, struct > sk_buff_head *list, > struct sk_buff_head *backlogq = &l->backlogq; > struct sk_buff *skb, *_skb, *bskb; > int pkt_cnt = skb_queue_len(list); > + struct tipc_backlog *bklog = l->backlog; > int rc = 0; > > if (unlikely(msg_size(hdr) > mtu)) { > @@ -960,7 +966,9 @@ int tipc_link_xmit(struct tipc_link *l, struct > sk_buff_head *list, > } > > /* Allow oversubscription of one data msg per source at congestion */ > - if (unlikely(l->backlog[imp].len >= l->backlog[imp].limit)) { > + if (bklog[TIPC_CRITICAL_IMPORTANCE].len >= > + bklog[TIPC_CRITICAL_IMPORTANCE].limit || > + bklog[imp].len >= bklog[imp].limit) { > if (imp == TIPC_SYSTEM_IMPORTANCE) { > pr_warn("%s<%s>, link overflow", link_rst_msg, l->name); > return -ENOBUFS; > -- > 2.1.4 |