From: Jon M. <jon...@er...> - 2019-08-26 14:49:44
|
> -----Original Message----- > From: Tuong Lien Tong <tuo...@de...> > Sent: 26-Aug-19 07:44 > To: Jon Maloy <jon...@er...>; 'Jon Maloy' > <ma...@do...> > Cc: Mohan Krishna Ghanta Krishnamurthy > <moh...@er...>; > par...@gm...; Tung Quang Nguyen > <tun...@de...>; Hoang Huu Le > <hoa...@de...>; Gordan Mihaljevic > <gor...@de...>; yin...@wi...; tipc- > dis...@li... > Subject: RE: [net-next 1/1] tipc: reduce risk of wakeup queue starvation > > Hi Jon, > > Yes, you are right, my previous patch was not complete (sorry, I have not > verified it but just wanted to give a general idea...). Actually, we could still > preserve the necessary data/header of the original message for building the > wakeup message later as needed (look, just the message 'dport' > is enough). dport *and* the original importance. I was considering this myself, but I didn't like it. However, I don't really like this approach because there is still an > issue there (see below). > > Your patch can fix the bug I mentioned earlier (i.e. unlimited bundles of small > messages...), but looks like it has a side-effect. We may again encounter the > starvation issue that we have tried to overcome by the previous patches, that > is, a socket user with a certain importance level messages can make the others > starved, in this case it's the 'CRITICAL' > level? That is a side effect I was aware of, but I don't see it as a showstopper. Normally, only a small percentage of the sockets are using the CRITICAL level, and the backlog queue limit is very high for this level. So, if we ever run into this limit we can be pretty sure it is due to bundling, in which case I think it is only fair to throttle the lower levels too. A much worse side effect is that my patch would block even SYSTEM level messages, which of course is unacceptable. So my patch has to be improved in that respect. > With the additional condition at the link_xmit(), we will limit the other > level users (i.e. report link congestion & cause them to wait...) just due to the > congestion at the 'CRITICAL' level (i.e. not their own levels) of the backlog > queue. Even, a "true" CRITICAL user that wants to send a message will face the > issue because the bundles of small messages at lower levels occupy all the > 'CRITICAL' slots... > > Really, I don't understand the purpose we set the importance level of a bundle > to 'CRITICAL', which even gives more slots for the "less important" > users with small messages... We have to give it *some* importance, and since it very often contains CRITICAL messages it looks to me that is the correct level. This also means that we can bundle a lot more messages, even of lower importance, before having to put users to sleep. An alternative might be to give the bundle message the importance of the highest-importance message inside the bundle. That would be better than putting all at CRITICAL level, but doesn't really solve our problem. > Isn't it by dividing & increasing the backlog level > limits, we want to give more chances for higher level users in message > sending? I think we should improve the bundle algorithm a little bit to reflect > the backlog level usages accurately instead. I will send you another patch... I have received your patch, and will study it. ///jon > > BR/Tuong > > -----Original Message----- > From: Jon Maloy <jon...@er...> > Sent: Saturday, August 24, 2019 9:29 PM > To: Jon Maloy <ma...@do...> > Cc: Mohan Krishna Ghanta Krishnamurthy > <moh...@er...>; > par...@gm...; Tung Quang Nguyen > <tun...@de...>; Hoang Huu Le > <hoa...@de...>; Tuong Tong Lien > <tuo...@de...>; Gordan Mihaljevic > <gor...@de...>; yin...@wi...; tipc- > dis...@li... > Subject: RE: [net-next 1/1] tipc: reduce risk of wakeup queue starvation > > Hi Tuong, > While experimenting with byte-oriented flow control I realized that this is a > very real problem that has to be fixed. > I first tried your suggestion with putting the congestion test at the end of > tipc_link_xmit(), but realized that we need access to the original message > header when we are scheduling a user to the wakeup queue. But this header is > already gone if original the message was bundled and deleted! > Also, there is no more space in the CB area for storing the per-level counter in > the bundle packets, as was my first suggestion. > > So, this was the simplest solution I could come up with. It seems to work well, > but seems to give a slight performance degradation. I am afraid we will have > to accept that for now. > > Please give feedback. > > ///jon > > > > > -----Original Message----- > > From: Jon Maloy <jon...@er...> > > Sent: 24-Aug-19 10:19 > > To: Jon Maloy <jon...@er...>; Jon Maloy > <ma...@do...> > > Cc: Mohan Krishna Ghanta Krishnamurthy > > <moh...@er...>; > > par...@gm...; Tung Quang Nguyen > > <tun...@de...>; Hoang Huu Le > > <hoa...@de...>; Tuong Tong Lien > > <tuo...@de...>; Gordan Mihaljevic > > <gor...@de...>; yin...@wi...; tipc- > > dis...@li... > > Subject: [net-next 1/1] tipc: reduce risk of wakeup queue starvation > > > > We have identified a problem with the "oversubscription" policy in the > link > > transmission code. > > > > When small messages are transmitted, and the sending link has reached > > the transmit window limit, those messages will be bundled and put into > > the > link > > backlog queue. However, bundles of data messages are counted at the > > 'CRITICAL' level, so that the counter for that level, instead of the > counter for > > the real, bundled message's level is the one being increased. > > Subsequent, to-be-bundled data messagea at non-CRITICAL levels > > continue to be tested against the unchanged counter for their own > > level, while contributing to an unrestrained increase at the CRITICAL backlog > level. > > > > This leaves a gap in congestion control algorithm for small messages, > > and > may > > eventually lead to buffer exhaustion and link reset. > > > > We fix this by adding a test for congestion at the CRITICAL level, as > > well > as the > > existing testing for the message's own level, whenever a message is > > transmitted. We also refuse to notify any waiting users as long as > congestion at > > the CRITICAL level exists. > > > > Reported-by: Tuong Lien <tuo...@de...> > > Signed-off-by: Jon Maloy <jon...@er...> > > --- > > net/tipc/link.c | 18 +++++++++++++----- > > 1 file changed, 13 insertions(+), 5 deletions(-) > > > > diff --git a/net/tipc/link.c b/net/tipc/link.c index 6cc75ff..25a6acb > 100644 > > --- a/net/tipc/link.c > > +++ b/net/tipc/link.c > > @@ -77,6 +77,11 @@ struct tipc_stats { > > u32 msg_length_profile[7]; /* used for msg. length profiling */ }; > > > > +struct tipc_backlog { > > + u16 len; > > + u16 limit; > > +}; > > + > > /** > > * struct tipc_link - TIPC link data structure > > * @addr: network address of link's peer node @@ -157,10 +162,7 @@ > > struct tipc_link { > > /* Sending */ > > struct sk_buff_head transmq; > > struct sk_buff_head backlogq; > > - struct { > > - u16 len; > > - u16 limit; > > - } backlog[5]; > > + struct tipc_backlog backlog[5]; > > u16 snd_nxt; > > u16 window; > > > > @@ -862,6 +864,9 @@ static void link_prepare_wakeup(struct tipc_link *l) > > for (; imp <= TIPC_SYSTEM_IMPORTANCE; imp++) > > avail[imp] = l->backlog[imp].limit - l->backlog[imp].len; > > > > + if (avail[TIPC_CRITICAL_IMPORTANCE] <= 0) > > + return; > > + > > skb_queue_walk_safe(wakeupq, skb, tmp) { > > imp = TIPC_SKB_CB(skb)->chain_imp; > > if (avail[imp] <= 0) > > @@ -949,6 +954,7 @@ int tipc_link_xmit(struct tipc_link *l, struct > > sk_buff_head *list, > > struct sk_buff_head *backlogq = &l->backlogq; > > struct sk_buff *skb, *_skb, *bskb; > > int pkt_cnt = skb_queue_len(list); > > + struct tipc_backlog *bklog = l->backlog; > > int rc = 0; > > > > if (unlikely(msg_size(hdr) > mtu)) { @@ -960,7 +966,9 @@ int > > tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list, > > } > > > > /* Allow oversubscription of one data msg per source at congestion > */ > > - if (unlikely(l->backlog[imp].len >= l->backlog[imp].limit)) { > > + if (bklog[TIPC_CRITICAL_IMPORTANCE].len >= > > + bklog[TIPC_CRITICAL_IMPORTANCE].limit || > > + bklog[imp].len >= bklog[imp].limit) { > > if (imp == TIPC_SYSTEM_IMPORTANCE) { > > pr_warn("%s<%s>, link overflow", link_rst_msg, > l->name); > > return -ENOBUFS; > > -- > > 2.1.4 > |