You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(9) |
Feb
(11) |
Mar
(22) |
Apr
(73) |
May
(78) |
Jun
(146) |
Jul
(80) |
Aug
(27) |
Sep
(5) |
Oct
(14) |
Nov
(18) |
Dec
(27) |
2005 |
Jan
(20) |
Feb
(30) |
Mar
(19) |
Apr
(28) |
May
(50) |
Jun
(31) |
Jul
(32) |
Aug
(14) |
Sep
(36) |
Oct
(43) |
Nov
(74) |
Dec
(63) |
2006 |
Jan
(34) |
Feb
(32) |
Mar
(21) |
Apr
(76) |
May
(106) |
Jun
(72) |
Jul
(70) |
Aug
(175) |
Sep
(130) |
Oct
(39) |
Nov
(81) |
Dec
(43) |
2007 |
Jan
(81) |
Feb
(36) |
Mar
(20) |
Apr
(43) |
May
(54) |
Jun
(34) |
Jul
(44) |
Aug
(55) |
Sep
(44) |
Oct
(54) |
Nov
(43) |
Dec
(41) |
2008 |
Jan
(42) |
Feb
(84) |
Mar
(73) |
Apr
(30) |
May
(119) |
Jun
(54) |
Jul
(54) |
Aug
(93) |
Sep
(173) |
Oct
(130) |
Nov
(145) |
Dec
(153) |
2009 |
Jan
(59) |
Feb
(12) |
Mar
(28) |
Apr
(18) |
May
(56) |
Jun
(9) |
Jul
(28) |
Aug
(62) |
Sep
(16) |
Oct
(19) |
Nov
(15) |
Dec
(17) |
2010 |
Jan
(14) |
Feb
(36) |
Mar
(37) |
Apr
(30) |
May
(33) |
Jun
(53) |
Jul
(42) |
Aug
(50) |
Sep
(67) |
Oct
(66) |
Nov
(69) |
Dec
(36) |
2011 |
Jan
(52) |
Feb
(45) |
Mar
(49) |
Apr
(21) |
May
(34) |
Jun
(13) |
Jul
(19) |
Aug
(37) |
Sep
(43) |
Oct
(10) |
Nov
(23) |
Dec
(30) |
2012 |
Jan
(42) |
Feb
(36) |
Mar
(46) |
Apr
(25) |
May
(96) |
Jun
(146) |
Jul
(40) |
Aug
(28) |
Sep
(61) |
Oct
(45) |
Nov
(100) |
Dec
(53) |
2013 |
Jan
(79) |
Feb
(24) |
Mar
(134) |
Apr
(156) |
May
(118) |
Jun
(75) |
Jul
(278) |
Aug
(145) |
Sep
(136) |
Oct
(168) |
Nov
(137) |
Dec
(439) |
2014 |
Jan
(284) |
Feb
(158) |
Mar
(231) |
Apr
(275) |
May
(259) |
Jun
(91) |
Jul
(222) |
Aug
(215) |
Sep
(165) |
Oct
(166) |
Nov
(211) |
Dec
(150) |
2015 |
Jan
(164) |
Feb
(324) |
Mar
(299) |
Apr
(214) |
May
(111) |
Jun
(109) |
Jul
(105) |
Aug
(36) |
Sep
(58) |
Oct
(131) |
Nov
(68) |
Dec
(30) |
2016 |
Jan
(46) |
Feb
(87) |
Mar
(135) |
Apr
(174) |
May
(132) |
Jun
(135) |
Jul
(149) |
Aug
(125) |
Sep
(79) |
Oct
(49) |
Nov
(95) |
Dec
(102) |
2017 |
Jan
(104) |
Feb
(75) |
Mar
(72) |
Apr
(53) |
May
(18) |
Jun
(5) |
Jul
(14) |
Aug
(19) |
Sep
(2) |
Oct
(13) |
Nov
(21) |
Dec
(67) |
2018 |
Jan
(56) |
Feb
(50) |
Mar
(148) |
Apr
(41) |
May
(37) |
Jun
(34) |
Jul
(34) |
Aug
(11) |
Sep
(52) |
Oct
(48) |
Nov
(28) |
Dec
(46) |
2019 |
Jan
(29) |
Feb
(63) |
Mar
(95) |
Apr
(54) |
May
(14) |
Jun
(71) |
Jul
(60) |
Aug
(49) |
Sep
(3) |
Oct
(64) |
Nov
(115) |
Dec
(57) |
2020 |
Jan
(15) |
Feb
(9) |
Mar
(38) |
Apr
(27) |
May
(60) |
Jun
(53) |
Jul
(35) |
Aug
(46) |
Sep
(37) |
Oct
(64) |
Nov
(20) |
Dec
(25) |
2021 |
Jan
(20) |
Feb
(31) |
Mar
(27) |
Apr
(23) |
May
(21) |
Jun
(30) |
Jul
(30) |
Aug
(7) |
Sep
(18) |
Oct
|
Nov
(15) |
Dec
(4) |
2022 |
Jan
(3) |
Feb
(1) |
Mar
(10) |
Apr
|
May
(2) |
Jun
(26) |
Jul
(5) |
Aug
|
Sep
(1) |
Oct
(2) |
Nov
(9) |
Dec
(2) |
2023 |
Jan
(4) |
Feb
(4) |
Mar
(5) |
Apr
(10) |
May
(29) |
Jun
(17) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
(2) |
Dec
|
2024 |
Jan
|
Feb
(6) |
Mar
|
Apr
(1) |
May
(6) |
Jun
|
Jul
(5) |
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jon M. <jon...@er...> - 2019-12-02 03:42:47
|
Just so there is no misunderstanding: this is a completely new version of the variable congestion window algorithm, comprising both slow start, congestion avoidance and fast recovery. I maybe should have added a v5 is similar to it... Performance is consistently better than in the previous versions. ///jon > -----Original Message----- > From: Jon Maloy <jon...@er...> > Sent: 1-Dec-19 19:32 > To: Jon Maloy <jon...@er...>; Jon Maloy <ma...@do...> > Cc: Mohan Krishna Ghanta Krishnamurthy <moh...@er...>; > par...@gm...; Tung Quang Nguyen <tun...@de...>; Hoang > Huu Le <hoa...@de...>; Tuong Tong Lien <tuo...@de...>; Gordan > Mihaljevic <gor...@de...>; yin...@wi...; tipc- > dis...@li... > Subject: [net-next 0/3] tipc: introdice variable window congestion control > > We improve thoughput greatly by introducing a variety of the Reno > congestion control algorithm at the link level. > > This algorithm is similar to one Xin developed last summer, but is > re-adapted to the latest changes at the link level and leverages the > special features of the TIPC link protocol, such as the explicit > nacks sent when a packet loss is detected. > > Jon Maloy (3): > tipc: eliminate gap indicator from ACK messages > tipc: eliminate more unnecessary nacks and retransmissions > tipc: introduce variable window congestion control > > net/tipc/bcast.c | 11 +-- > net/tipc/bearer.c | 11 +-- > net/tipc/bearer.h | 6 +- > net/tipc/eth_media.c | 3 +- > net/tipc/ib_media.c | 5 +- > net/tipc/link.c | 191 ++++++++++++++++++++++++++++++++++++--------------- > net/tipc/link.h | 9 +-- > net/tipc/node.c | 16 +++-- > net/tipc/udp_media.c | 3 +- > 9 files changed, 172 insertions(+), 83 deletions(-) > > -- > 2.1.4 |
From: Tuong L. <tuo...@de...> - 2019-11-30 10:19:12
|
When a user message is sent, TIPC will check if the socket has faced a congestion at link layer. If that happens, it will make a sleep to wait for the congestion to disappear. This leaves a gap for other users to take over the socket (e.g. multi threads) since the socket is released as well. Also, in case of connectionless (e.g. SOCK_RDM), user is free to send messages to various destinations (e.g. via 'sendto()'), then the socket's preformatted header has to be updated correspondingly prior to the actual payload message building. Unfortunately, the latter action is done before the first action which causes a condition issue that the destination of a certain message can be modified incorrectly in the middle, leading to wrong destination when that message is built. Consequently, when the message is sent to the link layer, it gets stuck there forever because the peer node will simply reject it. After a number of retransmission attempts, the link is eventually taken down and the retransmission failure is reported. This commit fixes the problem by rearranging the order of actions to prevent the race condition from occurring, so the message building is 'atomic' and its header will not be modified by anyone. Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion") Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/socket.c | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index a1c8d722ca20..9b0280a562a4 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1361,8 +1361,8 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dlen) struct tipc_msg *hdr = &tsk->phdr; struct tipc_name_seq *seq; struct sk_buff_head pkts; - u32 dport, dnode = 0; - u32 type, inst; + u32 dport = 0, dnode = 0; + u32 type = 0, inst = 0; int mtu, rc; if (unlikely(dlen > TIPC_MAX_USER_MSG_SIZE)) @@ -1415,23 +1415,11 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dlen) type = dest->addr.name.name.type; inst = dest->addr.name.name.instance; dnode = dest->addr.name.domain; - msg_set_type(hdr, TIPC_NAMED_MSG); - msg_set_hdr_sz(hdr, NAMED_H_SIZE); - msg_set_nametype(hdr, type); - msg_set_nameinst(hdr, inst); - msg_set_lookup_scope(hdr, tipc_node2scope(dnode)); dport = tipc_nametbl_translate(net, type, inst, &dnode); - msg_set_destnode(hdr, dnode); - msg_set_destport(hdr, dport); if (unlikely(!dport && !dnode)) return -EHOSTUNREACH; } else if (dest->addrtype == TIPC_ADDR_ID) { dnode = dest->addr.id.node; - msg_set_type(hdr, TIPC_DIRECT_MSG); - msg_set_lookup_scope(hdr, 0); - msg_set_destnode(hdr, dnode); - msg_set_destport(hdr, dest->addr.id.ref); - msg_set_hdr_sz(hdr, BASIC_H_SIZE); } else { return -EINVAL; } @@ -1442,6 +1430,22 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dlen) if (unlikely(rc)) return rc; + if (dest->addrtype == TIPC_ADDR_NAME) { + msg_set_type(hdr, TIPC_NAMED_MSG); + msg_set_hdr_sz(hdr, NAMED_H_SIZE); + msg_set_nametype(hdr, type); + msg_set_nameinst(hdr, inst); + msg_set_lookup_scope(hdr, tipc_node2scope(dnode)); + msg_set_destnode(hdr, dnode); + msg_set_destport(hdr, dport); + } else { /* TIPC_ADDR_ID */ + msg_set_type(hdr, TIPC_DIRECT_MSG); + msg_set_lookup_scope(hdr, 0); + msg_set_destnode(hdr, dnode); + msg_set_destport(hdr, dest->addr.id.ref); + msg_set_hdr_sz(hdr, BASIC_H_SIZE); + } + __skb_queue_head_init(&pkts); mtu = tipc_node_get_mtu(net, dnode, tsk->portid, false); rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts); -- 2.13.7 |
From: David M. <da...@da...> - 2019-11-29 07:09:47
|
From: Tung Nguyen <tun...@de...> Date: Thu, 28 Nov 2019 10:10:04 +0700 > This series fixes some bugs at socket layer. Series applied, thanks. |
From: Tung N. <tun...@de...> - 2019-11-28 03:10:25
|
Scenario: 1. A client socket initiates a SYN message to a listening socket. 2. The send link is congested, the SYN message is put in the send link and a wakeup message is put in wakeup queue. 3. The congestion situation is abated, the wakeup message is pulled out of the wakeup queue. Function tipc_sk_push_backlog() is called to send out delayed messages by Nagle. However, the client socket is still in CONNECTING state. So, it sends the SYN message in the socket write queue to the listening socket again. 4. The listening socket receives the first SYN message and creates first server socket. The client socket receives ACK- and establishes a connection to the first server socket. The client socket closes its connection with the first server socket. 5. The listening socket receives the second SYN message and creates second server socket. The second server socket sends ACK- to the client socket, but it has been closed. It results in connection reset error when reading from the server socket in user space. Solution: return from function tipc_sk_push_backlog() immediately if there is pending SYN message in the socket write queue. Fixes: c0bceb97db9e ("tipc: add smart nagle feature") Signed-off-by: Tung Nguyen <tun...@de...> Acked-by: Jon Maloy <jon...@er...> --- net/tipc/socket.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index da5fb84852a6..41688da233ab 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -540,12 +540,10 @@ static void __tipc_shutdown(struct socket *sock, int error) tipc_wait_for_cond(sock, &timeout, (!tsk->cong_link_cnt && !tsk_conn_cong(tsk))); - /* Push out unsent messages or remove if pending SYN */ - skb = skb_peek(&sk->sk_write_queue); - if (skb && !msg_is_syn(buf_msg(skb))) - tipc_sk_push_backlog(tsk); - else - __skb_queue_purge(&sk->sk_write_queue); + /* Push out delayed messages if in Nagle mode */ + tipc_sk_push_backlog(tsk); + /* Remove pending SYN */ + __skb_queue_purge(&sk->sk_write_queue); /* Reject all unreceived messages, except on an active connection * (which disconnects locally & sends a 'FIN+' to peer). @@ -1248,9 +1246,14 @@ static void tipc_sk_push_backlog(struct tipc_sock *tsk) struct sk_buff_head *txq = &tsk->sk.sk_write_queue; struct net *net = sock_net(&tsk->sk); u32 dnode = tsk_peer_node(tsk); + struct sk_buff *skb = skb_peek(txq); int rc; - if (skb_queue_empty(txq) || tsk->cong_link_cnt) + if (!skb || tsk->cong_link_cnt) + return; + + /* Do not send SYN again after congestion */ + if (msg_is_syn(buf_msg(skb))) return; tsk->snt_unacked += tsk->snd_backlog; -- 2.17.1 |
From: Tung N. <tun...@de...> - 2019-11-28 03:10:25
|
When initiating a connection message to a server side, the connection message is cloned and added to the socket write queue. However, if the cloning is failed, only the socket write queue is purged. It causes memory leak because the original connection message is not freed. This commit fixes it by purging the list of connection message when it cannot be cloned. Fixes: 6787927475e5 ("tipc: buffer overflow handling in listener socket") Reported-by: Hoang Le <hoa...@de...> Signed-off-by: Tung Nguyen <tun...@de...> Acked-by: Ying Xue <yin...@wi...> Acked-by: Jon Maloy <jon...@er...> --- net/tipc/socket.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index a1c8d722ca20..7baed2c2c93d 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1447,8 +1447,10 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dlen) rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts); if (unlikely(rc != dlen)) return rc; - if (unlikely(syn && !tipc_msg_skb_clone(&pkts, &sk->sk_write_queue))) + if (unlikely(syn && !tipc_msg_skb_clone(&pkts, &sk->sk_write_queue))) { + __skb_queue_purge(&pkts); return -ENOMEM; + } trace_tipc_sk_sendmsg(sk, skb_peek(&pkts), TIPC_DUMP_SK_SNDQ, " "); rc = tipc_node_xmit(net, &pkts, dnode, tsk->portid); -- 2.17.1 |
From: Tung N. <tun...@de...> - 2019-11-28 03:10:25
|
When tipc_sk_timeout() is executed but user space is grabbing ownership, this function rearms itself and returns. However, the socket reference counter is not reduced. This causes potential unexpected behavior. This commit fixes it by calling sock_put() before tipc_sk_timeout() returns in the above-mentioned case. Fixes: afe8792fec69 ("tipc: refactor function tipc_sk_timeout()") Signed-off-by: Tung Nguyen <tun...@de...> Acked-by: Ying Xue <yin...@wi...> Acked-by: Jon Maloy <jon...@er...> --- net/tipc/socket.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 7baed2c2c93d..fb5595081a05 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2759,6 +2759,7 @@ static void tipc_sk_timeout(struct timer_list *t) if (sock_owned_by_user(sk)) { sk_reset_timer(sk, &sk->sk_timer, jiffies + HZ / 20); bh_unlock_sock(sk); + sock_put(sk); return; } -- 2.17.1 |
From: Tung N. <tun...@de...> - 2019-11-28 03:10:24
|
This series fixes some bugs at socket layer. Tung Nguyen (4): tipc: fix potential memory leak in __tipc_sendmsg() tipc: fix wrong socket reference counter after tipc_sk_timeout() returns tipc: fix wrong timeout input for tipc_wait_for_cond() tipc: fix duplicate SYN messages under link congestion net/tipc/socket.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) -- 2.17.1 |
From: Tung N. <tun...@de...> - 2019-11-28 03:10:24
|
In function __tipc_shutdown(), the timeout value passed to tipc_wait_for_cond() is not jiffies. This commit fixes it by converting that value from milliseconds to jiffies. Fixes: 365ad353c256 ("tipc: reduce risk of user starvation during link congestion") Signed-off-by: Tung Nguyen <tun...@de...> Acked-by: Jon Maloy <jon...@er...> --- net/tipc/socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index fb5595081a05..da5fb84852a6 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -532,7 +532,7 @@ static void __tipc_shutdown(struct socket *sock, int error) struct sock *sk = sock->sk; struct tipc_sock *tsk = tipc_sk(sk); struct net *net = sock_net(sk); - long timeout = CONN_TIMEOUT_DEFAULT; + long timeout = msecs_to_jiffies(CONN_TIMEOUT_DEFAULT); u32 dnode = tsk_peer_node(tsk); struct sk_buff *skb; -- 2.17.1 |
From: David M. <da...@da...> - 2019-11-26 18:04:09
|
From: joh...@de... Date: Tue, 26 Nov 2019 13:52:55 +1100 > From: John Rutherford <joh...@de...> > > In commit 4f07b80c9733 ("tipc: check msg->req data len in > tipc_nl_compat_bearer_disable") the same patch code was copied into > routines: tipc_nl_compat_bearer_disable(), > tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats(). > The two link routine occurrences should have been modified to check > the maximum link name length and not bearer name length. > > Fixes: 4f07b80c9733 ("tipc: check msg->reg data len in tipc_nl_compat_bearer_disable") > Signed-off-by: John Rutherford <joh...@de...> > Acked-by: Jon Maloy <jon...@er...> Applied. |
From: Tuong L. <tuo...@de...> - 2019-11-26 04:07:14
|
In commit c55c8edafa91 ("tipc: smooth change between replicast and broadcast"), we allow instant switching between replicast and broadcast by sending a dummy 'SYN' packet on the last used link to synchronize packets on the links. The 'SYN' message is an object of link congestion also, so if that happens, a 'SOCK_WAKEUP' will be scheduled to be sent back to the socket... However, in that commit, we simply use the same socket 'cong_link_cnt' counter for both the 'SYN' & normal payload message sending. Therefore, if both the replicast & broadcast links are congested, the counter will be not updated correctly but overwritten by the latter congestion. Later on, when the 'SOCK_WAKEUP' messages are processed, the counter is reduced one by one and eventually overflowed. Consequently, further activities on the socket will only wait for the false congestion signal to disappear but never been met. Because sending the 'SYN' message is vital for the mechanism, it should be done anyway. This commit fixes the issue by marking the message with an error code e.g. 'TIPC_ERR_NO_PORT', so its sending should not face a link congestion, there is no need to touch the socket 'cong_link_cnt' either. In addition, in the event of any error (e.g. -ENOBUFS), we will purge the entire payload message queue and make a return immediately. Fixes: c55c8edafa91 ("tipc: smooth change between replicast and broadcast") Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/bcast.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c index 55aeba681cf4..656ebc79c64e 100644 --- a/net/tipc/bcast.c +++ b/net/tipc/bcast.c @@ -305,17 +305,17 @@ static int tipc_rcast_xmit(struct net *net, struct sk_buff_head *pkts, * @skb: socket buffer to copy * @method: send method to be used * @dests: destination nodes for message. - * @cong_link_cnt: returns number of encountered congested destination links * Returns 0 if success, otherwise errno */ static int tipc_mcast_send_sync(struct net *net, struct sk_buff *skb, struct tipc_mc_method *method, - struct tipc_nlist *dests, - u16 *cong_link_cnt) + struct tipc_nlist *dests) { struct tipc_msg *hdr, *_hdr; struct sk_buff_head tmpq; struct sk_buff *_skb; + u16 cong_link_cnt; + int rc = 0; /* Is a cluster supporting with new capabilities ? */ if (!(tipc_net(net)->capabilities & TIPC_MCAST_RBCTL)) @@ -343,18 +343,19 @@ static int tipc_mcast_send_sync(struct net *net, struct sk_buff *skb, _hdr = buf_msg(_skb); msg_set_size(_hdr, MCAST_H_SIZE); msg_set_is_rcast(_hdr, !msg_is_rcast(hdr)); + msg_set_errcode(_hdr, TIPC_ERR_NO_PORT); __skb_queue_head_init(&tmpq); __skb_queue_tail(&tmpq, _skb); if (method->rcast) - tipc_bcast_xmit(net, &tmpq, cong_link_cnt); + rc = tipc_bcast_xmit(net, &tmpq, &cong_link_cnt); else - tipc_rcast_xmit(net, &tmpq, dests, cong_link_cnt); + rc = tipc_rcast_xmit(net, &tmpq, dests, &cong_link_cnt); /* This queue should normally be empty by now */ __skb_queue_purge(&tmpq); - return 0; + return rc; } /* tipc_mcast_xmit - deliver message to indicated destination nodes @@ -396,9 +397,14 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head *pkts, msg_set_is_rcast(hdr, method->rcast); /* Switch method ? */ - if (rcast != method->rcast) - tipc_mcast_send_sync(net, skb, method, - dests, cong_link_cnt); + if (rcast != method->rcast) { + rc = tipc_mcast_send_sync(net, skb, method, dests); + if (unlikely(rc)) { + pr_err("Unable to send SYN: method %d, rc %d\n", + rcast, rc); + goto exit; + } + } if (method->rcast) rc = tipc_rcast_xmit(net, pkts, dests, cong_link_cnt); -- 2.13.7 |
From: <joh...@de...> - 2019-11-26 02:53:33
|
From: John Rutherford <joh...@de...> In commit 4f07b80c9733 ("tipc: check msg->req data len in tipc_nl_compat_bearer_disable") the same patch code was copied into routines: tipc_nl_compat_bearer_disable(), tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats(). The two link routine occurrences should have been modified to check the maximum link name length and not bearer name length. Fixes: 4f07b80c9733 ("tipc: check msg->reg data len in tipc_nl_compat_bearer_disable") Signed-off-by: John Rutherford <joh...@de...> Acked-by: Jon Maloy <jon...@er...> --- net/tipc/netlink_compat.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c index e135d4e11231..d4d2928424e2 100644 --- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -550,7 +550,7 @@ static int tipc_nl_compat_link_stat_dump(struct tipc_nl_compat_msg *msg, if (len <= 0) return -EINVAL; - len = min_t(int, len, TIPC_MAX_BEARER_NAME); + len = min_t(int, len, TIPC_MAX_LINK_NAME); if (!string_is_valid(name, len)) return -EINVAL; @@ -822,7 +822,7 @@ static int tipc_nl_compat_link_reset_stats(struct tipc_nl_compat_cmd_doit *cmd, if (len <= 0) return -EINVAL; - len = min_t(int, len, TIPC_MAX_BEARER_NAME); + len = min_t(int, len, TIPC_MAX_LINK_NAME); if (!string_is_valid(name, len)) return -EINVAL; -- 2.11.0 |
From: Jon M. <jon...@er...> - 2019-11-25 22:16:53
|
> -----Original Message----- > From: Tuong Lien Tong <tuo...@de...> > Sent: 18-Nov-19 20:57 > To: Jon Maloy <jon...@er...>; 'Jon Maloy' <ma...@do...> > Cc: Mohan Krishna Ghanta Krishnamurthy <moh...@er...>; > par...@gm...; Tung Quang Nguyen <tun...@de...>; Hoang > Huu Le <hoa...@de...>; Gordan Mihaljevic <gor...@de...>; > yin...@wi...; tip...@li... > Subject: RE: [net-next v3 1/1] tipc: introduce variable window congestion control > > Hi Jon, > > @@ -1045,7 +1056,13 @@ static void tipc_link_advance_backlog(struct > tipc_link *l, > u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; > u32 imp; > > - while (skb_queue_len(&l->transmq) < l->window) { > + qlen = skb_queue_len(txq); > + if (qlen >= cwin && (l->snd_nxt - buf_seqno(skb_peek(txq)) == qlen)) > { > > [Tuong]: now it should be: 'qlen + backlogq_len >= cwin' instead since we > might have released some packets from the transmq just before calling this > function...? I guess you are right, although it won't make any difference for max throughput. When we are sending long messages at full speed the transmit queue is almost always > window limit anyway. But the change will have the effect that the window won't grow during low traffic, which I think is a correct behavior. I make this change. ///jon > > + add = l->cong_acks++ % 32 ? 0 : 1; > + cwin = min_t(u16, cwin + add, l->max_win); > + l->window = cwin; > + } > + while (skb_queue_len(txq) < cwin) { > > BR/Tuong > > -----Original Message----- > From: Jon Maloy <jon...@er...> > Sent: Tuesday, November 19, 2019 6:33 AM > To: Jon Maloy <jon...@er...>; Jon Maloy <ma...@do...> > Cc: moh...@er...; > par...@gm...; tun...@de...; > hoa...@de...; tuo...@de...; > gor...@de...; yin...@wi...; > tip...@li... > Subject: [net-next v3 1/1] tipc: introduce variable window congestion > control > > We introduce a simple variable window congestion control for links. > The algorithm is inspired by the Reno algorithm, and can best be > descibed as working in permanent "congestion avoidance" mode, within > strict limits. > > - We introduce hard lower and upper window limits per link, still > different and configurable per bearer type. > > - Next, we let a link start at the minimum window, and then slowly > increment it for each 32 received non-duplicate ACK. This goes on > until it either reaches the upper limit, or until it receives a > NACK message. > > - For each non-duplicate NACK received, we let the window decrease by > intervals of 1/2 of the current window, but not below the minimum > window. > > The change does in reality have effect only on unicast ethernet > transport, as we have seen that there is no room whatsoever for > increasing the window max size for the UDP bearer. > For now, we also choose to keep the limits for the broadcast link > unchanged and equal. > > This algorithm seems to give a ~25% throughput improvement for large > messages, while it has no effect on throughput for small messages. > > Suggested-by: Xin Long <luc...@gm...> > Acked-by: Xin Long <luc...@gm...> > Signed-off-by: Jon Maloy <jon...@er...> > > --- > v2: - Moved window increment in tipc_advance_backlogq() to before > the transfer loop, as suggested Tuong. > - Introduced logic for incrementing the window even for the > broadcast send link, also suggested by Tuong. > v3: - Rebased to latest net-next > --- > net/tipc/bcast.c | 11 ++++---- > net/tipc/bearer.c | 11 ++++---- > net/tipc/bearer.h | 6 +++-- > net/tipc/eth_media.c | 6 ++++- > net/tipc/ib_media.c | 5 +++- > net/tipc/link.c | 76 > ++++++++++++++++++++++++++++++++++------------------ > net/tipc/link.h | 9 ++++--- > net/tipc/node.c | 13 +++++---- > net/tipc/udp_media.c | 3 ++- > 9 files changed, 90 insertions(+), 50 deletions(-) > > diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c > index f41096a..84da317 100644 > --- a/net/tipc/bcast.c > +++ b/net/tipc/bcast.c > @@ -562,18 +562,18 @@ int tipc_bclink_reset_stats(struct net *net) > return 0; > } > > -static int tipc_bc_link_set_queue_limits(struct net *net, u32 limit) > +static int tipc_bc_link_set_queue_limits(struct net *net, u32 max_win) > { > struct tipc_link *l = tipc_bc_sndlink(net); > > if (!l) > return -ENOPROTOOPT; > - if (limit < BCLINK_WIN_MIN) > - limit = BCLINK_WIN_MIN; > - if (limit > TIPC_MAX_LINK_WIN) > + if (max_win < BCLINK_WIN_MIN) > + max_win = BCLINK_WIN_MIN; > + if (max_win > TIPC_MAX_LINK_WIN) > return -EINVAL; > tipc_bcast_lock(net); > - tipc_link_set_queue_limits(l, limit); > + tipc_link_set_queue_limits(l, BCLINK_WIN_MIN, max_win); > tipc_bcast_unlock(net); > return 0; > } > @@ -683,6 +683,7 @@ int tipc_bcast_init(struct net *net) > if (!tipc_link_bc_create(net, 0, 0, > FB_MTU, > BCLINK_WIN_DEFAULT, > + BCLINK_WIN_DEFAULT, > 0, > &bb->inputq, > NULL, > diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c > index d7ec26b..34ca7b7 100644 > --- a/net/tipc/bearer.c > +++ b/net/tipc/bearer.c > @@ -311,7 +311,8 @@ static int tipc_enable_bearer(struct net *net, const > char *name, > > b->identity = bearer_id; > b->tolerance = m->tolerance; > - b->window = m->window; > + b->min_win = m->min_win; > + b->max_win = m->max_win; > b->domain = disc_domain; > b->net_plane = bearer_id + 'A'; > b->priority = prio; > @@ -796,7 +797,7 @@ static int __tipc_nl_add_bearer(struct tipc_nl_msg *msg, > goto prop_msg_full; > if (nla_put_u32(msg->skb, TIPC_NLA_PROP_TOL, bearer->tolerance)) > goto prop_msg_full; > - if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bearer->window)) > + if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bearer->max_win)) > goto prop_msg_full; > if (bearer->media->type_id == TIPC_MEDIA_TYPE_UDP) > if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, bearer->mtu)) > @@ -1088,7 +1089,7 @@ int __tipc_nl_bearer_set(struct sk_buff *skb, struct > genl_info *info) > if (props[TIPC_NLA_PROP_PRIO]) > b->priority = > nla_get_u32(props[TIPC_NLA_PROP_PRIO]); > if (props[TIPC_NLA_PROP_WIN]) > - b->window = nla_get_u32(props[TIPC_NLA_PROP_WIN]); > + b->max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); > if (props[TIPC_NLA_PROP_MTU]) { > if (b->media->type_id != TIPC_MEDIA_TYPE_UDP) > return -EINVAL; > @@ -1142,7 +1143,7 @@ static int __tipc_nl_add_media(struct tipc_nl_msg > *msg, > goto prop_msg_full; > if (nla_put_u32(msg->skb, TIPC_NLA_PROP_TOL, media->tolerance)) > goto prop_msg_full; > - if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, media->window)) > + if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, media->max_win)) > goto prop_msg_full; > if (media->type_id == TIPC_MEDIA_TYPE_UDP) > if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, media->mtu)) > @@ -1275,7 +1276,7 @@ int __tipc_nl_media_set(struct sk_buff *skb, struct > genl_info *info) > if (props[TIPC_NLA_PROP_PRIO]) > m->priority = > nla_get_u32(props[TIPC_NLA_PROP_PRIO]); > if (props[TIPC_NLA_PROP_WIN]) > - m->window = nla_get_u32(props[TIPC_NLA_PROP_WIN]); > + m->max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); > if (props[TIPC_NLA_PROP_MTU]) { > if (m->type_id != TIPC_MEDIA_TYPE_UDP) > return -EINVAL; > diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h > index d0c79cc..bc00231 100644 > --- a/net/tipc/bearer.h > +++ b/net/tipc/bearer.h > @@ -119,7 +119,8 @@ struct tipc_media { > char *raw); > u32 priority; > u32 tolerance; > - u32 window; > + u32 min_win; > + u32 max_win; > u32 mtu; > u32 type_id; > u32 hwaddr_len; > @@ -158,7 +159,8 @@ struct tipc_bearer { > struct packet_type pt; > struct rcu_head rcu; > u32 priority; > - u32 window; > + u32 min_win; > + u32 max_win; > u32 tolerance; > u32 domain; > u32 identity; > diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c > index f69a2fd..38cdcab 100644 > --- a/net/tipc/eth_media.c > +++ b/net/tipc/eth_media.c > @@ -37,6 +37,9 @@ > #include "core.h" > #include "bearer.h" > > +#define TIPC_MIN_ETH_LINK_WIN 50 > +#define TIPC_MAX_ETH_LINK_WIN 500 > + > /* Convert Ethernet address (media address format) to string */ > static int tipc_eth_addr2str(struct tipc_media_addr *addr, > char *strbuf, int bufsz) > @@ -92,7 +95,8 @@ struct tipc_media eth_media_info = { > .raw2addr = tipc_eth_raw2addr, > .priority = TIPC_DEF_LINK_PRI, > .tolerance = TIPC_DEF_LINK_TOL, > - .window = TIPC_DEF_LINK_WIN, > + .min_win = TIPC_MIN_ETH_LINK_WIN, > + .max_win = TIPC_MAX_ETH_LINK_WIN, > .type_id = TIPC_MEDIA_TYPE_ETH, > .hwaddr_len = ETH_ALEN, > .name = "eth" > diff --git a/net/tipc/ib_media.c b/net/tipc/ib_media.c > index e8c1671..7aa9ff8 100644 > --- a/net/tipc/ib_media.c > +++ b/net/tipc/ib_media.c > @@ -42,6 +42,8 @@ > #include "core.h" > #include "bearer.h" > > +#define TIPC_MAX_IB_LINK_WIN 500 > + > /* convert InfiniBand address (media address format) media address to > string */ > static int tipc_ib_addr2str(struct tipc_media_addr *a, char *str_buf, > int str_size) > @@ -94,7 +96,8 @@ struct tipc_media ib_media_info = { > .raw2addr = tipc_ib_raw2addr, > .priority = TIPC_DEF_LINK_PRI, > .tolerance = TIPC_DEF_LINK_TOL, > - .window = TIPC_DEF_LINK_WIN, > + .min_win = TIPC_DEF_LINK_WIN, > + .max_win = TIPC_MAX_IB_LINK_WIN, > .type_id = TIPC_MEDIA_TYPE_IB, > .hwaddr_len = INFINIBAND_ALEN, > .name = "ib" > diff --git a/net/tipc/link.c b/net/tipc/link.c > index fb72031..a88950b 100644 > --- a/net/tipc/link.c > +++ b/net/tipc/link.c > @@ -164,7 +164,6 @@ struct tipc_link { > struct sk_buff *target_bskb; > } backlog[5]; > u16 snd_nxt; > - u16 window; > > /* Reception */ > u16 rcv_nxt; > @@ -175,6 +174,10 @@ struct tipc_link { > > /* Congestion handling */ > struct sk_buff_head wakeupq; > + u16 window; > + u16 min_win; > + u16 max_win; > + u16 cong_acks; > > /* Fragmentation/reassembly */ > struct sk_buff *reasm_buf; > @@ -308,9 +311,14 @@ u32 tipc_link_id(struct tipc_link *l) > return l->peer_bearer_id << 16 | l->bearer_id; > } > > -int tipc_link_window(struct tipc_link *l) > +int tipc_link_min_win(struct tipc_link *l) > +{ > + return l->min_win; > +} > + > +int tipc_link_max_win(struct tipc_link *l) > { > - return l->window; > + return l->max_win; > } > > int tipc_link_prio(struct tipc_link *l) > @@ -436,7 +444,8 @@ u32 tipc_link_state(struct tipc_link *l) > * @net_plane: network plane (A,B,c..) this link belongs to > * @mtu: mtu to be advertised by link > * @priority: priority to be used by link > - * @window: send window to be used by link > + * @min_win: minimal send window to be used by link > + * @max_win: maximal send window to be used by link > * @session: session to be used by link > * @ownnode: identity of own node > * @peer: node id of peer node > @@ -451,7 +460,7 @@ u32 tipc_link_state(struct tipc_link *l) > */ > bool tipc_link_create(struct net *net, char *if_name, int bearer_id, > int tolerance, char net_plane, u32 mtu, int priority, > - int window, u32 session, u32 self, > + u32 min_win, u32 max_win, u32 session, u32 self, > u32 peer, u8 *peer_id, u16 peer_caps, > struct tipc_link *bc_sndlink, > struct tipc_link *bc_rcvlink, > @@ -495,7 +504,7 @@ bool tipc_link_create(struct net *net, char *if_name, > int bearer_id, > l->advertised_mtu = mtu; > l->mtu = mtu; > l->priority = priority; > - tipc_link_set_queue_limits(l, window); > + tipc_link_set_queue_limits(l, min_win, max_win); > l->ackers = 1; > l->bc_sndlink = bc_sndlink; > l->bc_rcvlink = bc_rcvlink; > @@ -523,7 +532,7 @@ bool tipc_link_create(struct net *net, char *if_name, > int bearer_id, > * Returns true if link was created, otherwise false > */ > bool tipc_link_bc_create(struct net *net, u32 ownnode, u32 peer, > - int mtu, int window, u16 peer_caps, > + int mtu, u32 min_win, u32 max_win, u16 peer_caps, > struct sk_buff_head *inputq, > struct sk_buff_head *namedq, > struct tipc_link *bc_sndlink, > @@ -531,9 +540,9 @@ bool tipc_link_bc_create(struct net *net, u32 ownnode, > u32 peer, > { > struct tipc_link *l; > > - if (!tipc_link_create(net, "", MAX_BEARERS, 0, 'Z', mtu, 0, window, > - 0, ownnode, peer, NULL, peer_caps, bc_sndlink, > - NULL, inputq, namedq, link)) > + if (!tipc_link_create(net, "", MAX_BEARERS, 0, 'Z', mtu, 0, min_win, > + max_win, 0, ownnode, peer, NULL, peer_caps, > + bc_sndlink, NULL, inputq, namedq, link)) > return false; > > l = *link; > @@ -959,7 +968,7 @@ int tipc_link_xmit(struct tipc_link *l, struct > sk_buff_head *list, > int pkt_cnt = skb_queue_len(list); > int imp = msg_importance(hdr); > unsigned int mss = tipc_link_mss(l); > - unsigned int maxwin = l->window; > + unsigned int cwin = l->window; > unsigned int mtu = l->mtu; > bool new_bundle; > int rc = 0; > @@ -988,7 +997,7 @@ int tipc_link_xmit(struct tipc_link *l, struct > sk_buff_head *list, > > /* Prepare each packet for sending, and add to relevant queue: */ > while ((skb = __skb_dequeue(list))) { > - if (likely(skb_queue_len(transmq) < maxwin)) { > + if (likely(skb_queue_len(transmq) < cwin)) { > hdr = buf_msg(skb); > msg_set_seqno(hdr, seqno); > msg_set_ack(hdr, ack); > @@ -1038,6 +1047,8 @@ int tipc_link_xmit(struct tipc_link *l, struct > sk_buff_head *list, > static void tipc_link_advance_backlog(struct tipc_link *l, > struct sk_buff_head *xmitq) > { > + struct sk_buff_head *txq = &l->transmq; > + u16 qlen, add, cwin = l->window; > struct sk_buff *skb, *_skb; > struct tipc_msg *hdr; > u16 seqno = l->snd_nxt; > @@ -1045,7 +1056,13 @@ static void tipc_link_advance_backlog(struct > tipc_link *l, > u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; > u32 imp; > > - while (skb_queue_len(&l->transmq) < l->window) { > + qlen = skb_queue_len(txq); > + if (qlen >= cwin && (l->snd_nxt - buf_seqno(skb_peek(txq)) == qlen)) > { > + add = l->cong_acks++ % 32 ? 0 : 1; > + cwin = min_t(u16, cwin + add, l->max_win); > + l->window = cwin; > + } > + while (skb_queue_len(txq) < cwin) { > skb = skb_peek(&l->backlogq); > if (!skb) > break; > @@ -1140,6 +1157,7 @@ static int tipc_link_bc_retrans(struct tipc_link *l, > struct tipc_link *r, > { > struct sk_buff *_skb, *skb = skb_peek(&l->transmq); > u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; > + bool retransmitted = false; > u16 ack = l->rcv_nxt - 1; > struct tipc_msg *hdr; > int rc = 0; > @@ -1173,11 +1191,13 @@ static int tipc_link_bc_retrans(struct tipc_link *l, > struct tipc_link *r, > _skb->priority = TC_PRIO_CONTROL; > __skb_queue_tail(xmitq, _skb); > l->stats.retransmitted++; > - > + retransmitted = true; > /* Increase actual retrans counter & mark first time */ > if (!TIPC_SKB_CB(skb)->retr_cnt++) > TIPC_SKB_CB(skb)->retr_stamp = jiffies; > } > + if (retransmitted) > + l->window = l->min_win + (l->window - l->min_win) / 2; > return 0; > } > > @@ -1417,6 +1437,7 @@ static int tipc_link_advance_transmq(struct tipc_link > *l, u16 acked, u16 gap, > struct sk_buff *skb, *_skb, *tmp; > struct tipc_msg *hdr; > u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; > + bool retransmitted = false; > u16 ack = l->rcv_nxt - 1; > bool passed = false; > u16 seqno, n = 0; > @@ -1449,7 +1470,7 @@ static int tipc_link_advance_transmq(struct tipc_link > *l, u16 acked, u16 gap, > _skb->priority = TC_PRIO_CONTROL; > __skb_queue_tail(xmitq, _skb); > l->stats.retransmitted++; > - > + retransmitted = true; > /* Increase actual retrans counter & mark first time > */ > if (!TIPC_SKB_CB(skb)->retr_cnt++) > TIPC_SKB_CB(skb)->retr_stamp = jiffies; > @@ -1463,7 +1484,8 @@ static int tipc_link_advance_transmq(struct tipc_link > *l, u16 acked, u16 gap, > goto next_gap_ack; > } > } > - > + if (retransmitted) > + l->window = l->min_win + (l->window - l->min_win) / 2; > return 0; > } > > @@ -2305,15 +2327,17 @@ int tipc_link_bc_nack_rcv(struct tipc_link *l, > struct sk_buff *skb, > return 0; > } > > -void tipc_link_set_queue_limits(struct tipc_link *l, u32 win) > +void tipc_link_set_queue_limits(struct tipc_link *l, u32 min_win, u32 > max_win) > { > int max_bulk = TIPC_MAX_PUBL / (l->mtu / ITEM_SIZE); > > - l->window = win; > - l->backlog[TIPC_LOW_IMPORTANCE].limit = max_t(u16, 50, win); > - l->backlog[TIPC_MEDIUM_IMPORTANCE].limit = max_t(u16, 100, win * > 2); > - l->backlog[TIPC_HIGH_IMPORTANCE].limit = max_t(u16, 150, win * > 3); > - l->backlog[TIPC_CRITICAL_IMPORTANCE].limit = max_t(u16, 200, win * > 4); > + l->window = min_win; > + l->min_win = min_win; > + l->max_win = max_win; > + l->backlog[TIPC_LOW_IMPORTANCE].limit = min_win * 2; > + l->backlog[TIPC_MEDIUM_IMPORTANCE].limit = min_win * 4; > + l->backlog[TIPC_HIGH_IMPORTANCE].limit = min_win * 6; > + l->backlog[TIPC_CRITICAL_IMPORTANCE].limit = min_win * 8; > l->backlog[TIPC_SYSTEM_IMPORTANCE].limit = max_bulk; > } > > @@ -2366,10 +2390,10 @@ int tipc_nl_parse_link_prop(struct nlattr *prop, > struct nlattr *props[]) > } > > if (props[TIPC_NLA_PROP_WIN]) { > - u32 win; > + u32 max_win; > > - win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); > - if ((win < TIPC_MIN_LINK_WIN) || (win > TIPC_MAX_LINK_WIN)) > + max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); > + if (max_win < TIPC_MIN_LINK_WIN || max_win > > TIPC_MAX_LINK_WIN) > return -EINVAL; > } > > @@ -2605,7 +2629,7 @@ int tipc_nl_add_bc_link(struct net *net, struct > tipc_nl_msg *msg) > prop = nla_nest_start_noflag(msg->skb, TIPC_NLA_LINK_PROP); > if (!prop) > goto attr_msg_full; > - if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bcl->window)) > + if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bcl->max_win)) > goto prop_msg_full; > if (nla_put_u32(msg->skb, TIPC_NLA_PROP_BROADCAST, bc_mode)) > goto prop_msg_full; > diff --git a/net/tipc/link.h b/net/tipc/link.h > index c09e9d4..d3c1c3f 100644 > --- a/net/tipc/link.h > +++ b/net/tipc/link.h > @@ -73,7 +73,7 @@ enum { > > bool tipc_link_create(struct net *net, char *if_name, int bearer_id, > int tolerance, char net_plane, u32 mtu, int priority, > - int window, u32 session, u32 ownnode, > + u32 min_win, u32 max_win, u32 session, u32 ownnode, > u32 peer, u8 *peer_id, u16 peer_caps, > struct tipc_link *bc_sndlink, > struct tipc_link *bc_rcvlink, > @@ -81,7 +81,7 @@ bool tipc_link_create(struct net *net, char *if_name, int > bearer_id, > struct sk_buff_head *namedq, > struct tipc_link **link); > bool tipc_link_bc_create(struct net *net, u32 ownnode, u32 peer, > - int mtu, int window, u16 peer_caps, > + int mtu, u32 min_win, u32 max_win, u16 peer_caps, > struct sk_buff_head *inputq, > struct sk_buff_head *namedq, > struct tipc_link *bc_sndlink, > @@ -115,7 +115,8 @@ char *tipc_link_name_ext(struct tipc_link *l, char > *buf); > u32 tipc_link_state(struct tipc_link *l); > char tipc_link_plane(struct tipc_link *l); > int tipc_link_prio(struct tipc_link *l); > -int tipc_link_window(struct tipc_link *l); > +int tipc_link_min_win(struct tipc_link *l); > +int tipc_link_max_win(struct tipc_link *l); > void tipc_link_update_caps(struct tipc_link *l, u16 capabilities); > bool tipc_link_validate_msg(struct tipc_link *l, struct tipc_msg *hdr); > unsigned long tipc_link_tolerance(struct tipc_link *l); > @@ -124,7 +125,7 @@ void tipc_link_set_tolerance(struct tipc_link *l, u32 > tol, > void tipc_link_set_prio(struct tipc_link *l, u32 prio, > struct sk_buff_head *xmitq); > void tipc_link_set_abort_limit(struct tipc_link *l, u32 limit); > -void tipc_link_set_queue_limits(struct tipc_link *l, u32 window); > +void tipc_link_set_queue_limits(struct tipc_link *l, u32 min_win, u32 > max_win); > int __tipc_nl_add_link(struct net *net, struct tipc_nl_msg *msg, > struct tipc_link *link, int nlflags); > int tipc_nl_parse_link_prop(struct nlattr *prop, struct nlattr *props[]); > diff --git a/net/tipc/node.c b/net/tipc/node.c > index aaf5956..970c780 100644 > --- a/net/tipc/node.c > +++ b/net/tipc/node.c > @@ -1134,7 +1134,8 @@ void tipc_node_check_dest(struct net *net, u32 addr, > snd_l = tipc_bc_sndlink(net); > if (!tipc_link_bc_create(net, tipc_own_addr(net), > addr, U16_MAX, > - tipc_link_window(snd_l), > + tipc_link_min_win(snd_l), > + tipc_link_max_win(snd_l), > n->capabilities, > &n->bc_entry.inputq1, > &n->bc_entry.namedq, snd_l, > @@ -1228,7 +1229,7 @@ void tipc_node_check_dest(struct net *net, u32 addr, > get_random_bytes(&session, sizeof(u16)); > if (!tipc_link_create(net, if_name, b->identity, > b->tolerance, > b->net_plane, b->mtu, b->priority, > - b->window, session, > + b->min_win, b->max_win, session, > tipc_own_addr(net), addr, peer_id, > n->capabilities, > tipc_bc_sndlink(n->net), > n->bc_entry.link, > @@ -2374,10 +2375,12 @@ int tipc_nl_node_set_link(struct sk_buff *skb, > struct genl_info *info) > tipc_link_set_prio(link, prio, &xmitq); > } > if (props[TIPC_NLA_PROP_WIN]) { > - u32 win; > + u32 max_win; > > - win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); > - tipc_link_set_queue_limits(link, win); > + max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); > + tipc_link_set_queue_limits(link, > + tipc_link_min_win(link), > + max_win); > } > } > > diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c > index 86aaa4d..7d9dcbb 100644 > --- a/net/tipc/udp_media.c > +++ b/net/tipc/udp_media.c > @@ -825,7 +825,8 @@ struct tipc_media udp_media_info = { > .msg2addr = tipc_udp_msg2addr, > .priority = TIPC_DEF_LINK_PRI, > .tolerance = TIPC_DEF_LINK_TOL, > - .window = TIPC_DEF_LINK_WIN, > + .min_win = TIPC_DEF_LINK_WIN, > + .max_win = TIPC_DEF_LINK_WIN, > .mtu = TIPC_DEF_LINK_UDP_MTU, > .type_id = TIPC_MEDIA_TYPE_UDP, > .hwaddr_len = 0, > -- > 2.1.4 > |
From: Jon M. <jon...@er...> - 2019-11-25 14:33:55
|
Acked-by: Jon > -----Original Message----- > From: joh...@de... <joh...@de...> > Sent: 25-Nov-19 01:35 > To: tip...@li... > Subject: [tipc-discussion] [net] tipc: fix link name length check > > From: John Rutherford <joh...@de...> > > In commit 4f07b80c9733 ("tipc: check msg->req data len in > tipc_nl_compat_bearer_disable") the same patch code was copied into > routines: tipc_nl_compat_bearer_disable(), > tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats(). > The two link routine occurrences should have been modified to check > the maximum link name length and not bearer name length. > > Fixes: 4f07b80c9733 ("tipc: check msg->reg data len in tipc_nl_compat_bearer_disable") > Signed-off-by: John Rutherford <joh...@de...> > --- > net/tipc/netlink_compat.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c > index e135d4e11231..d4d2928424e2 100644 > --- a/net/tipc/netlink_compat.c > +++ b/net/tipc/netlink_compat.c > @@ -550,7 +550,7 @@ static int tipc_nl_compat_link_stat_dump(struct tipc_nl_compat_msg *msg, > if (len <= 0) > return -EINVAL; > > - len = min_t(int, len, TIPC_MAX_BEARER_NAME); > + len = min_t(int, len, TIPC_MAX_LINK_NAME); > if (!string_is_valid(name, len)) > return -EINVAL; > > @@ -822,7 +822,7 @@ static int tipc_nl_compat_link_reset_stats(struct tipc_nl_compat_cmd_doit > *cmd, > if (len <= 0) > return -EINVAL; > > - len = min_t(int, len, TIPC_MAX_BEARER_NAME); > + len = min_t(int, len, TIPC_MAX_LINK_NAME); > if (!string_is_valid(name, len)) > return -EINVAL; > > -- > 2.11.0 > > > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion |
From: Tuong L. T. <tuo...@de...> - 2019-11-25 08:27:19
|
Hi Tung, Okie, got it. I lost track when your patch was sent in August... ☹. Anyway, I think you can consider my commit message which highlights this as a memory leak issue seriously. BR/Tuong -----Original Message----- From: tung quang nguyen <tun...@de...> Sent: Monday, November 25, 2019 3:23 PM To: 'Tuong Lien' <tuo...@de...>; tip...@li...; jon...@er...; ma...@do...; yin...@wi... Subject: RE: [tipc-discussion] [net] tipc: fix memory leak in socket streaming Hi Tuong, I fixed it in this commit "[tipc-discussion] [net v1 2/3] tipc: fix wrong socket reference counter after tipc_sk_timeout() returns" but I have not sent to David yet. I intended to send it in a series of patch for fixing bugs at socket layer. Thanks. Best regards, Tung Nguyen -----Original Message----- From: Tuong Lien <tuo...@de...> Sent: Monday, November 25, 2019 3:13 PM To: tip...@li...; jon...@er...; ma...@do...; yin...@wi... Subject: [tipc-discussion] [net] tipc: fix memory leak in socket streaming In case of stream sockets, a per-socket timer is set up for either the SYN attempt or connection supervision mechanism. When the socket timer expires, an appropriate action (i.e. sending SYN or PROBE message) would be taken just in the case the socket is not being owned by user (e.g. via the 'lock_sock()'). In the latter case, there is nothing but the timer is re-scheduled for a short period of time (~ 50ms) to try again. The function just makes a 'return' immediately without decreasing the socket 'refcnt' which was held in advance for the timer callback itself! The same happens if at the next time, the socket is still busy... As a result, the socket 'refcnt' is increased without decreasing, so the sock object cannot be freed at all (due to 'refcnt' != 0) even when the connection is closed and user releases all related resources. The memory leak is hard to detect because the probe interval is set to 1 hour since the connection is established, but in the case of a SYN attempt, that can be much more likely. The commit fixes the bug by calling the 'sock_put()' in the case mentioned above, then the socket 'refcnt' will be increased & decreased correctly and the sock object can be released later. Fixes: 0d5fcebf3c37 ("tipc: refactor tipc_sk_timeout() function") Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/socket.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index a1c8d722ca20..d67c3747d2c3 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2757,7 +2757,7 @@ static void tipc_sk_timeout(struct timer_list *t) if (sock_owned_by_user(sk)) { sk_reset_timer(sk, &sk->sk_timer, jiffies + HZ / 20); bh_unlock_sock(sk); - return; + goto exit; } if (sk->sk_state == TIPC_ESTABLISHED) @@ -2775,6 +2775,8 @@ static void tipc_sk_timeout(struct timer_list *t) tipc_dest_push(&tsk->cong_links, pnode, 0); tsk->cong_link_cnt = 1; } + +exit: sock_put(sk); } -- 2.13.7 _______________________________________________ tipc-discussion mailing list tip...@li... https://lists.sourceforge.net/lists/listinfo/tipc-discussion |
From: tung q. n. <tun...@de...> - 2019-11-25 08:23:12
|
Hi Tuong, I fixed it in this commit "[tipc-discussion] [net v1 2/3] tipc: fix wrong socket reference counter after tipc_sk_timeout() returns" but I have not sent to David yet. I intended to send it in a series of patch for fixing bugs at socket layer. Thanks. Best regards, Tung Nguyen -----Original Message----- From: Tuong Lien <tuo...@de...> Sent: Monday, November 25, 2019 3:13 PM To: tip...@li...; jon...@er...; ma...@do...; yin...@wi... Subject: [tipc-discussion] [net] tipc: fix memory leak in socket streaming In case of stream sockets, a per-socket timer is set up for either the SYN attempt or connection supervision mechanism. When the socket timer expires, an appropriate action (i.e. sending SYN or PROBE message) would be taken just in the case the socket is not being owned by user (e.g. via the 'lock_sock()'). In the latter case, there is nothing but the timer is re-scheduled for a short period of time (~ 50ms) to try again. The function just makes a 'return' immediately without decreasing the socket 'refcnt' which was held in advance for the timer callback itself! The same happens if at the next time, the socket is still busy... As a result, the socket 'refcnt' is increased without decreasing, so the sock object cannot be freed at all (due to 'refcnt' != 0) even when the connection is closed and user releases all related resources. The memory leak is hard to detect because the probe interval is set to 1 hour since the connection is established, but in the case of a SYN attempt, that can be much more likely. The commit fixes the bug by calling the 'sock_put()' in the case mentioned above, then the socket 'refcnt' will be increased & decreased correctly and the sock object can be released later. Fixes: 0d5fcebf3c37 ("tipc: refactor tipc_sk_timeout() function") Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/socket.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index a1c8d722ca20..d67c3747d2c3 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2757,7 +2757,7 @@ static void tipc_sk_timeout(struct timer_list *t) if (sock_owned_by_user(sk)) { sk_reset_timer(sk, &sk->sk_timer, jiffies + HZ / 20); bh_unlock_sock(sk); - return; + goto exit; } if (sk->sk_state == TIPC_ESTABLISHED) @@ -2775,6 +2775,8 @@ static void tipc_sk_timeout(struct timer_list *t) tipc_dest_push(&tsk->cong_links, pnode, 0); tsk->cong_link_cnt = 1; } + +exit: sock_put(sk); } -- 2.13.7 _______________________________________________ tipc-discussion mailing list tip...@li... https://lists.sourceforge.net/lists/listinfo/tipc-discussion |
From: Tuong L. <tuo...@de...> - 2019-11-25 08:12:47
|
In case of stream sockets, a per-socket timer is set up for either the SYN attempt or connection supervision mechanism. When the socket timer expires, an appropriate action (i.e. sending SYN or PROBE message) would be taken just in the case the socket is not being owned by user (e.g. via the 'lock_sock()'). In the latter case, there is nothing but the timer is re-scheduled for a short period of time (~ 50ms) to try again. The function just makes a 'return' immediately without decreasing the socket 'refcnt' which was held in advance for the timer callback itself! The same happens if at the next time, the socket is still busy... As a result, the socket 'refcnt' is increased without decreasing, so the sock object cannot be freed at all (due to 'refcnt' != 0) even when the connection is closed and user releases all related resources. The memory leak is hard to detect because the probe interval is set to 1 hour since the connection is established, but in the case of a SYN attempt, that can be much more likely. The commit fixes the bug by calling the 'sock_put()' in the case mentioned above, then the socket 'refcnt' will be increased & decreased correctly and the sock object can be released later. Fixes: 0d5fcebf3c37 ("tipc: refactor tipc_sk_timeout() function") Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/socket.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index a1c8d722ca20..d67c3747d2c3 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2757,7 +2757,7 @@ static void tipc_sk_timeout(struct timer_list *t) if (sock_owned_by_user(sk)) { sk_reset_timer(sk, &sk->sk_timer, jiffies + HZ / 20); bh_unlock_sock(sk); - return; + goto exit; } if (sk->sk_state == TIPC_ESTABLISHED) @@ -2775,6 +2775,8 @@ static void tipc_sk_timeout(struct timer_list *t) tipc_dest_push(&tsk->cong_links, pnode, 0); tsk->cong_link_cnt = 1; } + +exit: sock_put(sk); } -- 2.13.7 |
From: <joh...@de...> - 2019-11-25 06:35:40
|
From: John Rutherford <joh...@de...> In commit 4f07b80c9733 ("tipc: check msg->req data len in tipc_nl_compat_bearer_disable") the same patch code was copied into routines: tipc_nl_compat_bearer_disable(), tipc_nl_compat_link_stat_dump() and tipc_nl_compat_link_reset_stats(). The two link routine occurrences should have been modified to check the maximum link name length and not bearer name length. Fixes: 4f07b80c9733 ("tipc: check msg->reg data len in tipc_nl_compat_bearer_disable") Signed-off-by: John Rutherford <joh...@de...> --- net/tipc/netlink_compat.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/tipc/netlink_compat.c b/net/tipc/netlink_compat.c index e135d4e11231..d4d2928424e2 100644 --- a/net/tipc/netlink_compat.c +++ b/net/tipc/netlink_compat.c @@ -550,7 +550,7 @@ static int tipc_nl_compat_link_stat_dump(struct tipc_nl_compat_msg *msg, if (len <= 0) return -EINVAL; - len = min_t(int, len, TIPC_MAX_BEARER_NAME); + len = min_t(int, len, TIPC_MAX_LINK_NAME); if (!string_is_valid(name, len)) return -EINVAL; @@ -822,7 +822,7 @@ static int tipc_nl_compat_link_reset_stats(struct tipc_nl_compat_cmd_doit *cmd, if (len <= 0) return -EINVAL; - len = min_t(int, len, TIPC_MAX_BEARER_NAME); + len = min_t(int, len, TIPC_MAX_LINK_NAME); if (!string_is_valid(name, len)) return -EINVAL; -- 2.11.0 |
From: Ying X. <yin...@wi...> - 2019-11-22 16:31:03
|
>> I don't know you ever remembered many years ago I ever implemented a prototype to introduce slow- >> start and traffic congestion avoidance algorithms on link layer :) >> >> In my experiment, there were a few of meaningful findings: >> - It was crucial for the throughput performance about how to set initial congestion window and upper >> congestion window size. >> - It was found that different Ethernet speeds, different CPU capabilities and different message sizes all have >> a big impact on throughput performance. I did lots of experiments, as a result, sometimes performance >> improvement was very obvious, but sometimes, performance improvement was minimal even when I >> measured throughput in a similar network environment. Particularly, if I slightly changed some test >> conditions, throughput improvement results were also quite different. >> >> At that moment, I ever doubted whether we needed to do the following changes: >> 1. Should we introduce a timer to measure RTT and identify whether network congestion happens or not? > > Actually, measuring RTT is one of the stated requirements from the Ericsson product line, and I think we should do it. > Our plan was to do this by adding an adequately scaled time stamp to probe/probe-reply messages. This in turn could serve as base to calculate the bandwidth product and window limits. > That's great. >> 2. Should we change message delivery mode from message-oriented to byte-oriented? > > In my original series, sent out Nov. 4th, I did that in an additional patch. However, the solution was intrusive and ugly, and even gave slightly poorer performance than the current packet base one. So I decided to abandon this approach, at least for now. > Okay. >> >> Of course, in my experiment I didn't make so big changes. >> >> So I want to know: >> - How did you select the minimum window size and maximum window size? > > This was done empirically only. Lower than 50 never made any sense, as throughput always would be lower. > Throughput seemed to improve up to 500, but thereafter there was no improvement. Regarding my previous validation, the initial window size and maximum parameters are very crucial to performance. Moreover, currently when receiver's unacked messages reach 16 (TIPC_MIN_LINK_WIN), receiver would send one acknowledgment through one protocol state message to sender. In fact TIPC_MIN_LINK_WIN is also very important for performance. If the value is bigger, it means we can save network bandwidth to send more data. But if its value is bigger, it means detecting network congestion or message loss will be less sensitive. In my opinion, we also need to consider to adjust according test result. We can discuss more later. > > >> - Did you measure TIPC throughput performance on different Ethernets? Including large message size test >> and small message size test. > > I measured it only on virtio. I know this is a weakness, and we should of course try on other drivers as well. > >> - Did you meet similar phenomena to me when we slightly changed test condition? > > I observed differences, but the overall picture was that this modest change always gave some improvement. > >> >> In this proposal, during slow-start stage, the window increase is pretty slow: >> >> + if (qlen >= cwin && (l->snd_nxt - buf_seqno(skb_peek(txq)) == qlen)) { >> + add = l->cong_acks++ % 32 ? 0 : 1; >> + cwin = min_t(u16, cwin + add, l->max_win); >> + l->window = cwin; >> + } >> >> But in TCP slow-start algorithm, during slow-start stage congestion window increase is much more >> aggressive than above. As long as congestion window exceeds slow-start threshold, it enters congestion >> avoidance stage in which congestion window increases slowly. >> >> I am curious why the congestion window increase is pretty conservative compared to TCP slow-start >> algorithm. What factors did you consider when you selected the algorithm? > > I did it only because it is simple and gives an obvious improvement. I am fully aware that the growth is very slow, and that introducing a threshold where we go from a faster growth to a slower one probably would increase average window size and throughput. > I see this as the next step, that we all could experiment with. > > However, my approach now is that this very simple change is a low hanging fruit that gives an obvious improvement, so we lose nothing by releasing it as is. Then we can make it subject to further improvements later. > I see. It looks like your current algorithm is better than mine. In my solution, the performance improvement is not very stable particularly under different network test conditions. Acked-by: Ying Xue <yin...@wi...> > BR > ///jon > >> >> Thanks, >> Ying >> >> -----Original Message----- >> From: Jon Maloy [mailto:jon...@er...] >> Sent: Tuesday, November 19, 2019 7:33 AM >> To: Jon Maloy; Jon Maloy >> Cc: moh...@er...; par...@gm...; >> tun...@de...; hoa...@de...; tuo...@de...; >> gor...@de...; Xue, Ying; tip...@li... >> Subject: [net-next v3 1/1] tipc: introduce variable window congestion control >> >> We introduce a simple variable window congestion control for links. >> The algorithm is inspired by the Reno algorithm, and can best be >> descibed as working in permanent "congestion avoidance" mode, within >> strict limits. >> >> - We introduce hard lower and upper window limits per link, still >> different and configurable per bearer type. >> >> - Next, we let a link start at the minimum window, and then slowly >> increment it for each 32 received non-duplicate ACK. This goes on >> until it either reaches the upper limit, or until it receives a >> NACK message. >> >> - For each non-duplicate NACK received, we let the window decrease by >> intervals of 1/2 of the current window, but not below the minimum >> window. >> >> The change does in reality have effect only on unicast ethernet >> transport, as we have seen that there is no room whatsoever for >> increasing the window max size for the UDP bearer. >> For now, we also choose to keep the limits for the broadcast link >> unchanged and equal. >> >> This algorithm seems to give a ~25% throughput improvement for large >> messages, while it has no effect on throughput for small messages. >> >> Suggested-by: Xin Long <luc...@gm...> >> Acked-by: Xin Long <luc...@gm...> >> Signed-off-by: Jon Maloy <jon...@er...> >> >> --- >> v2: - Moved window increment in tipc_advance_backlogq() to before >> the transfer loop, as suggested Tuong. >> - Introduced logic for incrementing the window even for the >> broadcast send link, also suggested by Tuong. >> v3: - Rebased to latest net-next >> --- >> net/tipc/bcast.c | 11 ++++---- >> net/tipc/bearer.c | 11 ++++---- >> net/tipc/bearer.h | 6 +++-- >> net/tipc/eth_media.c | 6 ++++- >> net/tipc/ib_media.c | 5 +++- >> net/tipc/link.c | 76 ++++++++++++++++++++++++++++++++++------------------ >> net/tipc/link.h | 9 ++++--- >> net/tipc/node.c | 13 +++++---- >> net/tipc/udp_media.c | 3 ++- >> 9 files changed, 90 insertions(+), 50 deletions(-) >> >> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c >> index f41096a..84da317 100644 >> --- a/net/tipc/bcast.c >> +++ b/net/tipc/bcast.c >> @@ -562,18 +562,18 @@ int tipc_bclink_reset_stats(struct net *net) >> return 0; >> } >> >> -static int tipc_bc_link_set_queue_limits(struct net *net, u32 limit) >> +static int tipc_bc_link_set_queue_limits(struct net *net, u32 max_win) >> { >> struct tipc_link *l = tipc_bc_sndlink(net); >> >> if (!l) >> return -ENOPROTOOPT; >> - if (limit < BCLINK_WIN_MIN) >> - limit = BCLINK_WIN_MIN; >> - if (limit > TIPC_MAX_LINK_WIN) >> + if (max_win < BCLINK_WIN_MIN) >> + max_win = BCLINK_WIN_MIN; >> + if (max_win > TIPC_MAX_LINK_WIN) >> return -EINVAL; >> tipc_bcast_lock(net); >> - tipc_link_set_queue_limits(l, limit); >> + tipc_link_set_queue_limits(l, BCLINK_WIN_MIN, max_win); >> tipc_bcast_unlock(net); >> return 0; >> } >> @@ -683,6 +683,7 @@ int tipc_bcast_init(struct net *net) >> if (!tipc_link_bc_create(net, 0, 0, >> FB_MTU, >> BCLINK_WIN_DEFAULT, >> + BCLINK_WIN_DEFAULT, >> 0, >> &bb->inputq, >> NULL, >> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c >> index d7ec26b..34ca7b7 100644 >> --- a/net/tipc/bearer.c >> +++ b/net/tipc/bearer.c >> @@ -311,7 +311,8 @@ static int tipc_enable_bearer(struct net *net, const char *name, >> >> b->identity = bearer_id; >> b->tolerance = m->tolerance; >> - b->window = m->window; >> + b->min_win = m->min_win; >> + b->max_win = m->max_win; >> b->domain = disc_domain; >> b->net_plane = bearer_id + 'A'; >> b->priority = prio; >> @@ -796,7 +797,7 @@ static int __tipc_nl_add_bearer(struct tipc_nl_msg *msg, >> goto prop_msg_full; >> if (nla_put_u32(msg->skb, TIPC_NLA_PROP_TOL, bearer->tolerance)) >> goto prop_msg_full; >> - if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bearer->window)) >> + if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bearer->max_win)) >> goto prop_msg_full; >> if (bearer->media->type_id == TIPC_MEDIA_TYPE_UDP) >> if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, bearer->mtu)) >> @@ -1088,7 +1089,7 @@ int __tipc_nl_bearer_set(struct sk_buff *skb, struct genl_info *info) >> if (props[TIPC_NLA_PROP_PRIO]) >> b->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]); >> if (props[TIPC_NLA_PROP_WIN]) >> - b->window = nla_get_u32(props[TIPC_NLA_PROP_WIN]); >> + b->max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); >> if (props[TIPC_NLA_PROP_MTU]) { >> if (b->media->type_id != TIPC_MEDIA_TYPE_UDP) >> return -EINVAL; >> @@ -1142,7 +1143,7 @@ static int __tipc_nl_add_media(struct tipc_nl_msg *msg, >> goto prop_msg_full; >> if (nla_put_u32(msg->skb, TIPC_NLA_PROP_TOL, media->tolerance)) >> goto prop_msg_full; >> - if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, media->window)) >> + if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, media->max_win)) >> goto prop_msg_full; >> if (media->type_id == TIPC_MEDIA_TYPE_UDP) >> if (nla_put_u32(msg->skb, TIPC_NLA_PROP_MTU, media->mtu)) >> @@ -1275,7 +1276,7 @@ int __tipc_nl_media_set(struct sk_buff *skb, struct genl_info *info) >> if (props[TIPC_NLA_PROP_PRIO]) >> m->priority = nla_get_u32(props[TIPC_NLA_PROP_PRIO]); >> if (props[TIPC_NLA_PROP_WIN]) >> - m->window = nla_get_u32(props[TIPC_NLA_PROP_WIN]); >> + m->max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); >> if (props[TIPC_NLA_PROP_MTU]) { >> if (m->type_id != TIPC_MEDIA_TYPE_UDP) >> return -EINVAL; >> diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h >> index d0c79cc..bc00231 100644 >> --- a/net/tipc/bearer.h >> +++ b/net/tipc/bearer.h >> @@ -119,7 +119,8 @@ struct tipc_media { >> char *raw); >> u32 priority; >> u32 tolerance; >> - u32 window; >> + u32 min_win; >> + u32 max_win; >> u32 mtu; >> u32 type_id; >> u32 hwaddr_len; >> @@ -158,7 +159,8 @@ struct tipc_bearer { >> struct packet_type pt; >> struct rcu_head rcu; >> u32 priority; >> - u32 window; >> + u32 min_win; >> + u32 max_win; >> u32 tolerance; >> u32 domain; >> u32 identity; >> diff --git a/net/tipc/eth_media.c b/net/tipc/eth_media.c >> index f69a2fd..38cdcab 100644 >> --- a/net/tipc/eth_media.c >> +++ b/net/tipc/eth_media.c >> @@ -37,6 +37,9 @@ >> #include "core.h" >> #include "bearer.h" >> >> +#define TIPC_MIN_ETH_LINK_WIN 50 >> +#define TIPC_MAX_ETH_LINK_WIN 500 >> + >> /* Convert Ethernet address (media address format) to string */ >> static int tipc_eth_addr2str(struct tipc_media_addr *addr, >> char *strbuf, int bufsz) >> @@ -92,7 +95,8 @@ struct tipc_media eth_media_info = { >> .raw2addr = tipc_eth_raw2addr, >> .priority = TIPC_DEF_LINK_PRI, >> .tolerance = TIPC_DEF_LINK_TOL, >> - .window = TIPC_DEF_LINK_WIN, >> + .min_win = TIPC_MIN_ETH_LINK_WIN, >> + .max_win = TIPC_MAX_ETH_LINK_WIN, >> .type_id = TIPC_MEDIA_TYPE_ETH, >> .hwaddr_len = ETH_ALEN, >> .name = "eth" >> diff --git a/net/tipc/ib_media.c b/net/tipc/ib_media.c >> index e8c1671..7aa9ff8 100644 >> --- a/net/tipc/ib_media.c >> +++ b/net/tipc/ib_media.c >> @@ -42,6 +42,8 @@ >> #include "core.h" >> #include "bearer.h" >> >> +#define TIPC_MAX_IB_LINK_WIN 500 >> + >> /* convert InfiniBand address (media address format) media address to string */ >> static int tipc_ib_addr2str(struct tipc_media_addr *a, char *str_buf, >> int str_size) >> @@ -94,7 +96,8 @@ struct tipc_media ib_media_info = { >> .raw2addr = tipc_ib_raw2addr, >> .priority = TIPC_DEF_LINK_PRI, >> .tolerance = TIPC_DEF_LINK_TOL, >> - .window = TIPC_DEF_LINK_WIN, >> + .min_win = TIPC_DEF_LINK_WIN, >> + .max_win = TIPC_MAX_IB_LINK_WIN, >> .type_id = TIPC_MEDIA_TYPE_IB, >> .hwaddr_len = INFINIBAND_ALEN, >> .name = "ib" >> diff --git a/net/tipc/link.c b/net/tipc/link.c >> index fb72031..a88950b 100644 >> --- a/net/tipc/link.c >> +++ b/net/tipc/link.c >> @@ -164,7 +164,6 @@ struct tipc_link { >> struct sk_buff *target_bskb; >> } backlog[5]; >> u16 snd_nxt; >> - u16 window; >> >> /* Reception */ >> u16 rcv_nxt; >> @@ -175,6 +174,10 @@ struct tipc_link { >> >> /* Congestion handling */ >> struct sk_buff_head wakeupq; >> + u16 window; >> + u16 min_win; >> + u16 max_win; >> + u16 cong_acks; >> >> /* Fragmentation/reassembly */ >> struct sk_buff *reasm_buf; >> @@ -308,9 +311,14 @@ u32 tipc_link_id(struct tipc_link *l) >> return l->peer_bearer_id << 16 | l->bearer_id; >> } >> >> -int tipc_link_window(struct tipc_link *l) >> +int tipc_link_min_win(struct tipc_link *l) >> +{ >> + return l->min_win; >> +} >> + >> +int tipc_link_max_win(struct tipc_link *l) >> { >> - return l->window; >> + return l->max_win; >> } >> >> int tipc_link_prio(struct tipc_link *l) >> @@ -436,7 +444,8 @@ u32 tipc_link_state(struct tipc_link *l) >> * @net_plane: network plane (A,B,c..) this link belongs to >> * @mtu: mtu to be advertised by link >> * @priority: priority to be used by link >> - * @window: send window to be used by link >> + * @min_win: minimal send window to be used by link >> + * @max_win: maximal send window to be used by link >> * @session: session to be used by link >> * @ownnode: identity of own node >> * @peer: node id of peer node >> @@ -451,7 +460,7 @@ u32 tipc_link_state(struct tipc_link *l) >> */ >> bool tipc_link_create(struct net *net, char *if_name, int bearer_id, >> int tolerance, char net_plane, u32 mtu, int priority, >> - int window, u32 session, u32 self, >> + u32 min_win, u32 max_win, u32 session, u32 self, >> u32 peer, u8 *peer_id, u16 peer_caps, >> struct tipc_link *bc_sndlink, >> struct tipc_link *bc_rcvlink, >> @@ -495,7 +504,7 @@ bool tipc_link_create(struct net *net, char *if_name, int bearer_id, >> l->advertised_mtu = mtu; >> l->mtu = mtu; >> l->priority = priority; >> - tipc_link_set_queue_limits(l, window); >> + tipc_link_set_queue_limits(l, min_win, max_win); >> l->ackers = 1; >> l->bc_sndlink = bc_sndlink; >> l->bc_rcvlink = bc_rcvlink; >> @@ -523,7 +532,7 @@ bool tipc_link_create(struct net *net, char *if_name, int bearer_id, >> * Returns true if link was created, otherwise false >> */ >> bool tipc_link_bc_create(struct net *net, u32 ownnode, u32 peer, >> - int mtu, int window, u16 peer_caps, >> + int mtu, u32 min_win, u32 max_win, u16 peer_caps, >> struct sk_buff_head *inputq, >> struct sk_buff_head *namedq, >> struct tipc_link *bc_sndlink, >> @@ -531,9 +540,9 @@ bool tipc_link_bc_create(struct net *net, u32 ownnode, u32 peer, >> { >> struct tipc_link *l; >> >> - if (!tipc_link_create(net, "", MAX_BEARERS, 0, 'Z', mtu, 0, window, >> - 0, ownnode, peer, NULL, peer_caps, bc_sndlink, >> - NULL, inputq, namedq, link)) >> + if (!tipc_link_create(net, "", MAX_BEARERS, 0, 'Z', mtu, 0, min_win, >> + max_win, 0, ownnode, peer, NULL, peer_caps, >> + bc_sndlink, NULL, inputq, namedq, link)) >> return false; >> >> l = *link; >> @@ -959,7 +968,7 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list, >> int pkt_cnt = skb_queue_len(list); >> int imp = msg_importance(hdr); >> unsigned int mss = tipc_link_mss(l); >> - unsigned int maxwin = l->window; >> + unsigned int cwin = l->window; >> unsigned int mtu = l->mtu; >> bool new_bundle; >> int rc = 0; >> @@ -988,7 +997,7 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list, >> >> /* Prepare each packet for sending, and add to relevant queue: */ >> while ((skb = __skb_dequeue(list))) { >> - if (likely(skb_queue_len(transmq) < maxwin)) { >> + if (likely(skb_queue_len(transmq) < cwin)) { >> hdr = buf_msg(skb); >> msg_set_seqno(hdr, seqno); >> msg_set_ack(hdr, ack); >> @@ -1038,6 +1047,8 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list, >> static void tipc_link_advance_backlog(struct tipc_link *l, >> struct sk_buff_head *xmitq) >> { >> + struct sk_buff_head *txq = &l->transmq; >> + u16 qlen, add, cwin = l->window; >> struct sk_buff *skb, *_skb; >> struct tipc_msg *hdr; >> u16 seqno = l->snd_nxt; >> @@ -1045,7 +1056,13 @@ static void tipc_link_advance_backlog(struct tipc_link *l, >> u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; >> u32 imp; >> >> - while (skb_queue_len(&l->transmq) < l->window) { >> + qlen = skb_queue_len(txq); >> + if (qlen >= cwin && (l->snd_nxt - buf_seqno(skb_peek(txq)) == qlen)) { >> + add = l->cong_acks++ % 32 ? 0 : 1; >> + cwin = min_t(u16, cwin + add, l->max_win); >> + l->window = cwin; >> + } >> + while (skb_queue_len(txq) < cwin) { >> skb = skb_peek(&l->backlogq); >> if (!skb) >> break; >> @@ -1140,6 +1157,7 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, >> { >> struct sk_buff *_skb, *skb = skb_peek(&l->transmq); >> u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; >> + bool retransmitted = false; >> u16 ack = l->rcv_nxt - 1; >> struct tipc_msg *hdr; >> int rc = 0; >> @@ -1173,11 +1191,13 @@ static int tipc_link_bc_retrans(struct tipc_link *l, struct tipc_link *r, >> _skb->priority = TC_PRIO_CONTROL; >> __skb_queue_tail(xmitq, _skb); >> l->stats.retransmitted++; >> - >> + retransmitted = true; >> /* Increase actual retrans counter & mark first time */ >> if (!TIPC_SKB_CB(skb)->retr_cnt++) >> TIPC_SKB_CB(skb)->retr_stamp = jiffies; >> } >> + if (retransmitted) >> + l->window = l->min_win + (l->window - l->min_win) / 2; >> return 0; >> } >> >> @@ -1417,6 +1437,7 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, >> struct sk_buff *skb, *_skb, *tmp; >> struct tipc_msg *hdr; >> u16 bc_ack = l->bc_rcvlink->rcv_nxt - 1; >> + bool retransmitted = false; >> u16 ack = l->rcv_nxt - 1; >> bool passed = false; >> u16 seqno, n = 0; >> @@ -1449,7 +1470,7 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, >> _skb->priority = TC_PRIO_CONTROL; >> __skb_queue_tail(xmitq, _skb); >> l->stats.retransmitted++; >> - >> + retransmitted = true; >> /* Increase actual retrans counter & mark first time */ >> if (!TIPC_SKB_CB(skb)->retr_cnt++) >> TIPC_SKB_CB(skb)->retr_stamp = jiffies; >> @@ -1463,7 +1484,8 @@ static int tipc_link_advance_transmq(struct tipc_link *l, u16 acked, u16 gap, >> goto next_gap_ack; >> } >> } >> - >> + if (retransmitted) >> + l->window = l->min_win + (l->window - l->min_win) / 2; >> return 0; >> } >> >> @@ -2305,15 +2327,17 @@ int tipc_link_bc_nack_rcv(struct tipc_link *l, struct sk_buff *skb, >> return 0; >> } >> >> -void tipc_link_set_queue_limits(struct tipc_link *l, u32 win) >> +void tipc_link_set_queue_limits(struct tipc_link *l, u32 min_win, u32 max_win) >> { >> int max_bulk = TIPC_MAX_PUBL / (l->mtu / ITEM_SIZE); >> >> - l->window = win; >> - l->backlog[TIPC_LOW_IMPORTANCE].limit = max_t(u16, 50, win); >> - l->backlog[TIPC_MEDIUM_IMPORTANCE].limit = max_t(u16, 100, win * 2); >> - l->backlog[TIPC_HIGH_IMPORTANCE].limit = max_t(u16, 150, win * 3); >> - l->backlog[TIPC_CRITICAL_IMPORTANCE].limit = max_t(u16, 200, win * 4); >> + l->window = min_win; >> + l->min_win = min_win; >> + l->max_win = max_win; >> + l->backlog[TIPC_LOW_IMPORTANCE].limit = min_win * 2; >> + l->backlog[TIPC_MEDIUM_IMPORTANCE].limit = min_win * 4; >> + l->backlog[TIPC_HIGH_IMPORTANCE].limit = min_win * 6; >> + l->backlog[TIPC_CRITICAL_IMPORTANCE].limit = min_win * 8; >> l->backlog[TIPC_SYSTEM_IMPORTANCE].limit = max_bulk; >> } >> >> @@ -2366,10 +2390,10 @@ int tipc_nl_parse_link_prop(struct nlattr *prop, struct nlattr *props[]) >> } >> >> if (props[TIPC_NLA_PROP_WIN]) { >> - u32 win; >> + u32 max_win; >> >> - win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); >> - if ((win < TIPC_MIN_LINK_WIN) || (win > TIPC_MAX_LINK_WIN)) >> + max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); >> + if (max_win < TIPC_MIN_LINK_WIN || max_win > TIPC_MAX_LINK_WIN) >> return -EINVAL; >> } >> >> @@ -2605,7 +2629,7 @@ int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg) >> prop = nla_nest_start_noflag(msg->skb, TIPC_NLA_LINK_PROP); >> if (!prop) >> goto attr_msg_full; >> - if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bcl->window)) >> + if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bcl->max_win)) >> goto prop_msg_full; >> if (nla_put_u32(msg->skb, TIPC_NLA_PROP_BROADCAST, bc_mode)) >> goto prop_msg_full; >> diff --git a/net/tipc/link.h b/net/tipc/link.h >> index c09e9d4..d3c1c3f 100644 >> --- a/net/tipc/link.h >> +++ b/net/tipc/link.h >> @@ -73,7 +73,7 @@ enum { >> >> bool tipc_link_create(struct net *net, char *if_name, int bearer_id, >> int tolerance, char net_plane, u32 mtu, int priority, >> - int window, u32 session, u32 ownnode, >> + u32 min_win, u32 max_win, u32 session, u32 ownnode, >> u32 peer, u8 *peer_id, u16 peer_caps, >> struct tipc_link *bc_sndlink, >> struct tipc_link *bc_rcvlink, >> @@ -81,7 +81,7 @@ bool tipc_link_create(struct net *net, char *if_name, int bearer_id, >> struct sk_buff_head *namedq, >> struct tipc_link **link); >> bool tipc_link_bc_create(struct net *net, u32 ownnode, u32 peer, >> - int mtu, int window, u16 peer_caps, >> + int mtu, u32 min_win, u32 max_win, u16 peer_caps, >> struct sk_buff_head *inputq, >> struct sk_buff_head *namedq, >> struct tipc_link *bc_sndlink, >> @@ -115,7 +115,8 @@ char *tipc_link_name_ext(struct tipc_link *l, char *buf); >> u32 tipc_link_state(struct tipc_link *l); >> char tipc_link_plane(struct tipc_link *l); >> int tipc_link_prio(struct tipc_link *l); >> -int tipc_link_window(struct tipc_link *l); >> +int tipc_link_min_win(struct tipc_link *l); >> +int tipc_link_max_win(struct tipc_link *l); >> void tipc_link_update_caps(struct tipc_link *l, u16 capabilities); >> bool tipc_link_validate_msg(struct tipc_link *l, struct tipc_msg *hdr); >> unsigned long tipc_link_tolerance(struct tipc_link *l); >> @@ -124,7 +125,7 @@ void tipc_link_set_tolerance(struct tipc_link *l, u32 tol, >> void tipc_link_set_prio(struct tipc_link *l, u32 prio, >> struct sk_buff_head *xmitq); >> void tipc_link_set_abort_limit(struct tipc_link *l, u32 limit); >> -void tipc_link_set_queue_limits(struct tipc_link *l, u32 window); >> +void tipc_link_set_queue_limits(struct tipc_link *l, u32 min_win, u32 max_win); >> int __tipc_nl_add_link(struct net *net, struct tipc_nl_msg *msg, >> struct tipc_link *link, int nlflags); >> int tipc_nl_parse_link_prop(struct nlattr *prop, struct nlattr *props[]); >> diff --git a/net/tipc/node.c b/net/tipc/node.c >> index aaf5956..970c780 100644 >> --- a/net/tipc/node.c >> +++ b/net/tipc/node.c >> @@ -1134,7 +1134,8 @@ void tipc_node_check_dest(struct net *net, u32 addr, >> snd_l = tipc_bc_sndlink(net); >> if (!tipc_link_bc_create(net, tipc_own_addr(net), >> addr, U16_MAX, >> - tipc_link_window(snd_l), >> + tipc_link_min_win(snd_l), >> + tipc_link_max_win(snd_l), >> n->capabilities, >> &n->bc_entry.inputq1, >> &n->bc_entry.namedq, snd_l, >> @@ -1228,7 +1229,7 @@ void tipc_node_check_dest(struct net *net, u32 addr, >> get_random_bytes(&session, sizeof(u16)); >> if (!tipc_link_create(net, if_name, b->identity, b->tolerance, >> b->net_plane, b->mtu, b->priority, >> - b->window, session, >> + b->min_win, b->max_win, session, >> tipc_own_addr(net), addr, peer_id, >> n->capabilities, >> tipc_bc_sndlink(n->net), n->bc_entry.link, >> @@ -2374,10 +2375,12 @@ int tipc_nl_node_set_link(struct sk_buff *skb, struct genl_info *info) >> tipc_link_set_prio(link, prio, &xmitq); >> } >> if (props[TIPC_NLA_PROP_WIN]) { >> - u32 win; >> + u32 max_win; >> >> - win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); >> - tipc_link_set_queue_limits(link, win); >> + max_win = nla_get_u32(props[TIPC_NLA_PROP_WIN]); >> + tipc_link_set_queue_limits(link, >> + tipc_link_min_win(link), >> + max_win); >> } >> } >> >> diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c >> index 86aaa4d..7d9dcbb 100644 >> --- a/net/tipc/udp_media.c >> +++ b/net/tipc/udp_media.c >> @@ -825,7 +825,8 @@ struct tipc_media udp_media_info = { >> .msg2addr = tipc_udp_msg2addr, >> .priority = TIPC_DEF_LINK_PRI, >> .tolerance = TIPC_DEF_LINK_TOL, >> - .window = TIPC_DEF_LINK_WIN, >> + .min_win = TIPC_DEF_LINK_WIN, >> + .max_win = TIPC_DEF_LINK_WIN, >> .mtu = TIPC_DEF_LINK_UDP_MTU, >> .type_id = TIPC_MEDIA_TYPE_UDP, >> .hwaddr_len = 0, >> -- >> 2.1.4 > |
From: David M. <da...@da...> - 2019-11-21 23:05:08
|
From: Tuong Lien <tuo...@de...> Date: Thu, 21 Nov 2019 15:34:58 +0700 > It is observed that TIPC service binding order will not be kept in the > publication event report to user if the service is subscribed after the > bindings. > > For example, services are bound by application in the following order: > > Server: bound port A to {18888,66,66} scope 2 > Server: bound port A to {18888,33,33} scope 2 > > Now, if a client subscribes to the service range (e.g. {18888, 0-100}), > it will get the 'TIPC_PUBLISHED' events in that binding order only when > the subscription is started before the bindings. > Otherwise, if started after the bindings, the events will arrive in the > opposite order: > > Client: received event for published {18888,33,33} > Client: received event for published {18888,66,66} > > For the latter case, it is clear that the bindings have existed in the > name table already, so when reported, the events' order will follow the > order of the rbtree binding nodes (- a node with lesser 'lower'/'upper' > range value will be first). > > This is correct as we provide the tracking on a specific service status > (available or not), not the relationship between multiple services. > However, some users expect to see the same order of arriving events > irrespective of when the subscription is issued. This turns out to be > easy to fix. We now add functionality to ensure that publication events > always are issued in the same temporal order as the corresponding > bindings were performed. > > v2: replace the unnecessary macro - 'publication_after()' with inline > function. > v3: reuse 'time_after32()' instead of reinventing the same exact code. > > Acked-by: Jon Maloy <jon...@er...> > Signed-off-by: Tuong Lien <tuo...@de...> Applied, thanks. |
From: David M. <da...@da...> - 2019-11-21 23:03:15
|
From: Hoang Le <hoa...@de...> Date: Thu, 21 Nov 2019 10:01:09 +0700 > When setting up a cluster with non-replicast/replicast capability > supported. This capability will be disabled for broadcast send link > in order to be backwards compatible. > > However, when these non-support nodes left and be removed out the cluster. > We don't update this capability on broadcast send link. Then, some of > features that based on this capability will also disabling as unexpected. > > In this commit, we make sure the broadcast send link capabilities will > be re-calculated as soon as a node removed/rejoined a cluster. > > Acked-by: Jon Maloy <jon...@er...> > Signed-off-by: Hoang Le <hoa...@de...> Applied, thank you. |
From: Tuong L. <tuo...@de...> - 2019-11-21 08:35:16
|
It is observed that TIPC service binding order will not be kept in the publication event report to user if the service is subscribed after the bindings. For example, services are bound by application in the following order: Server: bound port A to {18888,66,66} scope 2 Server: bound port A to {18888,33,33} scope 2 Now, if a client subscribes to the service range (e.g. {18888, 0-100}), it will get the 'TIPC_PUBLISHED' events in that binding order only when the subscription is started before the bindings. Otherwise, if started after the bindings, the events will arrive in the opposite order: Client: received event for published {18888,33,33} Client: received event for published {18888,66,66} For the latter case, it is clear that the bindings have existed in the name table already, so when reported, the events' order will follow the order of the rbtree binding nodes (- a node with lesser 'lower'/'upper' range value will be first). This is correct as we provide the tracking on a specific service status (available or not), not the relationship between multiple services. However, some users expect to see the same order of arriving events irrespective of when the subscription is issued. This turns out to be easy to fix. We now add functionality to ensure that publication events always are issued in the same temporal order as the corresponding bindings were performed. v2: replace the unnecessary macro - 'publication_after()' with inline function. v3: reuse 'time_after32()' instead of reinventing the same exact code. Acked-by: Jon Maloy <jon...@er...> Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/name_table.c | 51 +++++++++++++++++++++++++++++++++++++++++++-------- net/tipc/name_table.h | 4 ++++ 2 files changed, 47 insertions(+), 8 deletions(-) diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c index 66a65c2cdb23..92d04dc2a44b 100644 --- a/net/tipc/name_table.c +++ b/net/tipc/name_table.c @@ -35,6 +35,7 @@ */ #include <net/sock.h> +#include <linux/list_sort.h> #include "core.h" #include "netlink.h" #include "name_table.h" @@ -66,6 +67,7 @@ struct service_range { /** * struct tipc_service - container for all published instances of a service type * @type: 32 bit 'type' value for service + * @publ_cnt: increasing counter for publications in this service * @ranges: rb tree containing all service ranges for this service * @service_list: links to adjacent name ranges in hash chain * @subscriptions: list of subscriptions for this service type @@ -74,6 +76,7 @@ struct service_range { */ struct tipc_service { u32 type; + u32 publ_cnt; struct rb_root ranges; struct hlist_node service_list; struct list_head subscriptions; @@ -109,6 +112,7 @@ static struct publication *tipc_publ_create(u32 type, u32 lower, u32 upper, INIT_LIST_HEAD(&publ->binding_node); INIT_LIST_HEAD(&publ->local_publ); INIT_LIST_HEAD(&publ->all_publ); + INIT_LIST_HEAD(&publ->list); return publ; } @@ -244,6 +248,8 @@ static struct publication *tipc_service_insert_publ(struct net *net, p = tipc_publ_create(type, lower, upper, scope, node, port, key); if (!p) goto err; + /* Suppose there shouldn't be a huge gap btw publs i.e. >INT_MAX */ + p->id = sc->publ_cnt++; if (in_own_node(net, node)) list_add(&p->local_publ, &sr->local_publ); list_add(&p->all_publ, &sr->all_publ); @@ -278,6 +284,20 @@ static struct publication *tipc_service_remove_publ(struct service_range *sr, } /** + * Code reused: time_after32() for the same purpose + */ +#define publication_after(pa, pb) time_after32((pa)->id, (pb)->id) +static int tipc_publ_sort(void *priv, struct list_head *a, + struct list_head *b) +{ + struct publication *pa, *pb; + + pa = container_of(a, struct publication, list); + pb = container_of(b, struct publication, list); + return publication_after(pa, pb); +} + +/** * tipc_service_subscribe - attach a subscription, and optionally * issue the prescribed number of events if there is any service * range overlapping with the requested range @@ -286,36 +306,51 @@ static void tipc_service_subscribe(struct tipc_service *service, struct tipc_subscription *sub) { struct tipc_subscr *sb = &sub->evt.s; + struct publication *p, *first, *tmp; + struct list_head publ_list; struct service_range *sr; struct tipc_name_seq ns; - struct publication *p; struct rb_node *n; - bool first; + u32 filter; ns.type = tipc_sub_read(sb, seq.type); ns.lower = tipc_sub_read(sb, seq.lower); ns.upper = tipc_sub_read(sb, seq.upper); + filter = tipc_sub_read(sb, filter); tipc_sub_get(sub); list_add(&sub->service_list, &service->subscriptions); - if (tipc_sub_read(sb, filter) & TIPC_SUB_NO_STATUS) + if (filter & TIPC_SUB_NO_STATUS) return; + INIT_LIST_HEAD(&publ_list); for (n = rb_first(&service->ranges); n; n = rb_next(n)) { sr = container_of(n, struct service_range, tree_node); if (sr->lower > ns.upper) break; if (!tipc_sub_check_overlap(&ns, sr->lower, sr->upper)) continue; - first = true; + first = NULL; list_for_each_entry(p, &sr->all_publ, all_publ) { - tipc_sub_report_overlap(sub, sr->lower, sr->upper, - TIPC_PUBLISHED, p->port, - p->node, p->scope, first); - first = false; + if (filter & TIPC_SUB_PORTS) + list_add_tail(&p->list, &publ_list); + else if (!first || publication_after(first, p)) + /* Pick this range's *first* publication */ + first = p; } + if (first) + list_add_tail(&first->list, &publ_list); + } + + /* Sort the publications before reporting */ + list_sort(NULL, &publ_list, tipc_publ_sort); + list_for_each_entry_safe(p, tmp, &publ_list, list) { + tipc_sub_report_overlap(sub, p->lower, p->upper, + TIPC_PUBLISHED, p->port, p->node, + p->scope, true); + list_del_init(&p->list); } } diff --git a/net/tipc/name_table.h b/net/tipc/name_table.h index f79066334cc8..728bc7016c38 100644 --- a/net/tipc/name_table.h +++ b/net/tipc/name_table.h @@ -58,6 +58,7 @@ struct tipc_group; * @node: network address of publishing socket's node * @port: publishing port * @key: publication key, unique across the cluster + * @id: publication id * @binding_node: all publications from the same node which bound this one * - Remote publications: in node->publ_list * Used by node/name distr to withdraw publications when node is lost @@ -69,6 +70,7 @@ struct tipc_group; * Used by closest_first and multicast receive lookup algorithms * @all_publ: all publications identical to this one, whatever node and scope * Used by round-robin lookup algorithm + * @list: to form a list of publications in temporal order * @rcu: RCU callback head used for deferred freeing */ struct publication { @@ -79,10 +81,12 @@ struct publication { u32 node; u32 port; u32 key; + u32 id; struct list_head binding_node; struct list_head binding_sock; struct list_head local_publ; struct list_head all_publ; + struct list_head list; struct rcu_head rcu; }; -- 2.13.7 |
From: David M. <da...@da...> - 2019-11-21 08:07:52
|
From: "Tuong Lien Tong" <tuo...@de...> Date: Thu, 21 Nov 2019 14:01:17 +0700 > Hi David, > > The fact is we still want to keep it with that explicit meaning, so make the > code easy to understand. Yes, the 'time_after32()' or another macro can give > the same result but makes no sense in this particular scenario. Otherwise, > do you like something such as: > > #define publication_after(...) time_after32(...) Yes, that's fine. |
From: Tuong L. T. <tuo...@de...> - 2019-11-21 07:01:31
|
Hi David, The fact is we still want to keep it with that explicit meaning, so make the code easy to understand. Yes, the 'time_after32()' or another macro can give the same result but makes no sense in this particular scenario. Otherwise, do you like something such as: #define publication_after(...) time_after32(...) BR/Tuong -----Original Message----- From: David Miller <da...@da...> Sent: Thursday, November 21, 2019 1:14 PM To: tuo...@de... Cc: jon...@er...; ma...@do...; yin...@wi...; ne...@vg...; tip...@li... Subject: Re: [net-next v2] tipc: support in-order name publication events From: Tuong Lien <tuo...@de...> Date: Thu, 21 Nov 2019 09:53:25 +0700 > +static inline int publication_after(struct publication *pa, > + struct publication *pb) > +{ > + return ((int)(pb->id - pa->id) < 0); > +} Juse use time32_after() et al. instead of reinventing the same exact code please. |
From: David M. <da...@da...> - 2019-11-21 06:14:11
|
From: Tuong Lien <tuo...@de...> Date: Thu, 21 Nov 2019 09:53:25 +0700 > +static inline int publication_after(struct publication *pa, > + struct publication *pb) > +{ > + return ((int)(pb->id - pa->id) < 0); > +} Juse use time32_after() et al. instead of reinventing the same exact code please. |
From: Tuong L. <tuo...@de...> - 2019-11-21 03:47:10
|
Two new commands are added as part of 'tipc node' command: $tipc node set key KEY [algname ALGNAME] [nodeid NODEID] $tipc node flush key which enable user to set and remove AEAD keys in kernel TIPC (requires the kernel option - 'TIPC_CRYPTO'). For the 'set key' command, the given 'nodeid' parameter decides the mode to be applied to the key, particularly: - If NODEID is empty, the key is a 'cluster' key which will be used for all message encryption/decryption from/to the node (i.e. both TX & RX). The same key will be set in the other nodes. - If NODEID is own node, the key is used for message encryption (TX) from the node. Whereas, if NODEID is a peer node, the key is for message decryption (RX) from that peer node. This is the 'per-node-key' mode that each nodes in the cluster has its specific (TX) key. Acked-by: Ying Xue <yin...@wi...> Acked-by: Jon Maloy <jon...@er...> Signed-off-by: Tuong Lien <tuo...@de...> --- include/uapi/linux/tipc.h | 21 ++++++ include/uapi/linux/tipc_netlink.h | 4 ++ tipc/misc.c | 38 +++++++++++ tipc/misc.h | 1 + tipc/node.c | 133 +++++++++++++++++++++++++++++++++++++- 5 files changed, 195 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/tipc.h b/include/uapi/linux/tipc.h index 0f6f28b2..bd34151f 100644 --- a/include/uapi/linux/tipc.h +++ b/include/uapi/linux/tipc.h @@ -233,6 +233,27 @@ struct tipc_sioc_nodeid_req { char node_id[TIPC_NODEID_LEN]; }; +/* + * TIPC Crypto, AEAD + */ +#define TIPC_AEAD_ALG_NAME (32) + +struct tipc_aead_key { + char alg_name[TIPC_AEAD_ALG_NAME]; + unsigned int keylen; /* in bytes */ + char key[]; +}; + +#define TIPC_AEAD_KEYLEN_MIN (16 + 4) +#define TIPC_AEAD_KEYLEN_MAX (32 + 4) +#define TIPC_AEAD_KEY_SIZE_MAX (sizeof(struct tipc_aead_key) + \ + TIPC_AEAD_KEYLEN_MAX) + +static inline int tipc_aead_key_size(struct tipc_aead_key *key) +{ + return sizeof(*key) + key->keylen; +} + /* The macros and functions below are deprecated: */ diff --git a/include/uapi/linux/tipc_netlink.h b/include/uapi/linux/tipc_netlink.h index efb958fd..6c2194ab 100644 --- a/include/uapi/linux/tipc_netlink.h +++ b/include/uapi/linux/tipc_netlink.h @@ -63,6 +63,8 @@ enum { TIPC_NL_PEER_REMOVE, TIPC_NL_BEARER_ADD, TIPC_NL_UDP_GET_REMOTEIP, + TIPC_NL_KEY_SET, + TIPC_NL_KEY_FLUSH, __TIPC_NL_CMD_MAX, TIPC_NL_CMD_MAX = __TIPC_NL_CMD_MAX - 1 @@ -160,6 +162,8 @@ enum { TIPC_NLA_NODE_UNSPEC, TIPC_NLA_NODE_ADDR, /* u32 */ TIPC_NLA_NODE_UP, /* flag */ + TIPC_NLA_NODE_ID, /* data */ + TIPC_NLA_NODE_KEY, /* data */ __TIPC_NLA_NODE_MAX, TIPC_NLA_NODE_MAX = __TIPC_NLA_NODE_MAX - 1 diff --git a/tipc/misc.c b/tipc/misc.c index e4b1cd0c..1daf3072 100644 --- a/tipc/misc.c +++ b/tipc/misc.c @@ -98,6 +98,44 @@ int str2nodeid(char *str, uint8_t *id) return 0; } +int str2key(char *str, struct tipc_aead_key *key) +{ + int len = strlen(str); + int ishex = 0; + int i; + + /* Check if the input is a hex string (i.e. 0x...) */ + if (len > 2 && strncmp(str, "0x", 2) == 0) { + ishex = is_hex(str + 2, len - 2 - 1); + if (ishex) { + len -= 2; + str += 2; + } + } + + /* Obtain key: */ + if (!ishex) { + key->keylen = len; + memcpy(key->key, str, len); + } else { + /* Convert hex string to key */ + key->keylen = (len + 1) / 2; + for (i = 0; i < key->keylen; i++) { + if (i == 0 && len % 2 != 0) { + if (sscanf(str, "%1hhx", &key->key[0]) != 1) + return -1; + str += 1; + continue; + } + if (sscanf(str, "%2hhx", &key->key[i]) != 1) + return -1; + str += 2; + } + } + + return 0; +} + void nodeid2str(uint8_t *id, char *str) { int i; diff --git a/tipc/misc.h b/tipc/misc.h index ff2f31f1..59309f68 100644 --- a/tipc/misc.h +++ b/tipc/misc.h @@ -18,5 +18,6 @@ uint32_t str2addr(char *str); int str2nodeid(char *str, uint8_t *id); void nodeid2str(uint8_t *id, char *str); void hash2nodestr(uint32_t hash, char *str); +int str2key(char *str, struct tipc_aead_key *key); #endif diff --git a/tipc/node.c b/tipc/node.c index 2fec6753..6c796bfb 100644 --- a/tipc/node.c +++ b/tipc/node.c @@ -157,6 +157,111 @@ static int cmd_node_set_nodeid(struct nlmsghdr *nlh, const struct cmd *cmd, return msg_doit(nlh, NULL, NULL); } +static void cmd_node_set_key_help(struct cmdl *cmdl) +{ + fprintf(stderr, + "Usage: %s node set key KEY [algname ALGNAME] [nodeid NODEID]\n\n" + "PROPERTIES\n" + " KEY - Symmetric KEY & SALT as a normal or hex string\n" + " that consists of two parts:\n" + " [KEY: 16, 24 or 32 octets][SALT: 4 octets]\n\n" + " algname ALGNAME - Default: \"gcm(aes)\"\n\n" + " nodeid NODEID - Own or peer node identity to which the key will\n" + " be attached. If not present, the key is a cluster\n" + " key!\n\n" + "EXAMPLES\n" + " %s node set key this_is_a_key16_salt algname \"gcm(aes)\" nodeid node1\n" + " %s node set key 0x746869735F69735F615F6B657931365F73616C74 nodeid node2\n\n", + cmdl->argv[0], cmdl->argv[0], cmdl->argv[0]); +} + +static int cmd_node_set_key(struct nlmsghdr *nlh, const struct cmd *cmd, + struct cmdl *cmdl, void *data) +{ + struct { + struct tipc_aead_key key; + char mem[TIPC_AEAD_KEYLEN_MAX + 1]; + } input = {}; + struct opt opts[] = { + { "algname", OPT_KEYVAL, NULL }, + { "nodeid", OPT_KEYVAL, NULL }, + { NULL } + }; + struct nlattr *nest; + struct opt *opt_algname, *opt_nodeid; + char buf[MNL_SOCKET_BUFFER_SIZE]; + uint8_t id[TIPC_NODEID_LEN] = {0,}; + int keysize; + char *str; + + if (help_flag) { + (cmd->help)(cmdl); + return -EINVAL; + } + + if (cmdl->optind >= cmdl->argc) { + fprintf(stderr, "error, missing key\n"); + return -EINVAL; + } + + /* Get user key */ + str = shift_cmdl(cmdl); + if (str2key(str, &input.key)) { + fprintf(stderr, "error, invalid key input\n"); + return -EINVAL; + } + + if (parse_opts(opts, cmdl) < 0) + return -EINVAL; + + /* Get algorithm name, default: "gcm(aes)" */ + opt_algname = get_opt(opts, "algname"); + if (!opt_algname) + strcpy(input.key.alg_name, "gcm(aes)"); + else + strcpy(input.key.alg_name, opt_algname->val); + + /* Get node identity */ + opt_nodeid = get_opt(opts, "nodeid"); + if (opt_nodeid && str2nodeid(opt_nodeid->val, id)) { + fprintf(stderr, "error, invalid node identity\n"); + return -EINVAL; + } + + /* Init & do the command */ + nlh = msg_init(buf, TIPC_NL_KEY_SET); + if (!nlh) { + fprintf(stderr, "error, message initialisation failed\n"); + return -1; + } + nest = mnl_attr_nest_start(nlh, TIPC_NLA_NODE); + keysize = tipc_aead_key_size(&input.key); + mnl_attr_put(nlh, TIPC_NLA_NODE_KEY, keysize, &input.key); + if (opt_nodeid) + mnl_attr_put(nlh, TIPC_NLA_NODE_ID, TIPC_NODEID_LEN, id); + mnl_attr_nest_end(nlh, nest); + return msg_doit(nlh, NULL, NULL); +} + +static int cmd_node_flush_key(struct nlmsghdr *nlh, const struct cmd *cmd, + struct cmdl *cmdl, void *data) +{ + char buf[MNL_SOCKET_BUFFER_SIZE]; + + if (help_flag) { + (cmd->help)(cmdl); + return -EINVAL; + } + + /* Init & do the command */ + nlh = msg_init(buf, TIPC_NL_KEY_FLUSH); + if (!nlh) { + fprintf(stderr, "error, message initialisation failed\n"); + return -1; + } + return msg_doit(nlh, NULL, NULL); +} + static int nodeid_get_cb(const struct nlmsghdr *nlh, void *data) { struct nlattr *info[TIPC_NLA_MAX + 1] = {}; @@ -270,13 +375,34 @@ static int cmd_node_set_netid(struct nlmsghdr *nlh, const struct cmd *cmd, return msg_doit(nlh, NULL, NULL); } +static void cmd_node_flush_help(struct cmdl *cmdl) +{ + fprintf(stderr, + "Usage: %s node flush PROPERTY\n\n" + "PROPERTIES\n" + " key - Flush all symmetric-keys\n", + cmdl->argv[0]); +} + +static int cmd_node_flush(struct nlmsghdr *nlh, const struct cmd *cmd, + struct cmdl *cmdl, void *data) +{ + const struct cmd cmds[] = { + { "key", cmd_node_flush_key, NULL }, + { NULL } + }; + + return run_cmd(nlh, cmd, cmds, cmdl, NULL); +} + static void cmd_node_set_help(struct cmdl *cmdl) { fprintf(stderr, "Usage: %s node set PROPERTY\n\n" "PROPERTIES\n" " identity NODEID - Set node identity\n" - " clusterid CLUSTERID - Set local cluster id\n", + " clusterid CLUSTERID - Set local cluster id\n" + " key PROPERTY - Set symmetric-key\n", cmdl->argv[0]); } @@ -288,6 +414,7 @@ static int cmd_node_set(struct nlmsghdr *nlh, const struct cmd *cmd, { "identity", cmd_node_set_nodeid, NULL }, { "netid", cmd_node_set_netid, NULL }, { "clusterid", cmd_node_set_netid, NULL }, + { "key", cmd_node_set_key, cmd_node_set_key_help }, { NULL } }; @@ -325,7 +452,8 @@ void cmd_node_help(struct cmdl *cmdl) "COMMANDS\n" " list - List remote nodes\n" " get - Get local node parameters\n" - " set - Set local node parameters\n", + " set - Set local node parameters\n" + " flush - Flush local node parameters\n", cmdl->argv[0]); } @@ -336,6 +464,7 @@ int cmd_node(struct nlmsghdr *nlh, const struct cmd *cmd, struct cmdl *cmdl, { "list", cmd_node_list, NULL }, { "get", cmd_node_get, cmd_node_get_help }, { "set", cmd_node_set, cmd_node_set_help }, + { "flush", cmd_node_flush, cmd_node_flush_help}, { NULL } }; -- 2.13.7 |