From: Tung N. <tun...@de...> - 2019-02-19 07:03:31
|
Fix issue of hung sendto() in user space. Tung Nguyen (1): tipc: fix race condition causing hung sendto net/tipc/socket.c | 4 ++++ 1 file changed, 4 insertions(+) mode change 100644 => 100755 net/tipc/socket.c -- 2.17.1 |
From: Tung N. <tun...@de...> - 2019-02-19 07:03:32
|
When sending multicast messages via blocking socket, if sending link is congested (tsk->cong_link_cnt is set to 1), the sending thread will be put into sleeping state. However, tipc_sk_filter_rcv() is called under socket spin lock but tipc_wait_for_cond() is not. So, there is no guarantee that the setting of tsk->cong_link_cnt to 0 in tipc_sk_proto_rcv() in CPU-1 will be perceived by CPU-0. If that is the case, the sending thread in CPU-0 after being waken up, will continue to see tsk->cong_link_cnt as 1 and put the sending thread into sleeping state again. The sending thread will sleep forever. CPU-0 | CPU-1 tipc_wait_for_cond() | { | // condition_ = !tsk->cong_link_cnt | while ((rc_ = !(condition_))) { | ... | release_sock(sk_); | wait_woken(); | | if (!sock_owned_by_user(sk)) | tipc_sk_filter_rcv() | { | ... | tipc_sk_proto_rcv() | { | ... | tsk->cong_link_cnt--; | ... | sk->sk_write_space(sk); | ... | } | ... | } sched_annotate_sleep(); | lock_sock(sk_); | remove_wait_queue(); | } | } | This commit fixes it by adding memory barrier to tipc_sk_proto_rcv() and tipc_wait_for_cond(). Acked-by: Jon Maloy <jon...@er...> Signed-off-by: Tung Nguyen <tun...@de...> --- net/tipc/socket.c | 4 ++++ 1 file changed, 4 insertions(+) mode change 100644 => 100755 net/tipc/socket.c diff --git a/net/tipc/socket.c b/net/tipc/socket.c old mode 100644 new mode 100755 index 1217c90a363b..d8f054d45941 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -383,6 +383,8 @@ static int tipc_sk_sock_err(struct socket *sock, long *timeout) int rc_; \ \ while ((rc_ = !(condition_))) { \ + /* coupled with smp_wmb() in tipc_sk_proto_rcv() */ \ + smp_rmb(); \ DEFINE_WAIT_FUNC(wait_, woken_wake_function); \ sk_ = (sock_)->sk; \ rc_ = tipc_sk_sock_err((sock_), timeo_); \ @@ -1982,6 +1984,8 @@ static void tipc_sk_proto_rcv(struct sock *sk, return; case SOCK_WAKEUP: tipc_dest_del(&tsk->cong_links, msg_orignode(hdr), 0); + /* coupled with smp_rmb() in tipc_wait_for_cond() */ + smp_wmb(); tsk->cong_link_cnt--; wakeup = true; break; -- 2.17.1 |
From: David M. <da...@da...> - 2019-02-19 21:21:42
|
From: Tung Nguyen <tun...@de...> Date: Tue, 19 Feb 2019 14:03:10 +0700 > When sending multicast messages via blocking socket, > if sending link is congested (tsk->cong_link_cnt is set to 1), > the sending thread will be put into sleeping state. However, > tipc_sk_filter_rcv() is called under socket spin lock but > tipc_wait_for_cond() is not. So, there is no guarantee that > the setting of tsk->cong_link_cnt to 0 in tipc_sk_proto_rcv() in > CPU-1 will be perceived by CPU-0. If that is the case, the sending > thread in CPU-0 after being waken up, will continue to see > tsk->cong_link_cnt as 1 and put the sending thread into sleeping > state again. The sending thread will sleep forever. ... > This commit fixes it by adding memory barrier to tipc_sk_proto_rcv() > and tipc_wait_for_cond(). > > Acked-by: Jon Maloy <jon...@er...> > Signed-off-by: Tung Nguyen <tun...@de...> You really need to build test this stuff properly: net/tipc/socket.c: In function ‘__tipc_shutdown’: ./include/linux/wait.h:1119:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] struct wait_queue_entry name = { \ ^~~~~~ You can't put the smp_rmb(); before the DEFINE_WAIT_FUNC() in that basic block. |
From: Tung N. <tun...@de...> - 2019-02-21 03:31:38
|
When sending multicast messages via blocking socket, if sending link is congested (tsk->cong_link_cnt is set to 1), the sending thread will be put into sleeping state. However, tipc_sk_filter_rcv() is called under socket spin lock but tipc_wait_for_cond() is not. So, there is no guarantee that the setting of tsk->cong_link_cnt to 0 in tipc_sk_proto_rcv() in CPU-1 will be perceived by CPU-0. If that is the case, the sending thread in CPU-0 after being waken up, will continue to see tsk->cong_link_cnt as 1 and put the sending thread into sleeping state again. The sending thread will sleep forever. CPU-0 | CPU-1 tipc_wait_for_cond() | { | // condition_ = !tsk->cong_link_cnt | while ((rc_ = !(condition_))) { | ... | release_sock(sk_); | wait_woken(); | | if (!sock_owned_by_user(sk)) | tipc_sk_filter_rcv() | { | ... | tipc_sk_proto_rcv() | { | ... | tsk->cong_link_cnt--; | ... | sk->sk_write_space(sk); | ... | } | ... | } sched_annotate_sleep(); | lock_sock(sk_); | remove_wait_queue(); | } | } | This commit fixes it by adding memory barrier to tipc_sk_proto_rcv() and tipc_wait_for_cond(). Acked-by: Jon Maloy <jon...@er...> Signed-off-by: Tung Nguyen <tun...@de...> --- net/tipc/socket.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 1217c90a363b..7cfa0c8ccc2a 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -379,16 +379,18 @@ static int tipc_sk_sock_err(struct socket *sock, long *timeout) #define tipc_wait_for_cond(sock_, timeo_, condition_) \ ({ \ + DEFINE_WAIT_FUNC(wait_, woken_wake_function); \ struct sock *sk_; \ int rc_; \ \ while ((rc_ = !(condition_))) { \ - DEFINE_WAIT_FUNC(wait_, woken_wake_function); \ + /* coupled with smp_wmb() in tipc_sk_proto_rcv() */ \ + smp_rmb(); \ sk_ = (sock_)->sk; \ rc_ = tipc_sk_sock_err((sock_), timeo_); \ if (rc_) \ break; \ - prepare_to_wait(sk_sleep(sk_), &wait_, TASK_INTERRUPTIBLE); \ + add_wait_queue(sk_sleep(sk_), &wait_); \ release_sock(sk_); \ *(timeo_) = wait_woken(&wait_, TASK_INTERRUPTIBLE, *(timeo_)); \ sched_annotate_sleep(); \ @@ -1982,6 +1984,8 @@ static void tipc_sk_proto_rcv(struct sock *sk, return; case SOCK_WAKEUP: tipc_dest_del(&tsk->cong_links, msg_orignode(hdr), 0); + /* coupled with smp_rmb() in tipc_wait_for_cond() */ + smp_wmb(); tsk->cong_link_cnt--; wakeup = true; break; -- 2.17.1 |
From: David M. <da...@da...> - 2019-02-21 03:51:12
|
So if the previous version didn't even compile, I have to ask. How did you test this? |
From: tung q. n. <tun...@de...> - 2019-02-21 04:05:01
|
Hi David, I compiled previous version and tested it. But I forgot to observe kernel warning. For the way to reproduce the issue: calling sendto() and printf() right after that to print out a log to see if sendto() was successful. On some occasions, the log was never printed and stack trace showed sendto() was not returned. By applying the patch, the issue disappeared. Thanks. Best regards, Tung Nguyen -----Original Message----- From: David Miller <da...@da...> Sent: Thursday, February 21, 2019 10:51 AM To: tun...@de... Cc: ne...@vg...; tip...@li... Subject: Re: [tipc-discussion][net v2 1/1] tipc: fix race condition causing hung sendto So if the previous version didn't even compile, I have to ask. How did you test this? |
From: David M. <da...@da...> - 2019-02-22 23:25:07
|
From: Tung Nguyen <tun...@de...> Date: Thu, 21 Feb 2019 10:31:21 +0700 > When sending multicast messages via blocking socket, This patch does not apply cleanly to net, please respin: [davem@localhost net]$ git am --signoff tipc-discussion-net-v2-1-1-tipc-fix-race-condition-causing-hung-sendto.patch Applying: tipc: fix race condition causing hung sendto error: patch failed: net/tipc/socket.c:379 error: net/tipc/socket.c: patch does not apply Patch failed at 0001 tipc: fix race condition causing hung sendto hint: Use 'git am --show-current-patch' to see the failed patch When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". |
From: Tung N. <tun...@de...> - 2019-02-25 03:57:38
|
When sending multicast messages via blocking socket, if sending link is congested (tsk->cong_link_cnt is set to 1), the sending thread will be put into sleeping state. However, tipc_sk_filter_rcv() is called under socket spin lock but tipc_wait_for_cond() is not. So, there is no guarantee that the setting of tsk->cong_link_cnt to 0 in tipc_sk_proto_rcv() in CPU-1 will be perceived by CPU-0. If that is the case, the sending thread in CPU-0 after being waken up, will continue to see tsk->cong_link_cnt as 1 and put the sending thread into sleeping state again. The sending thread will sleep forever. CPU-0 | CPU-1 tipc_wait_for_cond() | { | // condition_ = !tsk->cong_link_cnt | while ((rc_ = !(condition_))) { | ... | release_sock(sk_); | wait_woken(); | | if (!sock_owned_by_user(sk)) | tipc_sk_filter_rcv() | { | ... | tipc_sk_proto_rcv() | { | ... | tsk->cong_link_cnt--; | ... | sk->sk_write_space(sk); | ... | } | ... | } sched_annotate_sleep(); | lock_sock(sk_); | remove_wait_queue(); | } | } | This commit fixes it by adding memory barrier to tipc_sk_proto_rcv() and tipc_wait_for_cond(). Acked-by: Jon Maloy <jon...@er...> Signed-off-by: Tung Nguyen <tun...@de...> --- net/tipc/socket.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 684f2125fc6b..70343ac448b1 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -379,11 +379,13 @@ static int tipc_sk_sock_err(struct socket *sock, long *timeout) #define tipc_wait_for_cond(sock_, timeo_, condition_) \ ({ \ + DEFINE_WAIT_FUNC(wait_, woken_wake_function); \ struct sock *sk_; \ int rc_; \ \ while ((rc_ = !(condition_))) { \ - DEFINE_WAIT_FUNC(wait_, woken_wake_function); \ + /* coupled with smp_wmb() in tipc_sk_proto_rcv() */ \ + smp_rmb(); \ sk_ = (sock_)->sk; \ rc_ = tipc_sk_sock_err((sock_), timeo_); \ if (rc_) \ @@ -1983,6 +1985,8 @@ static void tipc_sk_proto_rcv(struct sock *sk, return; case SOCK_WAKEUP: tipc_dest_del(&tsk->cong_links, msg_orignode(hdr), 0); + /* coupled with smp_rmb() in tipc_wait_for_cond() */ + smp_wmb(); tsk->cong_link_cnt--; wakeup = true; break; -- 2.17.1 |
From: David M. <da...@da...> - 2019-02-26 22:52:29
|
From: Tung Nguyen <tun...@de...> Date: Mon, 25 Feb 2019 10:57:20 +0700 > When sending multicast messages via blocking socket, > if sending link is congested (tsk->cong_link_cnt is set to 1), > the sending thread will be put into sleeping state. However, > tipc_sk_filter_rcv() is called under socket spin lock but > tipc_wait_for_cond() is not. So, there is no guarantee that > the setting of tsk->cong_link_cnt to 0 in tipc_sk_proto_rcv() in > CPU-1 will be perceived by CPU-0. If that is the case, the sending > thread in CPU-0 after being waken up, will continue to see > tsk->cong_link_cnt as 1 and put the sending thread into sleeping > state again. The sending thread will sleep forever. ... > This commit fixes it by adding memory barrier to tipc_sk_proto_rcv() > and tipc_wait_for_cond(). > > Acked-by: Jon Maloy <jon...@er...> > Signed-off-by: Tung Nguyen <tun...@de...> Applied and queued up for -stable. |