From: Tuong L. <tuo...@de...> - 2019-12-24 08:06:12
|
The current 'tipc_wait_for_connect()' function makes a loop and waits for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the connecting has done. However, when the condition is met, it always returns '0' even in the case the connecting was actually failed (e.g. refused because the server socket has closed...) and the socket state was set to 'TIPC_DISCONNECTING'. This results in a wrong return code for the 'connect()' call from user, making it believe that the connection is established and goes ahead with more actions e.g. building & sending a message but then finally gets an unexpected result (e.g. '-EPIPE'). This commit fixes the issue by returning the corresponding error code if any when the wait process is waken up. Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/socket.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 8b1daf3634b0..2e5faf89ef80 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2428,7 +2428,7 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) { DEFINE_WAIT_FUNC(wait, woken_wake_function); struct sock *sk = sock->sk; - int done; + int done = 0; do { int err = sock_error(sk); @@ -2438,12 +2438,14 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) return -ETIMEDOUT; if (signal_pending(current)) return sock_intr_errno(*timeo_p); + if (done) + return 0; add_wait_queue(sk_sleep(sk), &wait); done = sk_wait_event(sk, timeo_p, sk->sk_state != TIPC_CONNECTING, &wait); remove_wait_queue(sk_sleep(sk), &wait); - } while (!done); + } while (1); return 0; } -- 2.13.7 |
From: Tuong L. <tuo...@de...> - 2020-01-08 02:19:19
|
The current 'tipc_wait_for_connect()' function does a wait-loop for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the socket connecting has done. However, when the condition is met, it returns '0' even in the case the connecting is actually failed, the socket state is set to 'TIPC_DISCONNECTING' (e.g. when the server socket has closed..). This results in a wrong return code for the 'connect()' call from user, making it believe that the connection is established and go ahead with building, sending a message, etc. but finally failed e.g. '-EPIPE'. This commit fixes the issue by changing the wait condition to the 'tipc_sk_connected(sk)', so the function will return '0' only when the connection is really established. Otherwise, either the socket 'sk_err' if any or '-ETIMEDOUT'/'-EINTR' will be returned correspondingly. Acked-by: Ying Xue <yin...@wi...> Acked-by: Jon Maloy <jon...@er...> Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/socket.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 6ebd809ef207..f9b4fb92c0b1 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2443,8 +2443,8 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) return sock_intr_errno(*timeo_p); add_wait_queue(sk_sleep(sk), &wait); - done = sk_wait_event(sk, timeo_p, - sk->sk_state != TIPC_CONNECTING, &wait); + done = sk_wait_event(sk, timeo_p, tipc_sk_connected(sk), + &wait); remove_wait_queue(sk_sleep(sk), &wait); } while (!done); return 0; -- 2.13.7 |
From: Tuong L. <tuo...@de...> - 2019-12-30 08:56:11
|
The current 'tipc_wait_for_connect()' function makes a loop and waits for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the connecting has done. However, when the condition is met, it always returns '0' even in the case the connecting was actually failed (e.g. refused because the server socket has closed...) and the socket state was set to 'TIPC_DISCONNECTING'. This results in a wrong return code for the 'connect()' call from user, making it believe that the connection is established and goes ahead with more actions e.g. building & sending a message but then finally gets an unexpected result (e.g. '-EPIPE'). This commit fixes the issue by instead setting the wait condition to 'tipc_sk_connected(sk)', so that the function will return '0' only when the connection is really established. Otherwise, either the socket error code if any or '-ETIMEDOUT'/'-EINTR' will be returned correspondingly. --------- v2: changed after discussing with Ying --------- Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/socket.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 6552f986774c..2f5679f84060 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2432,8 +2432,8 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) return sock_intr_errno(*timeo_p); add_wait_queue(sk_sleep(sk), &wait); - done = sk_wait_event(sk, timeo_p, - sk->sk_state != TIPC_CONNECTING, &wait); + done = sk_wait_event(sk, timeo_p, tipc_sk_connected(sk), + &wait); remove_wait_queue(sk_sleep(sk), &wait); } while (!done); return 0; -- 2.13.7 |
From: Xue, Y. <Yin...@wi...> - 2020-01-02 09:10:39
|
Great! Acked-by: Ying Xue <yin...@wi...> -----Original Message----- From: Tuong Lien [mailto:tuo...@de...] Sent: Monday, December 30, 2019 4:56 PM To: tip...@li...; jm...@re...; ma...@do...; Xue, Ying Subject: [net v2] tipc: fix wrong connect() return code The current 'tipc_wait_for_connect()' function makes a loop and waits for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the connecting has done. However, when the condition is met, it always returns '0' even in the case the connecting was actually failed (e.g. refused because the server socket has closed...) and the socket state was set to 'TIPC_DISCONNECTING'. This results in a wrong return code for the 'connect()' call from user, making it believe that the connection is established and goes ahead with more actions e.g. building & sending a message but then finally gets an unexpected result (e.g. '-EPIPE'). This commit fixes the issue by instead setting the wait condition to 'tipc_sk_connected(sk)', so that the function will return '0' only when the connection is really established. Otherwise, either the socket error code if any or '-ETIMEDOUT'/'-EINTR' will be returned correspondingly. --------- v2: changed after discussing with Ying --------- Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/socket.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 6552f986774c..2f5679f84060 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2432,8 +2432,8 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) return sock_intr_errno(*timeo_p); add_wait_queue(sk_sleep(sk), &wait); - done = sk_wait_event(sk, timeo_p, - sk->sk_state != TIPC_CONNECTING, &wait); + done = sk_wait_event(sk, timeo_p, tipc_sk_connected(sk), + &wait); remove_wait_queue(sk_sleep(sk), &wait); } while (!done); return 0; -- 2.13.7 |
From: Jon M. <jm...@re...> - 2020-01-21 12:11:32
|
Hi Ying, Do you have any ideas regarding this? My best proposal is to att a time stamp to the new server socket at creation, and then delay a 'close' if the socket is < 5ms old. But I am not quite happy with this solution, so any better idea would be welcome. ///jon > > > ----- Forwarded Message ----- From: Tuong Lien Tong > <tuo...@de...>To: 'Ying Xue' <yin...@wi...>; > "jm...@re..." <jm...@re...>; "ma...@do..." > <ma...@do...>Cc: "tip...@de..." > <tip...@de...>Sent: Monday, January 13, 2020, 04:57:58 AM > GMT-5Subject: FW: [net v2] tipc: fix wrong connect() return code > > Hi Ying, Jon, > > > > The last change has exposed a different issue with our “lightweight” > connection setup mechanism, > > Please see the scenario below: > > > > > > server process client process > > ----------------------- > ------------------------------ > > create()| | > > listen()*tskL | > > bind()| |create() > > +- accept()* tskC* > > | | |connect() ~~~~~~~~~~~~~+ > > wait_for_rcv()| |<---------[SYN]-----------| | > > | | |//tskC: CONNECTING > | | | *wait_for_conn() > > +--------->*tskS | > |->add to wait queue > > |//tskS: ESTABLISHED | | > > |----------[ACK]---------->|//tskC: ESTABLISHED > | ->wakeup(): SLEEPING -> run queue > > send()|----------[DATA]--------->|\ > ->wakeup(): already woken > > send()|----------[DATA]--------->| | > ->wakeup(): already woken > > . . . . |-> receive queue . > > . . . . | . > > send()|----------[DATA]--------->|/ > ->wakeup(): already woken > > close()|----------[FIN]---------->|//tskC: DISCONNECTING | > > | | | > > | ~~~~~~~~~~~~~~~~~~~~~~~~~~>schedule(): > | run queue -> RUNNING > > |->add > |back > |to > |wait > |queue > |(state > |!= > |ESTABLISHED) > > . > > . > > | > > *timed > out > > > > > > When the server side receives ‘SYN’ from client, the ‘accept()’ call ends up > with the new socket in the ‘ESTABLISHED’ state. Shortly thereafter, the > server starts sending a number of messages before calling the socket > ‘close()’ without waiting for any response from the client! > > In this scenario, it may happen on the other side that the client process > waiting for the connection to be ‘ESTABLISHED’ cannot be awakened in time. > Although the ‘wakeup()’ call is still done when the ‘ACK’ from the server > arrives, but it just moves the process into the run queue (i.e. runnable) to > be scheduled to run later. The kernel (i.e. RX process) is not preemptable > when it processes the messages from the server, including the ‘ACK ‘, > ‘DATA’, … and finally the ‘FIN’ which will terminate the connection. When > the wait process is really run back, the client socket state is now > ‘DISCONNECTING’ (i.e. != ‘ESTABLISHED’), so it continues to wait until the > timer expires. The client will finally get ‘-ETIMEDOUT’ and forced to > release the socket whereas there remains the messages from the server in the > socket’s receive queue. > > > > Notes: > > - The issue is found by the tipcutils/ptts tool. > - This is a timing issue, sometimes it happens sometimes not, depending on > the small enough number of the data messages (e.g. < 100) from the server > before it sends the ‘FIN’, as well as the network traffic, RX process, > scheduler, etc. on the client side. > - Without the previous change, the issue is _not detected_ since the > ‘connect()’ call will return 0 when the wait process is woken & running > back but the socket state is ‘DISCONNECTING’, there is no socket error > (sk->sk_err == 0) in this case… However, this is the original issue and > unexpected because if the client tries to send something after that, it > will get ‘-EPIPE’ immediately. > > > > We need to find a solution to this problem, > > - It seems to me that in this mechanism the server is “too hasty” to set > the new socket to ‘ESTABLISHED’ at the ‘accept()’ when it has just > received a ‘SYN’ from client. That allows it to go ahead with sending the > messages & closing the socket in one step… Should we consider to include > something like the TCP ‘3-way handshake’ here, so the server needs to get > a ‘ACK’ from the client before the connection is fully established? I > guess ‘no’ since it should be kept lightweight…? > - An alternative is we can introduce a bit delay e.g. 2 or 5ms to the > ‘accept()’ or ‘close()’ before completing it, which will give a chance > for its client to be really established first? This solution looks simple > and should have less impact to user space… > - Or …? > > > > What is your opinion, please share? > > Many thanks! > > > > BR/Tuong > > > > -----Original Message----- > From: Xue, Ying <Yin...@wi...> > Sent: Thursday, January 2, 2020 4:10 PM > To: Tuong Lien <tuo...@de...>; > tip...@li...; jm...@re...; ma...@do... > Subject: RE: [net v2] tipc: fix wrong connect() return code > > > > Great! > > > > Acked-by: Ying Xue <yin...@wi...> > > > > -----Original Message----- > > From: Tuong Lien [mailto:tuo...@de...] > > Sent: Monday, December 30, 2019 4:56 PM > > To: tip...@li...; jm...@re...; > ma...@do...; Xue, Ying > > Subject: [net v2] tipc: fix wrong connect() return code > > > > The current 'tipc_wait_for_connect()' function makes a loop and waits > > for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the > > connecting has done. However, when the condition is met, it always > > returns '0' even in the case the connecting was actually failed (e.g. > > refused because the server socket has closed...) and the socket state > > was set to 'TIPC_DISCONNECTING'. > > This results in a wrong return code for the 'connect()' call from user, > > making it believe that the connection is established and goes ahead > > with more actions e.g. building & sending a message but then finally > > gets an unexpected result (e.g. '-EPIPE'). > > > > This commit fixes the issue by instead setting the wait condition to > > 'tipc_sk_connected(sk)', so that the function will return '0' only when > > the connection is really established. Otherwise, either the socket > > error code if any or '-ETIMEDOUT'/'-EINTR' will be returned > > correspondingly. > > > > --------- > > v2: changed after discussing with Ying > > --------- > > > > Signed-off-by: Tuong Lien <tuo...@de...> > > --- > > net/tipc/socket.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > > index 6552f986774c..2f5679f84060 100644 > > --- a/net/tipc/socket.c > > +++ b/net/tipc/socket.c > > @@ -2432,8 +2432,8 @@ static int tipc_wait_for_connect(struct socket *sock, > long *timeo_p) > > return > sock_intr_errno(*timeo_p); > > > > add_wait_queue(sk_sleep(sk), &wait); > > - done = sk_wait_event(sk, timeo_p, > > - > sk->sk_state > != TIPC_CONNECTING, &wait); > > + done = sk_wait_event(sk, timeo_p, > tipc_sk_connected(sk), > > + &wait); > > remove_wait_queue(sk_sleep(sk), &wait); > > } while (!done); > > return 0; > > -- > > 2.13.7 > > > |
From: Tuong L. T. <tuo...@de...> - 2020-01-23 07:54:29
|
An alternative is to allow the 'connect()' to return '0' in this particular case. The socket state is already 'DISCONNECTING', so any further write will get '-EPIPE' but the socket is still able to read the messages existing in its receive queue. This may seem a bit contrary to the previous patch but they are not mutually exclusive, the previous one would only deal with the case the socket state is 'DISCONNECTING' but the 'sk->err' != 0. How does this look correct from a functional standpoint? BR/Tuong -----Original Message----- From: Jon Maloy <jm...@re...> Sent: Tuesday, January 21, 2020 7:11 PM To: yin...@wi... Cc: tip...@li... Subject: [tipc-discussion] Fwd: [net v2] tipc: fix wrong connect() return code Hi Ying, Do you have any ideas regarding this? My best proposal is to att a time stamp to the new server socket at creation, and then delay a 'close' if the socket is < 5ms old. But I am not quite happy with this solution, so any better idea would be welcome. ///jon > > > ----- Forwarded Message ----- From: Tuong Lien Tong > <tuo...@de...>To: 'Ying Xue' <yin...@wi...>; > "jm...@re..." <jm...@re...>; "ma...@do..." > <ma...@do...>Cc: "tip...@de..." > <tip...@de...>Sent: Monday, January 13, 2020, 04:57:58 AM > GMT-5Subject: FW: [net v2] tipc: fix wrong connect() return code > > Hi Ying, Jon, > > > > The last change has exposed a different issue with our “lightweight” > connection setup mechanism, > > Please see the scenario below: > > > > > > server process client process > > ----------------------- > ------------------------------ > > create()| | > > listen()*tskL | > > bind()| |create() > > +- accept()* tskC* > > | | |connect() ~~~~~~~~~~~~~+ > > wait_for_rcv()| |<---------[SYN]-----------| | > > | | |//tskC: CONNECTING > | | | *wait_for_conn() > > +--------->*tskS | > |->add to wait queue > > |//tskS: ESTABLISHED | | > > |----------[ACK]---------->|//tskC: ESTABLISHED > | ->wakeup(): SLEEPING -> run queue > > send()|----------[DATA]--------->|\ > ->wakeup(): already woken > > send()|----------[DATA]--------->| | > ->wakeup(): already woken > > . . . . |-> receive queue . > > . . . . | . > > send()|----------[DATA]--------->|/ > ->wakeup(): already woken > > close()|----------[FIN]---------->|//tskC: DISCONNECTING | > > | | | > > | ~~~~~~~~~~~~~~~~~~~~~~~~~~>schedule(): > | run queue -> RUNNING > > |->add > |back > |to > |wait > |queue > |(state > |!= > |ESTABLISHED) > > . > > . > > | > > *timed > out > > > > > > When the server side receives ‘SYN’ from client, the ‘accept()’ call ends up > with the new socket in the ‘ESTABLISHED’ state. Shortly thereafter, the > server starts sending a number of messages before calling the socket > ‘close()’ without waiting for any response from the client! > > In this scenario, it may happen on the other side that the client process > waiting for the connection to be ‘ESTABLISHED’ cannot be awakened in time. > Although the ‘wakeup()’ call is still done when the ‘ACK’ from the server > arrives, but it just moves the process into the run queue (i.e. runnable) to > be scheduled to run later. The kernel (i.e. RX process) is not preemptable > when it processes the messages from the server, including the ‘ACK ‘, > ‘DATA’, … and finally the ‘FIN’ which will terminate the connection. When > the wait process is really run back, the client socket state is now > ‘DISCONNECTING’ (i.e. != ‘ESTABLISHED’), so it continues to wait until the > timer expires. The client will finally get ‘-ETIMEDOUT’ and forced to > release the socket whereas there remains the messages from the server in the > socket’s receive queue. > > > > Notes: > > - The issue is found by the tipcutils/ptts tool. > - This is a timing issue, sometimes it happens sometimes not, depending on > the small enough number of the data messages (e.g. < 100) from the server > before it sends the ‘FIN’, as well as the network traffic, RX process, > scheduler, etc. on the client side. > - Without the previous change, the issue is _not detected_ since the > ‘connect()’ call will return 0 when the wait process is woken & running > back but the socket state is ‘DISCONNECTING’, there is no socket error > (sk->sk_err == 0) in this case… However, this is the original issue and > unexpected because if the client tries to send something after that, it > will get ‘-EPIPE’ immediately. > > > > We need to find a solution to this problem, > > - It seems to me that in this mechanism the server is “too hasty” to set > the new socket to ‘ESTABLISHED’ at the ‘accept()’ when it has just > received a ‘SYN’ from client. That allows it to go ahead with sending the > messages & closing the socket in one step… Should we consider to include > something like the TCP ‘3-way handshake’ here, so the server needs to get > a ‘ACK’ from the client before the connection is fully established? I > guess ‘no’ since it should be kept lightweight…? > - An alternative is we can introduce a bit delay e.g. 2 or 5ms to the > ‘accept()’ or ‘close()’ before completing it, which will give a chance > for its client to be really established first? This solution looks simple > and should have less impact to user space… > - Or …? > > > > What is your opinion, please share? > > Many thanks! > > > > BR/Tuong > > > > -----Original Message----- > From: Xue, Ying <Yin...@wi...> > Sent: Thursday, January 2, 2020 4:10 PM > To: Tuong Lien <tuo...@de...>; > tip...@li...; jm...@re...; ma...@do... > Subject: RE: [net v2] tipc: fix wrong connect() return code > > > > Great! > > > > Acked-by: Ying Xue <yin...@wi...> > > > > -----Original Message----- > > From: Tuong Lien [mailto:tuo...@de...] > > Sent: Monday, December 30, 2019 4:56 PM > > To: tip...@li...; jm...@re...; > ma...@do...; Xue, Ying > > Subject: [net v2] tipc: fix wrong connect() return code > > > > The current 'tipc_wait_for_connect()' function makes a loop and waits > > for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the > > connecting has done. However, when the condition is met, it always > > returns '0' even in the case the connecting was actually failed (e.g. > > refused because the server socket has closed...) and the socket state > > was set to 'TIPC_DISCONNECTING'. > > This results in a wrong return code for the 'connect()' call from user, > > making it believe that the connection is established and goes ahead > > with more actions e.g. building & sending a message but then finally > > gets an unexpected result (e.g. '-EPIPE'). > > > > This commit fixes the issue by instead setting the wait condition to > > 'tipc_sk_connected(sk)', so that the function will return '0' only when > > the connection is really established. Otherwise, either the socket > > error code if any or '-ETIMEDOUT'/'-EINTR' will be returned > > correspondingly. > > > > --------- > > v2: changed after discussing with Ying > > --------- > > > > Signed-off-by: Tuong Lien <tuo...@de...> > > --- > > net/tipc/socket.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > > index 6552f986774c..2f5679f84060 100644 > > --- a/net/tipc/socket.c > > +++ b/net/tipc/socket.c > > @@ -2432,8 +2432,8 @@ static int tipc_wait_for_connect(struct socket *sock, > long *timeo_p) > > return > sock_intr_errno(*timeo_p); > > > > add_wait_queue(sk_sleep(sk), &wait); > > - done = sk_wait_event(sk, timeo_p, > > - > sk->sk_state > != TIPC_CONNECTING, &wait); > > + done = sk_wait_event(sk, timeo_p, > tipc_sk_connected(sk), > > + &wait); > > remove_wait_queue(sk_sleep(sk), &wait); > > } while (!done); > > return 0; > > -- > > 2.13.7 > > > _______________________________________________ tipc-discussion mailing list tip...@li... https://lists.sourceforge.net/lists/listinfo/tipc-discussion |
From: Jon M. <jm...@re...> - 2020-01-23 14:10:23
|
I think that is a better solution. The client will not be able to do any more writes, but that is a correct behavior when the peer has closed the connection. Basically, you modify tipc_connect(), and return 0 if the socket is in state DISCONNECTING, right? ///jon On 1/23/20 2:54 AM, Tuong Lien Tong wrote: > An alternative is to allow the 'connect()' to return '0' in this particular case. The socket state is already 'DISCONNECTING', so any further write will get '-EPIPE' but the socket is still able to read the messages existing in its receive queue. This may seem a bit contrary to the previous patch but they are not mutually exclusive, the previous one would only deal with the case the socket state is 'DISCONNECTING' but the 'sk->err' != 0. > How does this look correct from a functional standpoint? > > BR/Tuong > > -----Original Message----- > From: Jon Maloy <jm...@re...> > Sent: Tuesday, January 21, 2020 7:11 PM > To: yin...@wi... > Cc: tip...@li... > Subject: [tipc-discussion] Fwd: [net v2] tipc: fix wrong connect() return code > > Hi Ying, > Do you have any ideas regarding this? My best proposal is to att a time stamp to the new server socket at creation, > and then delay a 'close' if the socket is < 5ms old. > But I am not quite happy with this solution, so any better idea would be welcome. > > ///jon > > >> >> ----- Forwarded Message ----- From: Tuong Lien Tong >> <tuo...@de...>To: 'Ying Xue' <yin...@wi...>; >> "jm...@re..." <jm...@re...>; "ma...@do..." >> <ma...@do...>Cc: "tip...@de..." >> <tip...@de...>Sent: Monday, January 13, 2020, 04:57:58 AM >> GMT-5Subject: FW: [net v2] tipc: fix wrong connect() return code >> >> Hi Ying, Jon, >> >> >> >> The last change has exposed a different issue with our “lightweight” >> connection setup mechanism, >> >> Please see the scenario below: >> >> >> >> >> >> server process client process >> >> ----------------------- >> ------------------------------ >> >> create()| | >> >> listen()*tskL | >> >> bind()| |create() >> >> +- accept()* tskC* >> >> | | |connect() ~~~~~~~~~~~~~+ >> >> wait_for_rcv()| |<---------[SYN]-----------| | >> >> | | |//tskC: CONNECTING >> | | | *wait_for_conn() >> >> +--------->*tskS | >> |->add to wait queue >> >> |//tskS: ESTABLISHED | | >> >> |----------[ACK]---------->|//tskC: ESTABLISHED >> | ->wakeup(): SLEEPING -> run queue >> >> send()|----------[DATA]--------->|\ >> ->wakeup(): already woken >> >> send()|----------[DATA]--------->| | >> ->wakeup(): already woken >> >> . . . . |-> receive queue . >> >> . . . . | . >> >> send()|----------[DATA]--------->|/ >> ->wakeup(): already woken >> >> close()|----------[FIN]---------->|//tskC: DISCONNECTING | >> >> | | | >> >> | ~~~~~~~~~~~~~~~~~~~~~~~~~~>schedule(): >> | run queue -> RUNNING >> >> |->add >> |back >> |to >> |wait >> |queue >> |(state >> |!= >> |ESTABLISHED) >> >> . >> >> . >> >> | >> >> *timed >> out >> >> >> >> >> >> When the server side receives ‘SYN’ from client, the ‘accept()’ call ends up >> with the new socket in the ‘ESTABLISHED’ state. Shortly thereafter, the >> server starts sending a number of messages before calling the socket >> ‘close()’ without waiting for any response from the client! >> >> In this scenario, it may happen on the other side that the client process >> waiting for the connection to be ‘ESTABLISHED’ cannot be awakened in time. >> Although the ‘wakeup()’ call is still done when the ‘ACK’ from the server >> arrives, but it just moves the process into the run queue (i.e. runnable) to >> be scheduled to run later. The kernel (i.e. RX process) is not preemptable >> when it processes the messages from the server, including the ‘ACK ‘, >> ‘DATA’, … and finally the ‘FIN’ which will terminate the connection. When >> the wait process is really run back, the client socket state is now >> ‘DISCONNECTING’ (i.e. != ‘ESTABLISHED’), so it continues to wait until the >> timer expires. The client will finally get ‘-ETIMEDOUT’ and forced to >> release the socket whereas there remains the messages from the server in the >> socket’s receive queue. >> >> >> >> Notes: >> >> - The issue is found by the tipcutils/ptts tool. >> - This is a timing issue, sometimes it happens sometimes not, depending on >> the small enough number of the data messages (e.g. < 100) from the server >> before it sends the ‘FIN’, as well as the network traffic, RX process, >> scheduler, etc. on the client side. >> - Without the previous change, the issue is _not detected_ since the >> ‘connect()’ call will return 0 when the wait process is woken & running >> back but the socket state is ‘DISCONNECTING’, there is no socket error >> (sk->sk_err == 0) in this case… However, this is the original issue and >> unexpected because if the client tries to send something after that, it >> will get ‘-EPIPE’ immediately. >> >> >> >> We need to find a solution to this problem, >> >> - It seems to me that in this mechanism the server is “too hasty” to set >> the new socket to ‘ESTABLISHED’ at the ‘accept()’ when it has just >> received a ‘SYN’ from client. That allows it to go ahead with sending the >> messages & closing the socket in one step… Should we consider to include >> something like the TCP ‘3-way handshake’ here, so the server needs to get >> a ‘ACK’ from the client before the connection is fully established? I >> guess ‘no’ since it should be kept lightweight…? >> - An alternative is we can introduce a bit delay e.g. 2 or 5ms to the >> ‘accept()’ or ‘close()’ before completing it, which will give a chance >> for its client to be really established first? This solution looks simple >> and should have less impact to user space… >> - Or …? >> >> >> >> What is your opinion, please share? >> >> Many thanks! >> >> >> >> BR/Tuong >> >> >> >> -----Original Message----- >> From: Xue, Ying <Yin...@wi...> >> Sent: Thursday, January 2, 2020 4:10 PM >> To: Tuong Lien <tuo...@de...>; >> tip...@li...; jm...@re...; ma...@do... >> Subject: RE: [net v2] tipc: fix wrong connect() return code >> >> >> >> Great! >> >> >> >> Acked-by: Ying Xue <yin...@wi...> >> >> >> >> -----Original Message----- >> >> From: Tuong Lien [mailto:tuo...@de...] >> >> Sent: Monday, December 30, 2019 4:56 PM >> >> To: tip...@li...; jm...@re...; >> ma...@do...; Xue, Ying >> >> Subject: [net v2] tipc: fix wrong connect() return code >> >> >> >> The current 'tipc_wait_for_connect()' function makes a loop and waits >> >> for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the >> >> connecting has done. However, when the condition is met, it always >> >> returns '0' even in the case the connecting was actually failed (e.g. >> >> refused because the server socket has closed...) and the socket state >> >> was set to 'TIPC_DISCONNECTING'. >> >> This results in a wrong return code for the 'connect()' call from user, >> >> making it believe that the connection is established and goes ahead >> >> with more actions e.g. building & sending a message but then finally >> >> gets an unexpected result (e.g. '-EPIPE'). >> >> >> >> This commit fixes the issue by instead setting the wait condition to >> >> 'tipc_sk_connected(sk)', so that the function will return '0' only when >> >> the connection is really established. Otherwise, either the socket >> >> error code if any or '-ETIMEDOUT'/'-EINTR' will be returned >> >> correspondingly. >> >> >> >> --------- >> >> v2: changed after discussing with Ying >> >> --------- >> >> >> >> Signed-off-by: Tuong Lien <tuo...@de...> >> >> --- >> >> net/tipc/socket.c | 4 ++-- >> >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> >> >> diff --git a/net/tipc/socket.c b/net/tipc/socket.c >> >> index 6552f986774c..2f5679f84060 100644 >> >> --- a/net/tipc/socket.c >> >> +++ b/net/tipc/socket.c >> >> @@ -2432,8 +2432,8 @@ static int tipc_wait_for_connect(struct socket *sock, >> long *timeo_p) >> >> return >> sock_intr_errno(*timeo_p); >> >> >> >> add_wait_queue(sk_sleep(sk), &wait); >> >> - done = sk_wait_event(sk, timeo_p, >> >> - >> sk->sk_state >> != TIPC_CONNECTING, &wait); >> >> + done = sk_wait_event(sk, timeo_p, >> tipc_sk_connected(sk), >> >> + &wait); >> >> remove_wait_queue(sk_sleep(sk), &wait); >> >> } while (!done); >> >> return 0; >> >> -- >> >> 2.13.7 >> >> >> > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > -- |
From: Jon M. <jm...@re...> - 2020-02-05 14:47:10
|
On 2/5/20 8:26 AM, Xue, Ying wrote: > > Sorry, I have a different opinion about this behavior. In the below > scenario, it’s a bit weird if connect() returns 0. Although the > connection was ever successful, it had been broken when connect() > returned. So, I think connect() should return an error code at that > moment. Whenever user receives an error from connect(), the user will > call close() to destroy the socket. > But the connect() was successful! So why shouldn't we return the value that confirms that? The fact that the connection has already been closed by the peer doesn't change that fact. > On the contrary, if connect() returns 0, we will allow user to fetch > messages from socket receive queue, which sounds reasonable. But at > that moment, when user tries to call write(), it will return an error, > which is quite strange for user, and it’s hard to be understandable. > When the user then tries to do write() he will encounter the situation that the peer has closed the connection, which is also a completely normal behavior that may happens at any time. > > If connect() returns 0, the biggest benefit is that we give user a > chance to get data from receive queue, but I am seriously doubt that > the data from server side is meaningless for client. So we don’t need > to tell client so complex truth, instead, we can hide the complexity > internally and simply tell client connect() is failed. > To me this is a completely normal scenario. The client does connect(), maybe sends one or more messages, the server accepts the connection, reads the messages, responds with one or more messages, and closes the connection. The response to connect() that the client sees is semantically correct. What we are doing is just (re-)separating two different events that occur so close in time that they mess up our FSM. The patch fixes this so that they are agian reported as separate events. BR ///jon > Thanks, > > Ying > > *From:*Jon Maloy [mailto:jm...@re...] > *Sent:* Thursday, January 23, 2020 10:10 PM > *To:* Tuong Lien Tong; Xue, Ying > *Cc:* tip...@li... > *Subject:* Re: [tipc-discussion] Fwd: [net v2] tipc: fix wrong > connect() return code > > I think that is a better solution. The client will not be able to do > any more writes, but that is a correct behavior when the peer has > closed the connection. > > Basically, you modify tipc_connect(), and return 0 if the socket is in > state DISCONNECTING, right? > > ///jon > > On 1/23/20 2:54 AM, Tuong Lien Tong wrote: > > An alternative is to allow the 'connect()' to return '0' in this particular case. The socket state is already 'DISCONNECTING', so any further write will get '-EPIPE' but the socket is still able to read the messages existing in its receive queue. This may seem a bit contrary to the previous patch but they are not mutually exclusive, the previous one would only deal with the case the socket state is 'DISCONNECTING' but the 'sk->err' != 0. > > How does this look correct from a functional standpoint? > > BR/Tuong > > -----Original Message----- > > From: Jon Maloy<jm...@re...> <mailto:jm...@re...> > > Sent: Tuesday, January 21, 2020 7:11 PM > > To:yin...@wi... <mailto:yin...@wi...> > > Cc:tip...@li... <mailto:tip...@li...> > > Subject: [tipc-discussion] Fwd: [net v2] tipc: fix wrong connect() return code > > Hi Ying, > > Do you have any ideas regarding this? My best proposal is to att a time stamp to the new server socket at creation, > > and then delay a 'close' if the socket is < 5ms old. > > But I am not quite happy with this solution, so any better idea would be welcome. > > ///jon > > ----- Forwarded Message ----- From: Tuong Lien Tong > > <tuo...@de...> <mailto:tuo...@de...>To: 'Ying Xue'<yin...@wi...> <mailto:yin...@wi...>; > > "jm...@re..." <mailto:jm...@re...> <jm...@re...> <mailto:jm...@re...>;"ma...@do..." <mailto:ma...@do...> > > <ma...@do...> <mailto:ma...@do...>Cc:"tip...@de..." <mailto:tip...@de...> > > <tip...@de...> <mailto:tip...@de...>Sent: Monday, January 13, 2020, 04:57:58 AM > > GMT-5Subject: FW: [net v2] tipc: fix wrong connect() return code > > > > Hi Ying, Jon, > > > > The last change has exposed a different issue with our “lightweight” > > connection setup mechanism, > > Please see the scenario below: > > > > > > server process client process > > ----------------------- > > ------------------------------ > > create()| | > > listen()*tskL | > > bind()| |create() > > +- accept()* tskC* > > | | |connect() ~~~~~~~~~~~~~+ > > wait_for_rcv()| |<---------[SYN]-----------| | > > | | |//tskC: CONNECTING > > | | | *wait_for_conn() > > +--------->*tskS | > > |->add to wait queue > > |//tskS: ESTABLISHED | | > > |----------[ACK]---------->|//tskC: ESTABLISHED > > | ->wakeup(): SLEEPING -> run queue > > send()|----------[DATA]--------->|\ > > ->wakeup(): already woken > > send()|----------[DATA]--------->| | > > ->wakeup(): already woken > > . . . . |-> receive queue . > > . . . . | . > > send()|----------[DATA]--------->|/ > > ->wakeup(): already woken > > close()|----------[FIN]---------->|//tskC: DISCONNECTING | > > | | | > > | ~~~~~~~~~~~~~~~~~~~~~~~~~~>schedule(): > > | run queue -> RUNNING > > |->add > > |back > > |to > > |wait > > |queue > > |(state > > |!= > > |ESTABLISHED) > > . > > . > > | > > *timed > > out > > > > > > When the server side receives ‘SYN’ from client, the ‘accept()’ call ends up > > with the new socket in the ‘ESTABLISHED’ state. Shortly thereafter, the > > server starts sending a number of messages before calling the socket > > ‘close()’ without waiting for any response from the client! > > In this scenario, it may happen on the other side that the client process > > waiting for the connection to be ‘ESTABLISHED’ cannot be awakened in time. > > Although the ‘wakeup()’ call is still done when the ‘ACK’ from the server > > arrives, but it just moves the process into the run queue (i.e. runnable) to > > be scheduled to run later. The kernel (i.e. RX process) is not preemptable > > when it processes the messages from the server, including the ‘ACK ‘, > > ‘DATA’, … and finally the ‘FIN’ which will terminate the connection. When > > the wait process is really run back, the client socket state is now > > ‘DISCONNECTING’ (i.e. != ‘ESTABLISHED’), so it continues to wait until the > > timer expires. The client will finally get ‘-ETIMEDOUT’ and forced to > > release the socket whereas there remains the messages from the server in the > > socket’s receive queue. > > > > Notes: > > > > - The issue is found by the tipcutils/ptts tool. > > - This is a timing issue, sometimes it happens sometimes not, depending on > > the small enough number of the data messages (e.g. < 100) from the server > > before it sends the ‘FIN’, as well as the network traffic, RX process, > > scheduler, etc. on the client side. > > - Without the previous change, the issue is _not detected_ since the > > ‘connect()’ call will return 0 when the wait process is woken & running > > back but the socket state is ‘DISCONNECTING’, there is no socket error > > (sk->sk_err == 0) in this case… However, this is the original issue and > > unexpected because if the client tries to send something after that, it > > will get ‘-EPIPE’ immediately. > > > > We need to find a solution to this problem, > > > > - It seems to me that in this mechanism the server is “too hasty” to set > > the new socket to ‘ESTABLISHED’ at the ‘accept()’ when it has just > > received a ‘SYN’ from client. That allows it to go ahead with sending the > > messages & closing the socket in one step… Should we consider to include > > something like the TCP ‘3-way handshake’ here, so the server needs to get > > a ‘ACK’ from the client before the connection is fully established? I > > guess ‘no’ since it should be kept lightweight…? > > - An alternative is we can introduce a bit delay e.g. 2 or 5ms to the > > ‘accept()’ or ‘close()’ before completing it, which will give a chance > > for its client to be really established first? This solution looks simple > > and should have less impact to user space… > > - Or …? > > > > What is your opinion, please share? > > Many thanks! > > > > BR/Tuong > > > > -----Original Message----- > > From: Xue, Ying<Yin...@wi...> <mailto:Yin...@wi...> > > Sent: Thursday, January 2, 2020 4:10 PM > > To: Tuong Lien<tuo...@de...> <mailto:tuo...@de...>; > > tip...@li... <mailto:tip...@li...>;jm...@re... <mailto:jm...@re...>;ma...@do... <mailto:ma...@do...> > > Subject: RE: [net v2] tipc: fix wrong connect() return code > > > > Great! > > > > Acked-by: Ying Xue<yin...@wi...> <mailto:yin...@wi...> > > > > -----Original Message----- > > From: Tuong Lien [mailto:tuo...@de...] > > Sent: Monday, December 30, 2019 4:56 PM > > To:tip...@li... <mailto:tip...@li...>;jm...@re... <mailto:jm...@re...>; > > ma...@do... <mailto:ma...@do...>; Xue, Ying > > Subject: [net v2] tipc: fix wrong connect() return code > > > > The current 'tipc_wait_for_connect()' function makes a loop and waits > > for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the > > connecting has done. However, when the condition is met, it always > > returns '0' even in the case the connecting was actually failed (e.g. > > refused because the server socket has closed...) and the socket state > > was set to 'TIPC_DISCONNECTING'. > > This results in a wrong return code for the 'connect()' call from user, > > making it believe that the connection is established and goes ahead > > with more actions e.g. building & sending a message but then finally > > gets an unexpected result (e.g. '-EPIPE'). > > > > This commit fixes the issue by instead setting the wait condition to > > 'tipc_sk_connected(sk)', so that the function will return '0' only when > > the connection is really established. Otherwise, either the socket > > error code if any or '-ETIMEDOUT'/'-EINTR' will be returned > > correspondingly. > > > > --------- > > v2: changed after discussing with Ying > > --------- > > > > Signed-off-by: Tuong Lien<tuo...@de...> <mailto:tuo...@de...> > > --- > > net/tipc/socket.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > > index 6552f986774c..2f5679f84060 100644 > > --- a/net/tipc/socket.c > > +++ b/net/tipc/socket.c > > @@ -2432,8 +2432,8 @@ static int tipc_wait_for_connect(struct socket *sock, > > long *timeo_p) > > return > > sock_intr_errno(*timeo_p); > > > > add_wait_queue(sk_sleep(sk), &wait); > > - done = sk_wait_event(sk, timeo_p, > > - > > sk->sk_state > > != TIPC_CONNECTING, &wait); > > + done = sk_wait_event(sk, timeo_p, > > tipc_sk_connected(sk), > > + &wait); > > remove_wait_queue(sk_sleep(sk), &wait); > > } while (!done); > > return 0; > > -- > > 2.13.7 > > > > _______________________________________________ > > tipc-discussion mailing list > > tip...@li... <mailto:tip...@li...> > > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > -- > -- |
From: David M. <da...@da...> - 2020-01-08 23:58:12
|
From: Tuong Lien <tuo...@de...> Date: Wed, 8 Jan 2020 09:19:00 +0700 > The current 'tipc_wait_for_connect()' function does a wait-loop for the > condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the socket > connecting has done. However, when the condition is met, it returns '0' > even in the case the connecting is actually failed, the socket state is > set to 'TIPC_DISCONNECTING' (e.g. when the server socket has closed..). > This results in a wrong return code for the 'connect()' call from user, > making it believe that the connection is established and go ahead with > building, sending a message, etc. but finally failed e.g. '-EPIPE'. > > This commit fixes the issue by changing the wait condition to the > 'tipc_sk_connected(sk)', so the function will return '0' only when the > connection is really established. Otherwise, either the socket 'sk_err' > if any or '-ETIMEDOUT'/'-EINTR' will be returned correspondingly. > > Acked-by: Ying Xue <yin...@wi...> > Acked-by: Jon Maloy <jon...@er...> > Signed-off-by: Tuong Lien <tuo...@de...> Applied. |
From: Jon M. <jon...@er...> - 2019-12-24 13:20:15
|
Acked-by: Jon Maloy <jon...@er...> > -----Original Message----- > From: Tuong Lien <tuo...@de...> > Sent: 24-Dec-19 03:06 > To: tip...@li...; Jon Maloy <jon...@er...>; ma...@do...; > yin...@wi... > Subject: [net] tipc: fix wrong connect() return code > > The current 'tipc_wait_for_connect()' function makes a loop and waits > for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the > connecting has done. However, when the condition is met, it always > returns '0' even in the case the connecting was actually failed (e.g. > refused because the server socket has closed...) and the socket state > was set to 'TIPC_DISCONNECTING'. > This results in a wrong return code for the 'connect()' call from user, > making it believe that the connection is established and goes ahead > with more actions e.g. building & sending a message but then finally > gets an unexpected result (e.g. '-EPIPE'). > > This commit fixes the issue by returning the corresponding error code > if any when the wait process is waken up. > > Signed-off-by: Tuong Lien <tuo...@de...> > --- > net/tipc/socket.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > index 8b1daf3634b0..2e5faf89ef80 100644 > --- a/net/tipc/socket.c > +++ b/net/tipc/socket.c > @@ -2428,7 +2428,7 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) > { > DEFINE_WAIT_FUNC(wait, woken_wake_function); > struct sock *sk = sock->sk; > - int done; > + int done = 0; > > do { > int err = sock_error(sk); > @@ -2438,12 +2438,14 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) > return -ETIMEDOUT; > if (signal_pending(current)) > return sock_intr_errno(*timeo_p); > + if (done) > + return 0; > > add_wait_queue(sk_sleep(sk), &wait); > done = sk_wait_event(sk, timeo_p, > sk->sk_state != TIPC_CONNECTING, &wait); > remove_wait_queue(sk_sleep(sk), &wait); > - } while (!done); > + } while (1); > return 0; > } > > -- > 2.13.7 |
From: Xue, Y. <Yin...@wi...> - 2019-12-25 13:48:56
|
Probably below change is more easily understandable: diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 6552f98..358cc55 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2435,7 +2435,7 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) done = sk_wait_event(sk, timeo_p, sk->sk_state != TIPC_CONNECTING, &wait); remove_wait_queue(sk_sleep(sk), &wait); - } while (!done); + } while (!done || sk->sk_err); return 0; } -----Original Message----- From: Tuong Lien [mailto:tuo...@de...] Sent: Tuesday, December 24, 2019 4:06 PM To: tip...@li...; jon...@er...; ma...@do...; Xue, Ying Subject: [net] tipc: fix wrong connect() return code The current 'tipc_wait_for_connect()' function makes a loop and waits for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the connecting has done. However, when the condition is met, it always returns '0' even in the case the connecting was actually failed (e.g. refused because the server socket has closed...) and the socket state was set to 'TIPC_DISCONNECTING'. This results in a wrong return code for the 'connect()' call from user, making it believe that the connection is established and goes ahead with more actions e.g. building & sending a message but then finally gets an unexpected result (e.g. '-EPIPE'). This commit fixes the issue by returning the corresponding error code if any when the wait process is waken up. Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/socket.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 8b1daf3634b0..2e5faf89ef80 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2428,7 +2428,7 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) { DEFINE_WAIT_FUNC(wait, woken_wake_function); struct sock *sk = sock->sk; - int done; + int done = 0; do { int err = sock_error(sk); @@ -2438,12 +2438,14 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) return -ETIMEDOUT; if (signal_pending(current)) return sock_intr_errno(*timeo_p); + if (done) + return 0; add_wait_queue(sk_sleep(sk), &wait); done = sk_wait_event(sk, timeo_p, sk->sk_state != TIPC_CONNECTING, &wait); remove_wait_queue(sk_sleep(sk), &wait); - } while (!done); + } while (1); return 0; } -- 2.13.7 |
From: Tuong L. T. <tuo...@de...> - 2019-12-26 01:04:48
|
Hi Ying, Actually, the 'done' flag has been set in the particular case (since the 'sk->sk_state' changed to 'TIPC_DISCONNECTING' after receiving a rejection of its SYN from server...) and what we want to achieve is the error code from the 'sock_error(sk)' to be returned to user correctly. For your code, there is no difference, the function would still return '0' for the said case. I considered an alternative that: done = sk_wait_event(sk, timeo_p, sk->sk_state != TIPC_CONNECTING, &wait); remove_wait_queue(sk_sleep(sk), &wait); } while (!done); - return 0; + return sock_err(sk); but this could get more concerns or we should check the socket state at the return e.g. return (sk->sk_state != TIPC_ESTABLISHED) ? sock_err(sk) : 0; and the fact is that we have this code already in the while statement, so I have decided to go with the code below. BR/Tuong -----Original Message----- From: Xue, Ying <Yin...@wi...> Sent: Wednesday, December 25, 2019 8:48 PM To: Tuong Lien <tuo...@de...>; tip...@li...; jon...@er...; ma...@do... Subject: RE: [net] tipc: fix wrong connect() return code Probably below change is more easily understandable: diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 6552f98..358cc55 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2435,7 +2435,7 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) done = sk_wait_event(sk, timeo_p, sk->sk_state != TIPC_CONNECTING, &wait); remove_wait_queue(sk_sleep(sk), &wait); - } while (!done); + } while (!done || sk->sk_err); return 0; } -----Original Message----- From: Tuong Lien [mailto:tuo...@de...] Sent: Tuesday, December 24, 2019 4:06 PM To: tip...@li...; jon...@er...; ma...@do...; Xue, Ying Subject: [net] tipc: fix wrong connect() return code The current 'tipc_wait_for_connect()' function makes a loop and waits for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the connecting has done. However, when the condition is met, it always returns '0' even in the case the connecting was actually failed (e.g. refused because the server socket has closed...) and the socket state was set to 'TIPC_DISCONNECTING'. This results in a wrong return code for the 'connect()' call from user, making it believe that the connection is established and goes ahead with more actions e.g. building & sending a message but then finally gets an unexpected result (e.g. '-EPIPE'). This commit fixes the issue by returning the corresponding error code if any when the wait process is waken up. Signed-off-by: Tuong Lien <tuo...@de...> --- net/tipc/socket.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 8b1daf3634b0..2e5faf89ef80 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2428,7 +2428,7 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) { DEFINE_WAIT_FUNC(wait, woken_wake_function); struct sock *sk = sock->sk; - int done; + int done = 0; do { int err = sock_error(sk); @@ -2438,12 +2438,14 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) return -ETIMEDOUT; if (signal_pending(current)) return sock_intr_errno(*timeo_p); + if (done) + return 0; add_wait_queue(sk_sleep(sk), &wait); done = sk_wait_event(sk, timeo_p, sk->sk_state != TIPC_CONNECTING, &wait); remove_wait_queue(sk_sleep(sk), &wait); - } while (!done); + } while (1); return 0; } -- 2.13.7 |
From: Ying X. <yin...@wi...> - 2019-12-26 06:31:22
|
On 12/26/19 9:04 AM, Tuong Lien Tong wrote: > Hi Ying, > > Actually, the 'done' flag has been set in the particular case (since the > 'sk->sk_state' changed to 'TIPC_DISCONNECTING' after receiving a rejection > of its SYN from server...) and what we want to achieve is the error code > from the 'sock_error(sk)' to be returned to user correctly. > For your code, there is no difference, the function would still return '0' > for the said case. --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2435,7 +2435,7 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) done = sk_wait_event(sk, timeo_p, sk->sk_state != TIPC_CONNECTING, &wait); remove_wait_queue(sk_sleep(sk), &wait); - } while (!done); + } while (!done || sk->sk_err); return 0; } Sorry, in my understanding, if this case you mentioned above occurs, "done" will be really set to 1, which means "while loop" will exit and then 0 will be returned to 0. This is our current problem. But, if we add "sk->sk_err" as another while statement condition, the while loop will not exit because "sk->sk_err" is not 0. As a consequence, in next loop, sock error code will be returned to user because sock_error() is not 0. > I considered an alternative that: > > done = sk_wait_event(sk, timeo_p, > sk->sk_state != TIPC_CONNECTING, > &wait); > remove_wait_queue(sk_sleep(sk), &wait); > } while (!done); > - return 0; > + return sock_err(sk); > > but this could get more concerns or we should check the socket state at the > return e.g. > return (sk->sk_state != TIPC_ESTABLISHED) ? sock_err(sk) : 0; > > and the fact is that we have this code already in the while statement, so I > have decided to go with the code below. |
From: Tuong L. T. <tuo...@de...> - 2019-12-26 07:16:48
|
Hi Ying, Sorry, my mistake! I thought it was "while (!done || !sk->sk_err)"... But, this is really confusing and one might ask why we continue the loop while the socket has encountered an error (sk-> sk_err! = 0)??? Moreover, it's not necessarily the "sk_err" will be the only case, For example: sk->sk_state != TIPC_CONNECTING but sk->sk_err = 0 (somehow) and a pending/interrupt signal then we should return '-EINTR'? BR/Tuong -----Original Message----- From: Ying Xue <yin...@wi...> Sent: Thursday, December 26, 2019 1:17 PM To: Tuong Lien Tong <tuo...@de...>; tip...@li...; jon...@er...; ma...@do... Subject: Re: [net] tipc: fix wrong connect() return code On 12/26/19 9:04 AM, Tuong Lien Tong wrote: > Hi Ying, > > Actually, the 'done' flag has been set in the particular case (since the > 'sk->sk_state' changed to 'TIPC_DISCONNECTING' after receiving a rejection > of its SYN from server...) and what we want to achieve is the error code > from the 'sock_error(sk)' to be returned to user correctly. > For your code, there is no difference, the function would still return '0' > for the said case. --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2435,7 +2435,7 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) done = sk_wait_event(sk, timeo_p, sk->sk_state != TIPC_CONNECTING, &wait); remove_wait_queue(sk_sleep(sk), &wait); - } while (!done); + } while (!done || sk->sk_err); return 0; } Sorry, in my understanding, if this case you mentioned above occurs, "done" will be really set to 1, which means "while loop" will exit and then 0 will be returned to 0. This is our current problem. But, if we add "sk->sk_err" as another while statement condition, the while loop will not exit because "sk->sk_err" is not 0. As a consequence, in next loop, sock error code will be returned to user because sock_error() is not 0. > I considered an alternative that: > > done = sk_wait_event(sk, timeo_p, > sk->sk_state != TIPC_CONNECTING, > &wait); > remove_wait_queue(sk_sleep(sk), &wait); > } while (!done); > - return 0; > + return sock_err(sk); > > but this could get more concerns or we should check the socket state at the > return e.g. > return (sk->sk_state != TIPC_ESTABLISHED) ? sock_err(sk) : 0; > > and the fact is that we have this code already in the while statement, so I > have decided to go with the code below. |
From: Ying X. <yin...@wi...> - 2019-12-26 08:08:20
|
On 12/26/19 3:16 PM, Tuong Lien Tong wrote: > Sorry, my mistake! I thought it was "while (!done || !sk->sk_err)"... > But, this is really confusing and one might ask why we continue the loop while the socket has encountered an error (sk-> sk_err! = 0)??? This is totally reasonable because it can make code simpler. > Moreover, it's not necessarily the "sk_err" will be the only case, > For example: > sk->sk_state != TIPC_CONNECTING but sk->sk_err = 0 (somehow) and a pending/interrupt signal If sk->sk_state is changed from TIPC_CONNECTING to TIPC_ESTABLISHED, sk->sk_err should be kept 0, but if sk->sk_state is changed to other states, sk->sk_err must be set with a proper error code in socket receive path, otherwise, it's a bug. However, there might be one below race condition: sk->sk_state is set to TIPC_DISCONNECTING, but before sk->sk_err is set with an error code, the process who calls connect() is woken up by one single. Even so, the process is still blocked when it tries to hold sock lock with lock_sock() in sk_wait_event() because the socket lock is taken in socket receive path. After socket lock is released, the process will continue being run. But at that moment, sk->sk_err has been set with an error code. So, this is no problem for us. > > then we should return '-EINTR'? |
From: Tuong L. T. <tuo...@de...> - 2019-12-26 10:39:04
|
What about this? diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 6ebd809ef207..04f035bcc272 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2446,7 +2446,7 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) done = sk_wait_event(sk, timeo_p, sk->sk_state != TIPC_CONNECTING, &wait); remove_wait_queue(sk_sleep(sk), &wait); - } while (!done); + } while (!tipc_sk_connected(sk)); return 0; } BR/Tuong -----Original Message----- From: Ying Xue <yin...@wi...> Sent: Thursday, December 26, 2019 2:55 PM To: Tuong Lien Tong <tuo...@de...>; tip...@li...; jon...@er...; ma...@do... Subject: Re: [net] tipc: fix wrong connect() return code On 12/26/19 3:16 PM, Tuong Lien Tong wrote: > Sorry, my mistake! I thought it was "while (!done || !sk->sk_err)"... > But, this is really confusing and one might ask why we continue the loop while the socket has encountered an error (sk-> sk_err! = 0)??? This is totally reasonable because it can make code simpler. [Tuong]: but a performance hit since it almost looks like the sk_err has to be read twice? > Moreover, it's not necessarily the "sk_err" will be the only case, > For example: > sk->sk_state != TIPC_CONNECTING but sk->sk_err = 0 (somehow) and a pending/interrupt signal If sk->sk_state is changed from TIPC_CONNECTING to TIPC_ESTABLISHED, sk->sk_err should be kept 0, but if sk->sk_state is changed to other states, sk->sk_err must be set with a proper error code in socket receive path, otherwise, it's a bug. However, there might be one below race condition: sk->sk_state is set to TIPC_DISCONNECTING, but before sk->sk_err is set with an error code, the process who calls connect() is woken up by one single. Even so, the process is still blocked when it tries to hold sock lock with lock_sock() in sk_wait_event() because the socket lock is taken in socket receive path. After socket lock is released, the process will continue being run. But at that moment, sk->sk_err has been set with an error code. So, this is no problem for us. > > then we should return '-EINTR'? |
From: Tuong L. T. <tuo...@de...> - 2019-12-27 02:22:40
|
Hi Ying, Thinking more about this... I suppose we can even remove the function and utilize the generic macro 'tipc_wait_for_cond()' but with the new condition, that is much simpler and also saves some footprint. The 'tipc_sk_connected()' condition is really what we should expect in this case instead. So, in the case of 'TIPC_DISCONNECTING', regardless of whether we have a 'sk->sk_err', it will return that error or '-ETIMEOUT' but never '0'. I will send the patch v2 for your review. BR/Tuong -----Original Message----- From: Tuong Lien Tong <tuo...@de...> Sent: Thursday, December 26, 2019 5:39 PM To: 'Ying Xue' <yin...@wi...>; tip...@li...; jon...@er...; ma...@do... Subject: Re: [tipc-discussion] [net] tipc: fix wrong connect() return code What about this? diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 6ebd809ef207..04f035bcc272 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -2446,7 +2446,7 @@ static int tipc_wait_for_connect(struct socket *sock, long *timeo_p) done = sk_wait_event(sk, timeo_p, sk->sk_state != TIPC_CONNECTING, &wait); remove_wait_queue(sk_sleep(sk), &wait); - } while (!done); + } while (!tipc_sk_connected(sk)); return 0; } BR/Tuong -----Original Message----- From: Ying Xue <yin...@wi...> Sent: Thursday, December 26, 2019 2:55 PM To: Tuong Lien Tong <tuo...@de...>; tip...@li...; jon...@er...; ma...@do... Subject: Re: [net] tipc: fix wrong connect() return code On 12/26/19 3:16 PM, Tuong Lien Tong wrote: > Sorry, my mistake! I thought it was "while (!done || !sk->sk_err)"... > But, this is really confusing and one might ask why we continue the loop while the socket has encountered an error (sk-> sk_err! = 0)??? This is totally reasonable because it can make code simpler. [Tuong]: but a performance hit since it almost looks like the sk_err has to be read twice? > Moreover, it's not necessarily the "sk_err" will be the only case, > For example: > sk->sk_state != TIPC_CONNECTING but sk->sk_err = 0 (somehow) and a pending/interrupt signal If sk->sk_state is changed from TIPC_CONNECTING to TIPC_ESTABLISHED, sk->sk_err should be kept 0, but if sk->sk_state is changed to other states, sk->sk_err must be set with a proper error code in socket receive path, otherwise, it's a bug. However, there might be one below race condition: sk->sk_state is set to TIPC_DISCONNECTING, but before sk->sk_err is set with an error code, the process who calls connect() is woken up by one single. Even so, the process is still blocked when it tries to hold sock lock with lock_sock() in sk_wait_event() because the socket lock is taken in socket receive path. After socket lock is released, the process will continue being run. But at that moment, sk->sk_err has been set with an error code. So, this is no problem for us. > > then we should return '-EINTR'? _______________________________________________ tipc-discussion mailing list tip...@li... https://lists.sourceforge.net/lists/listinfo/tipc-discussion |
From: Ying X. <yin...@wi...> - 2019-12-27 07:18:40
|
On 12/27/19 10:22 AM, Tuong Lien Tong wrote: > Hi Ying, > > Thinking more about this... > I suppose we can even remove the function and utilize the generic macro > 'tipc_wait_for_cond()' but with the new condition, that is much simpler and > also saves some footprint. Agreed. > The 'tipc_sk_connected()' condition is really what we should expect in this > case instead. Yes. > So, in the case of 'TIPC_DISCONNECTING', regardless of whether we have a > 'sk->sk_err', it will return that error or '-ETIMEOUT' but never '0'. But I think it's better to return the error attached on 'sk->sk_err' to user because it can really reflect why connect() is failed. > I will send the patch v2 for your review. > > BR/Tuong > > -----Original Message----- > From: Tuong Lien Tong <tuo...@de...> > Sent: Thursday, December 26, 2019 5:39 PM > To: 'Ying Xue' <yin...@wi...>; > tip...@li...; jon...@er...; > ma...@do... > Subject: Re: [tipc-discussion] [net] tipc: fix wrong connect() return code > > What about this? > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > index 6ebd809ef207..04f035bcc272 100644 > --- a/net/tipc/socket.c > +++ b/net/tipc/socket.c > @@ -2446,7 +2446,7 @@ static int tipc_wait_for_connect(struct socket *sock, > long *timeo_p) > done = sk_wait_event(sk, timeo_p, > sk->sk_state != TIPC_CONNECTING, > &wait); > remove_wait_queue(sk_sleep(sk), &wait); > - } while (!done); > + } while (!tipc_sk_connected(sk)); > return 0; > } > > BR/Tuong > > -----Original Message----- > From: Ying Xue <yin...@wi...> > Sent: Thursday, December 26, 2019 2:55 PM > To: Tuong Lien Tong <tuo...@de...>; > tip...@li...; jon...@er...; > ma...@do... > Subject: Re: [net] tipc: fix wrong connect() return code > > On 12/26/19 3:16 PM, Tuong Lien Tong wrote: >> Sorry, my mistake! I thought it was "while (!done || !sk->sk_err)"... >> But, this is really confusing and one might ask why we continue the loop > while the socket has encountered an error (sk-> sk_err! = 0)??? > > This is totally reasonable because it can make code simpler. > [Tuong]: but a performance hit since it almost looks like the sk_err has to > be read twice? > >> Moreover, it's not necessarily the "sk_err" will be the only case, >> For example: >> sk->sk_state != TIPC_CONNECTING but sk->sk_err = 0 (somehow) and a > pending/interrupt signal > > If sk->sk_state is changed from TIPC_CONNECTING to TIPC_ESTABLISHED, > sk->sk_err should be kept 0, but if sk->sk_state is changed to other > states, sk->sk_err must be set with a proper error code in socket > receive path, otherwise, it's a bug. > > However, there might be one below race condition: > > sk->sk_state is set to TIPC_DISCONNECTING, but before sk->sk_err is set > with an error code, the process who calls connect() is woken up by one > single. Even so, the process is still blocked when it tries to hold sock > lock with lock_sock() in sk_wait_event() because the socket lock is > taken in socket receive path. After socket lock is released, the process > will continue being run. But at that moment, sk->sk_err has been set > with an error code. So, this is no problem for us. > >> >> then we should return '-EINTR'? > > > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > |
From: Tuong L. T. <tuo...@de...> - 2019-12-30 09:11:47
|
Hi Ying, Unfortunately, the 'tipc_wait_for_cond()' macro is not applicable here... It's for the sendmsg() cases only which expects the socket in the 'ESTABLISHED' state first, otherwise it will return '- ENOTCONN' (see the 'tipc_sk_sock_err()'). Of course, we could adjust it for the case but looks not so good..., so I have decided to keep the function and just change the wait condition to 'tipc_sk_connected()'. Please see the patch v2 I just sent out. BR/Tuong -----Original Message----- From: Ying Xue <yin...@wi...> Sent: Friday, December 27, 2019 2:04 PM To: Tuong Lien Tong <tuo...@de...>; tip...@li...; jon...@er...; ma...@do... Subject: Re: [tipc-discussion] [net] tipc: fix wrong connect() return code On 12/27/19 10:22 AM, Tuong Lien Tong wrote: > Hi Ying, > > Thinking more about this... > I suppose we can even remove the function and utilize the generic macro > 'tipc_wait_for_cond()' but with the new condition, that is much simpler and > also saves some footprint. Agreed. > The 'tipc_sk_connected()' condition is really what we should expect in this > case instead. Yes. > So, in the case of 'TIPC_DISCONNECTING', regardless of whether we have a > 'sk->sk_err', it will return that error or '-ETIMEOUT' but never '0'. But I think it's better to return the error attached on 'sk->sk_err' to user because it can really reflect why connect() is failed. > I will send the patch v2 for your review. > > BR/Tuong > > -----Original Message----- > From: Tuong Lien Tong <tuo...@de...> > Sent: Thursday, December 26, 2019 5:39 PM > To: 'Ying Xue' <yin...@wi...>; > tip...@li...; jon...@er...; > ma...@do... > Subject: Re: [tipc-discussion] [net] tipc: fix wrong connect() return code > > What about this? > > diff --git a/net/tipc/socket.c b/net/tipc/socket.c > index 6ebd809ef207..04f035bcc272 100644 > --- a/net/tipc/socket.c > +++ b/net/tipc/socket.c > @@ -2446,7 +2446,7 @@ static int tipc_wait_for_connect(struct socket *sock, > long *timeo_p) > done = sk_wait_event(sk, timeo_p, > sk->sk_state != TIPC_CONNECTING, > &wait); > remove_wait_queue(sk_sleep(sk), &wait); > - } while (!done); > + } while (!tipc_sk_connected(sk)); > return 0; > } > > BR/Tuong > > -----Original Message----- > From: Ying Xue <yin...@wi...> > Sent: Thursday, December 26, 2019 2:55 PM > To: Tuong Lien Tong <tuo...@de...>; > tip...@li...; jon...@er...; > ma...@do... > Subject: Re: [net] tipc: fix wrong connect() return code > > On 12/26/19 3:16 PM, Tuong Lien Tong wrote: >> Sorry, my mistake! I thought it was "while (!done || !sk->sk_err)"... >> But, this is really confusing and one might ask why we continue the loop > while the socket has encountered an error (sk-> sk_err! = 0)??? > > This is totally reasonable because it can make code simpler. > [Tuong]: but a performance hit since it almost looks like the sk_err has to > be read twice? > >> Moreover, it's not necessarily the "sk_err" will be the only case, >> For example: >> sk->sk_state != TIPC_CONNECTING but sk->sk_err = 0 (somehow) and a > pending/interrupt signal > > If sk->sk_state is changed from TIPC_CONNECTING to TIPC_ESTABLISHED, > sk->sk_err should be kept 0, but if sk->sk_state is changed to other > states, sk->sk_err must be set with a proper error code in socket > receive path, otherwise, it's a bug. > > However, there might be one below race condition: > > sk->sk_state is set to TIPC_DISCONNECTING, but before sk->sk_err is set > with an error code, the process who calls connect() is woken up by one > single. Even so, the process is still blocked when it tries to hold sock > lock with lock_sock() in sk_wait_event() because the socket lock is > taken in socket receive path. After socket lock is released, the process > will continue being run. But at that moment, sk->sk_err has been set > with an error code. So, this is no problem for us. > >> >> then we should return '-EINTR'? > > > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > |