From: Jon M. <jm...@re...> - 2020-01-23 14:10:23
|
I think that is a better solution. The client will not be able to do any more writes, but that is a correct behavior when the peer has closed the connection. Basically, you modify tipc_connect(), and return 0 if the socket is in state DISCONNECTING, right? ///jon On 1/23/20 2:54 AM, Tuong Lien Tong wrote: > An alternative is to allow the 'connect()' to return '0' in this particular case. The socket state is already 'DISCONNECTING', so any further write will get '-EPIPE' but the socket is still able to read the messages existing in its receive queue. This may seem a bit contrary to the previous patch but they are not mutually exclusive, the previous one would only deal with the case the socket state is 'DISCONNECTING' but the 'sk->err' != 0. > How does this look correct from a functional standpoint? > > BR/Tuong > > -----Original Message----- > From: Jon Maloy <jm...@re...> > Sent: Tuesday, January 21, 2020 7:11 PM > To: yin...@wi... > Cc: tip...@li... > Subject: [tipc-discussion] Fwd: [net v2] tipc: fix wrong connect() return code > > Hi Ying, > Do you have any ideas regarding this? My best proposal is to att a time stamp to the new server socket at creation, > and then delay a 'close' if the socket is < 5ms old. > But I am not quite happy with this solution, so any better idea would be welcome. > > ///jon > > >> >> ----- Forwarded Message ----- From: Tuong Lien Tong >> <tuo...@de...>To: 'Ying Xue' <yin...@wi...>; >> "jm...@re..." <jm...@re...>; "ma...@do..." >> <ma...@do...>Cc: "tip...@de..." >> <tip...@de...>Sent: Monday, January 13, 2020, 04:57:58 AM >> GMT-5Subject: FW: [net v2] tipc: fix wrong connect() return code >> >> Hi Ying, Jon, >> >> >> >> The last change has exposed a different issue with our “lightweight” >> connection setup mechanism, >> >> Please see the scenario below: >> >> >> >> >> >> server process client process >> >> ----------------------- >> ------------------------------ >> >> create()| | >> >> listen()*tskL | >> >> bind()| |create() >> >> +- accept()* tskC* >> >> | | |connect() ~~~~~~~~~~~~~+ >> >> wait_for_rcv()| |<---------[SYN]-----------| | >> >> | | |//tskC: CONNECTING >> | | | *wait_for_conn() >> >> +--------->*tskS | >> |->add to wait queue >> >> |//tskS: ESTABLISHED | | >> >> |----------[ACK]---------->|//tskC: ESTABLISHED >> | ->wakeup(): SLEEPING -> run queue >> >> send()|----------[DATA]--------->|\ >> ->wakeup(): already woken >> >> send()|----------[DATA]--------->| | >> ->wakeup(): already woken >> >> . . . . |-> receive queue . >> >> . . . . | . >> >> send()|----------[DATA]--------->|/ >> ->wakeup(): already woken >> >> close()|----------[FIN]---------->|//tskC: DISCONNECTING | >> >> | | | >> >> | ~~~~~~~~~~~~~~~~~~~~~~~~~~>schedule(): >> | run queue -> RUNNING >> >> |->add >> |back >> |to >> |wait >> |queue >> |(state >> |!= >> |ESTABLISHED) >> >> . >> >> . >> >> | >> >> *timed >> out >> >> >> >> >> >> When the server side receives ‘SYN’ from client, the ‘accept()’ call ends up >> with the new socket in the ‘ESTABLISHED’ state. Shortly thereafter, the >> server starts sending a number of messages before calling the socket >> ‘close()’ without waiting for any response from the client! >> >> In this scenario, it may happen on the other side that the client process >> waiting for the connection to be ‘ESTABLISHED’ cannot be awakened in time. >> Although the ‘wakeup()’ call is still done when the ‘ACK’ from the server >> arrives, but it just moves the process into the run queue (i.e. runnable) to >> be scheduled to run later. The kernel (i.e. RX process) is not preemptable >> when it processes the messages from the server, including the ‘ACK ‘, >> ‘DATA’, … and finally the ‘FIN’ which will terminate the connection. When >> the wait process is really run back, the client socket state is now >> ‘DISCONNECTING’ (i.e. != ‘ESTABLISHED’), so it continues to wait until the >> timer expires. The client will finally get ‘-ETIMEDOUT’ and forced to >> release the socket whereas there remains the messages from the server in the >> socket’s receive queue. >> >> >> >> Notes: >> >> - The issue is found by the tipcutils/ptts tool. >> - This is a timing issue, sometimes it happens sometimes not, depending on >> the small enough number of the data messages (e.g. < 100) from the server >> before it sends the ‘FIN’, as well as the network traffic, RX process, >> scheduler, etc. on the client side. >> - Without the previous change, the issue is _not detected_ since the >> ‘connect()’ call will return 0 when the wait process is woken & running >> back but the socket state is ‘DISCONNECTING’, there is no socket error >> (sk->sk_err == 0) in this case… However, this is the original issue and >> unexpected because if the client tries to send something after that, it >> will get ‘-EPIPE’ immediately. >> >> >> >> We need to find a solution to this problem, >> >> - It seems to me that in this mechanism the server is “too hasty” to set >> the new socket to ‘ESTABLISHED’ at the ‘accept()’ when it has just >> received a ‘SYN’ from client. That allows it to go ahead with sending the >> messages & closing the socket in one step… Should we consider to include >> something like the TCP ‘3-way handshake’ here, so the server needs to get >> a ‘ACK’ from the client before the connection is fully established? I >> guess ‘no’ since it should be kept lightweight…? >> - An alternative is we can introduce a bit delay e.g. 2 or 5ms to the >> ‘accept()’ or ‘close()’ before completing it, which will give a chance >> for its client to be really established first? This solution looks simple >> and should have less impact to user space… >> - Or …? >> >> >> >> What is your opinion, please share? >> >> Many thanks! >> >> >> >> BR/Tuong >> >> >> >> -----Original Message----- >> From: Xue, Ying <Yin...@wi...> >> Sent: Thursday, January 2, 2020 4:10 PM >> To: Tuong Lien <tuo...@de...>; >> tip...@li...; jm...@re...; ma...@do... >> Subject: RE: [net v2] tipc: fix wrong connect() return code >> >> >> >> Great! >> >> >> >> Acked-by: Ying Xue <yin...@wi...> >> >> >> >> -----Original Message----- >> >> From: Tuong Lien [mailto:tuo...@de...] >> >> Sent: Monday, December 30, 2019 4:56 PM >> >> To: tip...@li...; jm...@re...; >> ma...@do...; Xue, Ying >> >> Subject: [net v2] tipc: fix wrong connect() return code >> >> >> >> The current 'tipc_wait_for_connect()' function makes a loop and waits >> >> for the condition 'sk->sk_state != TIPC_CONNECTING' to conclude if the >> >> connecting has done. However, when the condition is met, it always >> >> returns '0' even in the case the connecting was actually failed (e.g. >> >> refused because the server socket has closed...) and the socket state >> >> was set to 'TIPC_DISCONNECTING'. >> >> This results in a wrong return code for the 'connect()' call from user, >> >> making it believe that the connection is established and goes ahead >> >> with more actions e.g. building & sending a message but then finally >> >> gets an unexpected result (e.g. '-EPIPE'). >> >> >> >> This commit fixes the issue by instead setting the wait condition to >> >> 'tipc_sk_connected(sk)', so that the function will return '0' only when >> >> the connection is really established. Otherwise, either the socket >> >> error code if any or '-ETIMEDOUT'/'-EINTR' will be returned >> >> correspondingly. >> >> >> >> --------- >> >> v2: changed after discussing with Ying >> >> --------- >> >> >> >> Signed-off-by: Tuong Lien <tuo...@de...> >> >> --- >> >> net/tipc/socket.c | 4 ++-- >> >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> >> >> diff --git a/net/tipc/socket.c b/net/tipc/socket.c >> >> index 6552f986774c..2f5679f84060 100644 >> >> --- a/net/tipc/socket.c >> >> +++ b/net/tipc/socket.c >> >> @@ -2432,8 +2432,8 @@ static int tipc_wait_for_connect(struct socket *sock, >> long *timeo_p) >> >> return >> sock_intr_errno(*timeo_p); >> >> >> >> add_wait_queue(sk_sleep(sk), &wait); >> >> - done = sk_wait_event(sk, timeo_p, >> >> - >> sk->sk_state >> != TIPC_CONNECTING, &wait); >> >> + done = sk_wait_event(sk, timeo_p, >> tipc_sk_connected(sk), >> >> + &wait); >> >> remove_wait_queue(sk_sleep(sk), &wait); >> >> } while (!done); >> >> return 0; >> >> -- >> >> 2.13.7 >> >> >> > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > -- |