Thread: [Cppcms-users] Simple long polling application has memory leaks (sockets left in CLOSE_WAIT state in
Brought to you by:
artyom-beilis
From: Cody S. <cod...@gu...> - 2016-01-14 23:20:13
|
Hi all, I'm just starting to get my hands dirty with cppcms and long-polling, and I'm having some troubles. I've followed the example provided in the "chat" application of the examples directory and made some tweaks (code will be provided below). The problem arises in an ajax function called "listen()", which makes an ajax POST request to the server and waits until the server has responded or 10 seconds have passed, at which point it will fire up another "long- polling" request. However, each of these requests remains open in the case that they time out, and then the lighttpd server refuses any more connections after too many requests have been issued. I'm not sure what I'm doing wrong with my long-polling approach, but for some reason, these sockets are not getting closed and I've been unable to determine why after 2 days of investigating. Here is the client code (using jQuery): function listen(){ $.ajax({ url: "/chat/listen", timeout: 10000, // 10 seconds type: "GET", dataType: "json", success: function(r, st, xhr){ console.log(r); listen(); }, error: function(xhr, st, e){ if(st !== "timeout"){ return alert("error in listen(): " + st); } listen(); // If we timed out, fire another long poll } }); } $(function(){ listen(); }) And here is the server code: std::set<booster::shared_ptr<cppcms::http::context> > waiters_; ... void remove_context(booster::shared_ptr<cppcms::http::context> context) { waiters_.erase(context); } void listen() { booster::shared_ptr<cppcms::http::context> context = release_context(); waiters_.insert(context); context->async_on_peer_reset( std::bind( &chat::remove_context, this, context)); } After running the application for a few minutes, I run `sudo lsof -n | grep lighttpd` and the printout indicates that each request that has timed out has a corresponding socket that is hung in the CLOSE_WAIT state and remains that way until I bring the lighttpd server down and back up again. Any insight you can offer regarding this issue would be greatly appreciated. Please let me know if I should provide any more information; I have the lighttpd and cppcms configuration files on hand, as well as the rest of the client and server code, I just didn't want to have too much noise in this post. |
From: redred77 <red...@gm...> - 2016-01-15 00:01:06
|
Maybe you can reuse socket connection which is on CLOSE_WAIT state by changing linux system conf. I don't remember the actual command but you will easily find it from Google. Well I'm not sure whether this is the right solution but I experienced problem when linux system took too long wait time on CLOSE_WAIT and it caused blocking new socket connection. I hope it helps. |
From: Cody S. <cod...@gu...> - 2016-01-15 15:57:47
|
>>Cody Sheffler <cody.sheffler@...> writes: >> >> >> Hi all, >> >> I'm just starting to get my hands dirty with cppcms and long-polling, >> and I'm having some troubles. I've followed the example provided in the >> "chat" application of the examples directory and made some tweaks (code >> will be provided below). >> >> The problem arises in an ajax function called "listen()", which makes an >> ajax POST request to the server and waits until the server has responded >> or 10 seconds have passed, at which point it will fire up another "long- >> polling" request. However, each of these requests remains open in the >> case that they time out, and then the lighttpd server refuses any more >> connections after too many requests have been issued. I'm not sure what >> I'm doing wrong with my long-polling approach, but for some reason, >> these sockets are not getting closed and I've been unable to determine >> why after 2 days of investigating. >> >> Here is the client code (using jQuery): >> >> function listen(){ >> $.ajax({ >> url: "/chat/listen", >> timeout: 10000, // 10 seconds >> type: "GET", >> dataType: "json", >> success: function(r, st, xhr){ >> console.log(r); >> listen(); >> }, >> error: function(xhr, st, e){ >> if(st !== "timeout"){ >> return alert("error in listen(): " + st); >> } >> listen(); // If we timed out, fire another long poll >> } >> }); >> } >> >> $(function(){ >> listen(); >> }) >> >> And here is the server code: >> >> std::set<booster::shared_ptr<cppcms::http::context> > waiters_; >> >> ... >> >> void remove_context(booster::shared_ptr<cppcms::http::context> >> context) >> { >> >> waiters_.erase(context); >> } >> >> void listen() >> { >> booster::shared_ptr<cppcms::http::context> context = >> release_context(); >> waiters_.insert(context); >> context->async_on_peer_reset( >> std::bind( >> &chat::remove_context, >> this, >> context)); >> } >> >> After running the application for a few minutes, I run `sudo lsof -n | >> grep lighttpd` and the printout indicates that each request that has >> timed out has a corresponding socket that is hung in the CLOSE_WAIT >> state and remains that way until I bring the lighttpd server down and >> back up again. >> >> Any insight you can offer regarding this issue would be greatly >> appreciated. Please let me know if I should provide any more >> information; I have the lighttpd and cppcms configuration files on hand, >> as well as the rest of the client and server code, I just didn't want to >> have too much noise in this post. redred77 <redred77@...> writes: > > > Maybe you can reuse socket connection which is on CLOSE_WAIT state by changing linux system conf. I don't remember the actual command but you will easily find it from Google. > Well I'm not sure whether this is the right solution but I experienced problem when linux system took too long wait time on CLOSE_WAIT and it caused blocking new socket connection. I hope it helps. Thanks for the reply, redred! I don't want to poke around in the linux system config unless someone can tell me with confidence that that's where the issue lies. I feel like I should be able to handle this from either the client or the server application. Is there some cleanup I'm forgetting to perform in the server or client code I've posted? Or is cppcms not built to support long-polling? I apologize for my lack of expertise on this matter; I'm new to the nuances of the network stack. |
From: Cody S. <cod...@gu...> - 2016-01-15 15:04:49
|
>>Cody Sheffler <cody.sheffler@...> writes: >> >> >> Hi all, >> >> I'm just starting to get my hands dirty with cppcms and long-polling, >> and I'm having some troubles. I've followed the example provided in the >> "chat" application of the examples directory and made some tweaks (code >> will be provided below). >> >> The problem arises in an ajax function called "listen()", which makes an >> ajax POST request to the server and waits until the server has responded >> or 10 seconds have passed, at which point it will fire up another "long- >> polling" request. However, each of these requests remains open in the >> case that they time out, and then the lighttpd server refuses any more >> connections after too many requests have been issued. I'm not sure what >> I'm doing wrong with my long-polling approach, but for some reason, >> these sockets are not getting closed and I've been unable to determine >> why after 2 days of investigating. >> >> Here is the client code (using jQuery): >> >> function listen(){ >> $.ajax({ >> url: "/chat/listen", >> timeout: 10000, // 10 seconds >> type: "GET", >> dataType: "json", >> success: function(r, st, xhr){ >> console.log(r); >> listen(); >> }, >> error: function(xhr, st, e){ >> if(st !== "timeout"){ >> return alert("error in listen(): " + st); >> } >> listen(); // If we timed out, fire another long poll >> } >> }); >> } >> >> $(function(){ >> listen(); >> }) >> >> And here is the server code: >> >> std::set<booster::shared_ptr<cppcms::http::context> > waiters_; >> >> ... >> >> void remove_context(booster::shared_ptr<cppcms::http::context> >> context) >> { >> >> waiters_.erase(context); >> } >> >> void listen() >> { >> booster::shared_ptr<cppcms::http::context> context = >> release_context(); >> waiters_.insert(context); >> context->async_on_peer_reset( >> std::bind( >> &chat::remove_context, >> this, >> context)); >> } >> >> After running the application for a few minutes, I run `sudo lsof -n | >> grep lighttpd` and the printout indicates that each request that has >> timed out has a corresponding socket that is hung in the CLOSE_WAIT >> state and remains that way until I bring the lighttpd server down and >> back up again. >> >> Any insight you can offer regarding this issue would be greatly >> appreciated. Please let me know if I should provide any more >> information; I have the lighttpd and cppcms configuration files on hand, >> as well as the rest of the client and server code, I just didn't want to >> have too much noise in this post. redred77 <redred77@...> writes: > > > Maybe you can reuse socket connection which is on CLOSE_WAIT state by changing linux system conf. I don't remember the actual command but you will easily find it from Google. > Well I'm not sure whether this is the right solution but I experienced problem when linux system took too long wait time on CLOSE_WAIT and it caused blocking new socket connection. I hope it helps. Thanks for the reply, redred! I don't want to poke around in the linux system config unless someone can tell me with confidence that that's where the issue lies. I feel like I should be able to handle this from either the client or the server application. Is there some cleanup I'm forgetting to perform in the server or client code I've posted? Or is cppcms not built to support long-polling? I apologize for my lack of expertise on this matter; I'm new to the nuances of the network stack. |
From: Cody S. <cod...@gu...> - 2016-01-15 16:00:15
|
>>Cody Sheffler <cody.sheffler@...> writes: >> >> >> Hi all, >> >> I'm just starting to get my hands dirty with cppcms and long-polling, >> and I'm having some troubles. I've followed the example provided in the >> "chat" application of the examples directory and made some tweaks (code >> will be provided below). >> >> The problem arises in an ajax function called "listen()", which makes an >> ajax POST request to the server and waits until the server has responded >> or 10 seconds have passed, at which point it will fire up another "long- >> polling" request. However, each of these requests remains open in the >> case that they time out, and then the lighttpd server refuses any more >> connections after too many requests have been issued. I'm not sure what >> I'm doing wrong with my long-polling approach, but for some reason, >> these sockets are not getting closed and I've been unable to determine >> why after 2 days of investigating. >> >> Here is the client code (using jQuery): >> >> function listen(){ >> $.ajax({ >> url: "/chat/listen", >> timeout: 10000, // 10 seconds >> type: "GET", >> dataType: "json", >> success: function(r, st, xhr){ >> console.log(r); >> listen(); >> }, >> error: function(xhr, st, e){ >> if(st !== "timeout"){ >> return alert("error in listen(): " + st); >> } >> listen(); // If we timed out, fire another long poll >> } >> }); >> } >> >> $(function(){ >> listen(); >> }) >> >> And here is the server code: >> >> std::set<booster::shared_ptr<cppcms::http::context> > waiters_; >> >> ... >> >> void remove_context(booster::shared_ptr<cppcms::http::context> >> context) >> { >> >> waiters_.erase(context); >> } >> >> void listen() >> { >> booster::shared_ptr<cppcms::http::context> context = >> release_context(); >> waiters_.insert(context); >> context->async_on_peer_reset( >> std::bind( >> &chat::remove_context, >> this, >> context)); >> } >> >> After running the application for a few minutes, I run `sudo lsof -n | >> grep lighttpd` and the printout indicates that each request that has >> timed out has a corresponding socket that is hung in the CLOSE_WAIT >> state and remains that way until I bring the lighttpd server down and >> back up again. >> >> Any insight you can offer regarding this issue would be greatly >> appreciated. Please let me know if I should provide any more >> information; I have the lighttpd and cppcms configuration files on hand, >> as well as the rest of the client and server code, I just didn't want to >> have too much noise in this post. redred77 <redred77@...> writes: > > > Maybe you can reuse socket connection which is on CLOSE_WAIT state by changing linux system conf. I don't remember the actual command but you will easily find it from Google. > Well I'm not sure whether this is the right solution but I experienced problem when linux system took too long wait time on CLOSE_WAIT and it caused blocking new socket connection. I hope it helps. Thanks for the reply, redred! I don't want to poke around in the linux system config unless someone can tell me with confidence that that's where the issue lies. I feel like I should be able to handle this from either the client or the server application. Is there some cleanup I'm forgetting to perform in the server or client code I've posted? Or is cppcms not built to support long-polling? I apologize for my lack of expertise on this matter; I'm new to the nuances of the network stack. |
From: Artyom B. <art...@gm...> - 2016-01-15 16:36:35
|
The problem is that detecting that the connection isn't alive isn't reliable. Especially when working via webserver and fastcgi/scgi. You always need to have timeouts on connections especially long-polling ones. For example if you know that client resubmits request every 10 seconds (also I'd suggest to increase it) put your own 10 seconds timeout as well as you know that the client will disconnect so why to keep one alive. Always manage timeouts on asynchronous requests. Artyo > > The problem arises in an ajax function called "listen()", which makes an > ajax POST request to the server and waits until the server has responded > or 10 seconds have passed, at which point it will fire up another "long- > polling" request. However, each of these requests remains open in the > case that they time out, and then the lighttpd server refuses any more > connections after too many requests have been issued. I'm not sure what > I'm doing wrong with my long-polling approach, but for some reason, > these sockets are not getting closed and I've been unable to determine > why after 2 days of investigating. > > |
From: Cody S. <cod...@gu...> - 2016-01-15 16:54:57
|
Artyom Beilis <artyom.beilis@...> writes: > > The problem is that detecting that the connection isn't alive isn't reliable. > > Especially when working via webserver and fastcgi/scgi. > > You always need to have timeouts on connections especially long- polling > ones. > > For example if you know that client resubmits request every 10 seconds > (also I'd suggest > to increase it) put your own 10 seconds timeout as well as you know that the > client will disconnect so why to keep one alive. > > Always manage timeouts on asynchronous requests. > > Artyo > > > > > The problem arises in an ajax function called "listen()", which makes an > > ajax POST request to the server and waits until the server has responded > > or 10 seconds have passed, at which point it will fire up another "long- > > polling" request. However, each of these requests remains open in the > > case that they time out, and then the lighttpd server refuses any more > > connections after too many requests have been issued. I'm not sure what > > I'm doing wrong with my long-polling approach, but for some reason, > > these sockets are not getting closed and I've been unable to determine > > why after 2 days of investigating. > > > > Thanks, that seems to really address the heart of the issue. Just for clarification: async_on_peer_reset isn't automagically tracking the client timeout and performing some cleanup? I think that was the fundamental assumption that I had that went wrong. I expected that in this code: void listen() { booster::shared_ptr<cppcms::http::context> context = release_context(); waiters_.insert(context); context->async_on_peer_reset( std::bind( &chat::remove_context, this, context)); } the callback passed into async_on_peer_reset would get called on client timeout. The documentation here: http://cppcms.com/cppcms_ref/latest/classcppcms_1_1http_1_1context.html# a0f6fe53deabd90fee0ced81a8df6404e seems to indicate that that is a valid expectation. P.S. Sorry for blowing up the thread above with repeat submissions; I thought gmane may have dropped my post. I'll be more patient this time ;) Also, thanks for the framework and all your hard work; I'm having a great time jumping into web dev. and C++ with your tool! |
From: Artyom B. <art...@gm...> - 2016-01-15 17:43:18
|
> > Thanks, that seems to really address the heart of the issue. > > Just for clarification: async_on_peer_reset isn't automagically tracking > the client timeout and performing some cleanup? I think that was the > fundamental assumption that I had that went wrong. I expected that in > this code: > > void listen() > { > booster::shared_ptr<cppcms::http::context> context = > release_context(); > waiters_.insert(context); > context->async_on_peer_reset( > std::bind( > &chat::remove_context, > this, > context)); > } > > the callback passed into async_on_peer_reset would get called on client > timeout. The documentation here: > http://cppcms.com/cppcms_ref/latest/classcppcms_1_1http_1_1context.html# > a0f6fe53deabd90fee0ced81a8df6404e > seems to indicate that that is a valid expectation. > Yes and not: if client closes the connection it is reported over TCP/IP to the server and thus server can detect it and fire async_on_peer_reset. There are two problems: (a) if client does not cooperate and just gone - SO_KEEPALIVE timeout settings of TCP/IP should be quite tight to detect an issue (b) The server should report it to the FastCGI/SCGI application - and it is actually the major problem. Most of the servers I tested didn't do it reliably or did it on FastCGI but not SCGI or other way around. Bottom line it shouldn't be considered reliable. It works well with cooperative client with CppCMS's web server or web servers that handle connections properly. Artyom P.S.: I was sure that I published a blog article regarding this but I'm mistaken. So I probably need to write about it - how to detect a client that gone. |
From: Cody S. <cod...@gu...> - 2016-01-15 17:55:36
|
Artyom Beilis <artyom.beilis@...> writes: > > Bottom line it shouldn't be considered reliable. It works well with > cooperative client with CppCMS's web server or web servers that > handle connections properly. > > Artyom > > P.S.: I was sure that I published a blog article regarding this but > I'm mistaken. So I probably need to write about it - how to detect a > client that gone. > Awesome, thanks again for the clarification! If you get that blog post written and you remember, you should follow-up to this thread with a link; I would love to read it and find out more about the issue. Cheers! |