Thread: [asio-users] deadlock when handler destructor causes close()
Brought to you by:
chris_kohlhoff
From: Arvid N. <c9...@cs...> - 2006-01-24 13:45:20
|
I'm now experiencing a deadlock with the cvs version of asio (from a few days ago). It occurs when I have one connection receiving data continously and another connection that is closed. The connection that is being closed will cause the deadlock because the last shared_ptr to the connection object (which in turn is holding the tcp::socket) is stored in the handler. When the handler is destructed the socket will be destructed and closed, and close will lock the demuxer service, and deadlock. I've made sure that none of the other threads may be waiting for a lock inside a callback, that might prevent asio from getting its own locks. The main thread is blocking in a select() call to stdin waiting for user input. The 3:rd thread is a worker thread that is waiting for jobs (blocking on its own mutex). The 4:th thread is asio's hostname lookup thread, and I assume it's not interfering with the lock used by the 2:nd thread (the demuxer's thread). It's blocking on the kqueue call anyway. Unfortunately I've not been able to reproduce this in a smaller test. I've put the callstacks for thread 2 and 4 (the asio threads) here: http://www.cs.umu.se/~c99ang/deadlock_stack.txt Any help/suggestions would be appreciated. thanks. -- Arvid Norberg |
From: Christopher K. <ch...@ko...> - 2006-01-25 06:54:27
|
Hi Arvid, --- Arvid Norberg <c9...@cs...> wrote: > I'm now experiencing a deadlock with the cvs version of asio > (from a few days ago). It occurs when I have one connection > receiving data continously and another connection that is > closed. The connection that is being closed will cause the > deadlock because the last shared_ptr to the connection object > (which in turn is holding the tcp::socket) is stored in the > handler. When the handler is destructed the socket will be > destructed and closed, and close will lock the demuxer > service, and deadlock. I think I know what the problem is, and I have just committed a change to CVS to try to fix it. Can you please give it a go and see what happens. It should show up in the public sourceforge CVS in about 6 hours. I'm planning to rewrite the reactor stuff soon anyway to eliminate memory allocations, reduce locking time, and generally improve performance. The kqueue_reactor will be the first to be rewritten, as I'm using Mac OS X as my main dev platform. Cheers, Chris |
From: Arvid N. <c9...@cs...> - 2006-01-25 07:02:12
|
On Jan 25, 2006, at 07:54, Christopher Kohlhoff wrote: > Hi Arvid, > > --- Arvid Norberg <c9...@cs...> wrote: >> I'm now experiencing a deadlock with the cvs version of asio >> (from a few days ago). It occurs when I have one connection >> receiving data continously and another connection that is >> closed. The connection that is being closed will cause the >> deadlock because the last shared_ptr to the connection object >> (which in turn is holding the tcp::socket) is stored in the >> handler. When the handler is destructed the socket will be >> destructed and closed, and close will lock the demuxer >> service, and deadlock. > > I think I know what the problem is, and I have just committed a > change to CVS to try to fix it. Can you please give it a go and > see what happens. It should show up in the public sourceforge > CVS in about 6 hours. > > I'm planning to rewrite the reactor stuff soon anyway to > eliminate memory allocations, reduce locking time, and generally > improve performance. The kqueue_reactor will be the first to be > rewritten, as I'm using Mac OS X as my main dev platform. Ok, great! I will try this as soon as possible. I'll be out of town for the next 3 weeks, so no sooner than that. -- Arvid Norberg |
From: Arvid N. <c9...@cs...> - 2006-02-18 18:08:53
|
On Jan 25, 2006, at 07:54, Christopher Kohlhoff wrote: > Hi Arvid, > > --- Arvid Norberg <c9...@cs...> wrote: >> I'm now experiencing a deadlock with the cvs version of asio >> (from a few days ago). It occurs when I have one connection >> receiving data continously and another connection that is >> closed. The connection that is being closed will cause the >> deadlock because the last shared_ptr to the connection object >> (which in turn is holding the tcp::socket) is stored in the >> handler. When the handler is destructed the socket will be >> destructed and closed, and close will lock the demuxer >> service, and deadlock. > > I think I know what the problem is, and I have just committed a > change to CVS to try to fix it. Can you please give it a go and > see what happens. It should show up in the public sourceforge > CVS in about 6 hours. > > I'm planning to rewrite the reactor stuff soon anyway to > eliminate memory allocations, reduce locking time, and generally > improve performance. The kqueue_reactor will be the first to be > rewritten, as I'm using Mac OS X as my main dev platform. I don't experience any problems anymore. Nice work. -- Arvid Norberg |
From: Arvid N. <c9...@cs...> - 2006-02-20 01:57:23
|
On Feb 18, 2006, at 19:08, Arvid Norberg wrote: > I don't experience any problems anymore. Nice work. I was a little bit too quick on that conclusion. The deadlock is gone. But instead the async_by_name (formerly known as: async_get_host_by_name) always results in an "operation aborted". The synchronous version works though. This worked fine in the cvs version from about 3 weeks ago, but not in today's version. (This is still MacOS X) thanks. -- Arvid Norberg |
From: Christopher K. <ch...@ko...> - 2006-02-20 03:50:46
|
Hi Arvid, --- Arvid Norberg <c9...@cs...> wrote: > On Feb 18, 2006, at 19:08, Arvid Norberg wrote: > > I don't experience any problems anymore. Nice work. > > I was a little bit too quick on that conclusion. > > The deadlock is gone. But instead the async_by_name (formerly known > as: async_get_host_by_name) Expect even more changes when I finish adding IPv6 support ;) > always results in an "operation aborted". > The synchronous version works though. > > This worked fine in the cvs version from about 3 weeks ago, but not > in today's version. (This is still MacOS X) I see what the problem is - my fault for not testing some changes I did thoroughly :) Sorry. Please try the following change to asio/ipv4/detail/host_resolver_service.hpp: --- host_resolver_service.hpp 2 Feb 2006 07:02:03 -0000 1.19 +++ host_resolver_service.hpp 20 Feb 2006 03:48:56 -0000 @@ -71,6 +71,7 @@ // as a cancellation token to indicate to the background thread that the // operation has been cancelled. typedef boost::shared_ptr<void> implementation_type; + struct noop_deleter { void operator()(void*) {} }; // Constructor. host_resolver_service(IO_Service& d) @@ -105,6 +106,7 @@ // Construct a new host resolver implementation. void construct(implementation_type& impl) { + impl.reset(static_cast<void*>(0), noop_deleter()); } // Destroy a host resolver implementation. @@ -115,7 +117,7 @@ /// Cancel pending asynchronous operations. void cancel(implementation_type& impl) { - impl.reset(0); + impl.reset(static_cast<void*>(0), noop_deleter()); } Cheers, Chris |
From: Arvid N. <c9...@cs...> - 2006-02-20 10:31:24
|
On Feb 20, 2006, at 04:50, Christopher Kohlhoff wrote: > [...] > I see what the problem is - my fault for not testing some changes I > did > thoroughly :) Sorry. > > Please try the following change to > asio/ipv4/detail/host_resolver_service.hpp: > [...] I presume that these are the changes made on cvs as well, and with those it works perfectly. Thanks! regards -- Arvid Norberg |