Re: [Quickfix-developers] i'm seeing deadlocks...
Brought to you by:
orenmnero
From: Mark T. K. <mke...@di...> - 2008-04-24 20:33:42
|
the transition from a waiting-to-be-sent message queue length of zero to a length of one triggers the writing of a notification byte to the Acceptor's select loop. this will eventually cause the Acceptor's select() call to return and setup a "can-do-a-write-without-blocking" callback so that the Acceptor thread can drain the message queue as TCP buffer space becomes available. what is perhaps not obvious is that a transition from a queue length of one to a queue length of one can also happen during a send operation by the main thread. and this transition also writes a notification byte. these 1=>1 transitions can happen repeatedly, eventually filling the notification buffer causing the main thread to block (thus triggering deadlock). this is because signal() writes its byte when the queue size == 1, not when there is a transition from an old size of 0 to a new size of 1. consider the following sequence: 1) main thread queues a message to send (queue length goes from 0 to 1) 2) kernel buffer is full, so no attempt to send the message directly on the main thread is made to avoid blocking the main thread. 3) the main thread writes a signal byte to trigger the asynchronous delivery of the queued message later by the Acceptor thread. 4) main thread queues another message (queue length goes from 1 to 2). 5) kernel buffer space is now available (because the message receiver has read some messages), so the main thread can write without blocking. it dequeues message one (the old queued message) and sends all of it, leaving the new message (which was message 2) as the new message 1. 6) since the queue size is again just 1, a signal byte is written by the main thread. when a large number of messages is sent from the main thread all at once, it is possible for this scenario to happen over and over again once a single message has been queued if the message receiver suddenly opens up a lot of buffer space. /mark or...@qu... wrote: > Interesting. Didn't think about the signal socket blocking into a > deadlock. I'll look into that. > > -------- Original Message -------- > Subject: [Quickfix-developers] i'm seeing deadlocks... > From: "Mark T. Kennedy" <mke...@di...> > Date: Wed, April 23, 2008 3:43 pm > To: quickfix developers <qui...@li...urceforge > <mailto:qui...@li...urceforge>.net> > > QuickFIX Documentation: > <http://www.quickfixengine.org/quickfix/doc/html/index.html>http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX Support: > <http://www.quickfixengine.org/services.html>http://www.quickfixengine.org/services.html > > ------------------------------------------------------------------------ > > ... while writing to the 'signal' socket (pipe) used to implement > non-blocking sends. > > i have a test where i send 6,000+ orders in a batch and receive 6,000 > acks and 6,000 fills in response. in the middle of it, i shut down > and restart a proxy that sits between the sender and the exchange > simulator. every now and then, this triggers a deadlock in the > exchange simulator (see the attached stack trace). > > since a write to the 'signal' socket can block, sendToTarget can > still block, and that restores the oft-discussed deadlock scenario > that the non-blocking send implementation sought to avoid. > > thoughts/comments? i'm using the trunk for my tests, not 12.4. > > /mark > > > This communication and any attachments may contain > confidential/proprietary information and is intended for information > purposes only. It is not an invitation or offer to purchase > interests from Diamondback. Any representation to the contrary is > unintentional. Th is communication is intended only for the > person(s) to whom it is addressed. If you are not the intended > recipient you are hereby notified that you have received this > document in error and that any review, dissemination, distribution, > or copying of this message or any attachments is not permitted. If > you have received this in error, please notify the sender > immediately by e-mail and delete this message. All e-mails sent to > or received from this address will be received by Diamondback's > company e-mail system and is subject to archival and possible review > by someone other than the recipient. This notice is automatically > appended to each e-mail message leaving Diamondback. > ------------------------------------------------------------------------ > Thread 3 (Thread 1084229952 (LWP 24498)): > #0 0x0000003a3ffc5882 in __select_nocancel () from /lib64/libc.so.6 > #1 0x00002aaaaf78315f in FIX::SocketMonitor::block () > #2 0x00002aaaaf770f74 in FIX::SocketServer::block () > #3 0x00002aaaaf7d60ac in FIX::HttpServer::onStart () > #4 0x00002aaaaf7d612f in FIX::HttpServer::startThread () > #5 0x0000003a40e06337 in start_thread () from /lib64/libpthread.so.0 > #6 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6 > #7 0x0000000000000000 in ?? () > Thread 2 (Thread 1094719808 (LWP 24499)): > #0 0x0000003a40e0bb58 in __lll_mutex_lock_wait () from > /lib64/libpthread.so.0 > #1 0x0000003a40e0839e in _L_mutex_lock_65 () from /lib64/libpthread.so.0 > #2 0x0000003a40e0813b in pthread_mutex_lock () from > /lib64/libpthread.so.0 > #3 0x00002aaaaf59aac2 in FIX::Mutex::lock () > #4 0x00002aaaaf59ab11 in FIX::Locker::Locker () > #5 0x00002aaaaf78678a in FIX::SocketConnection::processQueue () > #6 0x00002aaaaf77abca in FIX::Sock etAcceptor::onWrite () > #7 0x00002aaaaf771406 in FIX::ServerWrapper::onWrite () > #8 0x00002aaaaf782b25 in FIX::SocketMonitor::processWriteSet () > #9 0x00002aaaaf7831c5 in FIX::SocketMonitor::block () > #10 0x00002aaaaf770f74 in FIX::SocketServer::block () > #11 0x00002aaaaf77b019 in FIX::SocketAcceptor::onStart () > #12 0x00002aaaaf773e1c in FIX::Acceptor::startThread () > #13 0x0000003a40e06337 in start_thread () from /lib64/libpthread.so.0 > #14 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6 > #15 0x0000000000000000 in ?? () > Thread 1 (Thread 46912498585648 (LWP 24489)): > #0 0x0000003a3ffcd021 in send () from /lib64/libc.so.6 > #1 0x00002aaaaf7d75c3 in FIX::socket_send () > #2 0x00002aaaaf782bbc in FIX::SocketMonitor::signal () > #3 0x00002aaaaf788bcf in FIX::SocketConnection::signal () > #4 0x00002aaaaf786a71 in FIX::SocketConnection::send () > #5 0x00002aaaaf743375 in FIX::Session::send () > #6 0x00002aaaaf744978 in FIX::Session::sendRaw () > #7 0x00002aaaaf74a7ce in FIX::Session::send () > #8 0x00002aaaaf74a91e in FIX::Session::sendToTarget () > #9 0x00002aaaaf598698 in quickfix_wrapper::send () > #10 0x00002aaaaf59885f in stp_quickfix_send () > #11 0x00002aaaaf488b39 in XS_STP__QuickFIX_stp_quickfix_send () > #12 0x00002aaaaab30f3a in Perl_pp_entersub () > #13 0x00002aaaaab2f6ea in Perl_runops_standard () > #14 0x00002aaaaaadfd5d in Perl_call_sv () > #15 0x00002aaaae2e8090 in pe_event_invoke () > #16 0x00002aaaae2e8210 in pe_empty_queue () > #17 0x00002aaaae2e8df8 in one_event () > #18 0x00002aaaae2e900d in XS_Event__loop () > #19 0x00002aaaaab30f3a in Perl_pp_entersub () > #20 0x00002aaaaab2f6ea in Perl_runops_standard () > #21 0x00002aaaaaae05ec in perl_run () > #22 0x000000000040165c in main () > ------------------------------------------------------------------------ > ------------------------------------------------------------------------- > This SF.net <http://SF.net> email is sponsored by the 2008 > JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > <http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone>http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > ------------------------------------------------------------------------ > _______________________________________________ > Quickfix-developers mailing list > Quickfix-developers**@lists.sourceforge.net <#Compose> > <https://lists.sourceforge.net/lists/listinfo/quickfix-developers>https://lists.sourceforge.net/lists/listinfo/quickfix-developers > This communication and any attachments may contain confidential/proprietary information and is intended for information purposes only. It is not an invitation or offer to purchase interests from Diamondback. Any representation to the contrary is unintentional. This communication is intended only for the person(s) to whom it is addressed. If you are not the intended recipient you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message or any attachments is not permitted. If you have received this in error, please notify the sender immediately by e-mail and delete this message. All e-mails sent to or received from this address will be received by Diamondback's company e-mail system and is subject to archival and possible review by someone other than the recipient. This notice is automatically appended to each e-mail message leaving Diamondback. |