Thread: Re: [Quickfix-developers] i'm seeing deadlocks...
Brought to you by:
orenmnero
From: <or...@qu...> - 2008-04-23 21:29:37
|
<html><body>Interesting. Didn't think about the signal socket blocking into a deadlock. I'll look into that.<BR><BR> <BLOCKQUOTE style="PADDING-LEFT: 8px; MARGIN-LEFT: 8px; BORDER-LEFT: blue 2px solid" webmail="1">-------- Original Message --------<BR>Subject: [Quickfix-developers] i'm seeing deadlocks...<BR>From: "Mark T. Kennedy" <mke...@di...><BR>Date: Wed, April 23, 2008 3:43 pm<BR>To: quickfix developers <<a href="mailto:qui...@li...urceforge">qui...@li...urceforge</a>.net><BR><BR>QuickFIX Documentation: <A href="http://www.quickfixengine.org/quickfix/doc/html/index.html" target=_blank><a href="http://www.quickfixengine.org/quickfix/doc/html/index.html">http://www.quickfixengine.org/quickfix/doc/html/index.html</a></A><BR>QuickFIX Support: <A href="http://www.quickfixengine.org/services.html" target=_blank><a href="http://www.quickfixengine.org/services.html">http://www.quickfixengine.org/services.html</a></A><BR><BR> <HR> <BR>... while writing to the 'signal' socket (pipe) used to implement<BR>non-blocking sends.<BR><BR>i have a test where i send 6,000+ orders in a batch and receive 6,000<BR>acks and 6,000 fills in response. in the middle of it, i shut down<BR>and restart a proxy that sits between the sender and the exchange<BR>simulator. every now and then, this triggers a deadlock in the<BR>exchange simulator (see the attached stack trace).<BR><BR>since a write to the 'signal' socket can block, sendToTarget can<BR>still block, and that restores the oft-discussed deadlock scenario<BR>that the non-blocking send implementation sought to avoid.<BR><BR>thoughts/comments? i'm using the trunk for my tests, not 12.4.<BR><BR>/mark<BR><BR><BR>This communication and any attachments may contain confidential/proprietary information and is intended for information purposes only. It is not an invitation or offer to purchase interests from Diamondback. Any representation to the contrary is unintentional. This communication is intended only for the person(s) to whom it is addressed. If you are not the intended recipient you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message or any attachments is not permitted. If you have received this in error, please notify the sender immediately by e-mail and delete this message. All e-mails sent to or received from this address will be received by Diamondback's company e-mail system and is subject to archival and possible review by someone other than the recipient. This notice is automatically appended to each e-mail message leaving Diamondback. <HR> Thread 3 (Thread 1084229952 (LWP 24498)):<BR>#0 0x0000003a3ffc5882 in __select_nocancel () from /lib64/libc.so.6<BR>#1 0x00002aaaaf78315f in FIX::SocketMonitor::block ()<BR>#2 0x00002aaaaf770f74 in FIX::SocketServer::block ()<BR>#3 0x00002aaaaf7d60ac in FIX::HttpServer::onStart ()<BR>#4 0x00002aaaaf7d612f in FIX::HttpServer::startThread ()<BR>#5 0x0000003a40e06337 in start_thread () from /lib64/libpthread.so.0<BR>#6 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6<BR>#7 0x0000000000000000 in ?? ()<BR>Thread 2 (Thread 1094719808 (LWP 24499)):<BR>#0 0x0000003a40e0bb58 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0<BR>#1 0x0000003a40e0839e in _L_mutex_lock_65 () from /lib64/libpthread.so.0<BR>#2 0x0000003a40e0813b in pthread_mutex_lock () from /lib64/libpthread.so.0<BR>#3 0x00002aaaaf59aac2 in FIX::Mutex::lock ()<BR>#4 0x00002aaaaf59ab11 in FIX::Locker::Locker ()<BR>#5 0x00002aaaaf78678a in FIX::SocketConnection::processQueue ()<BR>#6 0x00002aaaaf77abca in FIX::SocketAcceptor::onWrite ()<BR>#7 0x00002aaaaf771406 in FIX::ServerWrapper::onWrite ()<BR>#8 0x00002aaaaf782b25 in FIX::SocketMonitor::processWriteSet ()<BR>#9 0x00002aaaaf7831c5 in FIX::SocketMonitor::block ()<BR>#10 0x00002aaaaf770f74 in FIX::SocketServer::block ()<BR>#11 0x00002aaaaf77b019 in FIX::SocketAcceptor::onStart ()<BR>#12 0x00002aaaaf773e1c in FIX::Acceptor::startThread ()<BR>#13 0x0000003a40e06337 in start_thread () from /lib64/libpthread.so.0<BR>#14 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6<BR>#15 0x0000000000000000 in ?? ()<BR>Thread 1 (Thread 46912498585648 (LWP 24489)):<BR>#0 0x0000003a3ffcd021 in send () from /lib64/libc.so.6<BR>#1 0x00002aaaaf7d75c3 in FIX::socket_send ()<BR>#2 0x00002aaaaf782bbc in FIX::SocketMonitor::signal ()<BR>#3 0x00002aaaaf788bcf in FIX::SocketConnection::signal ()<BR>#4 0x00002aaaaf786a71 in FIX::SocketConnection::send ()<BR>#5 0x00002aaaaf743375 in FIX::Session::send ()<BR>#6 0x00002aaaaf744978 in FIX::Session::sendRaw ()<BR>#7 0x00002aaaaf74a7ce in FIX::Session::send ()<BR>#8 0x00002aaaaf74a91e in FIX::Session::sendToTarget ()<BR>#9 0x00002aaaaf598698 in quickfix_wrapper::send ()<BR>#10 0x00002aaaaf59885f in stp_quickfix_send ()<BR>#11 0x00002aaaaf488b39 in XS_STP__QuickFIX_stp_quickfix_send ()<BR>#12 0x00002aaaaab30f3a in Perl_pp_entersub ()<BR>#13 0x00002aaaaab2f6ea in Perl_runops_standard ()<BR>#14 0x00002aaaaaadfd5d in Perl_call_sv ()<BR>#15 0x00002aaaae2e8090 in pe_event_invoke ()<BR>#16 0x00002aaaae2e8210 in pe_empty_queue ()<BR>#17 0x00002aaaae2e8df8 in one_event ()<BR>#18 0x00002aaaae2e900d in XS_Event__loop ()<BR>#19 0x00002aaaaab30f3a in Perl_pp_entersub ()<BR>#20 0x00002aaaaab2f6ea in Perl_runops_standard ()<BR>#21 0x00002aaaaaae05ec in perl_run ()<BR>#22 0x000000000040165c in main ()<BR> <HR> -------------------------------------------------------------------------<BR>This <a href="http://SF.net">SF.net</a> email is sponsored by the 2008 JavaOne(SM) Conference <BR>Don't miss this year's exciting event. There's still time to save $100. <BR>Use priority code J8TL2D2. <BR><A href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone" target=_blank><a href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone">http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone</a></A> <HR> _______________________________________________<BR>Quickfix-developers mailing list<BR><A onclick="Popup.composeWindow('pcompose.php?sendto=Quickfix-developers%40lists.sourceforge.net'); return false;" href="#Compose">Quickfix-developers<B></B>@lists.sourceforge.net</A><BR><A href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers" target=_blank><a href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers">https://lists.sourceforge.net/lists/listinfo/quickfix-developers</a></A> </BLOCKQUOTE></body></html> |
From: Mark T. K. <mke...@di...> - 2008-04-24 20:33:42
|
the transition from a waiting-to-be-sent message queue length of zero to a length of one triggers the writing of a notification byte to the Acceptor's select loop. this will eventually cause the Acceptor's select() call to return and setup a "can-do-a-write-without-blocking" callback so that the Acceptor thread can drain the message queue as TCP buffer space becomes available. what is perhaps not obvious is that a transition from a queue length of one to a queue length of one can also happen during a send operation by the main thread. and this transition also writes a notification byte. these 1=>1 transitions can happen repeatedly, eventually filling the notification buffer causing the main thread to block (thus triggering deadlock). this is because signal() writes its byte when the queue size == 1, not when there is a transition from an old size of 0 to a new size of 1. consider the following sequence: 1) main thread queues a message to send (queue length goes from 0 to 1) 2) kernel buffer is full, so no attempt to send the message directly on the main thread is made to avoid blocking the main thread. 3) the main thread writes a signal byte to trigger the asynchronous delivery of the queued message later by the Acceptor thread. 4) main thread queues another message (queue length goes from 1 to 2). 5) kernel buffer space is now available (because the message receiver has read some messages), so the main thread can write without blocking. it dequeues message one (the old queued message) and sends all of it, leaving the new message (which was message 2) as the new message 1. 6) since the queue size is again just 1, a signal byte is written by the main thread. when a large number of messages is sent from the main thread all at once, it is possible for this scenario to happen over and over again once a single message has been queued if the message receiver suddenly opens up a lot of buffer space. /mark or...@qu... wrote: > Interesting. Didn't think about the signal socket blocking into a > deadlock. I'll look into that. > > -------- Original Message -------- > Subject: [Quickfix-developers] i'm seeing deadlocks... > From: "Mark T. Kennedy" <mke...@di...> > Date: Wed, April 23, 2008 3:43 pm > To: quickfix developers <qui...@li...urceforge > <mailto:qui...@li...urceforge>.net> > > QuickFIX Documentation: > <http://www.quickfixengine.org/quickfix/doc/html/index.html>http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX Support: > <http://www.quickfixengine.org/services.html>http://www.quickfixengine.org/services.html > > ------------------------------------------------------------------------ > > ... while writing to the 'signal' socket (pipe) used to implement > non-blocking sends. > > i have a test where i send 6,000+ orders in a batch and receive 6,000 > acks and 6,000 fills in response. in the middle of it, i shut down > and restart a proxy that sits between the sender and the exchange > simulator. every now and then, this triggers a deadlock in the > exchange simulator (see the attached stack trace). > > since a write to the 'signal' socket can block, sendToTarget can > still block, and that restores the oft-discussed deadlock scenario > that the non-blocking send implementation sought to avoid. > > thoughts/comments? i'm using the trunk for my tests, not 12.4. > > /mark > > > This communication and any attachments may contain > confidential/proprietary information and is intended for information > purposes only. It is not an invitation or offer to purchase > interests from Diamondback. Any representation to the contrary is > unintentional. Th is communication is intended only for the > person(s) to whom it is addressed. If you are not the intended > recipient you are hereby notified that you have received this > document in error and that any review, dissemination, distribution, > or copying of this message or any attachments is not permitted. If > you have received this in error, please notify the sender > immediately by e-mail and delete this message. All e-mails sent to > or received from this address will be received by Diamondback's > company e-mail system and is subject to archival and possible review > by someone other than the recipient. This notice is automatically > appended to each e-mail message leaving Diamondback. > ------------------------------------------------------------------------ > Thread 3 (Thread 1084229952 (LWP 24498)): > #0 0x0000003a3ffc5882 in __select_nocancel () from /lib64/libc.so.6 > #1 0x00002aaaaf78315f in FIX::SocketMonitor::block () > #2 0x00002aaaaf770f74 in FIX::SocketServer::block () > #3 0x00002aaaaf7d60ac in FIX::HttpServer::onStart () > #4 0x00002aaaaf7d612f in FIX::HttpServer::startThread () > #5 0x0000003a40e06337 in start_thread () from /lib64/libpthread.so.0 > #6 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6 > #7 0x0000000000000000 in ?? () > Thread 2 (Thread 1094719808 (LWP 24499)): > #0 0x0000003a40e0bb58 in __lll_mutex_lock_wait () from > /lib64/libpthread.so.0 > #1 0x0000003a40e0839e in _L_mutex_lock_65 () from /lib64/libpthread.so.0 > #2 0x0000003a40e0813b in pthread_mutex_lock () from > /lib64/libpthread.so.0 > #3 0x00002aaaaf59aac2 in FIX::Mutex::lock () > #4 0x00002aaaaf59ab11 in FIX::Locker::Locker () > #5 0x00002aaaaf78678a in FIX::SocketConnection::processQueue () > #6 0x00002aaaaf77abca in FIX::Sock etAcceptor::onWrite () > #7 0x00002aaaaf771406 in FIX::ServerWrapper::onWrite () > #8 0x00002aaaaf782b25 in FIX::SocketMonitor::processWriteSet () > #9 0x00002aaaaf7831c5 in FIX::SocketMonitor::block () > #10 0x00002aaaaf770f74 in FIX::SocketServer::block () > #11 0x00002aaaaf77b019 in FIX::SocketAcceptor::onStart () > #12 0x00002aaaaf773e1c in FIX::Acceptor::startThread () > #13 0x0000003a40e06337 in start_thread () from /lib64/libpthread.so.0 > #14 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6 > #15 0x0000000000000000 in ?? () > Thread 1 (Thread 46912498585648 (LWP 24489)): > #0 0x0000003a3ffcd021 in send () from /lib64/libc.so.6 > #1 0x00002aaaaf7d75c3 in FIX::socket_send () > #2 0x00002aaaaf782bbc in FIX::SocketMonitor::signal () > #3 0x00002aaaaf788bcf in FIX::SocketConnection::signal () > #4 0x00002aaaaf786a71 in FIX::SocketConnection::send () > #5 0x00002aaaaf743375 in FIX::Session::send () > #6 0x00002aaaaf744978 in FIX::Session::sendRaw () > #7 0x00002aaaaf74a7ce in FIX::Session::send () > #8 0x00002aaaaf74a91e in FIX::Session::sendToTarget () > #9 0x00002aaaaf598698 in quickfix_wrapper::send () > #10 0x00002aaaaf59885f in stp_quickfix_send () > #11 0x00002aaaaf488b39 in XS_STP__QuickFIX_stp_quickfix_send () > #12 0x00002aaaaab30f3a in Perl_pp_entersub () > #13 0x00002aaaaab2f6ea in Perl_runops_standard () > #14 0x00002aaaaaadfd5d in Perl_call_sv () > #15 0x00002aaaae2e8090 in pe_event_invoke () > #16 0x00002aaaae2e8210 in pe_empty_queue () > #17 0x00002aaaae2e8df8 in one_event () > #18 0x00002aaaae2e900d in XS_Event__loop () > #19 0x00002aaaaab30f3a in Perl_pp_entersub () > #20 0x00002aaaaab2f6ea in Perl_runops_standard () > #21 0x00002aaaaaae05ec in perl_run () > #22 0x000000000040165c in main () > ------------------------------------------------------------------------ > ------------------------------------------------------------------------- > This SF.net <http://SF.net> email is sponsored by the 2008 > JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > <http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone>http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > ------------------------------------------------------------------------ > _______________________________________________ > Quickfix-developers mailing list > Quickfix-developers**@lists.sourceforge.net <#Compose> > <https://lists.sourceforge.net/lists/listinfo/quickfix-developers>https://lists.sourceforge.net/lists/listinfo/quickfix-developers > This communication and any attachments may contain confidential/proprietary information and is intended for information purposes only. It is not an invitation or offer to purchase interests from Diamondback. Any representation to the contrary is unintentional. This communication is intended only for the person(s) to whom it is addressed. If you are not the intended recipient you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message or any attachments is not permitted. If you have received this in error, please notify the sender immediately by e-mail and delete this message. All e-mails sent to or received from this address will be received by Diamondback's company e-mail system and is subject to archival and possible review by someone other than the recipient. This notice is automatically appended to each e-mail message leaving Diamondback. |
From: Mark T. K. <mke...@di...> - 2008-04-24 21:28:48
|
this patch seems to work around the problem and passes all of the unit and app tests: Index: SocketConnection.cpp =================================================================== --- SocketConnection.cpp (revision 1944) +++ SocketConnection.cpp (working copy) @@ -65,9 +65,11 @@ Locker l( m_mutex ); + int old_queue_size = m_sendQueue.size(); m_sendQueue.push_back( msg ); processQueue(); - signal(); + if ( old_queue_size == 0 && m_sendQueue.size() == 1 ) + signal(); return true; QF_STACK_POP /mark Mark T. Kennedy wrote: > QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX Support: http://www.quickfixengine.org/services.html > > > the transition from a waiting-to-be-sent message queue length of zero > to a length of one triggers the writing of a notification byte to the > Acceptor's select loop. this will eventually cause the Acceptor's > select() call to return and setup a "can-do-a-write-without-blocking" > callback so that the Acceptor thread can drain the message queue as > TCP buffer space becomes available. > > what is perhaps not obvious is that a transition from a queue length > of one to a queue length of one can also happen during a send > operation by the main thread. and this transition also writes a > notification byte. these 1=>1 transitions can happen repeatedly, > eventually filling the notification buffer causing the main thread > to block (thus triggering deadlock). > > this is because signal() writes its byte when the queue size == 1, not > when there is a transition from an old size of 0 to a new size of 1. > > consider the following sequence: > > 1) main thread queues a message to send (queue length goes from 0 to 1) > > 2) kernel buffer is full, so no attempt to send the message > directly on the main thread is made to avoid blocking the main > thread. > > 3) the main thread writes a signal byte to trigger the asynchronous > delivery of the queued message later by the Acceptor thread. > > 4) main thread queues another message (queue length goes from 1 to 2). > > 5) kernel buffer space is now available (because the message receiver > has read some messages), so the main thread can write without > blocking. it dequeues message one (the old queued message) and > sends all of it, leaving the new message (which was message 2) as > the new message 1. > > 6) since the queue size is again just 1, a signal byte is > written by the main thread. > > when a large number of messages is sent from the main thread all at > once, it is possible for this scenario to happen over and over again > once a single message has been queued if the message receiver suddenly > opens up a lot of buffer space. > > /mark > > > or...@qu... wrote: > > Interesting. Didn't think about the signal socket blocking into a > > deadlock. I'll look into that. > > > > -------- Original Message -------- > > Subject: [Quickfix-developers] i'm seeing deadlocks... > > From: "Mark T. Kennedy" <mke...@di...> > > Date: Wed, April 23, 2008 3:43 pm > > To: quickfix developers <qui...@li...urceforge > > <mailto:qui...@li...urceforge>.net> > > > > QuickFIX Documentation: > > > <http://www.quickfixengine.org/quickfix/doc/html/index.html>http://www.quickfixengine.org/quickfix/doc/html/index.html > > QuickFIX Support: > > <http://www.quickfixengine.org/services.html>http://www.quickfixengine.org/services.html > > > > ------------------------------------------------------------------------ > > > > ... while writing to the 'signal' socket (pipe) used to implement > > non-blocking sends. > > > > i have a test where i send 6,000+ orders in a batch and receive 6,000 > > acks and 6,000 fills in response. in the middle of it, i shut down > > and restart a proxy that sits between the sender and the exchange > > simulator. every now and then, this triggers a deadlock in the > > exchange simulator (see the attached stack trace). > > > > since a write to the 'signal' socket can block, sendToTarget can > > still block, and that restores the oft-discussed deadlock scenario > > that the non-blocking send implementation sought to avoid. > > > > thoughts/comments? i'm using the trunk for my tests, not 12.4. > > > > /mark > > > > > > This communication and any attachments may contain > > confidential/proprietary information and is intended for information > > purposes only. It is not an invitation or offer to purchase > > interests from Diamondback. Any representation to the contrary is > > unintentional. Th is communication is intended only for the > > person(s) to whom it is addressed. If you are not the intended > > recipient you are hereby notified that you have received this > > document in error and that any review, dissemination, distribution, > > or copying of this message or any attachments is not permitted. If > > you have received this in error, please notify the sender > > immediately by e-mail and delete this message. All e-mails sent to > > or received from this address will be received by Diamondback's > > company e-mail system and is subject to archival and possible review > > by someone other than the recipient. This notice is automatically > > appended to each e-mail message leaving Diamondback. > > ------------------------------------------------------------------------ > > Thread 3 (Thread 1084229952 (LWP 24498)): > > #0 0x0000003a3ffc5882 in __select_nocancel () from /lib64/libc.so.6 > > #1 0x00002aaaaf78315f in FIX::SocketMonitor::block () > > #2 0x00002aaaaf770f74 in FIX::SocketServer::block () > > #3 0x00002aaaaf7d60ac in FIX::HttpServer::onStart () > > #4 0x00002aaaaf7d612f in FIX::HttpServer::startThread () > > #5 0x0000003a40e06337 in start_thread () from /lib64/libpthread.so.0 > > #6 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6 > > #7 0x0000000000000000 in ?? () > > Thread 2 (Thread 1094719808 (LWP 24499)): > > #0 0x0000003a40e0bb58 in __lll_mutex_lock_wait () from > > /lib64/libpthread.so.0 > > #1 0x0000003a40e0839e in _L_mutex_lock_65 () from /lib64/libpthread.so.0 > > #2 0x0000003a40e0813b in pthread_mutex_lock () from > > /lib64/libpthread.so.0 > > #3 0x00002aaaaf59aac2 in FIX::Mutex::lock () > > #4 0x00002aaaaf59ab11 in FIX::Locker::Locker () > > #5 0x00002aaaaf78678a in FIX::SocketConnection::processQueue () > > #6 0x00002aaaaf77abca in FIX::Sock etAcceptor::onWrite () > > #7 0x00002aaaaf771406 in FIX::ServerWrapper::onWrite () > > #8 0x00002aaaaf782b25 in FIX::SocketMonitor::processWriteSet () > > #9 0x00002aaaaf7831c5 in FIX::SocketMonitor::block () > > #10 0x00002aaaaf770f74 in FIX::SocketServer::block () > > #11 0x00002aaaaf77b019 in FIX::SocketAcceptor::onStart () > > #12 0x00002aaaaf773e1c in FIX::Acceptor::startThread () > > #13 0x0000003a40e06337 in start_thread () from /lib64/libpthread.so.0 > > #14 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6 > > #15 0x0000000000000000 in ?? () > > Thread 1 (Thread 46912498585648 (LWP 24489)): > > #0 0x0000003a3ffcd021 in send () from /lib64/libc.so.6 > > #1 0x00002aaaaf7d75c3 in FIX::socket_send () > > #2 0x00002aaaaf782bbc in FIX::SocketMonitor::signal () > > #3 0x00002aaaaf788bcf in FIX::SocketConnection::signal () > > #4 0x00002aaaaf786a71 in FIX::SocketConnection::send () > > #5 0x00002aaaaf743375 in FIX::Session::send () > > #6 0x00002aaaaf744978 in FIX::Session::sendRaw () > > #7 0x00002aaaaf74a7ce in FIX::Session::send () > > #8 0x00002aaaaf74a91e in FIX::Session::sendToTarget () > > #9 0x00002aaaaf598698 in quickfix_wrapper::send () > > #10 0x00002aaaaf59885f in stp_quickfix_send () > > #11 0x00002aaaaf488b39 in XS_STP__QuickFIX_stp_quickfix_send () > > #12 0x00002aaaaab30f3a in Perl_pp_entersub () > > #13 0x00002aaaaab2f6ea in Perl_runops_standard () > > #14 0x00002aaaaaadfd5d in Perl_call_sv () > > #15 0x00002aaaae2e8090 in pe_event_invoke () > > #16 0x00002aaaae2e8210 in pe_empty_queue () > > #17 0x00002aaaae2e8df8 in one_event () > > #18 0x00002aaaae2e900d in XS_Event__loop () > > #19 0x00002aaaaab30f3a in Perl_pp_entersub () > > #20 0x00002aaaaab2f6ea in Perl_runops_standard () > > #21 0x00002aaaaaae05ec in perl_run () > > #22 0x000000000040165c in main () > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > > This SF.net <http://SF.net> email is sponsored by the 2008 > > JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to save $100. > > Use priority code J8TL2D2. > > > <http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone>http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > > > ------------------------------------------------------------------------ > > _______________________________________________ > > Quickfix-developers mailing list > > Quickfix-developers**@lists.sourceforge.net <#Compose> > > > <https://lists.sourceforge.net/lists/listinfo/quickfix-developers>https://lists.sourceforge.net/lists/listinfo/quickfix-developers > > > > This communication and any attachments may contain confidential/proprietary information and is intended for information purposes only. It is not an invitation or offer to purchase interests from Diamondback. Any representation to the contrary is unintentional. This communication is intended only for the person(s) to whom it is addressed. If you are not the intended recipient you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message or any attachments is not permitted. If you have received this in error, please notify the sender immediately by e-mail and delete this message. All e-mails sent to or received from this address will be received by Diamondback's company e-mail system and is subject to archival and possible review by someone other than the recipient. This notice is automatically appended to each e-mail message leaving Diamondback. > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > Quickfix-developers mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfix-developers > This communication and any attachments may contain confidential/proprietary information and is intended for information purposes only. It is not an invitation or offer to purchase interests from Diamondback. Any representation to the contrary is unintentional. This communication is intended only for the person(s) to whom it is addressed. If you are not the intended recipient you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message or any attachments is not permitted. If you have received this in error, please notify the sender immediately by e-mail and delete this message. All e-mails sent to or received from this address will be received by Diamondback's company e-mail system and is subject to archival and possible review by someone other than the recipient. This notice is automatically appended to each e-mail message leaving Diamondback. |
From: <or...@qu...> - 2008-04-24 22:36:08
|
<html><body><div>I tried just making the signal and interrupt sockets non blocking. Can you see if this also works for you?</div> <div> </div> <div>--- SocketMonitor.cpp (revision 1956)<BR>+++ SocketMonitor.cpp (working copy)<BR>@@ -41,6 +41,8 @@<BR> std::pair<int, int> sockets = socket_createpair();<BR> m_signal = sockets.first;<BR> m_interrupt = sockets.second;<BR>+ socket_setnonblock( m_signal );<BR>+ socket_setnonblock( m_interrupt );<BR> m_readSockets.insert( m_interrupt );</div> <div> m_timeval.tv_sec = 0;<BR></div> <BLOCKQUOTE style="PADDING-LEFT: 8px; MARGIN-LEFT: 8px; BORDER-LEFT: blue 2px solid" webmail="1">-------- Original Message --------<BR>Subject: Re: [Quickfix-developers] i'm seeing deadlocks...<BR>From: "Mark T. Kennedy" <mke...@di...><BR>Date: Thu, April 24, 2008 4:25 pm<BR>To: quickfix developers <<a href="mailto:qui...@li...urceforge">qui...@li...urceforge</a>.net><BR><BR>QuickFIX Documentation: <A href="http://www.quickfixengine.org/quickfix/doc/html/index.html" target=_blank><a href="http://www.quickfixengine.org/quickfix/doc/html/index.html">http://www.quickfixengine.org/quickfix/doc/html/index.html</a></A><BR>QuickFIX Support: <A href="http://www.quickfixengine.org/services.html" target=_blank><a href="http://www.quickfixengine.org/services.html">http://www.quickfixengine.org/services.html</a></A><BR><BR><BR>this patch seems to work around the problem and passes all of the unit and app tests:<BR><BR>Index: SocketConnection.cpp<BR>===================================================================<BR>--- SocketConnection.cpp (revision 1944)<BR>+++ SocketConnection.cpp (working copy)<BR>@@ -65,9 +65,11 @@<BR><BR>Locker l( m_mutex );<BR><BR>+ int old_queue_size = m_sendQueue.size();<BR>m_sendQueue.push_back( msg );<BR>processQueue();<BR>- signal();<BR>+ if ( old_queue_size == 0 && m_sendQueue.size() == 1 )<BR>+ signal();<BR>return true;<BR><BR>QF_STACK_POP<BR><BR>/mark<BR><BR>Mark T. Kennedy wrote:<BR>> QuickFIX Documentation: <A href="http://www.quickfixengine.org/quickfix/doc/html/index.html" target=_blank><a href="http://www.quickfixengine.org/quickfix/doc/html/index.html">http://www.quickfixengine.org/quickfix/doc/html/index.html</a></A><BR>> QuickFIX Support: <A href="http://www.quickfixengine.org/services.html" target=_blank><a href="http://www.quickfixengine.org/services.html">http://www.quickfixengine.org/services.html</a></A><BR>> <BR>> <BR>> the transition from a waiting-to-be-sent message queue length of zero<BR>> to a length of one triggers the writing of a notification byte to the<BR>> Acceptor's select loop. this will eventually cause the Acceptor's<BR>> select() call to return and setup a "can-do-a-write-without-blocking"<BR>> callback so that the Acceptor thread can drain the message queue as<BR>> TCP buffer space becomes available.<BR>> <BR>> what is perhaps not obvious is that a transition from a queue length<BR>> of one to a queue length of one can also happen during a send<BR>> operation by the main thread. and this transition also writes a<BR>> notification byte. these 1=>1 transitions can happen repeatedly,<BR>> eventually filling the notification buffer causing the main thread<BR>> to block (thus triggering deadlock).<BR>> <BR>> this is because signal() writes its byte when the queue size == 1, not<BR>> when there is a transition from an old size of 0 to a new size of 1.<BR>> <BR>> consider the following sequence:<BR>> <BR>> 1) main thread queues a message to send (queue length goes from 0 to 1)<BR>> <BR>> 2) kernel buffer is full, so no attempt to send the message<BR>> directly on the main thread is made to avoid blocking the main<BR>> thread.<BR>> <BR>> 3) the main thread writes a signal byte to trigger the asynchronous<BR>> delivery of the queued message later by the Acceptor thread.<BR>> <BR>> 4) main thread queues another message (queue length goes from 1 to 2).<BR>> <BR>> 5) kernel buffer space is now available (because the message receiver<BR>> has read some messages), so the main thread can write without<BR>> blocking. it dequeues message one (the old queued message) and<BR>> sends all of it, leaving the new message (which was message 2) as<BR>> the new message 1.<BR>> <BR>> 6) since the queue size is again just 1, a signal byte is<BR>> written by the main thread.<BR>> <BR>> when a large number of messages is sent from the main thread all at<BR>> once, it is possible for this scenario to happen over and over again<BR>> once a single message has been queued if the message receiver suddenly<BR>> opens up a lot of buffer space.<BR>> <BR>> /mark<BR>> <BR>> <BR>> <A onclick="Popup.composeWindow('pcompose.php?sendto=oren%40quickfixengine.org'); return false;" href="#Compose">oren<B></B>@quickfixengine.org</A> wrote:<BR>> > Interesting. Didn't think about the signal socket blocking into a<BR>> > deadlock. I'll look into that.<BR>> ><BR>> > -------- Original Message --------<BR>> > Subject: [Quickfix-developers] i'm seeing deadlocks...<BR>> > From: "Mark T. Kennedy" <<A onclick="Popup.composeWindow('pcompose.php?sendto=mkennedy%40diamondbackcap.com'); return false;" href="#Compose">mkennedy<B></B>@diamondbackcap.com</A>><BR>> > Date: Wed, April 23, 2008 3:43 pm<BR>> > To: quickfix developers <<A onclick="Popup.composeWindow('pcompose.php?sendto=quickfix-developers%40lists.sourceforge'); return false;" href="#Compose">quickfix-developers<B></B>@lists.sourceforge</A><BR>> > <mailto:<A onclick="Popup.composeWindow('pcompose.php?sendto=quickfix-developers%40lists.sourceforge'); return false;" href="#Compose">quickfix-developers<B></B>@lists.sourceforge</A>>.net><BR>> ><BR>> > QuickFIX Documentation:<BR>> > <BR>> <<A href="http://www.quickfixengine.org/quickfix/doc/html/index.html" target=_blank><a href="http://www.quickfixengine.org/quickfix/doc/html/index.html">http://www.quickfixengine.org/quickfix/doc/html/index.html</a></A>><A href="http://www.quickfixengine.org/quickfix/doc/html/index.html" target=_blank><a href="http://www.quickfixengine.org/quickfix/doc/html/index.html">http://www.quickfixengine.org/quickfix/doc/html/index.html</a></A><BR>> > QuickFIX Support:<BR>> > <<A href="http://www.quickfixengine.org/services.html" target=_blank><a href="http://www.quickfixengine.org/services.html">http://www.quickfixengine.org/services.html</a></A>><A href="http://www.quickfixengine.org/services.html" target=_blank><a href="http://www.quickfixengine.org/services.html">http://www.quickfixengine.org/services.html</a></A><BR>> ><BR>> > ------------------------------------------------------------------------<BR>> ><BR>> > ... while writing to the 'signal' socket (pipe) used to implement<BR>> > non-blocking sends.<BR>> ><BR>> > i have a test where i send 6,000+ orders in a batch and receive 6,000<BR>> > acks and 6,000 fills in response. in the middle of it, i shut down<BR>> > and restart a proxy that sits between the sender and the exchange<BR>> > simulator. every now and then, this triggers a deadlock in the<BR>> > exchange simulator (see the attached stack trace).<BR>> ><BR>> > since a write to the 'signal' socket can block, sendToTarget can<BR>> > still block, and that restores the oft-discussed deadlock scenario<BR>> > that the non-blocking send implementation sought to avoid.<BR>> ><BR>> > thoughts/comments? i'm using the trunk for my tests, not 12.4.<BR>> ><BR>> > /mark<BR>> ><BR>> ><BR>> > This communication and any attachments may contain<BR>> > confidential/proprietary information and is intended for information<BR>> > purposes only. It is not an invitation or offer to purchase<BR>> > interests from Diamondback. Any representation to the contrary is<BR>> > unintentional. Th is communication is intended only for the<BR>> > person(s) to whom it is addressed. If you are not the intended<BR>> > recipient you are hereby notified that you have received this<BR>> > document in error and that any review, dissemination, distribution,<BR>> > or copying of this message or any attachments is not permitted. If<BR>> > you have received this in error, please notify the sender<BR>> > immediately by e-mail and delete this message. All e-mails sent to<BR>> > or received from this address will be received by Diamondback's<BR>> > company e-mail system and is subject to archival and possible review<BR>> > by someone other than the recipient. This notice is automatically<BR>> > appended to each e-mail message leaving Diamondback.<BR>> > ------------------------------------------------------------------------<BR>> > Thread 3 (Thread 1084229952 (LWP 24498)):<BR>> > #0 0x0000003a3ffc5882 in __select_nocancel () from /lib64/libc.so.6<BR>> > #1 0x00002aaaaf78315f in FIX::SocketMonitor::block ()<BR>> > #2 0x00002aaaaf770f74 in FIX::SocketServer::block ()<BR>> > #3 0x00002aaaaf7d60ac in FIX::HttpServer::onStart ()<BR>> > #4 0x00002aaaaf7d612f in FIX::HttpServer::startThread ()<BR>> > #5 0x0000003a40e06337 in start_thread () from /lib64/libpthread.so.0<BR>> > #6 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6<BR>> > #7 0x0000000000000000 in ?? ()<BR>> > Thread 2 (Thread 1094719808 (LWP 24499)):<BR>> > #0 0x0000003a40e0bb58 in __lll_mutex_lock_wait () from<BR>> > /lib64/libpthread.so.0<BR>> > #1 0x0000003a40e0839e in _L_mutex_lock_65 () from /lib64/libpthread.so.0<BR>> > #2 0x0000003a40e0813b in pthread_mutex_lock () from<BR>> > /lib64/libpthread.so.0<BR>> > #3 0x00002aaaaf59aac2 in FIX::Mutex::lock ()<BR>> > #4 0x00002aaaaf59ab11 in FIX::Locker::Locker ()<BR>> > #5 0x00002aaaaf78678a in FIX::SocketConnection::processQueue ()<BR>> > #6 0x00002aaaaf77abca in FIX::Sock etAcceptor::onWrite ()<BR>> > #7 0x00002aaaaf771406 in FIX::ServerWrapper::onWrite ()<BR>> > #8 0x00002aaaaf782b25 in FIX::SocketMonitor::processWriteSet ()<BR>> > #9 0x00002aaaaf7831c5 in FIX::SocketMonitor::block ()<BR>> > #10 0x00002aaaaf770f74 in FIX::SocketServer::block ()<BR>> > #11 0x00002aaaaf77b019 in FIX::SocketAcceptor::onStart ()<BR>> > #12 0x00002aaaaf773e1c in FIX::Acceptor::startThread ()<BR>> > #13 0x0000003a40e06337 in start_thread () from /lib64/libpthread.so.0<BR>> > #14 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6<BR>> > #15 0x0000000000000000 in ?? ()<BR>> > Thread 1 (Thread 46912498585648 (LWP 24489)):<BR>> > #0 0x0000003a3ffcd021 in send () from /lib64/libc.so.6<BR>> > #1 0x00002aaaaf7d75c3 in FIX::socket_send ()<BR>> > #2 0x00002aaaaf782bbc in FIX::SocketMonitor::signal ()<BR>> > #3 0x00002aaaaf788bcf in FIX::SocketConnection::signal ()<BR>> > #4 0x00002aaaaf786a71 in FIX::SocketConnection::send ()<BR>> > #5 0x00002aaaaf743375 in FIX::Session::send ()<BR>> > #6 0x00002aaaaf744978 in FIX::Session::sendRaw ()<BR>> > #7 0x00002aaaaf74a7ce in FIX::Session::send ()<BR>> > #8 0x00002aaaaf74a91e in FIX::Session::sendToTarget ()<BR>> > #9 0x00002aaaaf598698 in quickfix_wrapper::send ()<BR>> > #10 0x00002aaaaf59885f in stp_quickfix_send ()<BR>> > #11 0x00002aaaaf488b39 in XS_STP__QuickFIX_stp_quickfix_send ()<BR>> > #12 0x00002aaaaab30f3a in Perl_pp_entersub ()<BR>> > #13 0x00002aaaaab2f6ea in Perl_runops_standard ()<BR>> > #14 0x00002aaaaaadfd5d in Perl_call_sv ()<BR>> > #15 0x00002aaaae2e8090 in pe_event_invoke ()<BR>> > #16 0x00002aaaae2e8210 in pe_empty_queue ()<BR>> > #17 0x00002aaaae2e8df8 in one_event ()<BR>> > #18 0x00002aaaae2e900d in XS_Event__loop ()<BR>> > #19 0x00002aaaaab30f3a in Perl_pp_entersub ()<BR>> > #20 0x00002aaaaab2f6ea in Perl_runops_standard ()<BR>> > #21 0x00002aaaaaae05ec in perl_run ()<BR>> > #22 0x000000000040165c in main ()<BR>> > ------------------------------------------------------------------------<BR>> > -------------------------------------------------------------------------<BR>> > This <a href="http://SF.net">SF.net</a> <<A href="http://sf.net/" target=_blank><a href="http://SF.net">http://SF.net</a></A>> email is sponsored by the 2008<BR>> > JavaOne(SM) Conference<BR>> > Don't miss this year's exciting event. There's still time to save $100.<BR>> > Use priority code J8TL2D2.<BR>> > <BR>> <<A href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone" target=_blank><a href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone">http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone</a></A>><A href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone" target=_blank><a href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone">http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone</a></A><BR>> ><BR>> > ------------------------------------------------------------------------<BR>> > _______________________________________________<BR>> > Quickfix-developers mailing list<BR>> > <A onclick="Popup.composeWindow('pcompose.php?sendto=Quickfix-developers%2A%2A%40lists.sourceforge.net'); return false;" href="#Compose">Quickfix-developers**<B></B>@lists.sourceforge.net</A> <#Compose><BR>> > <BR>> <<A href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers" target=_blank><a href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers">https://lists.sourceforge.net/lists/listinfo/quickfix-developers</a></A>><A href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers" target=_blank><a href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers">https://lists.sourceforge.net/lists/listinfo/quickfix-developers</a></A><BR>> ><BR>> <BR>> This communication and any attachments may contain confidential/proprietary information and is intended for information purposes only. It is not an invitation or offer to purchase interests from Diamondback. Any representation to the contrary is unintentional. This communication is intended only for the person(s) to whom it is addressed. If you are not the intended recipient you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message or any attachments is not permitted. If you have received this in error, please notify the sender immediately by e-mail and delete this message. All e-mails sent to or received from this address will be received by Diamondback's company e-mail system and is subject to archival and possible review by someone other than the recipient. This notice is automatically appended to each e-mail message leaving Diamondback.<BR>> <BR>> <BR>> -------------------------------------------------------------------------<BR>> This <a href="http://SF.net">SF.net</a> email is sponsored by the 2008 JavaOne(SM) Conference <BR>> Don't miss this year's exciting event. There's still time to save $100. <BR>> Use priority code J8TL2D2. <BR>> <A href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone" target=_blank><a href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone">http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone</a></A><BR>> _______________________________________________<BR>> Quickfix-developers mailing list<BR>> <A onclick="Popup.composeWindow('pcompose.php?sendto=Quickfix-developers%40lists.sourceforge.net'); return false;" href="#Compose">Quickfix-developers<B></B>@lists.sourceforge.net</A><BR>> <A href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers" target=_blank><a href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers">https://lists.sourceforge.net/lists/listinfo/quickfix-developers</a></A><BR>> <BR><BR>This communication and any attachments may contain confidential/proprietary information and is intended for information purposes only. It is not an invitation or offer to purchase interests from Diamondback. Any representation to the contrary is unintentional. This communication is intended only for the person(s) to whom it is addressed. If you are not the intended recipient you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message or any attachments is not permitted. If you have received this in error, please notify the sender immediately by e-mail and delete this message. All e-mails sent to or received from this address will be received by Diamondback's company e-mail system and is subject to archival and possible review by someone other than the recipient. This notice is automatically appended to each e-mail message leaving Diamondback.<BR><BR><BR>-------------------------------------------------------------------------<BR>This <a href="http://SF.net">SF.net</a> email is sponsored by the 2008 JavaOne(SM) Conference <BR>Don't miss this year's exciting event. There's still time to save $100. <BR>Use priority code J8TL2D2. <BR><A href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone" target=_blank><a href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone">http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone</a></A><BR>_______________________________________________<BR>Quickfix-developers mailing list<BR><A onclick="Popup.composeWindow('pcompose.php?sendto=Quickfix-developers%40lists.sourceforge.net'); return false;" href="#Compose">Quickfix-developers<B></B>@lists.sourceforge.net</A><BR><A href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers" target=_blank><a href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers">https://lists.sourceforge.net/lists/listinfo/quickfix-developers</a></A><BR></BLOCKQUOTE></body></html> |
From: Mark T. K. <mke...@di...> - 2008-04-25 12:13:28
|
it wouldn't have any effect on the read side of the pipe (the m_interrupt socket) since m_interrupt is only consulted after select has said there is data to be read. on the write side (the side that can block), it isn't necessary if you go with the patch i sent earlier. not sure what would happen if you did turn it on since there is no message queueing scaffolding for the signals and you could potentially lose one if it simply ignored the EWOULDBLOCK. the underlying problem is that the signal should be sent *only* when the queue length transitions from zero to one and the current implementation can send it for transitions from a length of one to one. i've deployed the patch i sent and tested it a dozen times using my 6,000+ order test with no failures. my patch also passes all of the unit and app tests that are distributed with the source. /mark or...@qu... wrote: > I tried just making the signal and interrupt sockets non blocking. Can > you see if this also works for you? > > --- SocketMonitor.cpp (revision 1956) > +++ SocketMonitor.cpp (working copy) > @@ -41,6 +41,8 @@ > std::pair<int, int> sockets = socket_createpair(); > m_signal = sockets.first; > m_interrupt = sockets.second; > + socket_setnonblock( m_signal ); > + socket_setnonblock( m_interrupt ); > m_readSockets.insert( m_interrupt ); > m_timeval.tv_sec = 0; > > -------- Original Message -------- > Subject: Re: [Quickfix-developers] i'm seeing deadlocks... > From: "Mark T. Kennedy" <mke...@di...> > Date: Thu, April 24, 2008 4:25 pm > To: quickfix developers <qui...@li...urceforge > <mailto:qui...@li...urceforge>.net> > > QuickFIX Documentation: > <http://www.quickfixengine.org/quickfix/doc/html/index.html>http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX Support: > <http://www.quickfixengine.org/services.html>http://www.quickfixengine.org/services.html > > > this patch seems to work around the problem and passes all of the > unit and app tests: > > Index: Sock etConnection.cpp > =================================================================== > --- SocketConnection.cpp (revision 1944) > +++ SocketConnection.cpp (working copy) > @@ -65,9 +65,11 @@ > > Locker l( m_mutex ); > > + int old_queue_size = m_sendQueue.size(); > m_sendQueue.push_back( msg ); > processQueue(); > - signal(); > + if ( old_queue_size == 0 && m_sendQueue.size() == 1 ) > + signal(); > return true; > > QF_STACK_POP > > /mark > > Mark T. Kennedy wrote: > > QuickFIX Documentation: > <http://www.quickfixengine.org/quickfix/doc/html/index.html>http://www.quickfixengine.org/quickfix/doc/html/index.html > > QuickFIX Support: > <http://www.quickfixengine.org/services.html>http://www.quickfixengine.org/services.html > > > > > > the trans ition from a waiting-to-be-sent message queue length of > zero > > to a length of one triggers the writing of a notification byte to the > > Acceptor's select loop. this will eventually cause the Acceptor's > > select() call to return and setup a "can-do-a-write-without-blocking" > > callback so that the Acceptor thread can drain the message queue as > > TCP buffer space becomes available. > > > > what is perhaps not obvious is that a transition from a queue length > > of one to a queue length of one can also happen during a send > > operation by the main thread. and this transition also writes a > > notification byte. these 1=>1 transitions can happen repeatedly, > > eventually filling the notification buffer causing the main thread > > to block (thus triggering deadlock). > > > > this is because signal() writes its byte when the queue size == > 1, not > > when there is a transition from an old size of 0 to a new size of 1. > > > > consider the following sequence: > > > > 1) main thread queues a message to send (queue length goes from 0 > to 1) > > > > 2) kernel buffer is full, so no attempt to send the message > > directly on the main thread is made to avoid blocking the main > > thread. > > > > 3) the main thread writes a signal byte to trigger the asynchronous > > delivery of the queued message later by the Acceptor thread. > > > > 4) main thread queues another message (queue length goes from 1 > to 2). > > > > 5) kernel buffer space is now available (because the message receiver > > has read some messages), so the main thread can write without > > blocking. it dequeues message one (the old queued message) and > > sends all of it, leaving the new message (which was message 2) as > > the new message 1. > > > > 6) since the queue size is again just 1, a signal byte is > > written by the main t hread. > > > > when a large number of messages is sent from the main thread all at > > once, it is possible for this scenario to happen over and over again > > once a single message has been queued if the message receiver > suddenly > > opens up a lot of buffer space. > > > > /mark > > > > > > oren**@quickfixengine.org <#Compose> wrote: > > > Interesting. Didn't think about the signal socket blocking into a > > > deadlock. I'll look into that. > > > > > > -------- Original Message -------- > > > Subject: [Quickfix-developers] i'm seeing deadlocks... > > > From: "Mark T. Kennedy" <mkennedy**@diamondbackcap.com <#Compose>> > > > Date: Wed, April 23, 2008 3:43 pm > > > To: quickfix developers > <quickfix-developers**@lists.sourceforge <#Compose> > > > <mailto:quickfix-developers**@lists.sourceforge <#Compose>>.net> > > > > > > QuickFIX Documentation: > > > > > < > <http://www.quickfixengine.org/quickfix/doc/html/index.html>http://www.quickfixengine.org/quickfix/doc/html/index.html> > <http://www.quickfixengine.org/quickfix/doc/html/index.html>http://www.quickfixengine.org/quickfix/doc/html/index.html > > > QuickFIX Support: > > & gt; < > <http://www.quickfixengine.org/services.html>http://www.quickfixengine.org/services.html> > <http://www.quickfixengine.org/services.html>http://www.quickfixengine.org/services.html > > > > > > > ------------------------------------------------------------------------ > > > > > > ... while writing to the 'signal' socket (pipe) used to implement > > > non-blocking sends. > > > > > > i have a test where i send 6,000+ orders in a batch and receive > 6,000 > > > acks and 6,000 fills in response. in the middle of it, i shut down > > > and restart a proxy that sits between the sender and the exchange > > > simulator. every now and then, this triggers a deadlock in the > > > exchange simulator (see the attached stack trace). > > > ; > > > since a write to the 'signal' socket can block, sendToTarget can > > > still block, and that restores the oft-discussed deadlock scenario > > > that the non-blocking send implementation sought to avoid. > > > > > > thoughts/comments? i'm using the trunk for my tests, not 12.4. > > > > > > /mark > > > > > > > > > This communication and any attachments may contain > > > confidential/proprietary information and is intended for > information > > > purposes only. It is not an invitation or offer to purchase > > > interests from Diamondback. Any representation to the contrary is > > > unintentional. Th is communication is intended only for the > > > person(s) to whom it is addressed. If you are not the intended > > > recipient you are hereby notified that you have received this > > > document in error and that any review, dissemination, distribution, > > > or copying of this message or any attachments is not permitted. If > > > you have received this in error, please notify the sender > > > immediately by e-mail and delete this message. All e-mails sent to > > > or received from this address will be received by Diamondback's > > > company e-mail system and is subject to archival and possible > review > > > by someone other than the recipient. This notice is automatically > > > appended to each e-mail message leaving Diamondback. > > > > ------------------------------------------------------------------------ > > > Thread 3 (Thread 1084229952 (LWP 24498)): > > > #0 0x0000003a3ffc5882 in __select_nocancel () from /lib64/libc.so.6 > > > #1 0x00002aaaaf78315f in FIX::SocketMonitor::block () > > > #2 0x00002aaaaf770f74 in FIX::SocketServer::block () > > > #3 0x00002aaaaf7d60ac in FIX::HttpServer::onStart () > > > #4 0x00002aaaaf7d612f in FI X::HttpServer::startThread () > > > #5 0x0000003a40e06337 in start_thread () from > /lib64/libpthread.so.0 > > > #6 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6 > > > #7 0x0000000000000000 in ?? () > > > Thread 2 (Thread 1094719808 (LWP 24499)): > > > #0 0x0000003a40e0bb58 in __lll_mutex_lock_wait () from > > > /lib64/libpthread.so.0 > > > #1 0x0000003a40e0839e in _L_mutex_lock_65 () from > /lib64/libpthread.so.0 > > > #2 0x0000003a40e0813b in pthread_mutex_lock () from > > > /lib64/libpthread.so.0 > > > #3 0x00002aaaaf59aac2 in FIX::Mutex::lock () > > > #4 0x00002aaaaf59ab11 in FIX::Locker::Locker () > > > #5 0x00002aaaaf78678a in FIX::SocketConnection::processQueue () > > > #6 0x00002aaaaf77abca in FIX::Sock etAcceptor::onWrite () > > > #7 0x00002aaaaf771406 in FIX::ServerWrapper::onWrite () > > > #8 0x00002aaaaf782b25 in FIX::SocketMonitor::processWriteSet () > &g t; > #9 0x00002aaaaf7831c5 in FIX::SocketMonitor::block () > > > #10 0x00002aaaaf770f74 in FIX::SocketServer::block () > > > #11 0x00002aaaaf77b019 in FIX::SocketAcceptor::onStart () > > > #12 0x00002aaaaf773e1c in FIX::Acceptor::startThread () > > > #13 0x0000003a40e06337 in start_thread () from > /lib64/libpthread.so.0 > > > #14 0x0000003a3ffcc38d in clone () from /lib64/libc.so.6 > > > #15 0x0000000000000000 in ?? () > > > Thread 1 (Thread 46912498585648 (LWP 24489)): > > > #0 0x0000003a3ffcd021 in send () from /lib64/libc.so.6 > > > #1 0x00002aaaaf7d75c3 in FIX::socket_send () > > > #2 0x00002aaaaf782bbc in FIX::SocketMonitor::signal () > > > #3 0x00002aaaaf788bcf in FIX::SocketConnection::signal () > > > #4 0x00002aaaaf786a71 in FIX::SocketConnection::send () > > > #5 0x00002aaaaf743375 in FIX::Session::send () > > > #6 0x00002aaaaf744978 in FIX::Session::sendRaw () > > ; > #7 0x00002aaaaf74a7ce in FIX::Session::send () > > > #8 0x00002aaaaf74a91e in FIX::Session::sendToTarget () > > > #9 0x00002aaaaf598698 in quickfix_wrapper::send () > > > #10 0x00002aaaaf59885f in stp_quickfix_send () > > > #11 0x00002aaaaf488b39 in XS_STP__QuickFIX_stp_quickfix_send () > > > #12 0x00002aaaaab30f3a in Perl_pp_entersub () > > > #13 0x00002aaaaab2f6ea in Perl_runops_standard () > > > #14 0x00002aaaaaadfd5d in Perl_call_sv () > > > #15 0x00002aaaae2e8090 in pe_event_invoke () > > > #16 0x00002aaaae2e8210 in pe_empty_queue () > > > #17 0x00002aaaae2e8df8 in one_event () > > > #18 0x00002aaaae2e900d in XS_Event__loop () > > > #19 0x00002aaaaab30f3a in Perl_pp_entersub () > > > #20 0x00002aaaaab2f6ea in Perl_runops_standard () > > > #21 0x00002aaaaaae05ec in perl_run () > > > #22 0x000000000040165c in main () > > > -------------------------------- > ---------------------------------------- > > > > ------------------------------------------------------------------------- > > > This SF.net <http://SF.net> < <http://sf.net/>http://SF.net> > email is sponsored by the 2008 > > > JavaOne(SM) Conference > > > Don't miss this year's exciting event. There's still time to > save $100. > > > Use priority code J8TL2D2. > > > > > < > <http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone>http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone> > <http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone>http://ad.doubleclick.n > et/clk;198757673;13503038;p?http://java.sun.com/javaone > <http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone> > > > > > > > ------------------------------------------------------------------------ > > > _______________________________________________ > > > Quickfix-developers mailing list > > > Quickfix-developers****@lists.sourceforge.net <#Compose> <#Compose> > > > > > < > <https://lists.sourceforge.net/lists/listinfo/quickfix-developers>https://lists.sourceforge.net/lists/listinfo/quickfix-developers> > <https://lists.sourceforge.net/lists/listinfo/quickfix-developers>https://lists.sourceforge.net/lists/listinfo/quickfix-developers > > > > > > > This communication and any attachments may contain > confidential/proprietary information and is intended for information > purposes only. It is not an invitation or offer to purchase > interests from Diamondback. Any representation to the contrary is > unintentional. This communication is intended only for the person(s) > to whom it is addressed. If you are not the intended recipient you > are hereby notified that you have received this document in error > and that any review, dissemination, distribution, or copying of this > message or any attachments is not permitted. If you have received > this in error, please notify the sender immediately by e-mail and > delete this message. All e-mails sent to or received from this > address will be received by Diamondback's company e-mail system and > is subject to archival and possible review by someone other than the > recipient. This notice is automatically appended to each e-mail > message leaving Diamondback. > > > > > > > ------------------------------------------------------------------------- > > This SF.net <http://SF.net> email is sponsored by the 2008 > JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to save > $100. > > Use priority code J8TL2D2. > > > <http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone>http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > _______________________________________________ > > Quickfix-developers mailing list > > Quickfix-developers**@lists.sourceforge.net <#Compose> > > > <https://lists.sourceforge.net/lists/listinfo/quickfix-developers>https://lists.sourceforge.net/lists/listinfo/quickfix-developers > > > > This communication and any attachments may contain > confidential/proprietary information and is intended for information > purposes only. It is not an invitation or offer to purchase > interests from Diamondback. Any representation to the contrary is > unintentional. This communication is intended only for the person(s) > to whom it is addressed. If you are not the intended recipient you > are hereby notified that you have received this document in error > and that any review, dissemination, distribution, or copying of this > message or any attachments is not permitted. If you have received > this in error, please notify the sender immediately by e-mail and > delete this message. All e-mails sent to or received from this > address will be received by Diamondback's company e-mail system and > is subject to archival and possible review by someon e other than > the recipient. This notice is automatically appended to each e-mail > message leaving Diamondback. > > > ------------------------------------------------------------------------- > This SF.net <http://SF.net> email is sponsored by the 2008 > JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > <http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone>http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > Quickfix-developers mailing list > Quickfix-developers**@lists.sourceforge.net <#Compose> > <https://lists.sourcef > orge.net/lists/listinfo/quickfix-developers>https://lists.sourceforge.net/lists/listinfo/quickfix-developers This communication and any attachments may contain confidential/proprietary information and is intended for information purposes only. It is not an invitation or offer to purchase interests from Diamondback. Any representation to the contrary is unintentional. This communication is intended only for the person(s) to whom it is addressed. If you are not the intended recipient you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message or any attachments is not permitted. If you have received this in error, please notify the sender immediately by e-mail and delete this message. All e-mails sent to or received from this address will be received by Diamondback's company e-mail system and is subject to archival and possible review by someone other than the recipient. This notice is automatically appended to each e-mail message leaving Diamondback. |