Thread: [Quickfix-developers] possible deadlocking/freeze with routing implementation ?
Brought to you by:
orenmnero
From: quickfixer <li...@ch...> - 2008-05-06 23:32:17
|
Hi, I've been using quickfix for about three months in a routing kind of situation (similar to this poster: http://www.nabble.com/ThreadedSocketAcceptor---Message-resend-tc17067577.html) where I have both an initiator and an acceptor. I am using ThreadedSocketAcceptor, ThreadedSocketInitator, and SynchronizedApplication. There are n sessions coming in through the acceptor and 1 out via the initiator, and it is a comparitively low message throughput application. We experienced a situation today where quickfix tried to send a message to its outgoing connection, while processing a message received from on of incoming connections, somewhere in Session::sendToTarget my quickfix server got "stuck" (i.e. I know I called ::sendToTarget, but I never got to the part where the message was actually sent, as I did not see the offending message in the filestore (am using the FileStoreFactory)). I am inclined to believe this is a deadlocking situation but I am not sure on what resource I am deadlocking and which are the two (or more!) threads that are contending for the resource. During this period of time (it was about 5 minutes before someone noticed - as I said "low volume"!), several clients timed out because they failed to receive heartbeat responses, and began the reconnect process. For each of these clients I see the incoming heartbeat in the individual message store for each client and then later the following in the global log: "Accepted connection from x.x.x.x on port yyyy" but there is no corresponding response from my server to neither the heartbeat nor the new connection, presumably because those messages are queued up waiting for the thread to finish processing the original sendToTarget which is "stuck". I cannot however, figure out why the original sendToTarget is "stuck" nor if it is deadlocked what it is waiting on. The quickfix server was bounced, and everybody recovered and the offending message was resent, and so everything was fine in the end, but I was wondering if someone could help to figure out what I am doing in the code that caused this errant issue. Any thoughts, or pointers as to where to look for a possible issue would be _much_ appreciated. Regards, Liz -- View this message in context: http://www.nabble.com/possible-deadlocking-freeze-with-routing-implementation---tp17093661p17093661.html Sent from the QuickFIX - Dev mailing list archive at Nabble.com. |
From: Mark T. K. <mke...@di...> - 2008-05-07 15:18:09
|
a per-thread stack trace of the deadlocked instance would be interesting to see. /mark quickfixer wrote: > QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX Support: http://www.quickfixengine.org/services.html > > > Hi, > > I've been using quickfix for about three months in a routing kind of > situation (similar to this poster: > http://www.nabble.com/ThreadedSocketAcceptor---Message-resend-tc17067577.html) > where I have both an initiator and an acceptor. I am using > ThreadedSocketAcceptor, ThreadedSocketInitator, and SynchronizedApplication. > There are n sessions coming in through the acceptor and 1 out via the > initiator, and it is a comparitively low message throughput application. > > We experienced a situation today where quickfix tried to send a message to > its outgoing connection, while processing a message received from on of > incoming connections, somewhere in Session::sendToTarget my quickfix server > got "stuck" (i.e. I know I called ::sendToTarget, but I never got to the > part where the message was actually sent, as I did not see the offending > message in the filestore (am using the FileStoreFactory)). I am inclined to > believe this is a deadlocking situation but I am not sure on what resource > I am deadlocking and which are the two (or more!) threads that are > contending for the resource. During this period of time (it was about 5 > minutes before someone noticed - as I said "low volume"!), several clients > timed out because they failed to receive heartbeat responses, and began the > reconnect process. For each of these clients I see the incoming heartbeat > in the individual message store for each client and then later the following > in the global log: > "Accepted connection from x.x.x.x on port yyyy" > but there is no corresponding response from my server to neither the > heartbeat nor the new connection, presumably because those messages are > queued up waiting for the thread to finish processing the original > sendToTarget which is "stuck". I cannot however, figure out why the original > sendToTarget is "stuck" nor if it is deadlocked what it is waiting on. > > The quickfix server was bounced, and everybody recovered and the offending > message was resent, and so everything was fine in the end, but I was > wondering if someone could help to figure out what I am doing in the code > that caused this errant issue. Any thoughts, or pointers as to where to > look for a possible issue would be _much_ appreciated. > > Regards, > Liz This communication and any attachments may contain confidential/proprietary information and is intended for information purposes only. It is not an invitation or offer to purchase interests from Diamondback. Any representation to the contrary is unintentional. This communication is intended only for the person(s) to whom it is addressed. If you are not the intended recipient you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message or any attachments is not permitted. If you have received this in error, please notify the sender immediately by e-mail and delete this message. All e-mails sent to or received from this address will be received by Diamondback's company e-mail system and is subject to archival and possible review by someone other than the recipient. This notice is automatically appended to each e-mail message leaving Diamondback. |