Thread: [Quickfix-developers] possible deadlocking/freeze with routing implementation ?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi, 

  I've been using quickfix for about three months in a routing kind of
situation (similar to this poster:
http://www.nabble.com/ThreadedSocketAcceptor---Message-resend-tc17067577.html)
where I have both an initiator and an acceptor.  I am using
ThreadedSocketAcceptor, ThreadedSocketInitator, and SynchronizedApplication. 
There are n sessions coming in through the acceptor and 1 out via the
initiator, and it is a comparitively low message throughput application.  

  We experienced a situation today where quickfix tried to send a message to
its outgoing connection, while processing a message received from on of
incoming connections, somewhere in Session::sendToTarget my quickfix server
got "stuck" (i.e. I know I called ::sendToTarget, but I never got to the
part where the message was actually sent, as I did not see the offending
message in the filestore (am using the FileStoreFactory)).  I am inclined to 
believe this is a deadlocking situation  but I am not sure on what resource
I am deadlocking and which are the two (or more!) threads that are
contending for the resource.  During this period of time (it was about 5
minutes before someone noticed - as I said "low volume"!), several clients
timed out because they failed to receive heartbeat responses, and began the
reconnect process.  For each of these clients I see the incoming heartbeat
in the individual message store for each client and then later the following
in the global log: 
"Accepted connection from x.x.x.x on port yyyy"
but there is no corresponding response from my server to neither the
heartbeat nor the new connection, presumably because those messages are
queued up waiting for the thread to finish processing the original
sendToTarget which is "stuck". I cannot however, figure out why the original
sendToTarget is "stuck" nor if it is deadlocked what it is waiting on.  

  The quickfix server was bounced, and everybody recovered and the offending
message was resent, and so everything was fine in the end, but I was
wondering if someone could help to figure out what I am doing in the code
that caused this errant issue.  Any thoughts, or pointers as to where to
look for a possible issue would be _much_ appreciated.  

Regards,
Liz
-- 
View this message in context: http://www.nabble.com/possible-deadlocking-freeze-with-routing-implementation---tp17093661p17093661.html
Sent from the QuickFIX - Dev mailing list archive at Nabble.com.

Thread: [Quickfix-developers] possible deadlocking/freeze with routing implementation ?

quickfix-developers