1.13.1 mutex deadlock
Brought to you by:
orenmnero
Hello,
I have troubles with mutex locking. The situation is the following :
- The fix engine is processing incoming and outgoing messages simultaneously. After some messages processed the engine freezes always in the SessionState class when performing the lock on the mutex required to obtain next sequence numbers (getNextSenderMsgSeqNum()). Like if ingoing and outgoing messages processes are waiting for each other.
Once freeze is detected (by another process checking sequence numbers activity) I have to restart the engine to have it back on tracks.
FIXEngine is running on Unix Solaris 10 system with quickfix library version 1.13.1
Any help is welcome :-)
Many thanks
Regards
What are you doing in your callbacks? This might be a result of the interaction between the engine and your code.
Hello,
This is what i do :
in Application::fromApp i log message content, crack it and then get sequence numbers to send them to a monitoring system (internal application listening SunMQ). The last cas i had is with an Allocation message and no simultaneous messages (incoming and outgoing) where processed. Only an Allocation response from a broker. This happens on a total irregular basis.
Here is the code (i removed confidential data).
void Application::fromApp( const FIX::Message& message, const FIX::SessionID& sessionID ) throw(FIX::FieldNotFound, FIX::UnsupportedMessageType, FIX::IncorrectTagValue)
{
logger::cout << "\n[Application FromApp] " << "IN: " << message.toString() << logger::endl;
crack(message, sessionID);
adminMessageQ->sendSeqNumbers(); // this is a pointer to a SunMQ object
}
The crack calls
void Application::onMessage ( const FIX42::Allocation& message, const FIX::SessionID& sessionID)
{
...
Many calls like this one to retrieve fields values we need to acknowledge response
if(message.isSetField(exAllocTransType))
{
message.get(exAllocTransType);
mapMessageFIX["AllocTransType"] = exAllocTransType.getString(); // MapStringString storing values and custom keys fo further process
}
...
}
finally the snedseqnumbers call after the crack invocation.
void AdminMessage::sendSeqNumbers()
{
std::stringstream ss;
ss<<"(SEND "<<session->getExpectedSenderNum()<<")(TARGET " << session->getExpectedTargetNum() << ")";
std::string message(ss.str());
logger::cout<<"[AdminMessage sendSeqNumbers] " << message << logger::endl;
this->sendMessage(message,SEQ_NUMS);
}
So when the session->getExpectedSenderNum() (FIX::Session* session) is called it freezes. I've put state numbers in the library to see where it stops and here is where it hangs :
bool Session::sendRaw( Message& message, int num )
{
QF_STACK_PUSH(Session::sendRaw)
fixLogger::cout<<"[Session sendRaw] state = 10" <<fixLogger::endl;
Locker l( m_mutex );
fixLogger::cout<<"[Session sendRaw] state = 11" <<fixLogger::endl;
...
}
And here is the result in log
...
[04/08/2010-21:19:55.224][Application FromApp] IN: 8=FIX.4.29=39635=J34=5861 ...29=110=179
[04/08/2010-21:19:55.225] Inside Allocation Message
...
[04/08/2010-21:19:55.232] [AdminMessage sendSeqNumbers] (SEND 1955)(TARGET 5861)
[04/08/2010-21:20:01.324] [Session sendRaw] state = 10
[04/08/2010-21:45:10.826] NOTIFY END
[04/08/2010-21:45:10.826] [AminMessage notifyStop] - FIX STOPPED- Notifying Admin
[04/08/2010-21:45:10.828] KILL
As you can see engine freezed for 25 minutes and has been killed then restarted.
Many thanks for your help.