Menu

#52 1.13.1 mutex deadlock

open
nobody
None
7
2010-08-17
2010-08-17
Eccy
No

Hello,

I have troubles with mutex locking. The situation is the following :

- The fix engine is processing incoming and outgoing messages simultaneously. After some messages processed the engine freezes always in the SessionState class when performing the lock on the mutex required to obtain next sequence numbers (getNextSenderMsgSeqNum()). Like if ingoing and outgoing messages processes are waiting for each other.

Once freeze is detected (by another process checking sequence numbers activity) I have to restart the engine to have it back on tracks.

FIXEngine is running on Unix Solaris 10 system with quickfix library version 1.13.1

Any help is welcome :-)

Many thanks

Regards

Discussion

  • Eccy

    Eccy - 2010-08-17
    • priority: 5 --> 7
     
  • Oren Miller

    Oren Miller - 2010-09-10

    What are you doing in your callbacks? This might be a result of the interaction between the engine and your code.

     
  • Eccy

    Eccy - 2010-09-10

    Hello,

    This is what i do :

    in Application::fromApp i log message content, crack it and then get sequence numbers to send them to a monitoring system (internal application listening SunMQ). The last cas i had is with an Allocation message and no simultaneous messages (incoming and outgoing) where processed. Only an Allocation response from a broker. This happens on a total irregular basis.

    Here is the code (i removed confidential data).

    void Application::fromApp( const FIX::Message& message, const FIX::SessionID& sessionID ) throw(FIX::FieldNotFound, FIX::UnsupportedMessageType, FIX::IncorrectTagValue)
    {
    logger::cout << "\n[Application FromApp] " << "IN: " << message.toString() << logger::endl;
    crack(message, sessionID);
    adminMessageQ->sendSeqNumbers(); // this is a pointer to a SunMQ object
    }

    The crack calls
    void Application::onMessage ( const FIX42::Allocation& message, const FIX::SessionID& sessionID)
    {
    ...
    Many calls like this one to retrieve fields values we need to acknowledge response
    if(message.isSetField(exAllocTransType))
    {
    message.get(exAllocTransType);
    mapMessageFIX["AllocTransType"] = exAllocTransType.getString(); // MapStringString storing values and custom keys fo further process
    }
    ...
    }

    finally the snedseqnumbers call after the crack invocation.

    void AdminMessage::sendSeqNumbers()
    {
    std::stringstream ss;
    ss<<"(SEND "<<session->getExpectedSenderNum()<<")(TARGET " << session->getExpectedTargetNum() << ")";
    std::string message(ss.str());
    logger::cout<<"[AdminMessage sendSeqNumbers] " << message << logger::endl;
    this->sendMessage(message,SEQ_NUMS);
    }

    So when the session->getExpectedSenderNum() (FIX::Session* session) is called it freezes. I've put state numbers in the library to see where it stops and here is where it hangs :

    bool Session::sendRaw( Message& message, int num )
    {
    QF_STACK_PUSH(Session::sendRaw)
    fixLogger::cout<<"[Session sendRaw] state = 10" <<fixLogger::endl;
    Locker l( m_mutex );
    fixLogger::cout<<"[Session sendRaw] state = 11" <<fixLogger::endl;
    ...
    }

    And here is the result in log

    ...
    [04/08/2010-21:19:55.224][Application FromApp] IN: 8=FIX.4.29=39635=J34=5861 ...29=110=179
    [04/08/2010-21:19:55.225] Inside Allocation Message
    ...
    [04/08/2010-21:19:55.232] [AdminMessage sendSeqNumbers] (SEND 1955)(TARGET 5861)
    [04/08/2010-21:20:01.324] [Session sendRaw] state = 10
    [04/08/2010-21:45:10.826] NOTIFY END
    [04/08/2010-21:45:10.826] [AminMessage notifyStop] - FIX STOPPED- Notifying Admin
    [04/08/2010-21:45:10.828] KILL

    As you can see engine freezed for 25 minutes and has been killed then restarted.

    Many thanks for your help.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.