[Quickfix-users] Crash scenarios in QuickFix
Brought to you by:
orenmnero
From: Stancescu C. <Con...@sw...> - 2002-09-04 10:12:27
|
Hi, I am Constantin Stancescu from SWX (Suisse Exchange) I am currently implemented the FIX support for our trading system, based on QuickFix, the Windows C++ version. I have a couple of intersting subjects, this is the first of them: As I understand the sender algorithm ( in Session.cpp Session::sendRaw) is : 1. send() 2. m_pStore->set() // includes disk flush=20 3. m_pStore->incrNextSenderMsgSeqNum() // includes disk flush=20 If the application crashes after 1 (send) but before 3 = (incrNextSenderMsgSeqNum)=20 we may have the following situation(I forced and tested it) : - The receiver receives the message, so he will now expect n+1 as next = message. - The sender, when restarting, will come with n < n + 1 and the logon = attempt is=20 rejected and we have to completely reset the session. A couple of observations about coping with this situation: - In this situation the send call from the user layer will never return. Our application is written in such a way that in this case the business message will be send again with possResend flag set, even if a new FIX=20 session is created with the same party. Is it reasonable to assume that all(most) party applications will have this kind of behaviour ? - A slight change may reduce the probability of trouble; the order of action in Session::sendRaw should by: 1.m_pStore->incrNextSenderMsgSeqNum() // includes disk flush 2.m_pStore->set() // includes disk flush=20 3.send() - If the application crashes after 2. but before 3., at restart the sender will present a sequence number n+1 when the receiver expects = n. The receiver will ask for resend, the sender has all he needs and we = are OK. - If the application crashes after 1. but before 2., at restart the sender will present a sequence number n+1 when the receiver expects = n. The receiver will ask for resend, the sender does not have the message, = we are in trouble. In our application we provide our own message store = implementation using an MSSQL database instead of FileStore. Step 1(incrNextSenderMsgSeqNum) and and 2(set) are part of the same = transaction=20 so we can not crash between 1 and 2. In order to do this we have to = wait for the second call before commiting, it will be nicer to have just one MessageStore = call, say incrAndSet !! What do you think about my proposal ? Any others opininons or tips ? Regards, Constantin =20 |