RE: [Quickfix-users] Session resync problem
Brought to you by:
orenmnero
From: Ajay K. <Aja...@tr...> - 2006-03-08 11:52:41
|
Coalescing multiple overlapping resend requests into a single resend should definitely help in Sean's case where there were only 125 messages to resend.=20 However, note that with a single thread serving all FIX sessions there is still the possibility of a massive resend request on one FIX session starving the other FIX sessions. I have seen some cases in production in which a catastrophic sequence number problem on one side of a FIX connection result in a resend request for thousands of messages. If the FIX engine on the other end servicing that massive resend request does so in a tight loop, it will still take just one misbehaving counter party to affect the other FIX sessions. Obviously this kind of a problem doesn't occur very often, and would also not be a serious problem for installations with moderate to light volume of FIX messages. Regards, - Ajay -----Original Message----- From: Oren Miller [mailto:or...@qu...]=20 Sent: Tuesday, March 07, 2006 9:05 PM To: Sean Kirkpatrick Cc: Ajay Kamdar; qui...@li... Subject: Re: [Quickfix-users] Session resync problem Although Ajay's analysis is correct, and under other circumstances=20 moving to a threaded model might be appropriate, it is actually a red=20 herring in this case. I know you guys are running an older version of=20 the engine, and it is the resend logic in there where the fault really=20 lies. Older versions of QuickFIX did not handle this sort of resend=20 scenario very gracefully. The old implementation wasn't technically=20 incorrect, but it wasn't especially smart either. Newer versions of the engine can detect these sort of recursive resend request scenarios and=20 avoid them so you would only send 125 instead of 125! messages. The relevant code (in newer versions) that protects against this=20 scenario is implemented with a resendRange in the session class. --oren Sean Kirkpatrick wrote: > Thanks Ajay, I appreciate the response. > =20 > We had considered that as an option, but I believe the ThreadedSocket > classes spawn a thread per session. Having hundreds of threads wasn't > a desirable approach for us. A thread pool would probably work, but I > don't think that is implemented...perhaps I am mistaken? > =20 > --Sean > > -----Original Message----- > *From:* Ajay Kamdar [mailto:Aja...@tr...] > *Sent:* Tuesday, March 07, 2006 11:29 AM > *To:* Sean Kirkpatrick; qui...@li... > *Subject:* RE: [Quickfix-users] Session resync problem > > You might want to consider using the > ThreadedSocketAcceptor/ThreadedSocketInitiator classes that will > place each Session on its own thread, which I expect should > prevent the other sessions getting starved while the engine is > busy servicing the resend requests in a tight loop. Obviously your > application would need to be thread safe to go this route. Caveat > emptor: Not having actually used these classes myself yet, this > suggestion is based upon theoritical analysis. YMMV. > =20 > - Ajay > > -----Original Message----- > *From:* Sean Kirkpatrick > [mailto:sea...@pi...] > *Sent:* Tuesday, March 07, 2006 9:05 AM > *To:* qui...@li... > *Subject:* [Quickfix-users] Session resync problem > > Hello All, > > We had an issue in our production environment that boiled down > to the following: > > 1. Client has hard disconnect > 2. We send some messages prior to detecting the session is down > 3. Client logs back in with higher than expected seq num and > immediately starts sending some messages > 4. We send resend reqs for each message we receive until they > are handled, which the client does by > sending us seq reset messages. > 5. The client heartbeats. > 6. At this point, we do some message resending. > -- this is where the trouble began -- > 7. Since the client did not sync its seq nums after the logon, > when we start sending these messages they > have higher than expected seq nums. > 8. Client sends a resend request for each of the messages we > sent (125). > > When processing the resend requests, the engine sits in a > tight loop processing its queue. The trouble > here is that the resend requests took approx. 5 minutes to get > through and all other connections were > starved. > > Has anyone come across this problem, or have a suggestion for > dealing with it gracefully? > > Regards, > > Sean Kirkpatrick > _________________________________________________________________________= __ The information in this email is confidential and may be legally = privileged. It is intended solely for the addressee. Access to this = email by anyone else is unauthorized. If you are not the intended = recipient, any disclosure, copying, distribution or any action taken or = omitted to be taken in reliance on it, is prohibited and may be = unlawful. _________________________________________________________________________= __ |