Re: [Quickfix-developers] Weird system freeze
Brought to you by:
orenmnero
From: Djalma R. d. S. F. <drs...@gm...> - 2008-04-18 01:15:23
|
Hi John, I guess that your QFRouter have hung because the ThreadedSocketAcceptor is implemented using a blocking socket. The connection is not broken, but the modal dialog in your client is preventing the tcp ack. QFRouter will keep frozen until QFClient sends some kind of receiving confirmation, which in your case will happen only when someone closes the modal dialog. If you have QF 1.12.4 you can use the SocketAcceptor to try to solve this problem. Because of performance issues, I preferred to make my own non-blocking implementation which I have already submitted. Djalma On Thu, Apr 17, 2008 at 12:45 PM, John Haldi <jr...@ya...> wrote: > QuickFIX Documentation: > http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX Support: http://www.quickfixengine.org/services.html > > > I had an interesting scenario happen here and was wondering if somebody > could sanity check whether what I think happened actually happened, and if > so what I might be able to do to handle this scenario if it pops up again. > Here's the details: > > I have an application (QFrouter) which goes out and connects to 6 source > systems (brokers/exchanges) using the threadedSocketInitiator. It also > allows connections from 20+ client apps (QFclient) using the > threadedSocketAcceptor. When a message comes in from an exchange/broker, I > handle the callback and check certain fields in the message and determine > which of my client apps should get a copy. I then send a copy of this > incoming message to each of my clients by "rewiring" the sendercompid and > the targetcompid and fan copies out to each client using the SendToTarget > method. > > So far, all of this has worked fine (up until today). [The above process > may not be the best way to do this, and I'd love to hear from somebody if > this is a dumb way to do this, but that isn't my question right now.] > > Today a message came in from an exchange connection (ARCA1) and my code > proceeded to send copies to each of my client apps. It got through the > first 3 client apps and then tried to send to the 4th client app on my list > of clients. At this point the client app in question ran out of disk space > on the client workstation, resulting in a modal pop-up dialog box appearing > on the client workstations saying "out of disk space". (One important > factoid is that I didn't know the client workstation was stuck with a modal > dialog box being displayed at this point. Might be relevant later in the > story.) > > Once this happened, the thread in my QFrouter seems to have hung. The > connection to ARCA1 eventually dropped when I didn't respond to a heartbeat, > but my QFrouter never recovered, and eventually I had to kill the process > and bounce QFrouter. Once I bounced QFrouter, I reconnected to ARCA1 and > messages started flowing. As soon as I received a message that needed to be > copied to client #4, my QFrouter hung again. At this point we discovered > that client #4 had a modal dialog freezing the app. As soon as we deleted > some files (freeing up sdisk space) from client workstation #4 and pressed > ok on the modal dialog, messages started flowing to the client app and my > QFrouter got "unstuck". > > In digging into my logs, it appears that the modal dialog was created by a > generic exception handler in my client app (QFclient), and was triggered > within the onMessage callback when either the QF engine or my app (unclear > which) attempted to log the incoming message. With hindsight, I will of > course handle this scenario in QFclient in a more graceful manner. > > But what I'm really curious about is whether QFrouter should have handled > the fact that client #4 wasn't responding. I would have thought that in > this scenario the QF engine in QFrouter should have recognized that client > #4 wasn't responding and dropped that conenction wtihin the > threadedSocketAcceptor and moved on -- or at least thrown an exception of > some sort. But it didn't - it simply seems to have frozen on that thread. > Meanwhile, messages continued to be processed coming in from the other > exchanges/brokers on the threadedSocketInitiator and passing those messages > on to clients. > > Am I correct in thinking that my QFrouter should have handled this > scenario more gracefully? If so, should I have coded differently to expect > this scenario? Should an exception of some sort been thrown by the QF > engine? > > For reference, both QFrouter and QFclient are VB .NET 2005 apps, and both > have the generic binary .dll that ships with QF, runtime version 2.0.50727, > version 1.0.2447.42056. Both apps have been running happily for months, but > this is the first time we had a disk space error. > > If anybody has any thoughts on this I'd greatly appreciate it. > > Many thanks, > > John > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > Quickfix-developers mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfix-developers > |