Re: [Quickfix-developers] Weird system freeze
Brought to you by:
orenmnero
From: <or...@qu...> - 2008-04-18 16:08:34
|
<html><body><div>Well, first, I wonder what release version of QF you are using. I can see this happening with some of the old code which used blocking sockets, but we have since switched to non-blocking sockets so I wouldn't expect to see this with more recent releases. Before discussing possible changes to your software, do you know from which release you got the DLLs?</div> <div> </div> <div>--oren</div> <BLOCKQUOTE style="PADDING-LEFT: 8px; MARGIN-LEFT: 8px; BORDER-LEFT: blue 2px solid" webmail="1">-------- Original Message --------<BR>Subject: [Quickfix-developers] Weird system freeze<BR>From: John Haldi <jr...@ya...><BR>Date: Thu, April 17, 2008 10:45 am<BR>To: <a href="mailto:qui...@li...">qui...@li...</a><BR><BR>QuickFIX Documentation: <A href="http://www.quickfixengine.org/quickfix/doc/html/index.html" target=_blank><a href="http://www.quickfixengine.org/quickfix/doc/html/index.html">http://www.quickfixengine.org/quickfix/doc/html/index.html</a></A><BR>QuickFIX Support: <A href="http://www.quickfixengine.org/services.html" target=_blank><a href="http://www.quickfixengine.org/services.html">http://www.quickfixengine.org/services.html</a></A><BR><BR> <HR> <STYLE type=text/css> #wmMessage DIV {margin:0px;} </STYLE> <DIV style="FONT-SIZE: 12pt; FONT-FAMILY: times new roman, new york, times, serif"> <DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2>I had an interesting scenario happen here and was wondering if somebody could sanity check whether what I think happened actually happened, and if so what I might be able to do to handle this scenario if it pops up again. Here's the details:</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN> </DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2>I have an application (QFrouter) which goes out and connects to 6 source systems (brokers/exchanges) using the threadedSocketInitiator. It also allows connections from 20+ client apps (QFclient) using the threadedSocketAcceptor. When a message comes in from an exchange/broker, I handle the callback and check certain fields in the message and determine which of my client apps should get a copy. I then send a copy of this incoming message to each of my clients by "rewiring" the sendercompid and the targetcompid and fan copies out to each client using the SendToTarget method.</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN> </DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2>So far, all of this has worked fine (up until today). [The above process may not be the best way to do this, and I'd love to hear from somebody if this is a dumb way to do this, but that isn't my question right now.]</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN> </DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2>Today a message came in from an exchange connection (ARCA1) and my code proceeded to send copies to each of my client apps. It got through the first 3 client apps and then tried to send to the 4th client app on my list of clients. At this point the client app in question ran out of disk space on the client workstation, resulting in a modal pop-up dialog box appearing on the client workstations saying "out of disk space". (One important factoid is that I didn't know the client workstation was stuck with a modal dialog box being displayed at this point. Might be relevant later in the story.)</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN> </DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2>Once this happened, the thread in my QFrouter seems to have hung. The connection to ARCA1 eventually dropped when I didn't respond to a heartbeat, but my QFrouter never recovered, and eventually I had to kill the process and bounce QFrouter. Once I bounced QFrouter, I reconnected to ARCA1 and messages started flowing. As soon as I received a message that needed to be copied to client #4, my QFrouter hung again. At this point we discovered that client #4 had a modal dialog freezing the app. As soon as we deleted some files (freeing up sdisk space) from client workstation #4 and pressed ok on the modal dialog, messages started flowing to the client app and my QFrouter got "unstuck".</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN> </DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2>In digging into my logs, it appears that the modal dialog was created by a generic exception handler in my client app (QFclient), and was triggered within the onMessage callback when either the QF engine or my app (unclear which) attempted to log the incoming message. With hindsight, I will of course handle this scenario in QFclient in a more graceful manner.</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN> </DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2>But what I'm really curious about is whether QFrouter should have handled the fact that client #4 wasn't responding. I would have thought that in this scenario the QF engine in QFrouter should have recognized that client #4 wasn't responding and dropped that conenction wtihin the threadedSocketAcceptor and moved on -- or at least thrown an exception of some sort. But it didn't - it simply seems to have frozen on that thread. Meanwhile, messages continued to be processed coming in from the other exchanges/brokers on the threadedSocketInitiator and passing those messages on to clients.</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN> </DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2>Am I correct in thinking that my QFrouter should have handled this scenario more gracefully? If so, should I have coded differently to expect this scenario? Should an exception of some sort been thrown by the QF engine?</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN> </DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2>For reference, both QFrouter and QFclient are VB .NET 2005 apps, and both have the generic binary .dll that ships with QF, runtime version 2.0.50727, version 1.0.2447.42056. Both apps have been running happily for months, but this is the first time we had a disk space error.</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN> </DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2>If anybody has any thoughts on this I'd greatly appreciate it.</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN> </DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2>Many thanks,</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN> </DIV> <DIV><SPAN class=406321315-17042008><FONT face=Arial size=2> John</FONT></SPAN></DIV> <DIV><SPAN class=406321315-17042008></SPAN><BR></DIV></DIV></DIV> <HR> -------------------------------------------------------------------------<BR>This <a href="http://SF.net">SF.net</a> email is sponsored by the 2008 JavaOne(SM) Conference <BR>Don't miss this year's exciting event. There's still time to save $100. <BR>Use priority code J8TL2D2. <BR><A href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone" target=_blank><a href="http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone">http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone</a></A> <HR> _______________________________________________<BR>Quickfix-developers mailing list<BR><A onclick="Popup.composeWindow('pcompose.php?sendto=Quickfix-developers%40lists.sourceforge.net'); return false;" href="#Compose">Quickfix-developers<B></B>@lists.sourceforge.net</A><BR><A href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers" target=_blank><a href="https://lists.sourceforge.net/lists/listinfo/quickfix-developers">https://lists.sourceforge.net/lists/listinfo/quickfix-developers</a></A> </BLOCKQUOTE></body></html> |