RE: [Quickfix-developers] RE: Intermittent disconnect problem
Brought to you by:
orenmnero
From: Bishop, B. <Bar...@gs...> - 2005-02-02 10:03:31
|
Hi Oren, Thanks again for your response. I am definitely of the opinion that nothing should be thrown away, especially regarding useful information about a runtime event/error. So yes, it's a great idea to trap ERRNO around every socket call. To propogate this up to the session would mean starting at a low level. I was surprised to find the call to recv() (and also socket_fionread) in Parser, but that's where it would have to start. Regards, barry -----Original Message----- From: Oren Miller [mailto:or...@qu...] Sent: Tuesday, February 01, 2005 5:25 PM To: Bishop, Barry; qui...@li... Cc: 'Caleb Epstein'; 'Perez, John' Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem Well, not necessarilly. "Dropped Connection" right now just means that the connection was droppen outside of a logout sequence. If QuickFIX knows the reason for this, it will be preceeded in the log for the reason, such as "Timed out waiting for heartbeat." The only time I can think of where QF would not know the exact reason for a disconnect is if the socket is either broken somehow, or closed by the counterparty. You can see in the SocketInitiator and SocketAcceptor onDisconnect methods, that Session::disconnect is being called. This is the only place where there is not an additional message that provides a disconnect reason. What we can do is start logging the error codes of the socket calls to get a more detailed analysis on what is hapenning with the socket. For instance, calling close can set the global error code to one of the following. EBADF The s argument is not an active descriptor. ECONNABORTED The connection was aborted by the remote endpoint. ECONNREFUSED The remote endpoint refused to continue the connection. ECONNRESET The remote endpoint reset the connection request. EDESTUNREACH Remote destination is now unreachable. EHOSTUNREACH Remote host is now unreachable. ENETDOWN Local network interface is down. ETIMEDOUT The connection timed out. I think that would get you the information you need to figure out the source of the disconnect. --oren ----- Original Message ----- From: "Bishop, Barry" <Bar...@gs...> To: "'Oren Miller'" <or...@qu...>; <qui...@li...> Cc: "'Caleb Epstein'" <cal...@gm...>; "'Perez, John'" <jp...@Cr...> Sent: Tuesday, February 01, 2005 10:30 AM Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > Hello Oren, > > I'm afraid there is no consistency as to when this happens. Sometimes > it doesn't happen for a week, whereas it could be 8 times a week > anywhere from 7:00AM to 9:00 PM. > > The amount of traffic is very low, maybe 1 or 2 messages per second at > most. > > The outage doesn't last long and it's not a very big deal, but it > would be nice to get it fixed. > > Can you confirm that if quickfix logs the message 'Dropped Connection' > then > it was quickfix that disconnected? I believe this to be the case, but it > seems that the code only tests to see if a logout has been sent. > > Should quickfix have logged another message if the above is true? > > In the meantime, I will take John Perez's advice and have a good look > through the last received message just in case this is related. > However, the disconnect often occurs many seconds after the last > message is sent or received. > > Thanks again, > barry > > > > -----Original Message----- > From: Oren Miller [mailto:or...@qu...] > Sent: Tuesday, February 01, 2005 4:12 PM > To: Bishop, Barry; qui...@li... > Cc: 'Caleb Epstein' > Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem > > > Barry, > > Is there anything common about the times in which these disconnects > occur? Is this a high frequency line? Is it possible you are > overloading the socket buffer? > > --oren > > ----- Original Message ----- > From: "Bishop, Barry" <Bar...@gs...> > To: <qui...@li...> > Cc: "Oren Miller" <or...@qu...>; "'Caleb Epstein'" > <cal...@gm...> > Sent: Tuesday, February 01, 2005 9:37 AM > Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > > >> QuickFIX Documentation: >> http://www.quickfixengine.org/quickfix/doc/html/index.html >> QuickFIX FAQ: >> http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ >> QuickFIX Support: http://www.quickfixengine.org/services.html >> >> Hello all, >> >> This is a follow up on a problem that I was having last year: >> >> quickfix seemingly disconnects from its peer without indicating why. >> >> We've upgraded our system to quickfix 1.9.4 in the hope of getting >> more useful messages, but we don't appear to. I've had a look through >> the code and I can't see quite how this could happen. However it >> does. >> >> To reiterate, at some random time quickfix disconnects the TCP >> seesion from its peer and logs this message in the event log: >> >> 20050201-13:33:27 : Dropped Connection >> >> This message indicates that quickfix initiated the disconnect, but it >> does not say why. The inbound and outbound messages all look fine and >> usually there has been a few seconds since the last message was sent >> anyway. >> >> What happens next is the usual reconnect, logon and resend request. >> Everything continues after this. Incidentally, since upgrading from >> quickfix 1.4.0 to 1.9.4 this reconnect/resync is a whole order of >> magnitude better behaved. >> >> However, the mysterious disconnect still occurs. >> >> Has anyone else seen anything like this? >> Can anyone give me any suggestions as to how to track down the >> problem? >> >> We are running quickfix 1.9.4 on solaris 5.8 >> quickfix was built with GCC 3.2.2 >> We connect using SocketInitiator >> >> Thanks in advance, >> barry >> >> >> >> Here are some excerpts from our logs: >> >> EVENT LOG >> ========= >> 20050201-13:33:27 : Dropped Connection >> 20050201-13:33:29 : Connecting to XXX.XXX.XXX.XXX on port YYYY >> 20050201-13:33:29 : Connection succeeded 20050201-13:33:29 : >> Initiated logon request 20050201-13:33:31 : Received logon response >> >> >> INCOMING >> ======== >> The last message before disconnecting >> 8=FIX.4.2|9=0183|35=R|115=2126|34=8349|49=CCCCCC|56=BBBBBB|52=2005020 >> 1 >> -13:32 >> > :49|122=20050201-13:33:23|116=10101010101010101|144=ZZZZZZZ|131=200502 > 017287 >> |146=1|55=BBBBBB|48=773670|22=108|38=100|10=146| >> >> The logon response >> 8=FIX.4.2|9=0067|35=A|34=8351|49=CCCCCC|56=BBBBBB|52=20050201-13:32:5 >> 4 >> |98=0| >> 108=30|10=004| >> >> >> OUTGOING >> ======== >> The last message before disconnecting >> 8=FIX.4.2|9=290|35=S|34=8232|49=BBBBBB|52=20050201-13:33:23.582|56=CC >> C >> CCC|12 >> > 8=2126|129=10101010101010101|145=ZZZZZZZ|22=108|48=773670|55=GSAMFFT|1 > 07=des >> cription|117=id|131=txn|132=1|133=2|134=50000| >> >> The logon after the disconnect >> 135=50000|167=OPT|200=200101|201=1|202=1.1|205=20|206=L|231=0.01|10=1 >> 8 >> 1| >> > 8=FIX.4.2|9=71|35=A|34=8233|49=GSAMFFT|52=20050201-13:33:29.210|56=CAT > SOS|98 >> =0|108=30|10=098| >> >> >> APPLICATION LOG >> =============== >> Tue Feb 1 13:33:23:528 GMT+00:00 2005|Received: >> quickfix.fix42.QuoteRequest Tue Feb 1 13:33:23:528 GMT+00:00 >> 2005|toApp, SessionID=FIX.4.2:BBBBBB->CCCCCC, >> Message=quickfix.fix42.Quote Tue Feb 1 13:33:27:117 GMT+00:00 >> 2005|onLogout, SessionID=FIX.4.2:BBBBBB->CCCCCC >> >> >> >> -----Original Message----- >> From: Bishop, Barry >> Sent: Tuesday, November 30, 2004 08:11 AM >> To: or...@qu... [mailto:or...@qu...] >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> Hello Oren, >> >> Thanks for the reply. >> >> Sounds to me like I should try version 1.9.2 or later in our >> production environment. I have been unable to reproduce the >> mysterious disconnect in our QA system to the same client, but this >> is not surprising as it is so infrequent. I have been simulating it >> by breaking something else in the chain (which would appear as a >> client >> disconnect) so this would explain the lack of an explanation from >> qdhÔuickfix. >> >> I will try this over the next few days and report back. >> >> Thanks again, >> barry >> >> >> -----Original Message----- >> From: or...@qu... [mailto:or...@qu...] >> Sent: Monday, November 29, 2004 7:56 PM >> To: Bishop, Barry >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> >> Barry, >> >> For every disconnect that QuickFIX initiates, there should be a >> reason provided (not with 1.4.0, but with the new releases). With >> 1.9.4 (available now), QuickFIX also displays a "Dropped Connection" >> message if the disconnect is initiated by the peer (1.9.2, does not >> differentiate). That should help you to verify if it is QuickFIX >> that is initiating the disconnect. I don't think there are any more >> cases where QuickFIX initiates a disconnect without providing a >> reason. If the couterparty drops the connection, then unless they >> provide information in the form of a reject or >> logoff text, there is little QuickFIX can do to determine the cause. The >> best that we can probably do is report whether the socket was dropped >> gracefully, and therefore intentionally, or if it was an abnormal >> disconnect >> of some sort. >> >> Is there anything significantly different about this new client? >> Does their logs reveal anything about the nature of the disconnect? >> >> --oren >> >>> 1) Anyone have any idea what's going on? >>> 2) Is there a way to increase the amount of detail in log messages, >>> especially those to do with disconnection events? >>> 3) What sort of thing would cause quickfix to disconnect without >>> saying why? >>> >>> Thanks in advance, >>> barry >> >> >> >> ------------------------------------------------------- >> SF email is sponsored by - The IT Product Guide >> Read honest & candid reviews on hundreds of IT Products from real >> users. Discover which products truly live up to the hype. Start >> reading now. http://productguide.itmanagersjournal.com/ >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: IntelliVIEW -- Interactive >> Reporting Tool for open source databases. Create drag-&-drop reports. >> Save time by over 75%! Publish reports on the web. Export to DOC, >> XLS, RTF, etc. Download a FREE copy at >> http://www.intelliview.com/go/osdn_nl >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> > |