Thread: RE: [Quickfix-developers] RE: Intermittent disconnect problem
Brought to you by:
orenmnero
From: Bishop, B. <Bar...@gs...> - 2004-12-01 13:51:59
|
Hi Caleb, Thanks for the advice. I might try switching to this, but at the moment I am using gcc 2.95.2 (old, I know). I'm also using STL port 4.6.2 and libxml 2.5.10. This is all on top of SunOS 5.8 At the moment I'm having another go at using the configure script. It all starts going wrong from here: configure --with-stlport=/home/bishoba/qf/STLport-4.6.2 For example, the configure script builds Makefiles with incorrect include directories: /home/bishoba/qf/STLport-4.6.2/include/stlport but there is no 'include' subdirectory under stlport. I can edit configure to make this right and re-run it. I'm grappling with this now: /bin/bash ../../../libtool --mode=compile g++ -DHAVE_CONFIG_H -I. -I. -I../../.. -I.. -g -O2 -I/home/bishoba/qf/STLport-4.6.2/stlport -I/home/bishoba/libxmlbin/GPlxml/reloc/libxml/include/libxml2 -I/opt/JDK-1.3/j2se/include -I/opt/JDK-1.3/j2se/include/solaris -O0 -g -c -o FieldBaseTestCase.lo `test -f 'FieldBaseTestCase.cpp' || echo './'`FieldBaseTestCase.cpp g++ -DHAVE_CONFIG_H -I. -I. -I../../.. -I.. -g -O2 -I/home/bishoba/qf/STLport-4.6.2/stlport -I/home/bishoba/libxmlbin/GPlxml/reloc/libxml/include/libxml2 -I/opt/JDK-1.3/j2se/include -I/opt/JDK-1.3/j2se/include/solaris -O0 -g -c FieldBaseTestCase.cpp -Wp,-MD,.deps/FieldBaseTestCase.TPlo -fPIC -DPIC -o FieldBaseTestCase.lo In file included from /home/bishoba/qf/STLport-4.6.2/stlport/stl/_threads.h:233, from /home/bishoba/qf/STLport-4.6.2/stlport/stl/_alloc.h:64, from /home/bishoba/qf/STLport-4.6.2/stlport/stdexcept:45, from ../../../CPPTest/Exception.h:4, from ../../../CPPTest/Test.h:4, from ../../../CPPTest/TestCase.h:4, from FieldBaseTestCase.h:25, from FieldBaseTestCase.cpp:27: /usr/include/synch.h:78: type specifier omitted for parameter /usr/include/synch.h:78: parse error before `*' The offending line is this: /usr/include/synch.h:78: int _lwp_cond_timedwait(lwp_cond_t *, lwp_mutex_t *, timestruc_t *); I spent three weeks on a string of problems like this last year and never got it to work. If you don't mind me asking, did you have difficulties like this? Maybe I am just incompetent. Thanks anyway, barry -----Original Message----- From: Caleb Epstein [mailto:cal...@gm...] Sent: Wednesday, December 01, 2004 1:24 PM To: Oren Miller Cc: Bishop, Barry; qui...@li... Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem On Wed, 1 Dec 2004 05:41:45 -0600, Oren Miller <or...@qu...> wrote: > There are a significant amount of Solaris users. You don't mention if > you are building with gcc or SunPRO? There were a lot of > contributions recently concerning getting QF to build with the SunPRO > compiler. What are the sort of problems that you run into? Barry, if you use the SunPRO C++ compiler, you NEED to follow the directions here: http://tinyurl.com/5sls4 -- Caleb Epstein caleb dot epstein at gmail dot com |
From: Bishop, B. <Bar...@gs...> - 2004-12-01 14:14:47
|
Sounds like you've got me sussed! I think I'll take your advice as I've tried building without stlport and it won't. All the includes are wrong (e.g. I only have a limits.h not limits) so I'll need to seek out a later gcc. This is not easy in my organisation. Thank you very much indeed for responding. barry -----Original Message----- From: Caleb Epstein [mailto:cal...@gm...] Sent: Wednesday, December 01, 2004 2:06 PM To: Bishop, Barry Cc: Oren Miller; qui...@li... Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem On Wed, 1 Dec 2004 13:51:41 -0000, Bishop, Barry <bar...@gs...> wrote: > If you don't mind me asking, did you have difficulties like this? > Maybe I am just incompetent. No, just a glutton for punishment :) Seriously though, gcc 2.95 is just too old if you want to compile standards-conforming C++ code. You should try the latest 3.3 or 3.4 version (we use 3.3.2 here with success on Linux and Solaris). You can drop STLport with these newer versions as well. Your life becomes much simpler, at the expense of needing to compile all your C++ code because name mangling changed. -- Caleb Epstein caleb dot epstein at gmail dot com |
From: Bishop, B. <Bar...@gs...> - 2004-12-03 16:36:41
|
Hi Caleb, Thanks again for your advice. I've been distracted by other issues, but still managed to build quickfix 1.9.3 and 1.9.4 using the configure generated makefiles and gcc version 3.2.2. At the end of the day, I think it is safe to say that this does not work with gcc 2.95.2. You get a warning when running configure, but the warning doesn't do it justice! There are so many errors downstream that it might be better not to even attempt to support gcc 2.95.x. Oren, The 1.9.4 build is looking good and we have tested this in our QA environment. It reconnects very well (better than 1.4.0), but it will be some days before we run with this in our production environment. It would be nice if the strange disconnect problem goes away with this version, but it would be equally nice to see the better logging for these kind of events and learn something about what is happening. Again I will report back next week. Thanks again for everyone's input and thanks for a really good product. I know I've had problems building it, but once passed that stage the software has performed very well indeed. I would concur with one of your previous postings about the uptake of quickfix within the finance sector. There are a lot of people using it, which is most encouraging. Regards, barry -----Original Message----- From: Caleb Epstein [mailto:cal...@gm...] Sent: Wednesday, December 01, 2004 2:06 PM To: Bishop, Barry Cc: Oren Miller; qui...@li... Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem On Wed, 1 Dec 2004 13:51:41 -0000, Bishop, Barry <bar...@gs...> wrote: > If you don't mind me asking, did you have difficulties like this? > Maybe I am just incompetent. No, just a glutton for punishment :) Seriously though, gcc 2.95 is just too old if you want to compile standards-conforming C++ code. You should try the latest 3.3 or 3.4 version (we use 3.3.2 here with success on Linux and Solaris). You can drop STLport with these newer versions as well. Your life becomes much simpler, at the expense of needing to compile all your C++ code because name mangling changed. -- Caleb Epstein caleb dot epstein at gmail dot com |
From: Bishop, B. <Bar...@gs...> - 2005-02-01 15:38:10
|
Hello all, This is a follow up on a problem that I was having last year: quickfix seemingly disconnects from its peer without indicating why. We've upgraded our system to quickfix 1.9.4 in the hope of getting more useful messages, but we don't appear to. I've had a look through the code and I can't see quite how this could happen. However it does. To reiterate, at some random time quickfix disconnects the TCP seesion from its peer and logs this message in the event log: 20050201-13:33:27 : Dropped Connection This message indicates that quickfix initiated the disconnect, but it does not say why. The inbound and outbound messages all look fine and usually there has been a few seconds since the last message was sent anyway. What happens next is the usual reconnect, logon and resend request. Everything continues after this. Incidentally, since upgrading from quickfix 1.4.0 to 1.9.4 this reconnect/resync is a whole order of magnitude better behaved. However, the mysterious disconnect still occurs. Has anyone else seen anything like this? Can anyone give me any suggestions as to how to track down the problem? We are running quickfix 1.9.4 on solaris 5.8 quickfix was built with GCC 3.2.2 We connect using SocketInitiator Thanks in advance, barry Here are some excerpts from our logs: EVENT LOG ========= 20050201-13:33:27 : Dropped Connection 20050201-13:33:29 : Connecting to XXX.XXX.XXX.XXX on port YYYY 20050201-13:33:29 : Connection succeeded 20050201-13:33:29 : Initiated logon request 20050201-13:33:31 : Received logon response INCOMING ======== The last message before disconnecting 8=FIX.4.2|9=0183|35=R|115=2126|34=8349|49=CCCCCC|56=BBBBBB|52=20050201-13:32 :49|122=20050201-13:33:23|116=10101010101010101|144=ZZZZZZZ|131=200502017287 |146=1|55=BBBBBB|48=773670|22=108|38=100|10=146| The logon response 8=FIX.4.2|9=0067|35=A|34=8351|49=CCCCCC|56=BBBBBB|52=20050201-13:32:54|98=0| 108=30|10=004| OUTGOING ======== The last message before disconnecting 8=FIX.4.2|9=290|35=S|34=8232|49=BBBBBB|52=20050201-13:33:23.582|56=CCCCCC|12 8=2126|129=10101010101010101|145=ZZZZZZZ|22=108|48=773670|55=GSAMFFT|107=des cription|117=id|131=txn|132=1|133=2|134=50000| The logon after the disconnect 135=50000|167=OPT|200=200101|201=1|202=1.1|205=20|206=L|231=0.01|10=181| 8=FIX.4.2|9=71|35=A|34=8233|49=GSAMFFT|52=20050201-13:33:29.210|56=CATSOS|98 =0|108=30|10=098| APPLICATION LOG =============== Tue Feb 1 13:33:23:528 GMT+00:00 2005|Received: quickfix.fix42.QuoteRequest Tue Feb 1 13:33:23:528 GMT+00:00 2005|toApp, SessionID=FIX.4.2:BBBBBB->CCCCCC, Message=quickfix.fix42.Quote Tue Feb 1 13:33:27:117 GMT+00:00 2005|onLogout, SessionID=FIX.4.2:BBBBBB->CCCCCC -----Original Message----- From: Bishop, Barry Sent: Tuesday, November 30, 2004 08:11 AM To: or...@qu... [mailto:or...@qu...] Cc: 'qui...@li...' Subject: RE: [Quickfix-developers] Intermittent disconnect problem Hello Oren, Thanks for the reply. Sounds to me like I should try version 1.9.2 or later in our production environment. I have been unable to reproduce the mysterious disconnect in our QA system to the same client, but this is not surprising as it is so infrequent. I have been simulating it by breaking something else in the chain (which would appear as a client disconnect) so this would explain the lack of an explanation from quickfix. I will try this over the next few days and report back. Thanks again, barry -----Original Message----- From: or...@qu... [mailto:or...@qu...] Sent: Monday, November 29, 2004 7:56 PM To: Bishop, Barry Cc: 'qui...@li...' Subject: RE: [Quickfix-developers] Intermittent disconnect problem Barry, For every disconnect that QuickFIX initiates, there should be a reason provided (not with 1.4.0, but with the new releases). With 1.9.4 (available now), QuickFIX also displays a "Dropped Connection" message if the disconnect is initiated by the peer (1.9.2, does not differentiate). That should help you to verify if it is QuickFIX that is initiating the disconnect. I don't think there are any more cases where QuickFIX initiates a disconnect without providing a reason. If the couterparty drops the connection, then unless they provide information in the form of a reject or logoff text, there is little QuickFIX can do to determine the cause. The best that we can probably do is report whether the socket was dropped gracefully, and therefore intentionally, or if it was an abnormal disconnect of some sort. Is there anything significantly different about this new client? Does their logs reveal anything about the nature of the disconnect? --oren > 1) Anyone have any idea what's going on? > 2) Is there a way to increase the amount of detail in log messages, > especially those to do with disconnection events? > 3) What sort of thing would cause quickfix to disconnect without > saying why? > > Thanks in advance, > barry ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Quickfix-developers mailing list Qui...@li... https://lists.sourceforge.net/lists/listinfo/quickfix-developers |
From: Oren M. <or...@qu...> - 2005-02-01 16:11:39
|
Barry, Is there anything common about the times in which these disconnects occur? Is this a high frequency line? Is it possible you are overloading the socket buffer? --oren ----- Original Message ----- From: "Bishop, Barry" <Bar...@gs...> To: <qui...@li...> Cc: "Oren Miller" <or...@qu...>; "'Caleb Epstein'" <cal...@gm...> Sent: Tuesday, February 01, 2005 9:37 AM Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > QuickFIX Documentation: > http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX FAQ: http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ > QuickFIX Support: http://www.quickfixengine.org/services.html > > Hello all, > > This is a follow up on a problem that I was having last year: > > quickfix seemingly disconnects from its peer without indicating why. > > We've upgraded our system to quickfix 1.9.4 in the hope of getting more > useful messages, but we don't appear to. I've had a look through the code > and I can't see quite how this could happen. However it does. > > To reiterate, at some random time quickfix disconnects the TCP seesion > from > its peer and logs this message in the event log: > > 20050201-13:33:27 : Dropped Connection > > This message indicates that quickfix initiated the disconnect, but it does > not say why. The inbound and outbound messages all look fine and usually > there has been a few seconds since the last message was sent anyway. > > What happens next is the usual reconnect, logon and resend request. > Everything continues after this. Incidentally, since upgrading from > quickfix > 1.4.0 to 1.9.4 this reconnect/resync is a whole order of magnitude better > behaved. > > However, the mysterious disconnect still occurs. > > Has anyone else seen anything like this? > Can anyone give me any suggestions as to how to track down the problem? > > We are running quickfix 1.9.4 on solaris 5.8 > quickfix was built with GCC 3.2.2 > We connect using SocketInitiator > > Thanks in advance, > barry > > > > Here are some excerpts from our logs: > > EVENT LOG > ========= > 20050201-13:33:27 : Dropped Connection > 20050201-13:33:29 : Connecting to XXX.XXX.XXX.XXX on port YYYY > 20050201-13:33:29 : Connection succeeded > 20050201-13:33:29 : Initiated logon request > 20050201-13:33:31 : Received logon response > > > INCOMING > ======== > The last message before disconnecting > 8=FIX.4.2|9=0183|35=R|115=2126|34=8349|49=CCCCCC|56=BBBBBB|52=20050201-13:32 > :49|122=20050201-13:33:23|116=10101010101010101|144=ZZZZZZZ|131=200502017287 > |146=1|55=BBBBBB|48=773670|22=108|38=100|10=146| > > The logon response > 8=FIX.4.2|9=0067|35=A|34=8351|49=CCCCCC|56=BBBBBB|52=20050201-13:32:54|98=0| > 108=30|10=004| > > > OUTGOING > ======== > The last message before disconnecting > 8=FIX.4.2|9=290|35=S|34=8232|49=BBBBBB|52=20050201-13:33:23.582|56=CCCCCC|12 > 8=2126|129=10101010101010101|145=ZZZZZZZ|22=108|48=773670|55=GSAMFFT|107=des > cription|117=id|131=txn|132=1|133=2|134=50000| > > The logon after the disconnect > 135=50000|167=OPT|200=200101|201=1|202=1.1|205=20|206=L|231=0.01|10=181| > 8=FIX.4.2|9=71|35=A|34=8233|49=GSAMFFT|52=20050201-13:33:29.210|56=CATSOS|98 > =0|108=30|10=098| > > > APPLICATION LOG > =============== > Tue Feb 1 13:33:23:528 GMT+00:00 2005|Received: > quickfix.fix42.QuoteRequest > Tue Feb 1 13:33:23:528 GMT+00:00 2005|toApp, > SessionID=FIX.4.2:BBBBBB->CCCCCC, Message=quickfix.fix42.Quote > Tue Feb 1 13:33:27:117 GMT+00:00 2005|onLogout, > SessionID=FIX.4.2:BBBBBB->CCCCCC > > > > -----Original Message----- > From: Bishop, Barry > Sent: Tuesday, November 30, 2004 08:11 AM > To: or...@qu... [mailto:or...@qu...] > Cc: 'qui...@li...' > Subject: RE: [Quickfix-developers] Intermittent disconnect problem > > Hello Oren, > > Thanks for the reply. > > Sounds to me like I should try version 1.9.2 or later in our production > environment. I have been unable to reproduce the mysterious disconnect in > our QA system to the same client, but this is not surprising as it is so > infrequent. I have been simulating it by breaking something else in the > chain (which would appear as a client disconnect) so this would explain > the > lack of an explanation from qdhÔuickfix. > > I will try this over the next few days and report back. > > Thanks again, > barry > > > -----Original Message----- > From: or...@qu... [mailto:or...@qu...] > Sent: Monday, November 29, 2004 7:56 PM > To: Bishop, Barry > Cc: 'qui...@li...' > Subject: RE: [Quickfix-developers] Intermittent disconnect problem > > > Barry, > > For every disconnect that QuickFIX initiates, there should be a reason > provided (not with 1.4.0, but with the new releases). With 1.9.4 > (available > now), QuickFIX also displays a "Dropped Connection" message if the > disconnect is initiated by the peer (1.9.2, does not differentiate). That > should help you to verify if it is QuickFIX that is initiating the > disconnect. I don't think there are any more cases where QuickFIX > initiates > a disconnect without providing a reason. If the couterparty drops the > connection, then unless they provide information in the form of a reject > or > logoff text, there is little QuickFIX can do to determine the cause. The > best that we can probably do is report whether the socket was dropped > gracefully, and therefore intentionally, or if it was an abnormal > disconnect > of some sort. > > Is there anything significantly different about this new client? Does > their > logs reveal anything about the nature of the disconnect? > > --oren > >> 1) Anyone have any idea what's going on? >> 2) Is there a way to increase the amount of detail in log messages, >> especially those to do with disconnection events? >> 3) What sort of thing would cause quickfix to disconnect without >> saying why? >> >> Thanks in advance, >> barry > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://productguide.itmanagersjournal.com/ > _______________________________________________ > Quickfix-developers mailing list Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfix-developers > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Quickfix-developers mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfix-developers > |
From: Bishop, B. <Bar...@gs...> - 2005-02-01 16:31:10
|
Hello Oren, I'm afraid there is no consistency as to when this happens. Sometimes it doesn't happen for a week, whereas it could be 8 times a week anywhere from 7:00AM to 9:00 PM. The amount of traffic is very low, maybe 1 or 2 messages per second at most. The outage doesn't last long and it's not a very big deal, but it would be nice to get it fixed. Can you confirm that if quickfix logs the message 'Dropped Connection' then it was quickfix that disconnected? I believe this to be the case, but it seems that the code only tests to see if a logout has been sent. Should quickfix have logged another message if the above is true? In the meantime, I will take John Perez's advice and have a good look through the last received message just in case this is related. However, the disconnect often occurs many seconds after the last message is sent or received. Thanks again, barry -----Original Message----- From: Oren Miller [mailto:or...@qu...] Sent: Tuesday, February 01, 2005 4:12 PM To: Bishop, Barry; qui...@li... Cc: 'Caleb Epstein' Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem Barry, Is there anything common about the times in which these disconnects occur? Is this a high frequency line? Is it possible you are overloading the socket buffer? --oren ----- Original Message ----- From: "Bishop, Barry" <Bar...@gs...> To: <qui...@li...> Cc: "Oren Miller" <or...@qu...>; "'Caleb Epstein'" <cal...@gm...> Sent: Tuesday, February 01, 2005 9:37 AM Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > QuickFIX Documentation: > http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX FAQ: http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ > QuickFIX Support: http://www.quickfixengine.org/services.html > > Hello all, > > This is a follow up on a problem that I was having last year: > > quickfix seemingly disconnects from its peer without indicating why. > > We've upgraded our system to quickfix 1.9.4 in the hope of getting > more useful messages, but we don't appear to. I've had a look through > the code and I can't see quite how this could happen. However it does. > > To reiterate, at some random time quickfix disconnects the TCP seesion > from > its peer and logs this message in the event log: > > 20050201-13:33:27 : Dropped Connection > > This message indicates that quickfix initiated the disconnect, but it > does not say why. The inbound and outbound messages all look fine and > usually there has been a few seconds since the last message was sent > anyway. > > What happens next is the usual reconnect, logon and resend request. > Everything continues after this. Incidentally, since upgrading from > quickfix 1.4.0 to 1.9.4 this reconnect/resync is a whole order of > magnitude better behaved. > > However, the mysterious disconnect still occurs. > > Has anyone else seen anything like this? > Can anyone give me any suggestions as to how to track down the > problem? > > We are running quickfix 1.9.4 on solaris 5.8 > quickfix was built with GCC 3.2.2 > We connect using SocketInitiator > > Thanks in advance, > barry > > > > Here are some excerpts from our logs: > > EVENT LOG > ========= > 20050201-13:33:27 : Dropped Connection > 20050201-13:33:29 : Connecting to XXX.XXX.XXX.XXX on port YYYY > 20050201-13:33:29 : Connection succeeded 20050201-13:33:29 : Initiated > logon request 20050201-13:33:31 : Received logon response > > > INCOMING > ======== > The last message before disconnecting > 8=FIX.4.2|9=0183|35=R|115=2126|34=8349|49=CCCCCC|56=BBBBBB|52=20050201 > -13:32 > :49|122=20050201-13:33:23|116=10101010101010101|144=ZZZZZZZ|131=200502017287 > |146=1|55=BBBBBB|48=773670|22=108|38=100|10=146| > > The logon response > 8=FIX.4.2|9=0067|35=A|34=8351|49=CCCCCC|56=BBBBBB|52=20050201-13:32:54 > |98=0| > 108=30|10=004| > > > OUTGOING > ======== > The last message before disconnecting > 8=FIX.4.2|9=290|35=S|34=8232|49=BBBBBB|52=20050201-13:33:23.582|56=CCC > CCC|12 > 8=2126|129=10101010101010101|145=ZZZZZZZ|22=108|48=773670|55=GSAMFFT|107=des > cription|117=id|131=txn|132=1|133=2|134=50000| > > The logon after the disconnect > 135=50000|167=OPT|200=200101|201=1|202=1.1|205=20|206=L|231=0.01|10=18 > 1| > 8=FIX.4.2|9=71|35=A|34=8233|49=GSAMFFT|52=20050201-13:33:29.210|56=CATSOS|98 > =0|108=30|10=098| > > > APPLICATION LOG > =============== > Tue Feb 1 13:33:23:528 GMT+00:00 2005|Received: > quickfix.fix42.QuoteRequest > Tue Feb 1 13:33:23:528 GMT+00:00 2005|toApp, > SessionID=FIX.4.2:BBBBBB->CCCCCC, Message=quickfix.fix42.Quote > Tue Feb 1 13:33:27:117 GMT+00:00 2005|onLogout, > SessionID=FIX.4.2:BBBBBB->CCCCCC > > > > -----Original Message----- > From: Bishop, Barry > Sent: Tuesday, November 30, 2004 08:11 AM > To: or...@qu... [mailto:or...@qu...] > Cc: 'qui...@li...' > Subject: RE: [Quickfix-developers] Intermittent disconnect problem > > Hello Oren, > > Thanks for the reply. > > Sounds to me like I should try version 1.9.2 or later in our > production environment. I have been unable to reproduce the mysterious > disconnect in our QA system to the same client, but this is not > surprising as it is so infrequent. I have been simulating it by > breaking something else in the chain (which would appear as a client > disconnect) so this would explain the lack of an explanation from > qdhÔuickfix. > > I will try this over the next few days and report back. > > Thanks again, > barry > > > -----Original Message----- > From: or...@qu... [mailto:or...@qu...] > Sent: Monday, November 29, 2004 7:56 PM > To: Bishop, Barry > Cc: 'qui...@li...' > Subject: RE: [Quickfix-developers] Intermittent disconnect problem > > > Barry, > > For every disconnect that QuickFIX initiates, there should be a reason > provided (not with 1.4.0, but with the new releases). With 1.9.4 > (available now), QuickFIX also displays a "Dropped Connection" message > if the disconnect is initiated by the peer (1.9.2, does not > differentiate). That should help you to verify if it is QuickFIX that > is initiating the disconnect. I don't think there are any more cases > where QuickFIX initiates > a disconnect without providing a reason. If the couterparty drops the > connection, then unless they provide information in the form of a reject > or > logoff text, there is little QuickFIX can do to determine the cause. The > best that we can probably do is report whether the socket was dropped > gracefully, and therefore intentionally, or if it was an abnormal > disconnect > of some sort. > > Is there anything significantly different about this new client? Does > their > logs reveal anything about the nature of the disconnect? > > --oren > >> 1) Anyone have any idea what's going on? >> 2) Is there a way to increase the amount of detail in log messages, >> especially those to do with disconnection events? >> 3) What sort of thing would cause quickfix to disconnect without >> saying why? >> >> Thanks in advance, >> barry > > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real > users. Discover which products truly live up to the hype. Start > reading now. http://productguide.itmanagersjournal.com/ > _______________________________________________ > Quickfix-developers mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfix-developers > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive > Reporting Tool for open source databases. Create drag-&-drop reports. > Save time by over 75%! Publish reports on the web. Export to DOC, XLS, > RTF, etc. Download a FREE copy at > http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Quickfix-developers mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfix-developers > |
From: Oren M. <or...@qu...> - 2005-02-01 17:25:29
|
Well, not necessarilly. "Dropped Connection" right now just means that the connection was droppen outside of a logout sequence. If QuickFIX knows the reason for this, it will be preceeded in the log for the reason, such as "Timed out waiting for heartbeat." The only time I can think of where QF would not know the exact reason for a disconnect is if the socket is either broken somehow, or closed by the counterparty. You can see in the SocketInitiator and SocketAcceptor onDisconnect methods, that Session::disconnect is being called. This is the only place where there is not an additional message that provides a disconnect reason. What we can do is start logging the error codes of the socket calls to get a more detailed analysis on what is hapenning with the socket. For instance, calling close can set the global error code to one of the following. EBADF The s argument is not an active descriptor. ECONNABORTED The connection was aborted by the remote endpoint. ECONNREFUSED The remote endpoint refused to continue the connection. ECONNRESET The remote endpoint reset the connection request. EDESTUNREACH Remote destination is now unreachable. EHOSTUNREACH Remote host is now unreachable. ENETDOWN Local network interface is down. ETIMEDOUT The connection timed out. I think that would get you the information you need to figure out the source of the disconnect. --oren ----- Original Message ----- From: "Bishop, Barry" <Bar...@gs...> To: "'Oren Miller'" <or...@qu...>; <qui...@li...> Cc: "'Caleb Epstein'" <cal...@gm...>; "'Perez, John'" <jp...@Cr...> Sent: Tuesday, February 01, 2005 10:30 AM Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > Hello Oren, > > I'm afraid there is no consistency as to when this happens. Sometimes it > doesn't happen for a week, whereas it could be 8 times a week anywhere > from > 7:00AM to 9:00 PM. > > The amount of traffic is very low, maybe 1 or 2 messages per second at > most. > > The outage doesn't last long and it's not a very big deal, but it would be > nice to get it fixed. > > Can you confirm that if quickfix logs the message 'Dropped Connection' > then > it was quickfix that disconnected? I believe this to be the case, but it > seems that the code only tests to see if a logout has been sent. > > Should quickfix have logged another message if the above is true? > > In the meantime, I will take John Perez's advice and have a good look > through the last received message just in case this is related. However, > the > disconnect often occurs many seconds after the last message is sent or > received. > > Thanks again, > barry > > > > -----Original Message----- > From: Oren Miller [mailto:or...@qu...] > Sent: Tuesday, February 01, 2005 4:12 PM > To: Bishop, Barry; qui...@li... > Cc: 'Caleb Epstein' > Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem > > > Barry, > > Is there anything common about the times in which these disconnects occur? > Is this a high frequency line? Is it possible you are overloading the > socket buffer? > > --oren > > ----- Original Message ----- > From: "Bishop, Barry" <Bar...@gs...> > To: <qui...@li...> > Cc: "Oren Miller" <or...@qu...>; "'Caleb Epstein'" > <cal...@gm...> > Sent: Tuesday, February 01, 2005 9:37 AM > Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > > >> QuickFIX Documentation: >> http://www.quickfixengine.org/quickfix/doc/html/index.html >> QuickFIX FAQ: http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ >> QuickFIX Support: http://www.quickfixengine.org/services.html >> >> Hello all, >> >> This is a follow up on a problem that I was having last year: >> >> quickfix seemingly disconnects from its peer without indicating why. >> >> We've upgraded our system to quickfix 1.9.4 in the hope of getting >> more useful messages, but we don't appear to. I've had a look through >> the code and I can't see quite how this could happen. However it does. >> >> To reiterate, at some random time quickfix disconnects the TCP seesion >> from >> its peer and logs this message in the event log: >> >> 20050201-13:33:27 : Dropped Connection >> >> This message indicates that quickfix initiated the disconnect, but it >> does not say why. The inbound and outbound messages all look fine and >> usually there has been a few seconds since the last message was sent >> anyway. >> >> What happens next is the usual reconnect, logon and resend request. >> Everything continues after this. Incidentally, since upgrading from >> quickfix 1.4.0 to 1.9.4 this reconnect/resync is a whole order of >> magnitude better behaved. >> >> However, the mysterious disconnect still occurs. >> >> Has anyone else seen anything like this? >> Can anyone give me any suggestions as to how to track down the >> problem? >> >> We are running quickfix 1.9.4 on solaris 5.8 >> quickfix was built with GCC 3.2.2 >> We connect using SocketInitiator >> >> Thanks in advance, >> barry >> >> >> >> Here are some excerpts from our logs: >> >> EVENT LOG >> ========= >> 20050201-13:33:27 : Dropped Connection >> 20050201-13:33:29 : Connecting to XXX.XXX.XXX.XXX on port YYYY >> 20050201-13:33:29 : Connection succeeded 20050201-13:33:29 : Initiated >> logon request 20050201-13:33:31 : Received logon response >> >> >> INCOMING >> ======== >> The last message before disconnecting >> 8=FIX.4.2|9=0183|35=R|115=2126|34=8349|49=CCCCCC|56=BBBBBB|52=20050201 >> -13:32 >> > :49|122=20050201-13:33:23|116=10101010101010101|144=ZZZZZZZ|131=200502017287 >> |146=1|55=BBBBBB|48=773670|22=108|38=100|10=146| >> >> The logon response >> 8=FIX.4.2|9=0067|35=A|34=8351|49=CCCCCC|56=BBBBBB|52=20050201-13:32:54 >> |98=0| >> 108=30|10=004| >> >> >> OUTGOING >> ======== >> The last message before disconnecting >> 8=FIX.4.2|9=290|35=S|34=8232|49=BBBBBB|52=20050201-13:33:23.582|56=CCC >> CCC|12 >> > 8=2126|129=10101010101010101|145=ZZZZZZZ|22=108|48=773670|55=GSAMFFT|107=des >> cription|117=id|131=txn|132=1|133=2|134=50000| >> >> The logon after the disconnect >> 135=50000|167=OPT|200=200101|201=1|202=1.1|205=20|206=L|231=0.01|10=18 >> 1| >> > 8=FIX.4.2|9=71|35=A|34=8233|49=GSAMFFT|52=20050201-13:33:29.210|56=CATSOS|98 >> =0|108=30|10=098| >> >> >> APPLICATION LOG >> =============== >> Tue Feb 1 13:33:23:528 GMT+00:00 2005|Received: >> quickfix.fix42.QuoteRequest >> Tue Feb 1 13:33:23:528 GMT+00:00 2005|toApp, >> SessionID=FIX.4.2:BBBBBB->CCCCCC, Message=quickfix.fix42.Quote >> Tue Feb 1 13:33:27:117 GMT+00:00 2005|onLogout, >> SessionID=FIX.4.2:BBBBBB->CCCCCC >> >> >> >> -----Original Message----- >> From: Bishop, Barry >> Sent: Tuesday, November 30, 2004 08:11 AM >> To: or...@qu... [mailto:or...@qu...] >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> Hello Oren, >> >> Thanks for the reply. >> >> Sounds to me like I should try version 1.9.2 or later in our >> production environment. I have been unable to reproduce the mysterious >> disconnect in our QA system to the same client, but this is not >> surprising as it is so infrequent. I have been simulating it by >> breaking something else in the chain (which would appear as a client >> disconnect) so this would explain the lack of an explanation from >> qdhÔuickfix. >> >> I will try this over the next few days and report back. >> >> Thanks again, >> barry >> >> >> -----Original Message----- >> From: or...@qu... [mailto:or...@qu...] >> Sent: Monday, November 29, 2004 7:56 PM >> To: Bishop, Barry >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> >> Barry, >> >> For every disconnect that QuickFIX initiates, there should be a reason >> provided (not with 1.4.0, but with the new releases). With 1.9.4 >> (available now), QuickFIX also displays a "Dropped Connection" message >> if the disconnect is initiated by the peer (1.9.2, does not >> differentiate). That should help you to verify if it is QuickFIX that >> is initiating the disconnect. I don't think there are any more cases >> where QuickFIX initiates >> a disconnect without providing a reason. If the couterparty drops the >> connection, then unless they provide information in the form of a reject >> or >> logoff text, there is little QuickFIX can do to determine the cause. The >> best that we can probably do is report whether the socket was dropped >> gracefully, and therefore intentionally, or if it was an abnormal >> disconnect >> of some sort. >> >> Is there anything significantly different about this new client? Does >> their >> logs reveal anything about the nature of the disconnect? >> >> --oren >> >>> 1) Anyone have any idea what's going on? >>> 2) Is there a way to increase the amount of detail in log messages, >>> especially those to do with disconnection events? >>> 3) What sort of thing would cause quickfix to disconnect without >>> saying why? >>> >>> Thanks in advance, >>> barry >> >> >> >> ------------------------------------------------------- >> SF email is sponsored by - The IT Product Guide >> Read honest & candid reviews on hundreds of IT Products from real >> users. Discover which products truly live up to the hype. Start >> reading now. http://productguide.itmanagersjournal.com/ >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: IntelliVIEW -- Interactive >> Reporting Tool for open source databases. Create drag-&-drop reports. >> Save time by over 75%! Publish reports on the web. Export to DOC, XLS, >> RTF, etc. Download a FREE copy at >> http://www.intelliview.com/go/osdn_nl >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> > |
From: Yihu F. <Yih...@re...> - 2005-02-01 17:59:23
|
Hi, I understand that this discussion is about initiator gets disconnected.=20 However, for a separate issue regarding QuickFIX acceptor, the acceptor sil= ently drops the connection without any error message at least in the follow= ing scenario. (see ThreadedSocketConnection.cpp::setSession()) (1) If the incoming message does not have correct session header. (2) If the acceptor already establishes a session, a second connection from= the same client tries to connect to the same port. It also silently drops connection if the incoming connection is out of sess= ion window (startTime/endTime etc). It will be very help that QuickFIX can provide appropriate error messages f= or these cases too. Thanks. -Yihu -----Original Message----- From: qui...@li... [mailto:quickfix-deve= lop...@li...] On Behalf Of Oren Miller Sent: Tuesday, February 01, 2005 12:25 PM To: Bishop, Barry; qui...@li... Cc: 'Caleb Epstein'; 'Perez, John' Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/ind= ex.html QuickFIX FAQ: http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ QuickFIX Support: http://www.quickfixengine.org/services.html Well, not necessarilly. "Dropped Connection" right now just means that the= connection was droppen outside of a logout sequence. If QuickFIX knows th= e=20 reason for this, it will be preceeded in the log for the reason, such as=20 "Timed out waiting for heartbeat." The only time I can think of where QF would not know the exact reason for a= disconnect is if the socket is either broken somehow, or closed by the=20 counterparty. You can see in the SocketInitiator and SocketAcceptor=20 onDisconnect methods, that Session::disconnect is being called. This is th= e=20 only place where there is not an additional message that provides a=20 disconnect reason. What we can do is start logging the error codes of the socket calls to get = a=20 more detailed analysis on what is hapenning with the socket. For instance,= calling close can set the global error code to one of the following. EBADF The s argument is not an active descriptor. ECONNABORTED The connection was aborted by the remote endpoint. ECONNREFUSED The remote endpoint refused to continue the connection. ECONNRESET The remote endpoint reset the connection request. EDESTUNREACH Remote destination is now unreachable. EHOSTUNREACH Remote host is now unreachable. ENETDOWN Local network interface is down. ETIMEDOUT The connection timed out. I think that would get you the information you need to figure out the sourc= e=20 of the disconnect. --oren ----- Original Message -----=20 From: "Bishop, Barry" <Bar...@gs...> To: "'Oren Miller'" <or...@qu...>;=20 <qui...@li...> Cc: "'Caleb Epstein'" <cal...@gm...>; "'Perez, John'"=20 <jp...@Cr...> Sent: Tuesday, February 01, 2005 10:30 AM Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > Hello Oren, > > I'm afraid there is no consistency as to when this happens. Sometimes it > doesn't happen for a week, whereas it could be 8 times a week anywhere=20 > from > 7:00AM to 9:00 PM. > > The amount of traffic is very low, maybe 1 or 2 messages per second at=20 > most. > > The outage doesn't last long and it's not a very big deal, but it would be > nice to get it fixed. > > Can you confirm that if quickfix logs the message 'Dropped Connection'=20 > then > it was quickfix that disconnected? I believe this to be the case, but it > seems that the code only tests to see if a logout has been sent. > > Should quickfix have logged another message if the above is true? > > In the meantime, I will take John Perez's advice and have a good look > through the last received message just in case this is related. However,= > the > disconnect often occurs many seconds after the last message is sent or > received. > > Thanks again, > barry > > > > -----Original Message----- > From: Oren Miller [mailto:or...@qu...] > Sent: Tuesday, February 01, 2005 4:12 PM > To: Bishop, Barry; qui...@li... > Cc: 'Caleb Epstein' > Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem > > > Barry, > > Is there anything common about the times in which these disconnects occur? > Is this a high frequency line? Is it possible you are overloading the > socket buffer? > > --oren > > ----- Original Message -----=20 > From: "Bishop, Barry" <Bar...@gs...> > To: <qui...@li...> > Cc: "Oren Miller" <or...@qu...>; "'Caleb Epstein'" > <cal...@gm...> > Sent: Tuesday, February 01, 2005 9:37 AM > Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > > >> QuickFIX Documentation: >> http://www.quickfixengine.org/quickfix/doc/html/index.html >> QuickFIX FAQ: http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ >> QuickFIX Support: http://www.quickfixengine.org/services.html >> >> Hello all, >> >> This is a follow up on a problem that I was having last year: >> >> quickfix seemingly disconnects from its peer without indicating why. >> >> We've upgraded our system to quickfix 1.9.4 in the hope of getting >> more useful messages, but we don't appear to. I've had a look through >> the code and I can't see quite how this could happen. However it does. >> >> To reiterate, at some random time quickfix disconnects the TCP seesion >> from >> its peer and logs this message in the event log: >> >> 20050201-13:33:27 : Dropped Connection >> >> This message indicates that quickfix initiated the disconnect, but it >> does not say why. The inbound and outbound messages all look fine and >> usually there has been a few seconds since the last message was sent >> anyway. >> >> What happens next is the usual reconnect, logon and resend request. >> Everything continues after this. Incidentally, since upgrading from >> quickfix 1.4.0 to 1.9.4 this reconnect/resync is a whole order of >> magnitude better behaved. >> >> However, the mysterious disconnect still occurs. >> >> Has anyone else seen anything like this? >> Can anyone give me any suggestions as to how to track down the >> problem? >> >> We are running quickfix 1.9.4 on solaris 5.8 >> quickfix was built with GCC 3.2.2 >> We connect using SocketInitiator >> >> Thanks in advance, >> barry >> >> >> >> Here are some excerpts from our logs: >> >> EVENT LOG >> =3D=3D=3D=3D=3D=3D=3D=3D=3D >> 20050201-13:33:27 : Dropped Connection >> 20050201-13:33:29 : Connecting to XXX.XXX.XXX.XXX on port YYYY >> 20050201-13:33:29 : Connection succeeded 20050201-13:33:29 : Initiated >> logon request 20050201-13:33:31 : Received logon response >> >> >> INCOMING >> =3D=3D=3D=3D=3D=3D=3D=3D >> The last message before disconnecting >> 8=3DFIX.4.2|9=3D0183|35=3DR|115=3D2126|34=3D8349|49=3DCCCCCC|56=3DBBBBBB= |52=3D20050201 >> -13:32 >> > :49|122=3D20050201-13:33:23|116=3D10101010101010101|144=3DZZZZZZZ|131=3D2= 00502017287 >> |146=3D1|55=3DBBBBBB|48=3D773670|22=3D108|38=3D100|10=3D146| >> >> The logon response >> 8=3DFIX.4.2|9=3D0067|35=3DA|34=3D8351|49=3DCCCCCC|56=3DBBBBBB|52=3D20050= 201-13:32:54 >> |98=3D0| >> 108=3D30|10=3D004| >> >> >> OUTGOING >> =3D=3D=3D=3D=3D=3D=3D=3D >> The last message before disconnecting >> 8=3DFIX.4.2|9=3D290|35=3DS|34=3D8232|49=3DBBBBBB|52=3D20050201-13:33:23.= 582|56=3DCCC >> CCC|12 >> > 8=3D2126|129=3D10101010101010101|145=3DZZZZZZZ|22=3D108|48=3D773670|55=3D= GSAMFFT|107=3Ddes >> cription|117=3Did|131=3Dtxn|132=3D1|133=3D2|134=3D50000| >> >> The logon after the disconnect >> 135=3D50000|167=3DOPT|200=3D200101|201=3D1|202=3D1.1|205=3D20|206=3DL|23= 1=3D0.01|10=3D18 >> 1| >> > 8=3DFIX.4.2|9=3D71|35=3DA|34=3D8233|49=3DGSAMFFT|52=3D20050201-13:33:29.2= 10|56=3DCATSOS|98 >> =3D0|108=3D30|10=3D098| >> >> >> APPLICATION LOG >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> Tue Feb 1 13:33:23:528 GMT+00:00 2005|Received: >> quickfix.fix42.QuoteRequest >> Tue Feb 1 13:33:23:528 GMT+00:00 2005|toApp, >> SessionID=3DFIX.4.2:BBBBBB->CCCCCC, Message=3Dquickfix.fix42.Quote >> Tue Feb 1 13:33:27:117 GMT+00:00 2005|onLogout, >> SessionID=3DFIX.4.2:BBBBBB->CCCCCC >> >> >> >> -----Original Message----- >> From: Bishop, Barry >> Sent: Tuesday, November 30, 2004 08:11 AM >> To: or...@qu... [mailto:or...@qu...] >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> Hello Oren, >> >> Thanks for the reply. >> >> Sounds to me like I should try version 1.9.2 or later in our >> production environment. I have been unable to reproduce the mysterious >> disconnect in our QA system to the same client, but this is not >> surprising as it is so infrequent. I have been simulating it by >> breaking something else in the chain (which would appear as a client >> disconnect) so this would explain the lack of an explanation from >> qdh=D4uickfix. >> >> I will try this over the next few days and report back. >> >> Thanks again, >> barry >> >> >> -----Original Message----- >> From: or...@qu... [mailto:or...@qu...] >> Sent: Monday, November 29, 2004 7:56 PM >> To: Bishop, Barry >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> >> Barry, >> >> For every disconnect that QuickFIX initiates, there should be a reason >> provided (not with 1.4.0, but with the new releases). With 1.9.4 >> (available now), QuickFIX also displays a "Dropped Connection" message >> if the disconnect is initiated by the peer (1.9.2, does not >> differentiate). That should help you to verify if it is QuickFIX that >> is initiating the disconnect. I don't think there are any more cases >> where QuickFIX initiates >> a disconnect without providing a reason. If the couterparty drops the >> connection, then unless they provide information in the form of a reject >> or >> logoff text, there is little QuickFIX can do to determine the cause. The >> best that we can probably do is report whether the socket was dropped >> gracefully, and therefore intentionally, or if it was an abnormal >> disconnect >> of some sort. >> >> Is there anything significantly different about this new client? Does >> their >> logs reveal anything about the nature of the disconnect? >> >> --oren >> >>> 1) Anyone have any idea what's going on? >>> 2) Is there a way to increase the amount of detail in log messages, >>> especially those to do with disconnection events? >>> 3) What sort of thing would cause quickfix to disconnect without >>> saying why? >>> >>> Thanks in advance, >>> barry >> >> >> >> ------------------------------------------------------- >> SF email is sponsored by - The IT Product Guide >> Read honest & candid reviews on hundreds of IT Products from real >> users. Discover which products truly live up to the hype. Start >> reading now. http://productguide.itmanagersjournal.com/ >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: IntelliVIEW -- Interactive >> Reporting Tool for open source databases. Create drag-&-drop reports. >> Save time by over 75%! Publish reports on the web. Export to DOC, XLS, >> RTF, etc. Download a FREE copy at >> http://www.intelliview.com/go/osdn_nl >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> >=20 ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Quickfix-developers mailing list Qui...@li... https://lists.sourceforge.net/lists/listinfo/quickfix-developers ----------------------------------------------------------------- Visit our Internet site at http://www.reuters.com Get closer to the financial markets with Reuters Messaging - for more information and to register, visit http://www.reuters.com/messaging Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd. |
From: Oren M. <or...@qu...> - 2005-02-01 18:31:24
|
Yeah, for the first scenario we will be implementing a global logger. Right now as you know all loggers are associated with a specific session, so if there is something of interest that cannot be associated with a session, it doesn't have a place to report it. The second scenario will be easy to implement since we can place the duplicate logon attempt into the original sessions log. start/end time logging will also be easy to implement. --oren ----- Original Message ----- From: "Yihu Fang" <Yih...@re...> To: "Oren Miller" <or...@qu...>; "Bishop, Barry" <Bar...@gs...>; <qui...@li...> Cc: "Caleb Epstein" <cal...@gm...>; "Perez, John" <jp...@Cr...> Sent: Tuesday, February 01, 2005 11:50 AM Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem Hi, I understand that this discussion is about initiator gets disconnected. However, for a separate issue regarding QuickFIX acceptor, the acceptor silently drops the connection without any error message at least in the following scenario. (see ThreadedSocketConnection.cpp::setSession()) (1) If the incoming message does not have correct session header. (2) If the acceptor already establishes a session, a second connection from the same client tries to connect to the same port. It also silently drops connection if the incoming connection is out of session window (startTime/endTime etc). It will be very help that QuickFIX can provide appropriate error messages for these cases too. Thanks. -Yihu -----Original Message----- From: qui...@li... [mailto:qui...@li...] On Behalf Of Oren Miller Sent: Tuesday, February 01, 2005 12:25 PM To: Bishop, Barry; qui...@li... Cc: 'Caleb Epstein'; 'Perez, John' Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/index.html QuickFIX FAQ: http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ QuickFIX Support: http://www.quickfixengine.org/services.html Well, not necessarilly. "Dropped Connection" right now just means that the connection was droppen outside of a logout sequence. If QuickFIX knows the reason for this, it will be preceeded in the log for the reason, such as "Timed out waiting for heartbeat." The only time I can think of where QF would not know the exact reason for a disconnect is if the socket is either broken somehow, or closed by the counterparty. You can see in the SocketInitiator and SocketAcceptor onDisconnect methods, that Session::disconnect is being called. This is the only place where there is not an additional message that provides a disconnect reason. What we can do is start logging the error codes of the socket calls to get a more detailed analysis on what is hapenning with the socket. For instance, calling close can set the global error code to one of the following. EBADF The s argument is not an active descriptor. ECONNABORTED The connection was aborted by the remote endpoint. ECONNREFUSED The remote endpoint refused to continue the connection. ECONNRESET The remote endpoint reset the connection request. EDESTUNREACH Remote destination is now unreachable. EHOSTUNREACH Remote host is now unreachable. ENETDOWN Local network interface is down. ETIMEDOUT The connection timed out. I think that would get you the information you need to figure out the source of the disconnect. --oren ----- Original Message ----- From: "Bishop, Barry" <Bar...@gs...> To: "'Oren Miller'" <or...@qu...>; <qui...@li...> Cc: "'Caleb Epstein'" <cal...@gm...>; "'Perez, John'" <jp...@Cr...> Sent: Tuesday, February 01, 2005 10:30 AM Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > Hello Oren, > > I'm afraid there is no consistency as to when this happens. Sometimes it > doesn't happen for a week, whereas it could be 8 times a week anywhere > from > 7:00AM to 9:00 PM. > > The amount of traffic is very low, maybe 1 or 2 messages per second at > most. > > The outage doesn't last long and it's not a very big deal, but it would be > nice to get it fixed. > > Can you confirm that if quickfix logs the message 'Dropped Connection' > then > it was quickfix that disconnected? I believe this to be the case, but it > seems that the code only tests to see if a logout has been sent. > > Should quickfix have logged another message if the above is true? > > In the meantime, I will take John Perez's advice and have a good look > through the last received message just in case this is related. However, > > the > disconnect often occurs many seconds after the last message is sent or > received. > > Thanks again, > barry > > > > -----Original Message----- > From: Oren Miller [mailto:or...@qu...] > Sent: Tuesday, February 01, 2005 4:12 PM > To: Bishop, Barry; qui...@li... > Cc: 'Caleb Epstein' > Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem > > > Barry, > > Is there anything common about the times in which these disconnects occur? > Is this a high frequency line? Is it possible you are overloading the > socket buffer? > > --oren > > ----- Original Message ----- > From: "Bishop, Barry" <Bar...@gs...> > To: <qui...@li...> > Cc: "Oren Miller" <or...@qu...>; "'Caleb Epstein'" > <cal...@gm...> > Sent: Tuesday, February 01, 2005 9:37 AM > Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > > >> QuickFIX Documentation: >> http://www.quickfixengine.org/quickfix/doc/html/index.html >> QuickFIX FAQ: http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ >> QuickFIX Support: http://www.quickfixengine.org/services.html >> >> Hello all, >> >> This is a follow up on a problem that I was having last year: >> >> quickfix seemingly disconnects from its peer without indicating why. >> >> We've upgraded our system to quickfix 1.9.4 in the hope of getting >> more useful messages, but we don't appear to. I've had a look through >> the code and I can't see quite how this could happen. However it does. >> >> To reiterate, at some random time quickfix disconnects the TCP seesion >> from >> its peer and logs this message in the event log: >> >> 20050201-13:33:27 : Dropped Connection >> >> This message indicates that quickfix initiated the disconnect, but it >> does not say why. The inbound and outbound messages all look fine and >> usually there has been a few seconds since the last message was sent >> anyway. >> >> What happens next is the usual reconnect, logon and resend request. >> Everything continues after this. Incidentally, since upgrading from >> quickfix 1.4.0 to 1.9.4 this reconnect/resync is a whole order of >> magnitude better behaved. >> >> However, the mysterious disconnect still occurs. >> >> Has anyone else seen anything like this? >> Can anyone give me any suggestions as to how to track down the >> problem? >> >> We are running quickfix 1.9.4 on solaris 5.8 >> quickfix was built with GCC 3.2.2 >> We connect using SocketInitiator >> >> Thanks in advance, >> barry >> >> >> >> Here are some excerpts from our logs: >> >> EVENT LOG >> ========= >> 20050201-13:33:27 : Dropped Connection >> 20050201-13:33:29 : Connecting to XXX.XXX.XXX.XXX on port YYYY >> 20050201-13:33:29 : Connection succeeded 20050201-13:33:29 : Initiated >> logon request 20050201-13:33:31 : Received logon response >> >> >> INCOMING >> ======== >> The last message before disconnecting >> 8=FIX.4.2|9=0183|35=R|115=2126|34=8349|49=CCCCCC|56=BBBBBB|52=20050201 >> -13:32 >> > :49|122=20050201-13:33:23|116=10101010101010101|144=ZZZZZZZ|131=200502017287 >> |146=1|55=BBBBBB|48=773670|22=108|38=100|10=146| >> >> The logon response >> 8=FIX.4.2|9=0067|35=A|34=8351|49=CCCCCC|56=BBBBBB|52=20050201-13:32:54 >> |98=0| >> 108=30|10=004| >> >> >> OUTGOING >> ======== >> The last message before disconnecting >> 8=FIX.4.2|9=290|35=S|34=8232|49=BBBBBB|52=20050201-13:33:23.582|56=CCC >> CCC|12 >> > 8=2126|129=10101010101010101|145=ZZZZZZZ|22=108|48=773670|55=GSAMFFT|107=des >> cription|117=id|131=txn|132=1|133=2|134=50000| >> >> The logon after the disconnect >> 135=50000|167=OPT|200=200101|201=1|202=1.1|205=20|206=L|231=0.01|10=18 >> 1| >> > 8=FIX.4.2|9=71|35=A|34=8233|49=GSAMFFT|52=20050201-13:33:29.210|56=CATSOS|98 >> =0|108=30|10=098| >> >> >> APPLICATION LOG >> =============== >> Tue Feb 1 13:33:23:528 GMT+00:00 2005|Received: >> quickfix.fix42.QuoteRequest >> Tue Feb 1 13:33:23:528 GMT+00:00 2005|toApp, >> SessionID=FIX.4.2:BBBBBB->CCCCCC, Message=quickfix.fix42.Quote >> Tue Feb 1 13:33:27:117 GMT+00:00 2005|onLogout, >> SessionID=FIX.4.2:BBBBBB->CCCCCC >> >> >> >> -----Original Message----- >> From: Bishop, Barry >> Sent: Tuesday, November 30, 2004 08:11 AM >> To: or...@qu... [mailto:or...@qu...] >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> Hello Oren, >> >> Thanks for the reply. >> >> Sounds to me like I should try version 1.9.2 or later in our >> production environment. I have been unable to reproduce the mysterious >> disconnect in our QA system to the same client, but this is not >> surprising as it is so infrequent. I have been simulating it by >> breaking something else in the chain (which would appear as a client >> disconnect) so this would explain the lack of an explanation from >> qdhÔuickfix. >> >> I will try this over the next few days and report back. >> >> Thanks again, >> barry >> >> >> -----Original Message----- >> From: or...@qu... [mailto:or...@qu...] >> Sent: Monday, November 29, 2004 7:56 PM >> To: Bishop, Barry >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> >> Barry, >> >> For every disconnect that QuickFIX initiates, there should be a reason >> provided (not with 1.4.0, but with the new releases). With 1.9.4 >> (available now), QuickFIX also displays a "Dropped Connection" message >> if the disconnect is initiated by the peer (1.9.2, does not >> differentiate). That should help you to verify if it is QuickFIX that >> is initiating the disconnect. I don't think there are any more cases >> where QuickFIX initiates >> a disconnect without providing a reason. If the couterparty drops the >> connection, then unless they provide information in the form of a reject >> or >> logoff text, there is little QuickFIX can do to determine the cause. The >> best that we can probably do is report whether the socket was dropped >> gracefully, and therefore intentionally, or if it was an abnormal >> disconnect >> of some sort. >> >> Is there anything significantly different about this new client? Does >> their >> logs reveal anything about the nature of the disconnect? >> >> --oren >> >>> 1) Anyone have any idea what's going on? >>> 2) Is there a way to increase the amount of detail in log messages, >>> especially those to do with disconnection events? >>> 3) What sort of thing would cause quickfix to disconnect without >>> saying why? >>> >>> Thanks in advance, >>> barry >> >> >> >> ------------------------------------------------------- >> SF email is sponsored by - The IT Product Guide >> Read honest & candid reviews on hundreds of IT Products from real >> users. Discover which products truly live up to the hype. Start >> reading now. http://productguide.itmanagersjournal.com/ >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: IntelliVIEW -- Interactive >> Reporting Tool for open source databases. Create drag-&-drop reports. >> Save time by over 75%! Publish reports on the web. Export to DOC, XLS, >> RTF, etc. Download a FREE copy at >> http://www.intelliview.com/go/osdn_nl >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> > ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Quickfix-developers mailing list Qui...@li... https://lists.sourceforge.net/lists/listinfo/quickfix-developers ----------------------------------------------------------------- Visit our Internet site at http://www.reuters.com Get closer to the financial markets with Reuters Messaging - for more information and to register, visit http://www.reuters.com/messaging Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd. |
From: Yihu F. <Yih...@re...> - 2005-02-01 18:39:23
|
Sounds good. Another alternative is to provide a callback (e.g. onError()) in the Applic= ation interface so that the application has better control or view of what = is really going on in the FIX engine. -Yihu -----Original Message----- From: Oren Miller [mailto:or...@qu...]=20 Sent: Tuesday, February 01, 2005 1:31 PM To: Yihu Fang; Bishop, Barry; qui...@li... Cc: Caleb Epstein; Perez, John Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem Yeah, for the first scenario we will be implementing a global logger. Righ= t=20 now as you know all loggers are associated with a specific session, so if= there is something of interest that cannot be associated with a session, i= t=20 doesn't have a place to report it. The second scenario will be easy to implement since we can place the=20 duplicate logon attempt into the original sessions log. start/end time=20 logging will also be easy to implement. --oren ----- Original Message -----=20 From: "Yihu Fang" <Yih...@re...> To: "Oren Miller" <or...@qu...>; "Bishop, Barry"=20 <Bar...@gs...>; <qui...@li...> Cc: "Caleb Epstein" <cal...@gm...>; "Perez, John"=20 <jp...@Cr...> Sent: Tuesday, February 01, 2005 11:50 AM Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem Hi, I understand that this discussion is about initiator gets disconnected. However, for a separate issue regarding QuickFIX acceptor, the acceptor=20 silently drops the connection without any error message at least in the=20 following scenario. (see ThreadedSocketConnection.cpp::setSession()) (1) If the incoming message does not have correct session header. (2) If the acceptor already establishes a session, a second connection from= the same client tries to connect to the same port. It also silently drops connection if the incoming connection is out of=20 session window (startTime/endTime etc). It will be very help that QuickFIX can provide appropriate error messages= for these cases too. Thanks. -Yihu -----Original Message----- From: qui...@li...=20 [mailto:qui...@li...] On Behalf Of Oren= Miller Sent: Tuesday, February 01, 2005 12:25 PM To: Bishop, Barry; qui...@li... Cc: 'Caleb Epstein'; 'Perez, John' Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem QuickFIX Documentation:=20 http://www.quickfixengine.org/quickfix/doc/html/index.html QuickFIX FAQ: http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ QuickFIX Support: http://www.quickfixengine.org/services.html Well, not necessarilly. "Dropped Connection" right now just means that the= connection was droppen outside of a logout sequence. If QuickFIX knows the reason for this, it will be preceeded in the log for the reason, such as "Timed out waiting for heartbeat." The only time I can think of where QF would not know the exact reason for a= disconnect is if the socket is either broken somehow, or closed by the counterparty. You can see in the SocketInitiator and SocketAcceptor onDisconnect methods, that Session::disconnect is being called. This is the only place where there is not an additional message that provides a disconnect reason. What we can do is start logging the error codes of the socket calls to get a more detailed analysis on what is hapenning with the socket. For instance,= calling close can set the global error code to one of the following. EBADF The s argument is not an active descriptor. ECONNABORTED The connection was aborted by the remote endpoint. ECONNREFUSED The remote endpoint refused to continue the connection. ECONNRESET The remote endpoint reset the connection request. EDESTUNREACH Remote destination is now unreachable. EHOSTUNREACH Remote host is now unreachable. ENETDOWN Local network interface is down. ETIMEDOUT The connection timed out. I think that would get you the information you need to figure out the source of the disconnect. --oren ----- Original Message -----=20 From: "Bishop, Barry" <Bar...@gs...> To: "'Oren Miller'" <or...@qu...>; <qui...@li...> Cc: "'Caleb Epstein'" <cal...@gm...>; "'Perez, John'" <jp...@Cr...> Sent: Tuesday, February 01, 2005 10:30 AM Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > Hello Oren, > > I'm afraid there is no consistency as to when this happens. Sometimes it > doesn't happen for a week, whereas it could be 8 times a week anywhere > from > 7:00AM to 9:00 PM. > > The amount of traffic is very low, maybe 1 or 2 messages per second at > most. > > The outage doesn't last long and it's not a very big deal, but it would be > nice to get it fixed. > > Can you confirm that if quickfix logs the message 'Dropped Connection' > then > it was quickfix that disconnected? I believe this to be the case, but it > seems that the code only tests to see if a logout has been sent. > > Should quickfix have logged another message if the above is true? > > In the meantime, I will take John Perez's advice and have a good look > through the last received message just in case this is related. However, = >=20 > the > disconnect often occurs many seconds after the last message is sent or > received. > > Thanks again, > barry > > > > -----Original Message----- > From: Oren Miller [mailto:or...@qu...] > Sent: Tuesday, February 01, 2005 4:12 PM > To: Bishop, Barry; qui...@li... > Cc: 'Caleb Epstein' > Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem > > > Barry, > > Is there anything common about the times in which these disconnects occur? > Is this a high frequency line? Is it possible you are overloading the > socket buffer? > > --oren > > ----- Original Message -----=20 > From: "Bishop, Barry" <Bar...@gs...> > To: <qui...@li...> > Cc: "Oren Miller" <or...@qu...>; "'Caleb Epstein'" > <cal...@gm...> > Sent: Tuesday, February 01, 2005 9:37 AM > Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > > >> QuickFIX Documentation: >> http://www.quickfixengine.org/quickfix/doc/html/index.html >> QuickFIX FAQ: http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ >> QuickFIX Support: http://www.quickfixengine.org/services.html >> >> Hello all, >> >> This is a follow up on a problem that I was having last year: >> >> quickfix seemingly disconnects from its peer without indicating why. >> >> We've upgraded our system to quickfix 1.9.4 in the hope of getting >> more useful messages, but we don't appear to. I've had a look through >> the code and I can't see quite how this could happen. However it does. >> >> To reiterate, at some random time quickfix disconnects the TCP seesion >> from >> its peer and logs this message in the event log: >> >> 20050201-13:33:27 : Dropped Connection >> >> This message indicates that quickfix initiated the disconnect, but it >> does not say why. The inbound and outbound messages all look fine and >> usually there has been a few seconds since the last message was sent >> anyway. >> >> What happens next is the usual reconnect, logon and resend request. >> Everything continues after this. Incidentally, since upgrading from >> quickfix 1.4.0 to 1.9.4 this reconnect/resync is a whole order of >> magnitude better behaved. >> >> However, the mysterious disconnect still occurs. >> >> Has anyone else seen anything like this? >> Can anyone give me any suggestions as to how to track down the >> problem? >> >> We are running quickfix 1.9.4 on solaris 5.8 >> quickfix was built with GCC 3.2.2 >> We connect using SocketInitiator >> >> Thanks in advance, >> barry >> >> >> >> Here are some excerpts from our logs: >> >> EVENT LOG >> =3D=3D=3D=3D=3D=3D=3D=3D=3D >> 20050201-13:33:27 : Dropped Connection >> 20050201-13:33:29 : Connecting to XXX.XXX.XXX.XXX on port YYYY >> 20050201-13:33:29 : Connection succeeded 20050201-13:33:29 : Initiated >> logon request 20050201-13:33:31 : Received logon response >> >> >> INCOMING >> =3D=3D=3D=3D=3D=3D=3D=3D >> The last message before disconnecting >> 8=3DFIX.4.2|9=3D0183|35=3DR|115=3D2126|34=3D8349|49=3DCCCCCC|56=3DBBBBBB= |52=3D20050201 >> -13:32 >> > :49|122=3D20050201-13:33:23|116=3D10101010101010101|144=3DZZZZZZZ|131=3D2= 00502017287 >> |146=3D1|55=3DBBBBBB|48=3D773670|22=3D108|38=3D100|10=3D146| >> >> The logon response >> 8=3DFIX.4.2|9=3D0067|35=3DA|34=3D8351|49=3DCCCCCC|56=3DBBBBBB|52=3D20050= 201-13:32:54 >> |98=3D0| >> 108=3D30|10=3D004| >> >> >> OUTGOING >> =3D=3D=3D=3D=3D=3D=3D=3D >> The last message before disconnecting >> 8=3DFIX.4.2|9=3D290|35=3DS|34=3D8232|49=3DBBBBBB|52=3D20050201-13:33:23.= 582|56=3DCCC >> CCC|12 >> > 8=3D2126|129=3D10101010101010101|145=3DZZZZZZZ|22=3D108|48=3D773670|55=3D= GSAMFFT|107=3Ddes >> cription|117=3Did|131=3Dtxn|132=3D1|133=3D2|134=3D50000| >> >> The logon after the disconnect >> 135=3D50000|167=3DOPT|200=3D200101|201=3D1|202=3D1.1|205=3D20|206=3DL|23= 1=3D0.01|10=3D18 >> 1| >> > 8=3DFIX.4.2|9=3D71|35=3DA|34=3D8233|49=3DGSAMFFT|52=3D20050201-13:33:29.2= 10|56=3DCATSOS|98 >> =3D0|108=3D30|10=3D098| >> >> >> APPLICATION LOG >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> Tue Feb 1 13:33:23:528 GMT+00:00 2005|Received: >> quickfix.fix42.QuoteRequest >> Tue Feb 1 13:33:23:528 GMT+00:00 2005|toApp, >> SessionID=3DFIX.4.2:BBBBBB->CCCCCC, Message=3Dquickfix.fix42.Quote >> Tue Feb 1 13:33:27:117 GMT+00:00 2005|onLogout, >> SessionID=3DFIX.4.2:BBBBBB->CCCCCC >> >> >> >> -----Original Message----- >> From: Bishop, Barry >> Sent: Tuesday, November 30, 2004 08:11 AM >> To: or...@qu... [mailto:or...@qu...] >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> Hello Oren, >> >> Thanks for the reply. >> >> Sounds to me like I should try version 1.9.2 or later in our >> production environment. I have been unable to reproduce the mysterious >> disconnect in our QA system to the same client, but this is not >> surprising as it is so infrequent. I have been simulating it by >> breaking something else in the chain (which would appear as a client >> disconnect) so this would explain the lack of an explanation from >> qdh=D4uickfix. >> >> I will try this over the next few days and report back. >> >> Thanks again, >> barry >> >> >> -----Original Message----- >> From: or...@qu... [mailto:or...@qu...] >> Sent: Monday, November 29, 2004 7:56 PM >> To: Bishop, Barry >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> >> Barry, >> >> For every disconnect that QuickFIX initiates, there should be a reason >> provided (not with 1.4.0, but with the new releases). With 1.9.4 >> (available now), QuickFIX also displays a "Dropped Connection" message >> if the disconnect is initiated by the peer (1.9.2, does not >> differentiate). That should help you to verify if it is QuickFIX that >> is initiating the disconnect. I don't think there are any more cases >> where QuickFIX initiates >> a disconnect without providing a reason. If the couterparty drops the >> connection, then unless they provide information in the form of a reject >> or >> logoff text, there is little QuickFIX can do to determine the cause. The >> best that we can probably do is report whether the socket was dropped >> gracefully, and therefore intentionally, or if it was an abnormal >> disconnect >> of some sort. >> >> Is there anything significantly different about this new client? Does >> their >> logs reveal anything about the nature of the disconnect? >> >> --oren >> >>> 1) Anyone have any idea what's going on? >>> 2) Is there a way to increase the amount of detail in log messages, >>> especially those to do with disconnection events? >>> 3) What sort of thing would cause quickfix to disconnect without >>> saying why? >>> >>> Thanks in advance, >>> barry >> >> >> >> ------------------------------------------------------- >> SF email is sponsored by - The IT Product Guide >> Read honest & candid reviews on hundreds of IT Products from real >> users. Discover which products truly live up to the hype. Start >> reading now. http://productguide.itmanagersjournal.com/ >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: IntelliVIEW -- Interactive >> Reporting Tool for open source databases. Create drag-&-drop reports. >> Save time by over 75%! Publish reports on the web. Export to DOC, XLS, >> RTF, etc. Download a FREE copy at >> http://www.intelliview.com/go/osdn_nl >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> > ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ Quickfix-developers mailing list Qui...@li... https://lists.sourceforge.net/lists/listinfo/quickfix-developers ----------------------------------------------------------------- Visit our Internet site at http://www.reuters.com Get closer to the financial markets with Reuters Messaging - for more information and to register, visit http://www.reuters.com/messaging Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd. ----------------------------------------------------------------- Visit our Internet site at http://www.reuters.com Get closer to the financial markets with Reuters Messaging - for more information and to register, visit http://www.reuters.com/messaging Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd. |
From: Bishop, B. <Bar...@gs...> - 2005-02-02 10:03:31
|
Hi Oren, Thanks again for your response. I am definitely of the opinion that nothing should be thrown away, especially regarding useful information about a runtime event/error. So yes, it's a great idea to trap ERRNO around every socket call. To propogate this up to the session would mean starting at a low level. I was surprised to find the call to recv() (and also socket_fionread) in Parser, but that's where it would have to start. Regards, barry -----Original Message----- From: Oren Miller [mailto:or...@qu...] Sent: Tuesday, February 01, 2005 5:25 PM To: Bishop, Barry; qui...@li... Cc: 'Caleb Epstein'; 'Perez, John' Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem Well, not necessarilly. "Dropped Connection" right now just means that the connection was droppen outside of a logout sequence. If QuickFIX knows the reason for this, it will be preceeded in the log for the reason, such as "Timed out waiting for heartbeat." The only time I can think of where QF would not know the exact reason for a disconnect is if the socket is either broken somehow, or closed by the counterparty. You can see in the SocketInitiator and SocketAcceptor onDisconnect methods, that Session::disconnect is being called. This is the only place where there is not an additional message that provides a disconnect reason. What we can do is start logging the error codes of the socket calls to get a more detailed analysis on what is hapenning with the socket. For instance, calling close can set the global error code to one of the following. EBADF The s argument is not an active descriptor. ECONNABORTED The connection was aborted by the remote endpoint. ECONNREFUSED The remote endpoint refused to continue the connection. ECONNRESET The remote endpoint reset the connection request. EDESTUNREACH Remote destination is now unreachable. EHOSTUNREACH Remote host is now unreachable. ENETDOWN Local network interface is down. ETIMEDOUT The connection timed out. I think that would get you the information you need to figure out the source of the disconnect. --oren ----- Original Message ----- From: "Bishop, Barry" <Bar...@gs...> To: "'Oren Miller'" <or...@qu...>; <qui...@li...> Cc: "'Caleb Epstein'" <cal...@gm...>; "'Perez, John'" <jp...@Cr...> Sent: Tuesday, February 01, 2005 10:30 AM Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > Hello Oren, > > I'm afraid there is no consistency as to when this happens. Sometimes > it doesn't happen for a week, whereas it could be 8 times a week > anywhere from 7:00AM to 9:00 PM. > > The amount of traffic is very low, maybe 1 or 2 messages per second at > most. > > The outage doesn't last long and it's not a very big deal, but it > would be nice to get it fixed. > > Can you confirm that if quickfix logs the message 'Dropped Connection' > then > it was quickfix that disconnected? I believe this to be the case, but it > seems that the code only tests to see if a logout has been sent. > > Should quickfix have logged another message if the above is true? > > In the meantime, I will take John Perez's advice and have a good look > through the last received message just in case this is related. > However, the disconnect often occurs many seconds after the last > message is sent or received. > > Thanks again, > barry > > > > -----Original Message----- > From: Oren Miller [mailto:or...@qu...] > Sent: Tuesday, February 01, 2005 4:12 PM > To: Bishop, Barry; qui...@li... > Cc: 'Caleb Epstein' > Subject: Re: [Quickfix-developers] RE: Intermittent disconnect problem > > > Barry, > > Is there anything common about the times in which these disconnects > occur? Is this a high frequency line? Is it possible you are > overloading the socket buffer? > > --oren > > ----- Original Message ----- > From: "Bishop, Barry" <Bar...@gs...> > To: <qui...@li...> > Cc: "Oren Miller" <or...@qu...>; "'Caleb Epstein'" > <cal...@gm...> > Sent: Tuesday, February 01, 2005 9:37 AM > Subject: RE: [Quickfix-developers] RE: Intermittent disconnect problem > > >> QuickFIX Documentation: >> http://www.quickfixengine.org/quickfix/doc/html/index.html >> QuickFIX FAQ: >> http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ >> QuickFIX Support: http://www.quickfixengine.org/services.html >> >> Hello all, >> >> This is a follow up on a problem that I was having last year: >> >> quickfix seemingly disconnects from its peer without indicating why. >> >> We've upgraded our system to quickfix 1.9.4 in the hope of getting >> more useful messages, but we don't appear to. I've had a look through >> the code and I can't see quite how this could happen. However it >> does. >> >> To reiterate, at some random time quickfix disconnects the TCP >> seesion from its peer and logs this message in the event log: >> >> 20050201-13:33:27 : Dropped Connection >> >> This message indicates that quickfix initiated the disconnect, but it >> does not say why. The inbound and outbound messages all look fine and >> usually there has been a few seconds since the last message was sent >> anyway. >> >> What happens next is the usual reconnect, logon and resend request. >> Everything continues after this. Incidentally, since upgrading from >> quickfix 1.4.0 to 1.9.4 this reconnect/resync is a whole order of >> magnitude better behaved. >> >> However, the mysterious disconnect still occurs. >> >> Has anyone else seen anything like this? >> Can anyone give me any suggestions as to how to track down the >> problem? >> >> We are running quickfix 1.9.4 on solaris 5.8 >> quickfix was built with GCC 3.2.2 >> We connect using SocketInitiator >> >> Thanks in advance, >> barry >> >> >> >> Here are some excerpts from our logs: >> >> EVENT LOG >> ========= >> 20050201-13:33:27 : Dropped Connection >> 20050201-13:33:29 : Connecting to XXX.XXX.XXX.XXX on port YYYY >> 20050201-13:33:29 : Connection succeeded 20050201-13:33:29 : >> Initiated logon request 20050201-13:33:31 : Received logon response >> >> >> INCOMING >> ======== >> The last message before disconnecting >> 8=FIX.4.2|9=0183|35=R|115=2126|34=8349|49=CCCCCC|56=BBBBBB|52=2005020 >> 1 >> -13:32 >> > :49|122=20050201-13:33:23|116=10101010101010101|144=ZZZZZZZ|131=200502 > 017287 >> |146=1|55=BBBBBB|48=773670|22=108|38=100|10=146| >> >> The logon response >> 8=FIX.4.2|9=0067|35=A|34=8351|49=CCCCCC|56=BBBBBB|52=20050201-13:32:5 >> 4 >> |98=0| >> 108=30|10=004| >> >> >> OUTGOING >> ======== >> The last message before disconnecting >> 8=FIX.4.2|9=290|35=S|34=8232|49=BBBBBB|52=20050201-13:33:23.582|56=CC >> C >> CCC|12 >> > 8=2126|129=10101010101010101|145=ZZZZZZZ|22=108|48=773670|55=GSAMFFT|1 > 07=des >> cription|117=id|131=txn|132=1|133=2|134=50000| >> >> The logon after the disconnect >> 135=50000|167=OPT|200=200101|201=1|202=1.1|205=20|206=L|231=0.01|10=1 >> 8 >> 1| >> > 8=FIX.4.2|9=71|35=A|34=8233|49=GSAMFFT|52=20050201-13:33:29.210|56=CAT > SOS|98 >> =0|108=30|10=098| >> >> >> APPLICATION LOG >> =============== >> Tue Feb 1 13:33:23:528 GMT+00:00 2005|Received: >> quickfix.fix42.QuoteRequest Tue Feb 1 13:33:23:528 GMT+00:00 >> 2005|toApp, SessionID=FIX.4.2:BBBBBB->CCCCCC, >> Message=quickfix.fix42.Quote Tue Feb 1 13:33:27:117 GMT+00:00 >> 2005|onLogout, SessionID=FIX.4.2:BBBBBB->CCCCCC >> >> >> >> -----Original Message----- >> From: Bishop, Barry >> Sent: Tuesday, November 30, 2004 08:11 AM >> To: or...@qu... [mailto:or...@qu...] >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> Hello Oren, >> >> Thanks for the reply. >> >> Sounds to me like I should try version 1.9.2 or later in our >> production environment. I have been unable to reproduce the >> mysterious disconnect in our QA system to the same client, but this >> is not surprising as it is so infrequent. I have been simulating it >> by breaking something else in the chain (which would appear as a >> client >> disconnect) so this would explain the lack of an explanation from >> qdhÔuickfix. >> >> I will try this over the next few days and report back. >> >> Thanks again, >> barry >> >> >> -----Original Message----- >> From: or...@qu... [mailto:or...@qu...] >> Sent: Monday, November 29, 2004 7:56 PM >> To: Bishop, Barry >> Cc: 'qui...@li...' >> Subject: RE: [Quickfix-developers] Intermittent disconnect problem >> >> >> Barry, >> >> For every disconnect that QuickFIX initiates, there should be a >> reason provided (not with 1.4.0, but with the new releases). With >> 1.9.4 (available now), QuickFIX also displays a "Dropped Connection" >> message if the disconnect is initiated by the peer (1.9.2, does not >> differentiate). That should help you to verify if it is QuickFIX >> that is initiating the disconnect. I don't think there are any more >> cases where QuickFIX initiates a disconnect without providing a >> reason. If the couterparty drops the connection, then unless they >> provide information in the form of a reject or >> logoff text, there is little QuickFIX can do to determine the cause. The >> best that we can probably do is report whether the socket was dropped >> gracefully, and therefore intentionally, or if it was an abnormal >> disconnect >> of some sort. >> >> Is there anything significantly different about this new client? >> Does their logs reveal anything about the nature of the disconnect? >> >> --oren >> >>> 1) Anyone have any idea what's going on? >>> 2) Is there a way to increase the amount of detail in log messages, >>> especially those to do with disconnection events? >>> 3) What sort of thing would cause quickfix to disconnect without >>> saying why? >>> >>> Thanks in advance, >>> barry >> >> >> >> ------------------------------------------------------- >> SF email is sponsored by - The IT Product Guide >> Read honest & candid reviews on hundreds of IT Products from real >> users. Discover which products truly live up to the hype. Start >> reading now. http://productguide.itmanagersjournal.com/ >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: IntelliVIEW -- Interactive >> Reporting Tool for open source databases. Create drag-&-drop reports. >> Save time by over 75%! Publish reports on the web. Export to DOC, >> XLS, RTF, etc. Download a FREE copy at >> http://www.intelliview.com/go/osdn_nl >> _______________________________________________ >> Quickfix-developers mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >> > |
From: Oren M. <or...@qu...> - 2005-02-02 15:19:37
|
Yeah, that's not a problem. The Parser throws a RecvFailed exception, so we would just need to return the error number in there. We can create a base SocketException which has a default constructor which will get the error name and number as well as the human readable text. Then RecvFailed can inherit from it and it the base object will automatically get populated with this information if available. --oren On Feb 2, 2005, at 4:03 AM, Bishop, Barry wrote: > So yes, it's a great idea to trap ERRNO around every socket call. To > propogate this up to the session would mean starting at a low level. I > was > surprised to find the call to recv() (and also socket_fionread) in > Parser, > but that's where it would have to start. |
From: <or...@qu...> - 2005-02-04 16:54:26
|
Hi Michael, Thanks. What version of QuickFIX are you using? I ask because in 1.9.0 we introduced a similar fix for this. Instead of calling onTimeout after the read call, the session is calling next() after it processes each message, which should have a similar effect. Was your fix applied to a 1.9.x version or an earlier one? --oren > -------- Original Message -------- > Subject: [Quickfix-developers] Re: Intermittent disconnect problem > From: "Michael Holm" <mh...@li...> > Date: Fri, February 04, 2005 4:42 am > To: qui...@li... > > I have seen a similar problem when connecting to SFE. They require that > I follow the heartbeat procedure defined in the Fix spec. with one > exception. At initial start-up time or at certain times throughout the > day they may blast a huge amount of data to me which needs to be > processed. The QuickFix engine gives priority to these messages and > starves the heartbeat time slice. So if I am processing these messages > for more then the agreed upon heartbeat period SFE will disconnect the > session on their side and then QuickFix re-establishes the session and > resyncs. And then they have to retransmit messages and as you can see I > will encounter a never ending loop of shit. So I modified the following > to code to stop QuickFix from starving the heartbeats. > > > > In the following method - void SocketInitiator::onData( SocketConnector& > connector, int s ) > > > > The current code: > > while( pSocketConnection->read( connector ) ) > > {} > > > > My modified version: > > while( pSocketConnection->read( connector ) ) > > { > > // Modified 11-26-03 by M. Holm. Because heartbeats are being starved! > > i->second->onTimeout(); > > } > > > > This will ensure that heartbeats will be sent when necessary according > to the SFE spec. > > > > You might be having a similar problem to the one I encountered. > > > > Hope this helps. > > > > Michael Holm > > Liquid Capital Markets Ltd > 11 Old Jewry > London EC2R 8DU > Tel:020 7726 3028 |
From: Bishop, B. <Bar...@gs...> - 2005-02-04 16:55:17
|
Hi Michael, Thanks for your suggestions, although I think you have a different problem to me. We never have sudden bursts of traffic (inbound or outbound) and sometimes we disconnect after 10 seconds of complete inactivity. In your case though, wouldn't it be better pass the inbound messages on to a separate thread. This would unload the FIX thread to carry doing what it does best, leaving you as much time as you need to process all the messages. It would also save patching the quickfix code for this specal case. Best of luck, barry -----Original Message----- From: qui...@li... [mailto:qui...@li...] On Behalf Of Michael Holm Sent: Friday, February 04, 2005 10:43 AM To: qui...@li... Subject: [Quickfix-developers] Re: Intermittent disconnect problem I have seen a similar problem when connecting to SFE. They require that I follow the heartbeat procedure defined in the Fix spec. with one exception. At initial start-up time or at certain times throughout the day they may blast a huge amount of data to me which needs to be processed. The QuickFix engine gives priority to these messages and starves the heartbeat time slice. So if I am processing these messages for more then the agreed upon heartbeat period SFE will disconnect the session on their side and then QuickFix re-establishes the session and resyncs. And then they have to retransmit messages and as you can see I will encounter a never ending loop of shit. So I modified the following to code to stop QuickFix from starving the heartbeats. In the following method - void SocketInitiator::onData( SocketConnector& connector, int s ) The current code: while( pSocketConnection->read( connector ) ) {} My modified version: while( pSocketConnection->read( connector ) ) { // Modified 11-26-03 by M. Holm. Because heartbeats are being starved! i->second->onTimeout(); } This will ensure that heartbeats will be sent when necessary according to the SFE spec. You might be having a similar problem to the one I encountered. Hope this helps. Michael Holm Liquid Capital Markets Ltd 11 Old Jewry London EC2R 8DU Tel:020 7726 3028 |
From: Caleb E. <cal...@gm...> - 2004-12-01 14:06:31
|
On Wed, 1 Dec 2004 13:51:41 -0000, Bishop, Barry <bar...@gs...> wrote: > If you don't mind me asking, did you have difficulties like this? > Maybe I am just incompetent. No, just a glutton for punishment :) Seriously though, gcc 2.95 is just too old if you want to compile standards-conforming C++ code. You should try the latest 3.3 or 3.4 version (we use 3.3.2 here with success on Linux and Solaris). You can drop STLport with these newer versions as well. Your life becomes much simpler, at the expense of needing to compile all your C++ code because name mangling changed. -- Caleb Epstein caleb dot epstein at gmail dot com |