Re: [Quickfix-developers] heartbeat gaps are longer than my HeartBeatInt
Brought to you by:
orenmnero
From: Rick L. <ric...@gm...> - 2009-10-15 21:54:07
|
Thanks again for everyones' input. I feel that the easiest way to change my TestRequest behavior was to modify the QuickFix code -- I feel that I can do this with two simple modifications in the SessionState.h file. I am hoping someone who is familiar with the source will tell me if they see anything horribly wrong with what I'm proposing -- the last thing I want to do is muck around in the QF code, but this might save me a lot of headaches trying to do it in my code callbacks. I'm modifying two functions timedOut() and needTestRequest(). Basically, I'm trying to get QF to issue a TestRequest after no responses for 1.2*HeartBtInt, and then issue *another *TestRequest after 1.4*HeartBtInt -- in other words, much more rapidly than it does by default (and then issue a disconnect after 2 TestRequests). bool needTestRequest() const { UtcTimeStamp now; // testRequest() will be 0 for the first occurrence, and 1 for the 2nd occurrence. if(testRequest() < 2){ return (now - lastReceivedTime() >= (1 + (0.2*(testRequest()+1))) * (double)heartBtInt()); } return false; } Basically I'm saying "if I haven't sent 2 TestRequests already then if it's been at least 1.2 times the HeartBtInt I should send a TestRequest (or 1.4 * HeartBtInt if this is the 2nd TestRequest). And then the other function I modify (to log off after 2 unanswered TestRequests): bool timedOut() const { UtcTimeStamp now; return ( now - lastReceivedTime() ) >= (1.6 * (double)heartBtInt()); } I will return true if at least 1.6 * HeartBtInt seconds have transpired. Does anyone see anything bad with this approach? Thanks, Rick On Thu, Oct 15, 2009 at 3:11 PM, John Vatianou <jva...@co...>wrote: > > > On Thu, Oct 15, 2009 at 2:53 PM, Rick Lane <ric...@gm...> wrote: > >> John, >> Thanks for the response. >> >> So you're suggesting I manually send the TestRequest messages? I was >> under the assumption the out-of-the-box QuickFix build would handle this >> situation for me. Is there a setting to denote how long to wait before >> sending the second TestRequest? >> >> You mentioned the CME would send a TestRequest if I wait the HeartBeatInt >> -- why would they? I thought they only sent it when I waited *longer* than >> the HeartBeatInt? >> >> Thanks again for your input. >> >> Rick >> >> On Thu, Oct 15, 2009 at 11:26 AM, John Vatianou <jva...@co...>wrote: >> >>> >>> I'm actually on Autocert+ as we speak. Here's a Heartbeat example from a >>> couple minutes ago... >>> >>> 20091015-16:17:54.473 : >>> 8=FIX.4.2|9=106|35=0|34=701|49=9G19Z1N|50=apama_user|52=20091015-16:17:54.469|56=CME|57=G|142=CHICAGO|112=ID1255623474437|10=134| >>> 20091015-16:18:24.941 : >>> 8=FIX.4.2|9=86|35=0|34=702|49=9G19Z1N|50=apama_user|52=20091015-16:18:24.941|56=CME|57=G|142=CHICAGO|10=083| >>> 20091015-16:18:25.238 : >>> 8=FIX.4.2|9=82|35=0|34=742|369=702|52=20091015-16:18:25.221|49=CME|50=G|56=9G19Z1N|57=apama_user|10=001| >>> 20091015-16:18:54.950 : >>> 8=FIX.4.2|9=86|35=0|34=703|49=9G19Z1N|50=apama_user|52=20091015-16:18:54.950|56=CME|57=G|142=CHICAGO|10=087| >>> 20091015-16:18:55.297 : >>> 8=FIX.4.2|9=82|35=0|34=743|369=703|52=20091015-16:18:55.280|49=CME|50=G|56=9G19Z1N|57=apama_user|10=011| >>> 20091015-16:19:24.953 : >>> 8=FIX.4.2|9=86|35=0|34=704|49=9G19Z1N|50=apama_user|52=20091015-16:19:24.953|56=CME|57=G|142=CHICAGO|10=089| >>> >>> So no, it's not exactly 30 seconds, but the CME is not complaining. >>> >>> We had some problems getting by this test too. According to CSET, you do >>> not have to wait a Heartbeat interval to send your two Test Requests and >>> Logout messages. The CME-side is going to be totally unresponsive during >>> this test, so your system has to drive the messaging to get past this test >>> suite. >>> >>> If you wait for the Heartbeat interval, you'll get the behavior that is >>> happening to you, where the CME sends a Test Request because the CME thinks >>> you're not there anymore. >>> >>> >>> On Thu, Oct 15, 2009 at 11:20 AM, Rick Lane <ric...@gm...> wrote: >>> >>>> John, >>>> I'm failing the *Verify Test Request Procedure/Methodology* test >>>> because the CME is sending me a TestRequest (because my heartbeat interval >>>> is 30, but there's a gap between my individual heartbeats that is 30.060 (so >>>> 30 seconds plus 60 milliseconds). The CME then sends me a TestRequest >>>> because 30 seconds have gone by and they didn't get a heartbeat. >>>> >>>> You're saying in your log files if you just let your connection >>>> heartbeat, the gap (in tag 52) between your heartbeat messages is exactly >>>> HeartBeatInt seconds? >>>> >>>> Rick >>>> >>>> >>>> On Thu, Oct 15, 2009 at 11:06 AM, John Vatianou < >>>> jva...@co...> wrote: >>>> >>>>> >>>>> Connamara just went through the Drop Copy certification and we're >>>>> currently going through one for an order-sending adapter. We're using >>>>> QuickFIX C++ and have not had any such issues. I'm curious, which Autocert+ >>>>> test are you having a problem with? Many of the tests require Heartbeats at >>>>> the end, but I don't know of a specific Heartbeat test. >>>>> >>>>> John Vatianou >>>>> Connamara Systems, LLC >>>>> 200 West Jackson Boulevard, Suite 1340 >>>>> Chicago, Illinois 60606 >>>>> Direct: 312-235-6772 >>>>> Email: jva...@co... >>>>> www.connamara.com >>>>> >>>>> >>>>> On Thu, Oct 15, 2009 at 10:46 AM, Rick Lane <ric...@gm...>wrote: >>>>> >>>>>> QuickFIX Documentation: >>>>>> http://www.quickfixengine.org/quickfix/doc/html/index.html >>>>>> QuickFIX Support: http://www.quickfixengine.org/services.html >>>>>> >>>>>> >>>>>> >>>>>> Greetings, >>>>>> I'm still having issues with my HeartBeatInt (and it's also affecting >>>>>> my TestRequest sending). The body of my toAdmin() function is below. >>>>>> >>>>>> So I thought that on my development laptop the 3 conditionals and the >>>>>> three calls to setField() might be adding some extra time, but 40 >>>>>> milliseconds??? So I tested this out on my production environment -- 8-core >>>>>> (nehalem) Dell R610, and sure enough, I'm still seeing these gaps. >>>>>> >>>>>> Unfortunately, there's no way I can test the speed if I *don't* make >>>>>> those 3 calls to setField() because the CME will reject the message w/out >>>>>> these fields. I sincerely hope that making these three calls isn't adding >>>>>> this much time. Does anyone have any thoughts? Does anyone see similar >>>>>> behavior (I would think that the gap between heartbeats should be >>>>>> HeartBeatInt plus *maybe *1 or 2 ms). Is this simply the result of >>>>>> .NET managed to unmanaged calls within QuickFix (please say no)? >>>>>> >>>>>> public void toAdmin(QuickFix.Message __p1, SessionID __p2) >>>>>> { >>>>>> if (__p1 is Logon) { >>>>>> ILinkSession session = null; >>>>>> if (m_SessionMap.TryGetValue(__p2.getSenderCompID(), out >>>>>> session)) { >>>>>> __p1.setField(new IntField(95, >>>>>> session.sessionPassword.Length)); >>>>>> __p1.setField(new StringField(96, >>>>>> session.sessionPassword)); >>>>>> } >>>>>> } else if (__p1 is ResendRequest) { >>>>>> ResendRequest request = (ResendRequest)__p1; >>>>>> Session session = Session.lookupSession(__p2); >>>>>> if(session != null){ >>>>>> int expectedNext = session.getExpectedTargetNum(); >>>>>> >>>>>> } >>>>>> MainFrame.AddStatusMessage("requesting iLink sequence numbers >>>>>> " + request.getBeginSeqNo().getValue() + " through " + >>>>>> request.getEndSeqNo().getValue()); >>>>>> m_LastResendRequestSeqNum = __p1.getHeader().getInt(34); >>>>>> if (__p1.isSetField(369)) { >>>>>> __p1.removeField(369); // in using the CME's "basic >>>>>> resend" functionality, tag 369 cannot be present. >>>>>> } >>>>>> } else if (__p1 is Logout) { >>>>>> if (!__p1.isSetField(789)) { >>>>>> Session userSession = Session.lookupSession(__p2); >>>>>> if (userSession != null) { >>>>>> __p1.setInt(789, userSession.getExpectedTargetNum()); >>>>>> } >>>>>> } >>>>>> } >>>>>> if (!__p1.getHeader().isSetField(50)) { >>>>>> __p1.getHeader().setField(new StringField(50, m_ServerID)); >>>>>> } >>>>>> __p1.getHeader().setField(new CharField(57, 'G')); >>>>>> __p1.getHeader().setField(new StringField(142, m_ServerID)); >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11295 | 369=871 | >>>>>> 52=20091015-15:10:31.355 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=124 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=872 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:10:50.771* | 56=CME | 57=G | 142=JBM152 | 10=229 | >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11296 | 369=872 | >>>>>> 52=20091015-15:10:51.394 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=131 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=873 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:11:10.812* | 56=CME | 57=G | 142=JBM152 | 10=223 | >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11297 | 369=873 | >>>>>> 52=20091015-15:11:11.435 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=126 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=874 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:11:30.852* | 56=CME | 57=G | 142=JBM152 | 10=230 | >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11298 | 369=874 | >>>>>> 52=20091015-15:11:31.476 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=135 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=875 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:11:50.892* | 56=CME | 57=G | 142=JBM152 | 10=237 | >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11299 | 369=875 | >>>>>> 52=20091015-15:11:51.517 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=135 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=876 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:12:10.934* | 56=CME | 57=G | 142=JBM152 | 10=232 | >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11300 | 369=876 | >>>>>> 52=20091015-15:12:11.560 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=114 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=877 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:12:30.978* | 56=CME | 57=G | 142=JBM152 | 10=243 | >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11301 | 369=877 | >>>>>> 52=20091015-15:12:31.600 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=113 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=878 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:12:50.018* | 56=CME | 57=G | 142=JBM152 | 10=231 | >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11302 | 369=878 | >>>>>> 52=20091015-15:12:51.641 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=122 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=879 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:13:10.060* | 56=CME | 57=G | 142=JBM152 | 10=226 | >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11303 | 369=879 | >>>>>> 52=20091015-15:13:11.683 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=127 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=880 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:13:30.102* | 56=CME | 57=G | 142=JBM152 | 10=217 | >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11304 | 369=880 | >>>>>> 52=20091015-15:13:31.723 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=117 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=881 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:13:50.142* | 56=CME | 57=G | 142=JBM152 | 10=224 | >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11305 | 369=881 | >>>>>> 52=20091015-15:13:51.764 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=126 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=882 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:14:10.184* | 56=CME | 57=G | 142=JBM152 | 10=228 | >>>>>> 8=FIX.4.2 | 9=80 | 35=0 | 34=11306 | 369=882 | >>>>>> 52=20091015-15:14:11.806 | 49=CME | 50=G | 56=XXXXXXN | 57=JBM152 | 10=122 >>>>>> | >>>>>> 8=FIX.4.2 | 9=81 | 35=0 | 34=883 | 49=XXXXXXN | 50=JBM152 | >>>>>> 52=20091015-*15:14:30.226* | 56=CME | 57=G | 142=JBM152 | 10=228 | >>>>>> >>>>>> On Wed, Oct 14, 2009 at 10:06 AM, Rick Lane <ric...@gm...>wrote: >>>>>> >>>>>>> Greetings, >>>>>>> I'm attempting to do some certification on the CME (I like to >>>>>>> re-certify every year or so to ensure no new changes break any parts of our >>>>>>> platform), and they've introduced a new certification environment that is >>>>>>> more sensitive than prior systems. There is an issue I'm seeing that seems >>>>>>> to be causing a particular test to fail, and the issue is that the gap >>>>>>> between my heartbeats is slightly over my pre-determined heartbeat interval, >>>>>>> sometimes by 2 or 3 ms, but other times by 50-100 ms. What could be causing >>>>>>> this delay? >>>>>>> >>>>>>> At first I thought maybe it was my file logging of messages, so I >>>>>>> turned that off and the CME was still seeing this gap. It can't be a >>>>>>> connection thing because the timestamps on my messages (when I have logging >>>>>>> turned on) show gaps wider than 30 seconds, so it's something that's >>>>>>> happening *before *the message leaves my box. >>>>>>> >>>>>>> Any thoughts? >>>>>>> >>>>>>> Thanks, >>>>>>> Rick >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Come build with us! The BlackBerry(R) Developer Conference in SF, CA >>>>>> is the only developer event you need to attend this year. Jumpstart >>>>>> your >>>>>> developing skills, take BlackBerry mobile applications to market and >>>>>> stay >>>>>> ahead of the curve. Join us from November 9 - 12, 2009. Register now! >>>>>> http://p.sf.net/sfu/devconference >>>>>> _______________________________________________ >>>>>> Quickfix-developers mailing list >>>>>> Qui...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/quickfix-developers >>>>>> >>>>> >>>>> >>>> >>> >> > |