RE: [Quickfix-developers] QFJ: Test message => Heartbeat response => Disconnect
Brought to you by:
orenmnero
|
From: Steve B. <sb...@sm...> - 2006-02-02 20:42:30
|
Hi Toby, =20 This definitely seems strange. I don't have any good theories about what's happening. I wrote a unit test to verify the heartbeat message parses and validates with the FIX 4.4 data dictionary. The timing would only cause a problem if there was an offset greater than or equal to +/- two minutes (by default). Even if the message were rejected for a bad sending time, there should a reject message and a log record of the incoming heartbeat and outgoing reject. =20 There's a acceptance test case (6_SendTestRequest.def) that tests this scenario and although it tests the functionality using an acceptor rather than an initiator, the same session logic and message parsing code is used for both of them. The <time> placeholder in the AT can specify a relative offset in seconds so I simulated sending a heartbeat with a sending time 10 seconds before the test request. The test passed with no problems.=20 =20 However, just in case there's a bug there, are you using the latency check settings (the check flag and/or the maximum latency setting)? =20 I know you probably weren't showing the whole log, but there's a gap in the sequence numbers between the last incoming message before the heartbeat and the heartbeat itself. Could that be significant? I'm just brainstorming here. =20 Steve Bate Smart Trade Technologies Phone: +33 4 42 90 03 97 http://www.smart-trade.net/ ________________________________ From: qui...@li... [mailto:qui...@li...] On Behalf Of Shepheard, Toby (London) Sent: Thursday, February 02, 2006 8:13 PM To: qui...@li... Subject: [Quickfix-developers] QFJ: Test message =3D> Heartbeat response =3D> Disconnect =09 =09 I'm seeing some unexpected behaviour after sending a Test message and getting a Heartbeat response back. This is with the QFJ, taken from CVS head ~1 week ago. =20 Here's what I see in the logs at my end (XXX to replace 3rd party) [2006-02-02 18:34:19,445][INFO ] msg.incoming: FIX.4.4:ML_RTFI_DEV1->XXXXFIXTEST1: 8=3DFIX.4.4=019=3D121=0135=3DH=0134=3D1107=0149=3DXXXXFIXTEST1=0150=3DTob= y_Shepheard@MLRX=0152=3D2 0060202-18:34:19.336=0156=3DML_RTFI_DEV1=0111=3D5.1.152297=0154=3D1=0155=3D= [N/A]=0110=3D047 ...... protocol.XXXXXXXQFApplication: Sending FIX admin message (1)8=3DFIX.4.4=019=3D78=0135=3D1=0134=3D954=0149=3DML_RTFI_DEV1=0152=3D20= 060202-18:35:05.383=015 6=3DXXXXFIXTEST1=01112=3DTEST=0110=3D147=01 [2006-02-02 18:35:05,383][INFO ] msg.outgoing: FIX.4.4:ML_RTFI_DEV1->XXXXFIXTEST1: 8=3DFIX.4.4=019=3D78=0135=3D1=0134=3D954=0149=3DML_RTFI_DEV1=0152=3D20060= 202-18:35:05.383=0156=3DX XXXFIXTEST1=01112=3DTEST=0110=3D147=01 [2006-02-02 18:35:05,430][INFO ] quickfixj.event: FIX.4.4:ML_RTFI_DEV1->XXXXFIXTEST1: Sent test request TEST [2006-02-02 18:35:32,399][INFO ] quickfixj.event: FIX.4.4:ML_RTFI_DEV1->XXXXFIXTEST1: Timed out waiting for heartbeat [2006-02-02 18:35:32,399][INFO ] quickfixj.event: FIX.4.4:ML_RTFI_DEV1->XXXXFIXTEST1: Disconnecting =20 From the logs, no great shakes - we sent a Test request, got no response, timed out due to receiving no messages and disconnected. However..... a network snoop reveals the following: =20 =20 =09 8=3DFIX.4.4.9=3D78.35=3D1.34=3D954.49=3DML_RTFI_DEV1.52=3D20060202-18:35:= 05.383.56=3DR EUTFIXTEST1.112=3DTEST.10=3D147. =3D>>>>> this is my Test message =20 =09 8=3DFIX.4.4.9=3D79.35=3D0.34=3D1115.49=3DREUTFIXTEST1.52=3D20060202-18:35= :05.374.56=3D ML_RTFI_DEV1.112=3DTEST.10=3D185. <<<<<<=3D This is a heartbeat = message response =20 =20 The heartbeat message was received ~20ms after the Test message, but due to some slight difference in clock times, the Sending Time (52) makes it look like it came a fraction earlier. I'm not sure if this is perhaps part of the problem? There doesn't seem to be any sign of this heartbeat being received on my side, which leads me to believe there's something fishy here, but I'm not sure if its related to the TEST message or the SendingTime being prior to my SystemTime when I'm processing it, or something else altogether. =20 The time between the last recorded message I received (the resend request at the top, which I ignored) and the disconnect is ~73 seconds, matching the 2.4*heartbeat interval in the code. So everything points to he heartbeat message from the 3rd party, send almost immediately after the Test message request, being totally ignored.... any ideas why? =20 MTIA, Toby =20 ________________________________ If you are not an intended recipient of this e-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute it. Click here <http://www.ml.com/email_terms/> for important additional terms relating to this e-mail. http://www.ml.com/email_terms/ ________________________________ |