RE: [Quickfix-developers] QFJ: Test message => Heartbeat response => Disconnect
Brought to you by:
orenmnero
|
From: Shepheard, T. (London) <Tob...@ml...> - 2006-02-03 09:43:35
|
CheckLatency=3DN and I've no other latency settings configured.
=20
Wrt sequence numbers, the sequence numbers over the network were fine.
I'm not sure if any were "missed" by QFJ or if that was just me
extracting limited snippets though and I stupidly lost the log file when
I went home last night and closed down eclipse. Certainly the network
trace shows consecutive numbers, though there was a preceding
sequenceReset which may be relevant, it looks to have acted as expected.
=20
If there's nothing obvious I'd suggest we leave it for now and I'll keep
it in the back of my mind. If it happens again I will take care to
capture complete logs on both the network and application side. Once the
problem started happening the sequence was fully repeatable (logon,
sequence of messages =3D> Test message =3D> Disconnect =3D> back to =
Logon
again etc) so hopefully I can get a bit more info if it does recur.
=20
Regards
Toby
=20
-----Original Message-----
From: qui...@li...
[mailto:qui...@li...] On Behalf Of
Steve Bate
Sent: 02 February 2006 20:36
To: qui...@li...
Subject: RE: [Quickfix-developers] QFJ: Test message =3D>
Heartbeat response =3D> Disconnect
=09
=09
Hi Toby,
=20
This definitely seems strange. I don't have any good theories
about what's happening. I wrote a unit test to verify the heartbeat
message parses and validates with the FIX 4.4 data dictionary. The
timing would only cause a problem if there was an offset greater than or
equal to +/- two minutes (by default). Even if the message were rejected
for a bad sending time, there should a reject message and a log record
of the incoming heartbeat and outgoing reject.
=20
There's a acceptance test case (6_SendTestRequest.def) that
tests this scenario and although it tests the functionality using an
acceptor rather than an initiator, the same session logic and message
parsing code is used for both of them. The <time> placeholder in the AT
can specify a relative offset in seconds so I simulated sending a
heartbeat with a sending time 10 seconds before the test request. The
test passed with no problems.=20
=20
However, just in case there's a bug there, are you using the
latency check settings (the check flag and/or the maximum latency
setting)?
=20
I know you probably weren't showing the whole log, but there's a
gap in the sequence numbers between the last incoming message before the
heartbeat and the heartbeat itself. Could that be significant? I'm just
brainstorming here.
=20
Steve Bate
Smart Trade Technologies
Phone: +33 4 42 90 03 97
http://www.smart-trade.net/
_____ =20
From: qui...@li...
[mailto:qui...@li...] On Behalf Of
Shepheard, Toby (London)
Sent: Thursday, February 02, 2006 8:13 PM
To: qui...@li...
Subject: [Quickfix-developers] QFJ: Test message =3D>
Heartbeat response =3D> Disconnect
=09
=09
I'm seeing some unexpected behaviour after sending a
Test message and getting a Heartbeat response back. This is with the
QFJ, taken from CVS head ~1 week ago.
=20
Here's what I see in the logs at my end (XXX to replace
3rd party)
[2006-02-02 18:34:19,445][INFO ] msg.incoming:
FIX.4.4:ML_RTFI_DEV1->XXXXFIXTEST1:
8=3DFIX.4.4=019=3D121=0135=3DH=0134=3D1107=0149=3DXXXXFIXTEST1=0150=3DTob=
y_Shepheard@MLRX=0152=3D2
0060202-18:34:19.336=0156=3DML_RTFI_DEV1=0111=3D5.1.152297=0154=3D1=0155=3D=
[N/A]=0110=3D047
......
protocol.XXXXXXXQFApplication: Sending FIX admin message
(1)8=3DFIX.4.4=019=3D78=0135=3D1=0134=3D954=0149=3DML_RTFI_DEV1=0152=3D20=
060202-18:35:05.383=015
6=3DXXXXFIXTEST1=01112=3DTEST=0110=3D147=01
[2006-02-02 18:35:05,383][INFO ] msg.outgoing:
FIX.4.4:ML_RTFI_DEV1->XXXXFIXTEST1:
8=3DFIX.4.4=019=3D78=0135=3D1=0134=3D954=0149=3DML_RTFI_DEV1=0152=3D20060=
202-18:35:05.383=0156=3DX
XXXFIXTEST1=01112=3DTEST=0110=3D147=01
[2006-02-02 18:35:05,430][INFO ] quickfixj.event:
FIX.4.4:ML_RTFI_DEV1->XXXXFIXTEST1: Sent test request TEST
[2006-02-02 18:35:32,399][INFO ] quickfixj.event:
FIX.4.4:ML_RTFI_DEV1->XXXXFIXTEST1: Timed out waiting for heartbeat
[2006-02-02 18:35:32,399][INFO ] quickfixj.event:
FIX.4.4:ML_RTFI_DEV1->XXXXFIXTEST1: Disconnecting
=20
From the logs, no great shakes - we sent a Test request,
got no response, timed out due to receiving no messages and
disconnected. However..... a network snoop reveals the following:
=20
=20
=09
8=3DFIX.4.4.9=3D78.35=3D1.34=3D954.49=3DML_RTFI_DEV1.52=3D20060202-18:35:=
05.383.56=3DR
EUTFIXTEST1.112=3DTEST.10=3D147. =3D>>>>> this is my Test message
=20
=09
8=3DFIX.4.4.9=3D79.35=3D0.34=3D1115.49=3DREUTFIXTEST1.52=3D20060202-18:35=
:05.374.56=3D
ML_RTFI_DEV1.112=3DTEST.10=3D185. <<<<<<=3D This is a heartbeat =
message
response
=20
=20
The heartbeat message was received ~20ms after the Test
message, but due to some slight difference in clock times, the Sending
Time (52) makes it look like it came a fraction earlier. I'm not sure if
this is perhaps part of the problem? There doesn't seem to be any sign
of this heartbeat being received on my side, which leads me to believe
there's something fishy here, but I'm not sure if its related to the
TEST message or the SendingTime being prior to my SystemTime when I'm
processing it, or something else altogether.
=20
The time between the last recorded message I received
(the resend request at the top, which I ignored) and the disconnect is
~73 seconds, matching the 2.4*heartbeat interval in the code. So
everything points to he heartbeat message from the 3rd party, send
almost immediately after the Test message request, being totally
ignored.... any ideas why?
=20
MTIA,
Toby
=20
_____ =20
If you are not an intended recipient of this e-mail,
please notify the sender, delete it and do not read, act upon, print,
disclose, copy, retain or redistribute it. Click here
<http://www.ml.com/email_terms/> for important additional terms relating
to this e-mail. http://www.ml.com/email_terms/
_____ =20
|