Thread: [Quickfix-developers] Already sent ResendRequest ... Not sending another.
Brought to you by:
orenmnero
|
From: Scott H. <sco...@fo...> - 2006-01-02 15:39:36
|
Has anyone encountered problems with the code that was added between 1.9.4 and 1.10.2 that emits the "Not sending another" message? Oren's CVS comment was "fix for stack over flow with recursive next calls. Do not resend if request already exists with same range. Submitted by Caleb Epstein." Note that I'm using QuickFIX/J, but the C++ code is the same. My session started out normally, but (possibly due to an unrelated bug in QFJ socket input) somehow loses a message (#216250) and should have tried to get the counterparty to resend it. But due to an earlier (successful) ResendRequest, the "Not sending another" code kicks in repeatedly (a very bad failure mode -- app gets no new messages yet session appears functional). I see two fixes: (1) modify nextSequenceReset() to call state.setResendRange(0,0) (2) modify the range check in doTargetTooHigh() Any advice? See my events below. The message length error was emitted by quickfix.netty.FIXMessageData.read(). 20051230-13:19:50: Created session 20051230-13:19:50: connection established: net.gleamynode.netty2.Session@191f801 20051230-13:19:54: Initiated logon request 20051230-13:19:54: Received logon response 20051230-13:19:54: MsgSeqNum too high, expecting 197648 but received 197650 20051230-13:19:55: Sent ResendRequest FROM: 197648 TO: 0 20051230-13:19:55: Received SequenceReset FROM: 197648 TO: 197651 2005-12-30 16:07:09,050 Error in message length 20051230-16:07:09: MsgSeqNum too high, expecting 216250 but received 216251 20051230-16:07:09: Already sent ResendRequest FROM: 197648 TO: 0. Not sending another. 20051230-16:07:10: MsgSeqNum too high, expecting 216250 but received 216252 20051230-16:07:10: Already sent ResendRequest FROM: 197648 TO: 0. Not sending another. 20051230-16:07:11: MsgSeqNum too high, expecting 216250 but received 216253 20051230-16:07:11: Already sent ResendRequest FROM: 197648 TO: 0. Not sending another. 20051230-16:07:11: MsgSeqNum too high, expecting 216250 but received 216254 20051230-16:07:11: Already sent ResendRequest FROM: 197648 TO: 0. Not sending another. |
|
From: Caleb E. <cal...@gm...> - 2006-01-02 21:27:40
|
On 1/2/06, Scott Harrington <sco...@fo...> wrote: > Has anyone encountered problems with the code that was added between 1.9.= 4 > and 1.10.2 that emits the "Not sending another" message? Oren's CVS > comment was "fix for stack over flow with recursive next calls. Do not > resend if request already exists with same range. Submitted by Caleb > Epstein." Note that I'm using QuickFIX/J, but the C++ code is the same. > > My session started out normally, but (possibly due to an unrelated bug in > QFJ socket input) somehow loses a message (#216250) and should have tried > to get the counterparty to resend it. But due to an earlier (successful) > ResendRequest, the "Not sending another" code kicks in repeatedly (a very > bad failure mode -- app gets no new messages yet session appears > functional). > > I see two fixes: > (1) modify nextSequenceReset() to call state.setResendRange(0,0) This method is for handling incoming SeqneceReset messages, so this would not be the right place to fix this. (2) modify the range check in doTargetTooHigh() The range check is correct, its just that the data in m_state.resendRange i= s not. The right fix I believe is to modify the code in Session::generateResendRequest to track the actual upper bound of the range it has requested, instead of the number 0 or 9999999 depending on FIX version. Its a one-line change that I've just committed. Any advice? See my events below. The message length error was emitted by > quickfix.netty.FIXMessageData.read(). > > 20051230-13:19:50: Created session > 20051230-13:19:50: connection established: > net.gleamynode.netty2.Session@191f801 > 20051230-13:19:54: Initiated logon request > 20051230-13:19:54: Received logon response > 20051230-13:19:54: MsgSeqNum too high, expecting 197648 but received > 197650 > 20051230-13:19:55: Sent ResendRequest FROM: 197648 TO: 0 > 20051230-13:19:55: Received SequenceReset FROM: 197648 TO: 197651 > 2005-12-30 16:07:09,050 Error in message length > 20051230-16:07:09: MsgSeqNum too high, expecting 216250 but received > 216251 > 20051230-16:07:09: Already sent ResendRequest FROM: 197648 TO: 0. Not > sending another. This is definitely a problem. The way the "Already sent ResendRequest ... not sending another" logic is supposed to work is that when a "MsgSeqNum to= o high" condition is detected, QuickFIX sends one and only one ResendRequest to satisfy the range of messages that were missed. The message that triggered this ResendRequest is placed on an internal queue of messages for later processing. Once a message is successfully processed from this internal queue, the original ResendRequest is considered to have been satisifed and the "Already sent ResendRequest" flag should be cleared. You should see a message in the log like "Processed QUEUED message: #" when thi= s happens, but that didn't happen here, which is something I've not encountered before. It suggests that there is a way to bypass the "nextQueued" logic in the Session which should probably not be happening. -- Caleb Epstein caleb dot epstein at gmail dot com |
|
From: Oren M. <or...@qu...> - 2006-01-03 04:02:13
|
I'm looking at SessionState::resendRequested, and see the following logic...
bool resendRequested() const
{ return !(m_resendRange.first == 0 && m_resendRange.second == 0); }
Wouldn't this erroneously return false in the case where a resend
request to INFINITY (defined as 0 in FIX.4.2. and later) is being
processed? Shouldn't this just be { return m_resendRange.first != 0; } ?
--oren
Caleb Epstein wrote:
> On 1/2/06, *Scott Harrington* <sco...@fo...
> <mailto:sco...@fo...>> wrote:
>
> Has anyone encountered problems with the code that was added
> between 1.9.4
> and 1.10.2 that emits the "Not sending another" message? Oren's CVS
> comment was "fix for stack over flow with recursive next calls. Do
> not
> resend if request already exists with same range. Submitted by Caleb
> Epstein." Note that I'm using QuickFIX/J, but the C++ code is the
> same.
>
> My session started out normally, but (possibly due to an unrelated
> bug in
> QFJ socket input) somehow loses a message (#216250) and should
> have tried
> to get the counterparty to resend it. But due to an earlier
> (successful)
> ResendRequest, the "Not sending another" code kicks in repeatedly
> (a very
> bad failure mode -- app gets no new messages yet session appears
> functional).
>
> I see two fixes:
> (1) modify nextSequenceReset() to call state.setResendRange(0,0)
>
>
> This method is for handling incoming SeqneceReset messages, so this
> would not be the right place to fix this.
>
> (2) modify the range check in doTargetTooHigh()
>
>
> The range check is correct, its just that the data in
> m_state.resendRange is not. The right fix I believe is to modify the
> code in Session::generateResendRequest to track the actual upper bound
> of the range it has requested, instead of the number 0 or 9999999
> depending on FIX version. Its a one-line change that I've just
> committed.
>
> Any advice? See my events below. The message length error was
> emitted by
> quickfix.netty.FIXMessageData.read().
>
> 20051230-13:19:50: Created session
> 20051230-13:19:50: connection established:
> net.gleamynode.netty2.Session@191f801
> 20051230-13:19:54: Initiated logon request
> 20051230-13:19:54: Received logon response
> 20051230-13:19:54: MsgSeqNum too high, expecting 197648 but
> received 197650
> 20051230-13:19:55: Sent ResendRequest FROM: 197648 TO: 0
> 20051230-13:19:55: Received SequenceReset FROM: 197648 TO: 197651
> 2005-12-30 16:07:09,050 Error in message length
> 20051230-16:07:09: MsgSeqNum too high, expecting 216250 but
> received 216251
> 20051230-16:07:09: Already sent ResendRequest FROM: 197648 TO:
> 0. Not sending another.
>
>
> This is definitely a problem. The way the "Already sent ResendRequest
> ... not sending another" logic is supposed to work is that when a
> "MsgSeqNum too high" condition is detected, QuickFIX sends one and
> only one ResendRequest to satisfy the range of messages that were
> missed. The message that triggered this ResendRequest is placed on an
> internal queue of messages for later processing. Once a message is
> successfully processed from this internal queue, the original
> ResendRequest is considered to have been satisifed and the "Already
> sent ResendRequest" flag should be cleared. You should see a message
> in the log like "Processed QUEUED message: #" when this happens, but
> that didn't happen here, which is something I've not encountered
> before. It suggests that there is a way to bypass the "nextQueued"
> logic in the Session which should probably not be happening.
>
> --
> Caleb Epstein
> caleb dot epstein at gmail dot com
|
|
From: Caleb E. <cal...@gm...> - 2006-01-03 11:58:00
|
On 1/2/06, Oren Miller <or...@qu...> wrote:
>
> I'm looking at SessionState::resendRequested, and see the following
> logic...
>
> bool resendRequested() const
> { return !(m_resendRange.first =3D=3D 0 && m_resendRange.second =3D=3D =
0); }
>
> Wouldn't this erroneously return false in the case where a resend
> request to INFINITY (defined as 0 in FIX.4.2. and later) is being
> processed? Shouldn't this just be { return m_resendRange.first !=3D 0; }=
?
The change I just made to generateResendRequest will ensure that the .secon=
d
member of the range is always initialized to a real sequence number, so the
code there is fine. What worries me though is the log Scott attached not
showing a message like "Processed QUEUED message: #". That is where the
Session code resets the resend range to (0, 0). So if the "Processing
QUEUED message" never shows up in the log, the range would still have
been (197648,
0) so changing SessionState::resendRequested wouldn't have fixed the bug.
I think the simplest and most reliable change is the one I made to
Session.cpp where the resendRange is initialized with a range of real
sequence numbers, and it not "open-ended" at the top (e.g. 0 or 9999999).
I am still slightly concerned however that there is a code-path by which
::nextQueued isn't being called before ::next. This says to me that the
logic for resetting the resend range belongs elsewhere.
--
Caleb Epstein
caleb dot epstein at gmail dot com
|
|
From: Oren M. <or...@qu...> - 2006-01-03 12:10:44
|
Well, the initiator sent a resend request from 197648 to 0 after logging in, and the acceptor responded with a sequence reset from 197648 to 197651. No messages would have been queued, so no queued messages would have been processed. So yeah, I would think that it is not being reset in the correct place. --oren > The change I just made to generateResendRequest will ensure that the > .second member of the range is always initialized to a real sequence > number, so the code there is fine. What worries me though is the log > Scott attached not showing a message like "Processed QUEUED message: > #". That is where the Session code resets the resend range to (0, > 0). So if the "Processing QUEUED message" never shows up in the log, > the range would still have been ( 197648, 0) so changing > SessionState::resendRequested wouldn't have fixed the bug. > > I think the simplest and most reliable change is the one I made to > Session.cpp where the resendRange is initialized with a range of real > sequence numbers, and it not "open-ended" at the top ( e.g. 0 or 9999999). > > I am still slightly concerned however that there is a code-path by > which ::nextQueued isn't being called before ::next. This says to me > that the logic for resetting the resend range belongs elsewhere. > > > > -- > Caleb Epstein > caleb dot epstein at gmail dot com |
|
From: Caleb E. <cal...@gm...> - 2006-01-03 20:39:52
|
On 1/3/06, Oren Miller <or...@qu...> wrote: > > Well, the initiator sent a resend request from 197648 to 0 after logging > in, and the acceptor responded with a sequence reset from 197648 to > 197651. No messages would have been queued, so no queued messages would > have been processed. So yeah, I would think that it is not being reset > in the correct place. Ah that explains it. I didn't read the log closely enough and missed this reset. It looks like the right place for the "have we satisfied an outstanding resend request" logic is in Session::verify. I've just committed a change that moves this logic from Session::nextQueued to here and it passes all user and acceptance tests. There will be a new event logged when this happens: "ResendRequest for messages FROM: # TO: # has been satisfied.". Shifting gears slightly, this raises (again) the queueing of messages from the counterparty. In Scott's case I'm pretty sure there was at least one message sitting in the SessionState's m_queue that would never be emptied out (#197650) until the session disconnected. I've posted before about the potential problems associated with queueing messages, specifically Admin-type messages (see http://sourceforge.net/mailarchive/forum.php?thread_id=3D7090286&forum_id= =3D103 ), and I'm wondering if it might not make sense to remove the queueing logi= c from the Session entirely. Its a potential performance optimization that has the potential to cause a memory leak (e.g. queued messages that are skipped by a SequenceReset) and worse, incorrect behavior (see email thread linked previously). Removing the queue would also simplify the code in the Session class, which is always a good thing IMHO. Failing that, Admin-type messages should probably not be enqueued, based on the reasons I outlined in the email thread from April. Oren, what do you think of this? -- Caleb Epstein caleb dot epstein at gmail dot com |
|
From: Oren M. <or...@qu...> - 2006-01-03 21:21:33
|
Yeah, I'm not sure about removing the queue completely. It comes off of this recommendation from the FIX protocol. · It is also recommended that an engine should store out of sequence messages in a temporary queue and process them in order when the gap is closed. This prevents generating resend requests for n->m, n->m+1, n->m+2, ... which can result in many resent PossDupFlag=Y messages. I *think* it might be ok for us to do away with the queue, because I believe we are blocking any sends while a resend request was being processed (making it impossible to create the above scenario). Otherwise removing the queue would cause some serious thrashing on a high frequency system that would be unrecoverable. If it can be demonstrated that a high frequency system can gracefully handle resend request scenarios, I would be ok removing the queue. --oren Caleb Epstein wrote: > On 1/3/06, *Oren Miller* <or...@qu... > <mailto:or...@qu...>> wrote: > > Well, the initiator sent a resend request from 197648 to 0 after > logging > in, and the acceptor responded with a sequence reset from 197648 to > 197651. No messages would have been queued, so no queued messages > would > have been processed. So yeah, I would think that it is not being > reset > in the correct place. > > > Ah that explains it. I didn't read the log closely enough and missed > this reset. > > It looks like the right place for the "have we satisfied an > outstanding resend request" logic is in Session::verify. I've just > committed a change that moves this logic from Session::nextQueued to > here and it passes all user and acceptance tests. There will be a new > event logged when this happens: "ResendRequest for messages FROM: # > TO: # has been satisfied.". > > Shifting gears slightly, this raises (again) the queueing of messages > from the counterparty. In Scott's case I'm pretty sure there was at > least one message sitting in the SessionState's m_queue that would > never be emptied out (#197650) until the session disconnected. > > I've posted before about the potential problems associated with > queueing messages, specifically Admin-type messages (see > http://sourceforge.net/mailarchive/forum.php?thread_id=7090286&forum_id=103 > <http://sourceforge.net/mailarchive/forum.php?thread_id=7090286&forum_id=103>), > and I'm wondering if it might not make sense to remove the queueing > logic from the Session entirely. Its a potential performance > optimization that has the potential to cause a memory leak (e.g. > queued messages that are skipped by a SequenceReset) and worse, > incorrect behavior (see email thread linked previously). > > Removing the queue would also simplify the code in the Session class, > which is always a good thing IMHO. Failing that, Admin-type messages > should probably not be enqueued, based on the reasons I outlined in > the email thread from April. Oren, what do you think of this? > > -- > Caleb Epstein > caleb dot epstein at gmail dot com |
|
From: Joerg T. <Joe...@ma...> - 2006-01-06 08:08:41
|
Oren Miller wrote: > QuickFIX Documentation: > http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX Support: http://www.quickfixengine.org/services.html > > Yeah, I'm not sure about removing the queue completely. [...] Hi Oren, hi Caleb, we recently added a configuration option to QuickFIX/J, which controls whether QFJ should use ResendRequest to INFINITY or fixed-range ResendRequest messages to re-request missed messages. I know the FPL recommends to use INFINITY in any case, and you changed the QF code quite a while ago. But this is a requirement from EMX London, which we are connecting to. Maybe other counterparty also have this restriction... To keep QF and QFJ compatible, it would be good to "backport" this change to QF. But I think that fixed-range ResendRequests need the queue. On the other hand, getting rid of unneeded code makes things easier and less error-prone. Can you both think of a way to accomplish this? Happy New Year and cheers, Jörg -- Joerg Thoennes http://macd.com Tel.: +49 (0)241 44597-24 Macdonald Associates GmbH Fax : +49 (0)241 44597-10 Lothringer Str. 52, D-52070 Aachen |
|
From: Scott H. <sco...@fo...> - 2006-01-09 16:20:30
Attachments:
qfj-resend-range.patch
|
Dear Caleb: Thanks for patching the Session.cpp code to fix my problem. I just today caught up to things and have ported your fix to my local copy of quickfixj/src/quickfix/Session.java. The patch is attached; could we ask Steve to review it and check it in to quickfixj? My underlying message length error (in netty2?) that caused my ResendRequest in the first place was intermittent and I haven't rigged a testbed to force it to happen on command. But once it does happen (I saw it every other day or so before, and once several times in one day), I will know that the new code is working when I see the new "ResendRequest has been satisfied" message. I guess then I will have to tackle the underlying problem before I can switch to QuickFIX/J in production.... Scott On Tue, 3 Jan 2006, Caleb Epstein wrote: > On 1/3/06, Oren Miller <or...@qu...> wrote: >> >> Well, the initiator sent a resend request from 197648 to 0 after logging >> in, and the acceptor responded with a sequence reset from 197648 to >> 197651. No messages would have been queued, so no queued messages would >> have been processed. So yeah, I would think that it is not being reset >> in the correct place. > > > Ah that explains it. I didn't read the log closely enough and missed this > reset. > > It looks like the right place for the "have we satisfied an outstanding > resend request" logic is in Session::verify. I've just committed a change > that moves this logic from Session::nextQueued to here and it passes all > user and acceptance tests. There will be a new event logged when this > happens: "ResendRequest for messages FROM: # TO: # has been satisfied.". > > Shifting gears slightly, this raises (again) the queueing of messages from > the counterparty. In Scott's case I'm pretty sure there was at least one > message sitting in the SessionState's m_queue that would never be emptied > out (#197650) until the session disconnected. > > I've posted before about the potential problems associated with queueing > messages, specifically Admin-type messages (see > http://sourceforge.net/mailarchive/forum.php?thread_id=7090286&forum_id=103 > ), and I'm wondering if it might not make sense to remove the queueing logic > from the Session entirely. Its a potential performance optimization that > has the potential to cause a memory leak (e.g. queued messages that are > skipped by a SequenceReset) and worse, incorrect behavior (see email thread > linked previously). > > Removing the queue would also simplify the code in the Session class, which > is always a good thing IMHO. Failing that, Admin-type messages should > probably not be enqueued, based on the reasons I outlined in the email > thread from April. Oren, what do you think of this? > > -- > Caleb Epstein > caleb dot epstein at gmail dot com |
|
From: Caleb E. <cal...@gm...> - 2006-01-09 16:40:57
|
On 1/9/06, Scott Harrington <sco...@fo...> wrote:
>
> Thanks for patching the Session.cpp code to fix my problem. I just today
> caught up to things and have ported your fix to my local copy of
> quickfixj/src/quickfix/Session.java. The patch is attached; could we ask
> Steve to review it and check it in to quickfixj?
This looks like half of the fix, but you're missing the logic to reset the
SessionState's resendRange back to (0,0) in Session::verify. Here's the
relevant section of the patch, which includes another fix Oren recently mad=
e
for handling Logon messages containing ResetSeqNumFlag=3DY
--- Session.cpp 2 Jan 2006 21:26:37 -0000 1.85
+++ Session.cpp 9 Jan 2006 16:04:56 -0000 1.87
@@ -235,7 +235,7 @@
MsgSeqNum msgSeqNum;
logon.getHeader().getField( msgSeqNum );
- if ( isTargetTooHigh( msgSeqNum ) )
+ if ( isTargetTooHigh( msgSeqNum ) && !resetSeqNumFlag )
{
doTargetTooHigh( logon );
}
@@ -964,6 +964,20 @@
doTargetTooLow( msg );
return false;
}
+
+ if ( (checkTooHigh || checkTooLow) && m_state.resendRequested() )
+ {
+ SessionState::ResendRange range =3D m_state.resendRange();
+
+ if ( msgSeqNum >=3D range.second )
+ {
+ m_state.onEvent ("ResendRequest for messages FROM: " +
+ IntConvertor::convert (range.first) + " TO: " +
+ IntConvertor::convert (range.second) +
+ " has been satisfied.");
+ m_state.resendRange (0, 0);
+ }
+ }
}
catch ( std::exception& e )
{
--
Caleb Epstein
caleb dot epstein at gmail dot com
|
|
From: Scott H. <sco...@fo...> - 2006-01-09 17:12:57
|
I'm pretty sure the state.setResendRange(0,0) is in the patch I attached
previously. I was working from a diff of revisions 1.84 to 1.86. Oddly,
I cannot access revision 1.87 of Session.cpp in CVS at the moment (nor in
the web-based ViewCVS), so I did miss Oren's latest patch.
On Mon, 9 Jan 2006, Caleb Epstein wrote:
> On 1/9/06, Scott Harrington <sco...@fo...> wrote:
>>
>> Thanks for patching the Session.cpp code to fix my problem. I just today
>> caught up to things and have ported your fix to my local copy of
>> quickfixj/src/quickfix/Session.java. The patch is attached; could we ask
>> Steve to review it and check it in to quickfixj?
>
>
> This looks like half of the fix, but you're missing the logic to reset the
> SessionState's resendRange back to (0,0) in Session::verify. Here's the
> relevant section of the patch, which includes another fix Oren recently made
> for handling Logon messages containing ResetSeqNumFlag=Y
>
> --- Session.cpp 2 Jan 2006 21:26:37 -0000 1.85
> +++ Session.cpp 9 Jan 2006 16:04:56 -0000 1.87
> @@ -235,7 +235,7 @@
>
> MsgSeqNum msgSeqNum;
> logon.getHeader().getField( msgSeqNum );
> - if ( isTargetTooHigh( msgSeqNum ) )
> + if ( isTargetTooHigh( msgSeqNum ) && !resetSeqNumFlag )
> {
> doTargetTooHigh( logon );
> }
> @@ -964,6 +964,20 @@
> doTargetTooLow( msg );
> return false;
> }
> +
> + if ( (checkTooHigh || checkTooLow) && m_state.resendRequested() )
> + {
> + SessionState::ResendRange range = m_state.resendRange();
> +
> + if ( msgSeqNum >= range.second )
> + {
> + m_state.onEvent ("ResendRequest for messages FROM: " +
> + IntConvertor::convert (range.first) + " TO: " +
> + IntConvertor::convert (range.second) +
> + " has been satisfied.");
> + m_state.resendRange (0, 0);
> + }
> + }
> }
> catch ( std::exception& e )
> {
>
>
> --
> Caleb Epstein
> caleb dot epstein at gmail dot com
>
|
|
From: Caleb E. <cal...@gm...> - 2006-01-09 17:34:36
|
On 1/9/06, Scott Harrington <sco...@fo...> wrote: > > I'm pretty sure the state.setResendRange(0,0) is in the patch I attached > previously. I was working from a diff of revisions 1.84 to 1.86. Oddly, Yes, you're of course right. I missed it in my first reading. I cannot access revision 1.87 of Session.cpp in CVS at the moment (nor in > the web-based ViewCVS), so I did miss Oren's latest patch. I suspect its the lag between developer CVS and anonymous CVS thats at work here. -- Caleb Epstein caleb dot epstein at gmail dot com |