Thread: [Quickfix-developers] Logon Ack seqNo
Brought to you by:
orenmnero
From: Shamanth <sha...@in...> - 2004-10-19 12:08:03
|
Hi I am using quickfix 1.8, While testing due to some network problems we got disconnected from the = "Acceptor". In the mean time, our "initiator" tried reconnecting to the = "acceptor" every 30secs.=20 It tried it 8 times before it could get an ack for its logon message.=20 Problem1: Our initiator, sent 8 logon messages and only the 9th logon = message was ack by the acceptor. But in the meantime, our initiator = incremented its MsgSeqNo, so when both the initiator and acceptor got = connected, there was a mismatch of SeqNo, and the "acceptor" send a = resendRequest to the "initiator" Question: Is there a way we can prevent the quickfix initiator from = incrementing its SeqNo, if it did not receive Ack for its Logon msg. NOTE: Only the SeqNo of the messages sent was incremented, while the = SeqNo of the messages received was correct. Problem2: After connecting again the Acceptor sent, a resend request = FROM: 0 TO: 2147483647, our initiator had not sent so many messages, so = it considers it as an error condition and stops responding to the = acceptor. Is "2147483647" the maximum value in resend request as per fix = protocol or should "0"(infinity) be considered as the max valueis = considered as the maximum number? thanks R Shamanth |
From: Oren M. <or...@qu...> - 2004-10-19 14:59:37
|
Answer1: No. This is in fact normal behavior. Whenever a message is sent the=20 sequence number has to be incremented. Just because we did not receive=20= an ack, does not necessarily mean the counter-party did not receive the=20= logon. If the sequence number was not incremented, and they had=20 actually received it without acknowledging, you would then encounter=20 disconnect scenarios due to too low sequence numbers at some point. A=20= much worse position to be in as it cannot be resolved automatically. Having a sequence number that is too high isn't much of a problem since=20= the two engines can resolve this on their own. And since in this case=20= we are talking about logon messages, all that is required is a single=20 gap fill message to put everything in order. Answer2: Depends on the version. For FIX.4.2 and higher, the value should be 0.=20= For versions 4.1 and earlier, a special value of 999999 is used. I'm=20= a bit curious as to what is going on here. Is both the initiator and=20 acceptor QuickFIX. It seems strange because since QuickFIX 1.6, the=20 EndSeqNo is always send either 0 or 999999, never another value. Based=20= on this I'm guessing the acceptor in this scenario is not QuickFIX, is=20= this correct? As to the effect of the value 2147483647, I suspect your application=20 has stopped responding because you now got the message store trying to=20= look up a hell of a lot of messages in a tight loop. I suspect we can=20= have QuickFIX handle this situation more gracefully if we consider such=20= a situation equivalent to an infinite request as such: if ( beginString >=3D FIX::BeginString_FIX42 && endSeqNo =3D=3D 0 || beginString <=3D FIX::BeginString_FIX42 && endSeqNo =3D=3D = 999999 || endSeqNo >=3D getExpectedSeqNum() ) // new condition to handle=20= bizarrely large numbers { endSeqNo =3D getExpectedSenderNum() - 1; } On Oct 19, 2004, at 7:08 AM, Shamanth wrote: > Hi > > I am using quickfix 1.8, > > While testing due to some network problems we got disconnected from=20 > the "Acceptor". In the mean time, our "initiator" tried reconnecting=20= > to the "acceptor" every 30secs. > > It tried it 8 times before it could get an ack for its logon message. > > Problem1: Our initiator, sent 8 logon messages and only the 9th logon=20= > message was ack by the acceptor. But in the meantime, our initiator=20 > incremented its MsgSeqNo, so when both the initiator and acceptor got=20= > connected, there was a mismatch of SeqNo, and the =93acceptor=94 send = a=20 > resendRequest to the =93initiator=94 > > Question: Is there a way we can prevent the quickfix initiator from=20 > incrementing its SeqNo, if it did not receive Ack for its Logon msg. > > NOTE: Only the SeqNo of the messages sent was incremented, while the=20= > SeqNo of the messages received was correct. > > > > Problem2: After connecting again the Acceptor sent, a resend request=20= > FROM: 0 TO: 2147483647, our initiator had not sent so many messages,=20= > so it considers it as an error condition and stops responding to the=20= > acceptor. Is =932147483647=94 the maximum value in resend request as = per=20 > fix protocol or should =930=94(infinity) be considered as the max = valueis=20 > considered as the maximum number? > > thanks > R Shamanth |
From: Oren M. <or...@qu...> - 2004-10-19 15:06:11
|
Correction: That condition should read, 'endSeqNo >=3D getExpectedSenderNum()' On Oct 19, 2004, at 9:59 AM, Oren Miller wrote: > Answer1: > > No. This is in fact normal behavior. Whenever a message is sent the=20= > sequence number has to be incremented. Just because we did not=20 > receive an ack, does not necessarily mean the counter-party did not=20 > receive the logon. If the sequence number was not incremented, and=20 > they had actually received it without acknowledging, you would then=20 > encounter disconnect scenarios due to too low sequence numbers at some=20= > point. A much worse position to be in as it cannot be resolved=20 > automatically. > > Having a sequence number that is too high isn't much of a problem=20 > since the two engines can resolve this on their own. And since in=20 > this case we are talking about logon messages, all that is required is=20= > a single gap fill message to put everything in order. > > Answer2: > > Depends on the version. For FIX.4.2 and higher, the value should be=20= > 0. For versions 4.1 and earlier, a special value of 999999 is used. =20= > I'm a bit curious as to what is going on here. Is both the initiator=20= > and acceptor QuickFIX. It seems strange because since QuickFIX 1.6,=20= > the EndSeqNo is always send either 0 or 999999, never another value. =20= > Based on this I'm guessing the acceptor in this scenario is not=20 > QuickFIX, is this correct? > > As to the effect of the value 2147483647, I suspect your application=20= > has stopped responding because you now got the message store trying to=20= > look up a hell of a lot of messages in a tight loop. I suspect we can=20= > have QuickFIX handle this situation more gracefully if we consider=20 > such a situation equivalent to an infinite request as such: > > if ( beginString >=3D FIX::BeginString_FIX42 && endSeqNo =3D=3D 0 || > beginString <=3D FIX::BeginString_FIX42 && endSeqNo =3D=3D = 999999 || > endSeqNo >=3D getExpectedSeqNum() ) // new condition to handle=20= > bizarrely large numbers > { endSeqNo =3D getExpectedSenderNum() - 1; } > > > On Oct 19, 2004, at 7:08 AM, Shamanth wrote: > >> Hi >> >> I am using quickfix 1.8, >> >> While testing due to some network problems we got disconnected from=20= >> the "Acceptor". In the mean time, our "initiator" tried reconnecting=20= >> to the "acceptor" every 30secs. >> >> It tried it 8 times before it could get an ack for its logon = message. >> >> >> Problem1: Our initiator, sent 8 logon messages and only the 9th logon=20= >> message was ack by the acceptor. But in the meantime, our initiator=20= >> incremented its MsgSeqNo, so when both the initiator and acceptor got=20= >> connected, there was a mismatch of SeqNo, and the =93acceptor=94 send = a=20 >> resendRequest to the =93initiator=94 >> >> Question: Is there a way we can prevent the quickfix initiator from=20= >> incrementing its SeqNo, if it did not receive Ack for its Logon msg. >> >> NOTE: Only the SeqNo of the messages sent was incremented, while the=20= >> SeqNo of the messages received was correct. >> >> >> >> Problem2: After connecting again the Acceptor sent, a resend request=20= >> FROM: 0 TO: 2147483647, our initiator had not sent so many messages,=20= >> so it considers it as an error condition and stops responding to the=20= >> acceptor. Is =932147483647=94 the maximum value in resend request as = per=20 >> fix protocol or should =930=94(infinity) be considered as the max = valueis=20 >> considered as the maximum number? >> >> thanks >> >> R Shamanth |
From: Caleb E. <cal...@gm...> - 2004-10-19 17:32:16
|
On a slightly related note. I've noticed that QuickFIX can be a little over-aggressive in terms of issuing ResendRequest messages and I think it might make sense to implement some level of throttling on these. For example, if a counterparty has a large-ish number of messages in flight when a gap is detected, each will cause QF to issue a ResendRequest for the same range of missed messages. The remote side will honor all of these, resulting in several times the necessary message traffic to fill the gap. It might make sense to keep track of the range and time of the most recent ResendRequest in the SessionState and not send a new ResendRequest if the outstanding one would satisfy the same range of messages and no more than <X> seconds has elapsed. -- Caleb Epstein cal...@gm... |