|
From: Brad H. <Bra...@gb...> - 2006-09-22 22:25:10
|
Hi Steve, I think quickfix avoided an infinite resend loop nicely and adhered to the spec, but from a practical point of view the connection was unusable. I'd like to be able to recover gracefully from application errors involving checksum calculation (potentially other garbled message conditions?). I'm not concerned if the message causing the problem can't be processed, but I still want to receive subsequent messages coming in on the connection. I wouldn't mind if manual intervention was required - how about a way of telling the engine (without a logout) that a particular message number is never going to come? For example, would a method on session to bump the expected sequence number up and forget about past resend requests sent work? Thanks for your help. Regards, Brad. -----Original Message----- From: qui...@li... [mailto:qui...@li...] On Behalf Of Steve Bate Sent: Friday, 22 September 2006 7:23 PM To: qui...@li... Subject: Re: [Quickfixj-users] Handling of checksum errors QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ QuickFIX/J Support: http://www.quickfixj.org/support/ Hi Brad, According to the FIX 4.4 spec... "Note: The receiving application should disregard any message=20 that is garbled, cannot be parsed or fails a data integrity check.=20 Processing of the next valid FIX message will cause detection of a=20 sequence gap and a Resend Request will be generated. Logic should be=20 included in the FIX engine to recognize the possible infinite resend=20 loop, which may be encountered in this situation." =20 This paragraph seems to be assuming a /network error/ that would cause garbled messages rather than an application error. If it were the network, you'd expect that the resent message might be correct. This isn't necessarily true for an application error. Your specific problem seems to be caused partially by the QFJ checksum calculation. There are several little issues here, most of which are related to Java's lack of unsigned types and the fact that QF JNI uses Strings to represent raw message data (rather than a byte array or a ByteBuffer). The use of Strings means that the data is stored as a Unicode character array rather than as a byte array and this can cause problems in the cases. On the other hand, it's a potential benefit for the Chinese users. :-) My challenge has been how to support the String-based messages while maintaining the best possible performance. Java Strings and performance are almost an oxymoron, especially for networking protocols. I'm going to do some more investigation, but assuming your=20 counterparty is calculating the checksum correctly for the garbled=20 data, I'd prefer to improve the checksum calculation in QFJ. I'm open to other suggestions, but I'd prefer to not support configurable disabling of message integrity checks. Steve > -----Original Message----- > From: qui...@li...=20 > [mailto:qui...@li...] On=20 > Behalf Of Brad Harvey > Sent: Friday, September 22, 2006 2:36 AM > To: qui...@li... > Subject: Re: [Quickfixj-users] Handling of checksum errors >=20 > QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ > QuickFIX/J Support: http://www.quickfixj.org/support/ Hi Toli, >=20 > Thanks. We do know the cause of the checksum errors - some=20 > extra data was being added into a string field (from an=20 > uninitialised buffer apparently so the characters could be=20 > anything). The two engines aren't calculating the checksum=20 > the same way for these non printable characters, but I=20 > haven't looked into exactly what they were or what the=20 > correct checksum should have been yet. I'm leaving further=20 > investigation of this for later - at the moment I just want=20 > to make sure a checksum error doesn't stop all subsequent=20 > incoming messages from being processed. >=20 > Cheers, > Brad. =20 >=20 > -----Original Message----- > From: qui...@li... > [mailto:qui...@li...] On=20 > Behalf Of Toli Kuznets > Sent: Friday, 22 September 2006 10:09 AM > To: qui...@li... > Subject: Re: [Quickfixj-users] Handling of checksum errors >=20 > QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ > QuickFIX/J Support: http://www.quickfixj.org/support/ Brad, >=20 > Not sure if this helps - but i came across checksum problems=20 > when my exchange simulator (modified exchange code) was=20 > inserting fields unknown to quickfixj when these vars were set: > DataDictionary=3DFIX42.xml > UseDataDictionary=3DY >=20 > Are you sure you are not inserting "unknown" fields? >=20 > hope this helps. If i remember the exact sequence of what=20 > caused checksum failures for me i'll post that >=20 > On 9/21/06, Brad Harvey <Bra...@gb...> wrote: > > QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ > > QuickFIX/J Support: http://www.quickfixj.org/support/ Hi All, > > > > I recently encountered a problem where the checksum validation of a=20 > > particular incoming message always failed. This has the side effect > of > > stopping fromApp being called for any future message because the > engine > > waits forever for the message to be resent. > > > > Ignoring the problem of why the checksum is failing for the moment > (non > > printable/possibly embedded null characters in a string field - > possibly > > the subject of a future mail), what would be a better way to handle=20 > > this? > > > > As a short term workaround I may disable checksum=20 > validation - we're=20 > > connecting to counterparty over a LAN and the risk of=20 > checksum being=20 > > incorrect on a message is higher than the risk of the message being=20 > > received incorrectly. > > > > Longer term I was thinking of doing something to make the=20 > engine go to=20 > > the next target sequence number if there are 2 checksums=20 > failures for=20 > > the same MsgSeqNum. > > > > Following is a log from Banzai showing the problem. It was=20 > connecting=20 > > to Executor which I modified to always make the checksum of=20 > execution=20 > > reports for security "CHK" incorrect. This test was done with > quickfixj > > trunk but problem was initially noticed with 1.0.2. > > > > Any suggestions are appreciated. > > Thanks, > > Brad. > > > > The test was: > > * submitted two orders that received execution reports ok. > > * submitted order for CHK. > > * executor mangled checksum on execution report for CHK, banzai > detected > > checksum failure (fromApp not called) > > * submit another order > > * banzai detects out of sequence message, sends resend request > > * executor resent execution report with managed checksum again. > > * banzai detects checksum failure > > * For any further incoming messages, banzai detects=20 > sequence too high=20 > > and does nothing because of pending resend request. > > > > <20060921-12:25:25, FIX.4.2:BANZAI->EXEC, outgoing> > > > = (8=3DFIX.4.2=019=3D129=0135=3DD=0134=3D2=0149=3DBANZAI=0152=3D20060921-12= :25:25.937=01 > 56=3DEXEC=0111 > > > = =3D1158841525860=0121=3D1=0138=3D5=0140=3D2=0144=3D10=0154=3D1=0155=3DABC= =0159=3D0=0160=3D200609 > 21-12:25:2 > > 5=0110=3D250=01) > > <20060921-12:25:26, FIX.4.2:BANZAI->EXEC, incoming> > > > = (8=3DFIX.4.2=019=3D140=0135=3D8=0134=3D2=0149=3DEXEC=0152=3D20060921-12:2= 5:26.062=0156 > =3DBANZAI=016=3D > > > = 10=0111=3D1158841525860=0114=3D5=0117=3D1=0120=3D0=0131=3D10=0132=3D5=013= 7=3D1=0138=3D5=0139=3D2=015 > 4=3D1=0155=3DABC > > =01150=3D2=01151=3D0=0110=3D057=01) > > <20060921-12:25:46, FIX.4.2:BANZAI->EXEC, outgoing> > > > = (8=3DFIX.4.2=019=3D129=0135=3DD=0134=3D3=0149=3DBANZAI=0152=3D20060921-12= :25:46.546=01 > 56=3DEXEC=0111 > > > = =3D1158841546548=0121=3D1=0138=3D5=0140=3D2=0144=3D10=0154=3D1=0155=3DCBA= =0159=3D0=0160=3D200609 > 21-12:25:4 > > 6=0110=3D003=01) > > <20060921-12:25:46, FIX.4.2:BANZAI->EXEC, incoming> > > > = (8=3DFIX.4.2=019=3D140=0135=3D8=0134=3D3=0149=3DEXEC=0152=3D20060921-12:2= 5:46.546=0156 > =3DBANZAI=016=3D > > > = 10=0111=3D1158841546548=0114=3D5=0117=3D2=0120=3D0=0131=3D10=0132=3D5=013= 7=3D2=0138=3D5=0139=3D2=015 > 4=3D1=0155=3DCBA > > =01150=3D2=01151=3D0=0110=3D075=01) > > <20060921-12:25:56, FIX.4.2:BANZAI->EXEC, outgoing> > > > = (8=3DFIX.4.2=019=3D129=0135=3DD=0134=3D4=0149=3DBANZAI=0152=3D20060921-12= :25:56.312=01 > 56=3DEXEC=0111 > > > = =3D1158841556315=0121=3D1=0138=3D5=0140=3D2=0144=3D10=0154=3D1=0155=3DCHK= =0159=3D0=0160=3D200609 > 21-12:25:5 > > 6=0110=3D006=01) > > <20060921-12:25:56, FIX.4.2:BANZAI->EXEC, incoming> > > > = (8=3DFIX.4.2=019=3D140=0135=3D8=0134=3D4=0149=3DEXEC=0152=3D20060921-12:2= 5:56.312=0156 > =3DBANZAI=016=3D > > > = 10=0111=3D1158841556315=0114=3D5=0117=3D3=0120=3D0=0131=3D10=0132=3D5=013= 7=3D3=0138=3D5=0139=3D2=015 > 4=3D1=0155=3DCHK > > =01150=3D2=01151=3D0=0110=3D179=01) > > 21/09/2006 22:25:56 quickfix.mina.AbstractIoHandler messageReceived > > SEVERE: Invalid message: Expected CheckSum=3D79, Received=20 > CheckSum=3D179=20 > > <20060921-12:26:26, FIX.4.2:BANZAI->EXEC, outgoing> > > > = (8=3DFIX.4.2=019=3D129=0135=3DD=0134=3D5=0149=3DBANZAI=0152=3D20060921-12= :26:26.468=01 > 56=3DEXEC=0111 > > > = =3D1158841586472=0121=3D1=0138=3D5=0140=3D2=0144=3D10=0154=3D1=0155=3DAAA= =0159=3D0=0160=3D200609 > 21-12:26:2 > > 6=0110=3D003=01) > > <20060921-12:26:26, FIX.4.2:BANZAI->EXEC, incoming> > > > = (8=3DFIX.4.2=019=3D140=0135=3D8=0134=3D5=0149=3DEXEC=0152=3D20060921-12:2= 6:26.468=0156 > =3DBANZAI=016=3D > > > = 10=0111=3D1158841586472=0114=3D5=0117=3D4=0120=3D0=0131=3D10=0132=3D5=013= 7=3D4=0138=3D5=0139=3D2=015 > 4=3D1=0155=3DAAA > > =01150=3D2=01151=3D0=0110=3D080=01) > > <20060921-12:26:26, FIX.4.2:BANZAI->EXEC, event> (MsgSeqNum=20 > too high,=20 > > expecting 4 but received 5) <20060921-12:26:26,=20 > FIX.4.2:BANZAI->EXEC,=20 > > outgoing> > > > = (8=3DFIX.4.2=019=3D62=0135=3D2=0134=3D6=0149=3DBANZAI=0152=3D20060921-12:= 26:26.468=015 > 6=3DEXEC=017=3D4 > > =0116=3D0=0110=3D058=01) > > <20060921-12:26:26, FIX.4.2:BANZAI->EXEC, event> (Sent ResendRequest > > FROM: 4 TO: 0) > > <20060921-12:26:26, FIX.4.2:BANZAI->EXEC, incoming> > > > = (8=3DFIX.4.2=019=3D167=0135=3D8=0134=3D4=0143=3DY=0149=3DEXEC=0152=3D2006= 0921-12:26:26.4 > 84=0156=3DBANZ > > > = AI=01122=3D20060921-12:25:56=016=3D10=0111=3D1158841556315=0114=3D5=0117=3D= 3=0120=3D0=01 > 31=3D10=0132=3D5 > > = =0137=3D3=0138=3D5=0139=3D2=0154=3D1=0155=3DCHK=01150=3D2=01151=3D0=0110=3D= 255=01) > > <20060921-12:26:26, FIX.4.2:BANZAI->EXEC, incoming> > > > = (8=3DFIX.4.2=019=3D167=0135=3D8=0134=3D5=0143=3DY=0149=3DEXEC=0152=3D2006= 0921-12:26:26.5 > 00=0156=3DBANZ > > > = AI=01122=3D20060921-12:26:26=016=3D10=0111=3D1158841586472=0114=3D5=0117=3D= 4=0120=3D0=01 > 31=3D10=0132=3D5 > > = =0137=3D4=0138=3D5=0139=3D2=0154=3D1=0155=3DAAA=01150=3D2=01151=3D0=0110=3D= 133=01) > > <20060921-12:26:26, FIX.4.2:BANZAI->EXEC, event> (MsgSeqNum=20 > too high,=20 > > expecting 4 but received 5) <20060921-12:26:26,=20 > FIX.4.2:BANZAI->EXEC,=20 > > event> (Already sent ResendRequest FROM: 4 TO: 4. Not sending=20 > > another.) > > 21/09/2006 22:26:26 quickfix.mina.AbstractIoHandler messageReceived > > SEVERE: Invalid message: Expected CheckSum=3D155, Received=20 > CheckSum=3D255=20 > > <20060921-12:26:34, FIX.4.2:BANZAI->EXEC, outgoing> > > > = (8=3DFIX.4.2=019=3D129=0135=3DD=0134=3D7=0149=3DBANZAI=0152=3D20060921-12= :26:34.421=01 > 56=3DEXEC=0111 > > > = =3D1158841594426=0121=3D1=0138=3D5=0140=3D2=0144=3D10=0154=3D1=0155=3DBBB= =0159=3D0=0160=3D200609 > 21-12:26:3 > > 4=0110=3D249=01) > > <20060921-12:26:34, FIX.4.2:BANZAI->EXEC, incoming> > > > = (8=3DFIX.4.2=019=3D140=0135=3D8=0134=3D6=0149=3DEXEC=0152=3D20060921-12:2= 6:34.437=0156 > =3DBANZAI=016=3D > > > = 10=0111=3D1158841594426=0114=3D5=0117=3D5=0120=3D0=0131=3D10=0132=3D5=013= 7=3D5=0138=3D5=0139=3D2=015 > 4=3D1=0155=3DBBB > > =01150=3D2=01151=3D0=0110=3D079=01) > > <20060921-12:26:34, FIX.4.2:BANZAI->EXEC, event> (MsgSeqNum=20 > too high,=20 > > expecting 4 but received 6) <20060921-12:26:34,=20 > FIX.4.2:BANZAI->EXEC,=20 > > event> (Already sent ResendRequest FROM: 4 TO: 4. Not sending=20 > > another.) <20060921-12:26:43, FIX.4.2:BANZAI->EXEC, outgoing> > > > = (8=3DFIX.4.2=019=3D129=0135=3DD=0134=3D8=0149=3DBANZAI=0152=3D20060921-12= :26:43.609=01 > 56=3DEXEC=0111 > > > = =3D1158841603615=0121=3D1=0138=3D5=0140=3D2=0144=3D10=0154=3D1=0155=3DCCC= =0159=3D0=0160=3D200609 > 21-12:26:4 > > 3=0110=3D252=01) > > <20060921-12:26:43, FIX.4.2:BANZAI->EXEC, incoming> > > > = (8=3DFIX.4.2=019=3D140=0135=3D8=0134=3D7=0149=3DEXEC=0152=3D20060921-12:2= 6:43.625=0156 > =3DBANZAI=016=3D > > > = 10=0111=3D1158841603615=0114=3D5=0117=3D6=0120=3D0=0131=3D10=0132=3D5=013= 7=3D6=0138=3D5=0139=3D2=015 > 4=3D1=0155=3DCCC > > =01150=3D2=01151=3D0=0110=3D075=01) > > <20060921-12:26:43, FIX.4.2:BANZAI->EXEC, event> (MsgSeqNum=20 > too high,=20 > > expecting 4 but received 7) <20060921-12:26:43,=20 > FIX.4.2:BANZAI->EXEC,=20 > > event> (Already sent ResendRequest FROM: 4 TO: 4. Not sending=20 > > another.) <20060921-12:26:54, FIX.4.2:BANZAI->EXEC, outgoing> > > > = (8=3DFIX.4.2=019=3D129=0135=3DD=0134=3D9=0149=3DBANZAI=0152=3D20060921-12= :26:54.390=01 > 56=3DEXEC=0111 > > > = =3D1158841614397=0121=3D1=0138=3D5=0140=3D2=0144=3D10=0154=3D1=0155=3DDDD= =0159=3D0=0160=3D200609 > 21-12:26:5 > > 4=0110=3D010=01) > > <20060921-12:26:54, FIX.4.2:BANZAI->EXEC, incoming> > > > = (8=3DFIX.4.2=019=3D140=0135=3D8=0134=3D8=0149=3DEXEC=0152=3D20060921-12:2= 6:54.406=0156 > =3DBANZAI=016=3D > > > = 10=0111=3D1158841614397=0114=3D5=0117=3D7=0120=3D0=0131=3D10=0132=3D5=013= 7=3D7=0138=3D5=0139=3D2=015 > 4=3D1=0155=3DDDD > > =01150=3D2=01151=3D0=0110=3D089=01) > > <20060921-12:26:54, FIX.4.2:BANZAI->EXEC, event> (MsgSeqNum=20 > too high,=20 > > expecting 4 but received 8) <20060921-12:26:54,=20 > FIX.4.2:BANZAI->EXEC,=20 > > event> (Already sent ResendRequest FROM: 4 TO: 4. Not sending=20 > > another.) > > > > > > > > > -------------------------------------------------------------- > ---------- > - > > Take Surveys. Earn Cash. Influence the Future of IT Join=20 > > SourceForge.net's Techsay panel and you'll get the chance to > share your > > opinions on IT & business topics through brief surveys -- and earn > cash > > > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge > &CID=3DDEVDE > V > > _______________________________________________ > > Quickfixj-users mailing list > > Qui...@li... > > https://lists.sourceforge.net/lists/listinfo/quickfixj-users > > >=20 >=20 > -- > Toli Kuznets > http://www.marketcetera.com: Open-Source Trading Platform=20 > download.run.trade. >=20 > -------------------------------------------------------------- > ---------- > - > Take Surveys. Earn Cash. Influence the Future of IT Join=20 > SourceForge.net's Techsay panel and you'll get the chance to=20 > share your opinions on IT & business topics through brief=20 > surveys -- and earn cash=20 > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge > &CID=3DDEVDE > V > _______________________________________________ > Quickfixj-users mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfixj-users >=20 > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT Join=20 > SourceForge.net's Techsay panel and you'll get the chance to=20 > share your opinions on IT & business topics through brief=20 > surveys -- and earn cash=20 > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge > &CID=3DDEVDEV > _______________________________________________ > Quickfixj-users mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfixj-users >=20 ------------------------------------------------------------------------ - Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3D= DEVDE V _______________________________________________ Quickfixj-users mailing list Qui...@li... https://lists.sourceforge.net/lists/listinfo/quickfixj-users |