|
From: Christoph J. <chr...@ma...> - 2020-08-27 13:33:38
|
Hi 1. Do you only have Acceptors running or also Initiators? 2. The output of either netstat (see my last mail) or a stack dump at the time of the problem would be very helpful but I guess it is hard to know when the problem appears. Cheers Chris. Am 27. August 2020 14:24:35 MESZ schrieb Vipin Chaudhary <vip...@gm...>: >Hi, > >I have some detailed updates on this. > >When this issue occur - During the first client LOGON the logon get >hanged >(don't no why) >Following is the state in event log >Accepting session FIX.4.4: *****Server->****client from /clientIP: >60868 > > Acceptor heartbeat set to 60 seconds > >After this the logon process get hanged somewhere and no LOGON message >is >sent to client. > >When client does not get his LOGON msg response than it disconnect and >retry. >=> This disconnect is not detected at our side and when new LOGON >message >comes than at *AcceptorIoHandler.java *line no 69 >*qfSession.hasResponder() => return true* >*But * when i checked *qfSession.getRemoteAddress() *than it return me* >null. *Also I tried qfSession.logout() which does not do anything. > >In case of normal LOGON floe we have following state of event log >Accepting session FIX.4.4: *****Server->****client from /clientIP: >60868 > Acceptor heartbeat set to 60 seconds > Logon contains ResetSeqNumFlag=Y,resetting sequence numbers to 1 > Received logon > Responding to Logon request > >and then I can see the logon message in our outgoing log. > >I am not able to replicate this issue on my local machine so I don't >know >where it is getting hanged during first logon :( > >Do anyone has any solution to this ? > > >Thanks >Vipin > >On Thu, Aug 6, 2020 at 9:41 PM Christoph John <chr...@ma...> >wrote: > >> Hi, >> >> when all clients are affected at the same time it could still be a >network >> issue, right? :) >> As said, QFJ does not handle the TCP connection stuff by itself but >uses >> MINA which is a stable and mature library. Of course, it still could >have >> bugs. But your description rather sounds like all client's connection >get >> closed and go into TIME_WAIT or CLOSE_WAIT state. >> >> >https://superuser.com/questions/173535/what-are-close-wait-and-time-wait-states >> >> Did you check "netstat" or similar tool when the connection problem >occurs? >> >> If you were to implement the logic that you proposed that would mean >that >> someone could do a simple attack against your server which would not >only >> end the new malicious session but only the one that is rightfully >> connected. Are you sure you want to follow that way? ;) >> >> Cheers, >> Chris. >> >> On 06.08.20 13:27, Vipin Chaudhary wrote: >> >> Hi guys. >> >> I am still facing this issue and now the frequency is very high. I am >also >> not able to detect any networking issue. >> >> One more point is, it is happening with all clients at same time. >This >> does not resolve without the restart of application. >> One more strange thing is if I don't restart, then the client which >was >> disconnected at the time of this issue, when try to connect at their >> session time also facing this issue. >> >> So now I am highly suspecting this as quickfix bug. >> >> As per me quickfixj should close old session if it detect that old >session >> is still running when it receive new logon. To achieve the same I am >> thinking to edit *AcceptorIoHandler.java.* >> >> Class : AcceptorIoHandler.java >> Line no 69 >> if (qfSession.hasResponder()) { >> // Session is already bound to another connection >> sessionLog.onErrorEvent("Multiple logons/connections for >this >> session are not allowed"); >> protocolSession.closeNow(); >> *//TODO Close old session here* >> return; >> } >> >> There are many methods to close the session in Session class as >following >> >> 1. qfSession.close(); >> 2. qfSession.disconnect("Closing Old session", true); >> 3. qfSession.logout("Closing old session to accommodate new >> session"); >> 4. qfSession.generateLogout(); >> 5. qfSession.reset(); >> >> Which of the above method I should opt for, that can serve my purpose >? >> >> Thanks >> Vipin >> >> >> On Mon, May 4, 2020 at 3:47 PM Christoph John ><chr...@ma...> >> wrote: >> >>> Hi, >>> >>> if VPN rekeying is not the problem then maybe differing MTU sizes or >>> asymmetric routing. Or of course it could also be another problem. >Just >>> said what the problems mostly were in our case. >>> But I think if you describe the problem to your and your >counterparty's >>> network team (since you only seem to have this problem with some of >your >>> sessions) they should be able to debug it. Maybe your team need to >do some >>> tcp dumps around the time of the problem. >>> >>> It would be nice if you could reply to the user group (not me >privately) >>> if you found something out. This could help other users as well. >>> >>> Cheers, >>> Chris. >>> >>> >>> On 04.05.20 08:28, Vipin Chaudhary wrote: >>> >>> Hi Christoph, >>> >>> I double checked from the network team and find that no initiator >use VPN >>> to connect to us. >>> >>> In that case, Can you help me on what should I request to network >team to >>> fix the network connections? >>> >>> Thanks >>> Vipin Chaudhary >>> >>> On Thu, Apr 30, 2020 at 3:49 PM Christoph John ><chr...@ma...> >>> wrote: >>> >>>> The only solution is to fix the network connection. Everything else >is >>>> only a workaround. >>>> You could try to increase socket timeouts on both sides of the >>>> connection. Maybe it helps (depends on the cause of the problem) >but as >>>> said this will only work around the problem. >>>> >>>> Cheers, >>>> Chris. >>>> >>>> On 30.04.20 12:11, Vipin Chaudhary wrote: >>>> >>>> Hi Christoph, >>>> >>>> Thanks for your input, >>>> >>>> We have multiple sessions and this problems does not happens with >all >>>> sessions simultaneously. Mostly we see this problem with one >session and >>>> rarely with other session. >>>> >>>> As far as I know initiator connect directly with us over internet >(have >>>> ssl). Will double check on this with network team. >>>> >>>> Meanwhile any solution you think of this problem ? >>>> >>>> Thanks >>>> Vipin Chaudhary >>>> >>>> On Thu, Apr 30, 2020 at 3:31 PM Christoph John ><chr...@ma...> >>>> wrote: >>>> >>>>> Addition: if VPN rekeying is the problem you will probably see >this >>>>> message every hour (or whatever the rekey interval is) >>>>> >>>>> Chris. >>>>> >>>>> On 30.04.20 12:00, Christoph John wrote: >>>>> >>>>> Hi, >>>>> >>>>> did you change QFJ version or why do you think it is QFJ related? >Do >>>>> you only have one FIX session? If not, do all sessions show this >behaviour? >>>>> (apart from that, the TCP/IP stuff is done via the MINA framework >and >>>>> not within QFJ itself) >>>>> >>>>> This message appears when the initiator side of the connection >>>>> considers the connection broken and tries to reconnect. But the >acceptor >>>>> still considers it a vital connection (probably until the >connection >>>>> timeout kicks in). >>>>> >>>>> From my experience this happens mostly on VPN connections via >internet. >>>>> Reasons for this were one of: >>>>> - different MTU sizes on both sides of the connection or on a >>>>> router/firewall in between >>>>> - asymmetric routing >>>>> - differing VPN parameters leading to different rekeying >behaviour on >>>>> both ends of the connection. >>>>> >>>>> Hope that helps, >>>>> Chris. >>>>> >>>>> >>>>> On 30.04.20 05:51, Vipin Chaudhary wrote: >>>>> >>>>> QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ >>>>> QuickFIX/J Support: http://www.quickfixj.org/support/ >>>>> >>>>> >>>>> Hi Team, >>>>> >>>>> We are facing strange issue with quickfixj. >>>>> >>>>> We are SessionAcceptor, sometime when initiator disconnect, then >>>>> quickfixj is not able to recognize the disconnection. So when >client logon >>>>> next time it say >>>>> " Multiple logons/connections for this session are not allowed". >>>>> Although in reality client is disconnected. >>>>> Earlier it was very rare and happening once in a while but >nowadays its >>>>> happening like once in week. >>>>> Quickfixj is not able to recover from this and we need to restart >our >>>>> application >>>>> >>>>> *Do anyone have seen/fix this ?* >>>>> >>>>> Thanks >>>>> Vipin Chaudhary >>>>> >>>>> >>>>> _______________________________________________ >>>>> Quickfixj-users mailing >lis...@li...://lists.sourceforge.net/lists/listinfo/quickfixj-users >>>>> >>>>> >>>>> -- >>>>> Christoph John >>>>> Software Engineering >>>>> T +49 241 557...@ma... >>>>> >>>>> MACD GmbH >>>>> Oppenhoffallee 103 >>>>> 52066 Aachen, Germanywww.macd.com >>>>> >>>>> Amtsgericht Aachen: HRB 8151 >>>>> Ust.-Id: DE 813021663 >>>>> Geschäftsführer: George Macdonald >>>>> >>>>> >>>>> -- >>>>> Christoph John >>>>> Software Engineering >>>>> T +49 241 557...@ma... >>>>> >>>>> MACD GmbH >>>>> Oppenhoffallee 103 >>>>> 52066 Aachen, Germanywww.macd.com >>>>> >>>>> Amtsgericht Aachen: HRB 8151 >>>>> Ust.-Id: DE 813021663 >>>>> Geschäftsführer: George Macdonald >>>>> >>>>> >>>> -- >>>> Christoph John >>>> Software Engineering >>>> T +49 241 557...@ma... >>>> >>>> MACD GmbH >>>> Oppenhoffallee 103 >>>> 52066 Aachen, Germanywww.macd.com >>>> >>>> Amtsgericht Aachen: HRB 8151 >>>> Ust.-Id: DE 813021663 >>>> Geschäftsführer: George Macdonald >>>> >>>> >>> -- >>> Christoph John >>> Software Engineering >>> T +49 241 557...@ma... >>> >>> MACD GmbH >>> Oppenhoffallee 103 >>> 52066 Aachen, Germanywww.macd.com >>> >>> Amtsgericht Aachen: HRB 8151 >>> Ust.-Id: DE 813021663 >>> Geschäftsführer: George Macdonald >>> >>> >> -- >> Christoph John >> Software Engineering >> T +49 241 557...@ma... >> >> MACD GmbH >> Oppenhoffallee 103 >> 52066 Aachen, Germanywww.macd.com >> >> Amtsgericht Aachen: HRB 8151 >> Ust.-Id: DE 813021663 >> Geschäftsführer: George Macdonald >> >> |