|
From: Florian L. <fl...@un...> - 2020-12-07 18:07:18
|
I assumed as much as well in the beginning, but I ruled this out since it doesn't happen with the same message volume but without a proxy in between, and it also doesn't happen with the proxy but the heartbeat interval configured as 100'000s, and lastly it also immediately stops when sending out the heartbeat (and not after some timeout because the heartbeat arrives too late at the other side and they close the connection because of it). There was never a time during testing where any incoming message was processed after such a sent heartbeat and only then the disconnect happened, the heartbeat message was always the last thing logged. The really weird part about for me is that there is absolutely no kind of error thrown or anything. So it seems Quickfix/j still believes everything is fine even though the connection is broken... On 07.12.2020 15:37, Colin DuPlantis wrote: > QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ > QuickFIX/J Support: http://www.quickfixj.org/support/ > > > > Apologies if this has been asked and answered, but, is it possible the > issue is related to your Application becoming overwhelmed with > incoming messages and not sending out the heartbeat because it takes > too long to respond to the incoming messages? > > I ask this because you say that it breaks "where the is lots of > incoming traffic". > > One way to test this would be to have your receiver either not process > incoming messages or stick them in an adjacent thread to process, thus > freeing up the QFJ Application thread. > > On 12/7/20 6:34 AM, Florian Leu wrote: >> QuickFIX/J Documentation:http://www.quickfixj.org/documentation/ >> QuickFIX/J Support:http://www.quickfixj.org/support/ >> >> >> >> Hi Chris >> >> Sorry for the late reply and thanks for taking an interest in this issue. >> >> The connection doesn't break in quiet times, but in times where there >> is lots of incoming traffic, while at the same time I try to send out >> a heartbeat message after the configured interval. Since I'm only a >> message receiver, I never send out my own application messages and >> therefore Quickfix/j produces the heartbeat messages regularly. But >> somehow sending out a (heartbeat) messages via proxy while >> simultaneously receiving incoming (application) messages seems to >> break the connection. I verified with the counter party that they >> never receive my heartbeat in such a case and I stop receiving their >> messages as well. This behavior doesn't happen when we connect >> directly without proxy, but it happens with the proxy in my >> development environment as well as with the proxy at the client's >> production environment (where a direct connection is not possible). >> >> The end of the log basically looks like this: >> >> 2020-11-25 17:12:24.448 INFO 19036 --- [ssage Processor] >> c.u.fix_adapter.fix.FixApp : fromApp: >> 8=FIXT.1.19=31235=634=819143=Y49=<REPLACED>52=20201125-16:12:14.78156=<REPLACED>115=<REPLACED>122=20201125-07:00:10.209142=MADRID15=EUR22=423=Tag23-720127=2500000028=N44=97.78248=ES037015200354=255=ALRSRU >> 0 >> 01/11/202060=20201125-07:00:07.103235=MATURITY854=0199=1104=E2529=12530=32531=-126.22532=310=136 >> 2020-11-25 17:12:24.448 INFO 19036 --- [ssage Processor] >> c.u.fix_adapter.fix.FixApp : fromApp: >> 8=FIXT.1.19=32735=634=819243=Y49=<REPLACED>52=20201125-16:12:14.78156=<REPLACED>115=<REPLACED>122=20201125-07:00:10.210142=MADRID15=EUR22=423=Tag23-720227=220000028=N44=98.45248=ES036179601654=255=ALRSRU >> 0 >> 01/11/202060=20201125-07:00:07.103235=MATURITY236=0.058488653854=0199=1104=E2529=12530=32531=-130.32532=310=116 >> 2020-11-25 17:12:24.448 INFO 19036 --- [ssage Processor] >> c.u.fix_adapter.fix.FixApp : fromApp: >> 8=FIXT.1.19=32735=634=819343=Y49=<REPLACED>52=20201125-16:12:14.78156=<REPLACED>115=<REPLACED>122=20201125-07:00:10.211142=MADRID15=EUR22=423=Tag23-720327=500000028=N44=100.1548=ES030524800954=255=ALRSRU >> 0 >> 01/11/202060=20201125-07:00:07.103235=MATURITY236=0.124030151854=0199=1104=E2529=12530=32531=-124.82532=110=067 >> 2020-11-25 17:12:24.456 INFO 19036 --- [ssage Processor] >> c.u.fix_adapter.fix.FixApp : fromApp: >> 8=FIXT.1.19=32735=634=819443=Y49=<REPLACED>52=20201125-16:12:14.78156=<REPLACED>115=<REPLACED>122=20201125-07:00:10.211142=MADRID15=EUR22=423=Tag23-720427=550000028=N44=99.75248=ES037015102154=255=ALRSRU >> 0 >> 01/11/202060=20201125-07:00:07.103235=MATURITY236=0.013007278854=0199=1104=E2529=12530=32531=-126.52532=310=100 >> 2020-11-25 17:12:24.456 INFO 19036 --- [ssage Processor] >> c.u.fix_adapter.fix.FixApp : fromApp: >> 8=FIXT.1.19=32635=634=819543=Y49=<REPLACED>52=20201125-16:12:14.78156=<REPLACED>115=<REPLACED>122=20201125-07:00:10.213142=MADRID15=EUR22=423=Tag23-720527=60000028=N44=96.99748=ES037798400254=255=ALRSRU >> 0 >> 01/11/202060=20201125-07:00:07.103235=MATURITY236=0.103649457854=0199=1104=E2529=12530=32531=-126.82532=310=093 >> 2020-11-25 17:12:24.456 INFO 19036 --- [ssage Processor] >> c.u.fix_adapter.fix.FixApp : fromApp: >> 8=FIXT.1.19=31235=634=819643=Y49=<REPLACED>52=20201125-16:12:14.78156=<REPLACED>115=<REPLACED>122=20201125-07:00:10.213142=MADRID15=EUR22=423=Tag23-720627=2500000028=N44=99.68148=ES037798800354=255=ALRSRU >> 0 >> 01/11/202060=20201125-07:00:07.103235=MATURITY854=0199=1104=E2529=12530=32531=-124.12532=310=162 >> 2020-11-25 17:12:24.456 INFO 19036 --- [ssage Processor] >> c.u.fix_adapter.fix.FixApp : fromApp: >> 8=FIXT.1.19=35835=634=819743=Y49=<REPLACED>52=20201125-16:12:14.78156=<REPLACED>115=<REPLACED>122=20201125-07:00:10.213142=MADRID15=EUR22=423=Tag23-720727=200000028=N44=98.91448=XS105765983854=255=ALRSRU >> 0 >> 01/11/202060=20201125-07:00:07.103218=0.31235=MATURITY236=3.895172789699=XS1057659838761=4854=0199=1104=E2529=12530=32531=332.92532=110=221 >> 2020-11-25 17:12:24.456 INFO 19036 --- [ssage Processor] >> c.u.fix_adapter.fix.FixApp : fromApp: >> 8=FIXT.1.19=35735=634=819843=Y49=<REPLACED>52=20201125-16:12:14.78156=<REPLACED>115=<REPLACED>122=20201125-07:00:10.213142=MADRID15=EUR22=423=Tag23-720827=870000028=N44=104.81748=ES031497023954=255=ALRSRU >> 0 >> 01/11/202060=20201125-07:00:07.103218=93.0235=MATURITY236=0.04773846699=DE0001141687761=4854=0199=1104=E2529=12530=32531=20.32532=310=064 >> 2020-11-25 17:12:24.456 INFO 19036 --- [ssage Processor] >> c.u.fix_adapter.fix.FixApp : fromApp: >> 8=FIXT.1.19=34635=634=819943=Y49=<REPLACED>52=20201125-16:12:14.78156=<REPLACED>115=<REPLACED>122=20201125-07:00:10.213142=MADRID15=EUR22=423=Tag23-720927=118526428=N44=103.59348=EU000A1U982954=255=ALUFP >> 6.45 >> 03/15/202960=20201125-07:00:07.103218=30.48235=MATURITY699=DE0001141703761=4854=0199=1104=E2529=12530=32531=-42.32532=310=065 >> 2020-11-25 17:12:24.480 INFO 19036 --- [ssage Processor] >> c.u.fix_adapter.fix.FixApp : toAdmin: >> 8=FIXT.1.19=6635=034=5049=<REPLACED>52=20201125-16:12:24.48056=<REPLACED>10=191 >> 2020-11-25 17:12:24.622 INFO 19036 --- [ scheduling-1] >> c.unitek.fix_adapter.ScheduledTasks : Saved [400] IOI messages. >> >> I get 1000s of incoming messages that I printed out for debugging >> purposes as "fromApp" log statements....and then there is the first >> heartbeat that I send out, logged as "toAdmin", after the configured >> interval (e.g. 30s after logging on). After that the incoming >> messages immediately stop, as if the outgoing heartbeat message >> caused a disconnect of the connection, but in a way that Quickfix/j >> doesn't even realize that something is wrong because no "recover >> attempt" is done which I usually see when the counter party is e.g. >> really offline for a time, then Quickfix/j usually tries to reconnect >> until the connection can be re-established. But in case of this >> proxy-disconnect, the application is then in a state that Quickfix/j >> seems not to be able to get out of again. But if I send out a >> heartbeat message in a time where there are no incoming messages >> simultaneously, then the heartbeat goes through just fine and doesn't >> cause a disconnect, so only when both incoming and outgoing messages >> occur together, this seems to be an issue, as if the proxy were a >> one-way-street that shuts down if there is traffic from both sides at >> the same time. >> >> The proxy settings used in my test environment initiator.cfg file are >> as follows: >> >> ProxyHost=194.191.110.6 ProxyPassword=test ProxyPort=3128 >> ProxyType=http ProxyUser=test ProxyVersion=1.1 >> >> There are also config parameters ProxyDomain and ProxyWorkstation >> available in Quickfix/j, but I didn't find out what they are used for. >> >> I implemented my own workaround now be initiating a logout/logon >> after not receiving an incoming message for 3 minutes, but that only >> works because we know that we basically get a constant stream of >> messages, so when we receive nothing for 3 minutes something must be >> wrong, but that's more a workaround then a solution, same as setting >> heartbeat interval to 100'000 to avoid getting into this situation in >> the first place. >> >> Regards, Florian >> >> PS: The newer answer in this mail-thread from Andrew Marlow seems to >> belong to another topic altogether... >> >> >> On 02.12.2020 15:28, Christoph John wrote: >>> Hi, >>> >>> can you share some message and event logs? >>> Are you saying that the connection only breaks when a Heartbeat >>> *needs* to be sent, i.e. when there is always traffic the connection >>> works, but in quiet times the connection will break? >>> >>> Cheers, >>> Chris. >>> >>> >>> On 24.11.20 09:57, Florian Leu wrote: >>>> QuickFIX/J Documentation: http://www.quickfixj.org/documentation/ >>>> QuickFIX/J Support: http://www.quickfixj.org/support/ >>>> >>>> >>>> Hi everyone >>>> >>>> I have a weird issue that my quickfix initiator works fine when >>>> connecting directly to the message supplier, but as soon as we >>>> connect via an HTTP proxy, it works for the first n seconds and >>>> then the messages stop arriving. >>>> >>>> I've now managed to trace the problem to the fact that those n >>>> seconds are exactly what is configured for the heartbeat interval, >>>> i.e. if I set this heartbeat interval higher, then it works longer >>>> and if I set it e.g. to 5, then it already stops after 5 seconds. >>>> >>>> At first I assumed the counter-party just closes the connection if >>>> they don't receive the heartbeat on time, but then I learned that >>>> after a missing heartbeat message first a test request is sent out >>>> and only after that a connection would be closed. So it seems to me >>>> that actually the sending out of the heartbeat from the Quickfix/j >>>> library (version 2.2.0) via proxy somehow seems to block the >>>> incoming traffic, as though the proxy was a one way street. Since >>>> the logon message exchange works, I know that communicating both >>>> ways actually works, but it seems that sending out a heartbeat >>>> while simultaneously processing an incoming stream of messages >>>> seems to somehow cut of this incoming stream for good. I can >>>> literally see in my logs that the incoming messages are processed >>>> via the fromApp method exactly up to the point that my first >>>> heartbeat is sent out, so logging this heartbeat via the toAdmin >>>> method is always the last thing logged before the communication >>>> stops altogether....no new incoming or outgoing messages are logged >>>> until the software is restarted. >>>> >>>> Has anyone come across a similar issue? And is there a solution for >>>> it? For the moment we think about just disabling the heartbeat >>>> messages altogether (that seems to work for now), but that it more >>>> of a hack workaround in my opinion and not an actual solution. >>>> >>>> Thanks for your support and regards, >>>> Florian >>>> >>>> >>>> >>>> _______________________________________________ >>>> Quickfixj-users mailing list >>>> Qui...@li... >>>> https://lists.sourceforge.net/lists/listinfo/quickfixj-users >>> >>> -- >>> Christoph John >>> Software Engineering >>> T +49 241 557080-28 >>> chr...@ma... >>> >>> MACD GmbH >>> Oppenhoffallee 103 >>> 52066 Aachen, Germany >>> www.macd.com >>> >>> Amtsgericht Aachen: HRB 8151 >>> Ust.-Id: DE 813021663 >>> Geschäftsführer: George Macdonald >> >> >> _______________________________________________ >> Quickfixj-users mailing list >> Qui...@li... >> https://lists.sourceforge.net/lists/listinfo/quickfixj-users > -- > Colin DuPlantis > Chief Architect, Marketcetera > Download, Run, Trade > 888.868.4884 > https://www.marketcetera.com > > > _______________________________________________ > Quickfixj-users mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfixj-users |