[Quickfix-developers] Intermittent disconnect problem
Brought to you by:
orenmnero
From: Bishop, B. <Bar...@gs...> - 2004-11-29 15:32:29
|
Hi, I have a problem with a FIX session. The quickfix engine seems to randomly, but infrequently disconnect itself from its peer. The message 'Disconnecting' appears in the event log with no other useful information. This can happen a few times a day or not happen for a week and is completely unrelated to how busy it is. Background: We are using quickfix on a multi-processor machine running solaris 5.8 We have been using quickfix 1.4.0 since that version was released. We had big problems building this on solaris (and still have) and so settled for our own build script that combines all the engine source and JNI layer in to one big binary. This binary has been in production for two years (approx) for two of our client FIX connections and has worked perfectly. The same binary has now been in use with a third connection to a different client and has shown the above problem. Detailed analysis of many packet sniffing captures shows nothing wrong with the TCP session. In fact the session is closed gracefully by quickfix in the usual FIN-ACK manner (not a reset). Immediately after disconnecting it tries to reconnect and usually manages this straightaway with only a few seconds outage. However, sometimes it can't get itself sorted out and it gets in to a cycle that involves the message "logon response received before sending logon" and this can take a long time (minutes) to stabilise. I've looked through lots of mailing-list threads, but can't find anything similar to the disconnect problem. However, I did find plenty about problems with reconnecting and getting in to hard-to-break-out-of loops. Consequently, I downloaded and built version 1.9.2 (again with my own build script) and I've been testing with this. I was rather disappointed to find that on killing the peer quickfix still logs just the message "Disconnecting". I was hoping for more detailed information as this was mentioned as an enhancement in one of the release notes. So questions for anyone who has struggled to the end of this rather long email (sorry) are: 1) Anyone have any idea what's going on? 2) Is there a way to increase the amount of detail in log messages, especially those to do with disconnection events? 3) What sort of thing would cause quickfix to disconnect without saying why? Thanks in advance, barry PS I like the new bug-tracker. I'd be happy to get involved with the JAVA port. I would guess that a lot of people would want this. |