Thread: [Quickfix-developers] System clock issues
Brought to you by:
orenmnero
From: John H. <jr...@ya...> - 2008-07-30 16:07:12
|
I recently had two different scenarios occur which impacted my quickfix implementation, and I thought I'd describe them here in case anybody else runs into a similar situation, as well as to inquire if anybody has any suggestions on how best to try to handle either situation the next time it occurs. By way of background, we've created an algorithmic high-frequency trading system that utilizes the latest out-of-the-box quickfix implementation. The system is written in .NET 2.0. We have a FIX "router" which is a gateway to 2 direct exchanges and 2 brokers (one US, one LatAm). The router serves about 20 client implementations here in the office. All clients are running on Windows XP. Scenario #1 ("My life in IT would be much easier without Users"): One of my traders starts screaming that his trading application has gone nuts. Investigation shows that his system clock is not set to the proper date. Further digging reveals that the trader in question was "bored, and just playing with the system clock. Why would that affect our trading system?" Diagnosis: Quickfix discovered that the date had changed on the system clock and reset the daily session, setting incoming and outgoing sequence numbers back to 1. This caused an immediate disconnect from the FIX router and subsequent reconnects failed due to sequence number mismatches. >From my perspective, quickfix behaved exactly as it should in this scenario. Amusingly enough, the same scenario re-occurred with a different trader a few weeks later. I can't resist sharing the explanation from the trader: "I was online reserving a car for my vacation and when it asked me to enter a date for pickup, I thought that was where I should enter it". Priceless. Scenario #2: ("#^$&# Windows...") Today one of our traders suddenly experienced a disconnect from our FIX router in the middle of a large sequence of trades. The loss of connection was not a welcome event at that moment. After much digging, we finally discovered that Windows had decided on its own to initiate its weekly clock sync to internet time. Because of hardware vagaries, this workstation's system time had drifted a little over two minutes since its last sync. Upon resetting the system time, quickfix determined that too much time had elapsed during a heartbeat request (send it at 11:06:58 and two seconds later get a response but the system time is now 11:09:23) so it initiated a disconnect on timeout. Quickfix then automatically (and successfully) tried to log on as determined in my config file settings, but the market had slipped on the trades in progress so the experience was considered a big problem. In both cases I believe that quickfix behaved exactly as expected. I suppose the reason I'm writing all this is to see if perhaps anybody has any suggestions on 1) how to perhaps handle either of these scenarios more gracefully within my code, or 2) how to perhaps try to prevent similar situations from occuring from an IT management perspective (workstation lockdown, etc.). If anybody has any thoughts on the topic I'd love to hear them. (And before you suggest it, I asked: Getting rid of all my users is not an option....) Many thanks, John |
From: Shane T. <str...@co...> - 2008-07-30 16:12:58
|
John, It might be worth it to configure QuickFIX with "CheckLatency=N" to remove time validation, though I don't think this is recommended unless your counterparty uses local time instead of GMT or something similar. -- Shane Trotter Connamara Systems, LLC On Wed, Jul 30, 2008 at 11:07 AM, John Haldi <jr...@ya...> wrote: > QuickFIX Documentation: > http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX Support: http://www.quickfixengine.org/services.html > > > I recently had two different scenarios occur which impacted my quickfix > implementation, and I thought I'd describe them here in case anybody else > runs into a similar situation, as well as to inquire if anybody has any > suggestions on how best to try to handle either situation the next time it > occurs. > > By way of background, we've created an algorithmic high-frequency trading > system that utilizes the latest out-of-the-box quickfix implementation. The > system is written in .NET 2.0. We have a FIX "router" which is a gateway to > 2 direct exchanges and 2 brokers (one US, one LatAm). The router serves > about 20 client implementations here in the office. All clients are running > on Windows XP. > > Scenario #1 ("My life in IT would be much easier without Users"): > > One of my traders starts screaming that his trading application has gone > nuts. Investigation shows that his system clock is not set to the proper > date. Further digging reveals that the trader in question was "bored, and > just playing with the system clock. Why would that affect our trading > system?" Diagnosis: Quickfix discovered that the date had changed on the > system clock and reset the daily session, setting incoming and outgoing > sequence numbers back to 1. This caused an immediate disconnect from the > FIX router and subsequent reconnects failed due to sequence number > mismatches. > > From my perspective, quickfix behaved exactly as it should in this > scenario. Amusingly enough, the same scenario re-occurred with a different > trader a few weeks later. I can't resist sharing the explanation from the > trader: "I was online reserving a car for my vacation and when it asked me > to enter a date for pickup, I thought that was where I should enter it". > Priceless. > > Scenario #2: ("#^$&# Windows...") > > Today one of our traders suddenly experienced a disconnect from our FIX > router in the middle of a large sequence of trades. The loss of connection > was not a welcome event at that moment. After much digging, we finally > discovered that Windows had decided on its own to initiate its weekly clock > sync to internet time. Because of hardware vagaries, this workstation's > system time had drifted a little over two minutes since its last sync. Upon > resetting the system time, quickfix determined that too much time had > elapsed during a heartbeat request (send it at 11:06:58 and two seconds > later get a response but the system time is now 11:09:23) so it initiated a > disconnect on timeout. Quickfix then automatically (and successfully) tried > to log on as determined in my config file settings, but the market had > slipped on the trades in progress so the experience was considered a big > problem. > > In both cases I believe that quickfix behaved exactly as expected. I > suppose the reason I'm writing all this is to see if perhaps anybody has any > suggestions on 1) how to perhaps handle either of these scenarios more > gracefully within my code, or 2) how to perhaps try to prevent similar > situations from occuring from an IT management perspective (workstation > lockdown, etc.). If anybody has any thoughts on the topic I'd love to hear > them. > > (And before you suggest it, I asked: Getting rid of all my users is not an > option....) > > Many thanks, > > John > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Quickfix-developers mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfix-developers > |
From: John H. <jo...@ha...> - 2008-07-30 16:18:44
|
Understood, and thank you. I'm hesitant to try this for fear that its a case of throwing the baby out with the bath water. I've just now the following line in my app at startup: Shell("w32tm /resync", AppWinStyle.Hide, False) This should in theory force a resync, which ought to prevent an inadvertent sync during trading hours since Windows then reschedules the next sync for a week later.... jh _____ From: Shane Trotter [mailto:str...@co...] Sent: Wednesday, July 30, 2008 12:13 PM To: John Haldi Cc: qui...@li... Subject: Re: [Quickfix-developers] System clock issues John, It might be worth it to configure QuickFIX with "CheckLatency=N" to remove time validation, though I don't think this is recommended unless your counterparty uses local time instead of GMT or something similar. -- Shane Trotter Connamara Systems, LLC On Wed, Jul 30, 2008 at 11:07 AM, John Haldi <jr...@ya...> wrote: QuickFIX Documentation: http://www.quickfixengine.org/quickfix/doc/html/index.html QuickFIX Support: http://www.quickfixengine.org/services.html I recently had two different scenarios occur which impacted my quickfix implementation, and I thought I'd describe them here in case anybody else runs into a similar situation, as well as to inquire if anybody has any suggestions on how best to try to handle either situation the next time it occurs. By way of background, we've created an algorithmic high-frequency trading system that utilizes the latest out-of-the-box quickfix implementation. The system is written in .NET 2.0. We have a FIX "router" which is a gateway to 2 direct exchanges and 2 brokers (one US, one LatAm). The router serves about 20 client implementations here in the office. All clients are running on Windows XP. Scenario #1 ("My life in IT would be much easier without Users"): One of my traders starts screaming that his trading application has gone nuts. Investigation shows that his system clock is not set to the proper date. Further digging reveals that the trader in question was "bored, and just playing with the system clock. Why would that affect our trading system?" Diagnosis: Quickfix discovered that the date had changed on the system clock and reset the daily session, setting incoming and outgoing sequence numbers back to 1. This caused an immediate disconnect from the FIX router and subsequent reconnects failed due to sequence number mismatches. >From my perspective, quickfix behaved exactly as it should in this scenario. Amusingly enough, the same scenario re-occurred with a different trader a few weeks later. I can't resist sharing the explanation from the trader: "I was online reserving a car for my vacation and when it asked me to enter a date for pickup, I thought that was where I should enter it". Priceless. Scenario #2: ("#^$&# Windows...") Today one of our traders suddenly experienced a disconnect from our FIX router in the middle of a large sequence of trades. The loss of connection was not a welcome event at that moment. After much digging, we finally discovered that Windows had decided on its own to initiate its weekly clock sync to internet time. Because of hardware vagaries, this workstation's system time had drifted a little over two minutes since its last sync. Upon resetting the system time, quickfix determined that too much time had elapsed during a heartbeat request (send it at 11:06:58 and two seconds later get a response but the system time is now 11:09:23) so it initiated a disconnect on timeout. Quickfix then automatically (and successfully) tried to log on as determined in my config file settings, but the market had slipped on the trades in progress so the experience was considered a big problem. In both cases I believe that quickfix behaved exactly as expected. I suppose the reason I'm writing all this is to see if perhaps anybody has any suggestions on 1) how to perhaps handle either of these scenarios more gracefully within my code, or 2) how to perhaps try to prevent similar situations from occuring from an IT management perspective (workstation lockdown, etc.). If anybody has any thoughts on the topic I'd love to hear them. (And before you suggest it, I asked: Getting rid of all my users is not an option....) Many thanks, John ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100 <http://moblin-contest.org/redirect.php?banner_id=100&url=/> &url=/ _______________________________________________ Quickfix-developers mailing list Qui...@li... https://lists.sourceforge.net/lists/listinfo/quickfix-developers No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.138 / Virus Database: 270.5.7/1581 - Release Date: 7/30/2008 6:56 AM |