Thread: [Quickfix-developers] Intermittent disconnects on Solaris
Brought to you by:
orenmnero
From: <kri...@rb...> - 2005-03-21 19:02:41
|
I have isolated the cause of the intermittent disconnects caused by QuickFI= X on Solaris. The problem was due to the use of the I_NREAD ioctl to determ= ine whether a readable socket was EOF or not. In certain circumstances which seem to involve high network traffic and low= machine load, I_NREAD will return zero for a readable socket that actually= has data. In such cases, QuickFIX would erroneously close the socket. I replaced the I_NREAD code in socket_disconnected() one byte recv() with t= he MSG_PEEK flag. This appears to have resolved this rather troublesome iss= ue for us in production. I will send patches against 1.9.4 for this fix, as well as other changes re= quired to get QuickFIX to compile on Solaris/SunPRO 5.3 w/Roguewave STL, af= ter I have tested them with SunPRO and the default STL, as well as gcc on S= olaris. Regards, - Kris Utility.cpp =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D OLD --- bool socket_disconnected( int s ) { QF_STACK_PUSH(socket_disconnected) unsigned long read; #ifdef _MSC_VER ioctlsocket( s, FIONREAD, &read ); #elif defined(USING_STREAMS) ioctl( s, I_NREAD, &read ); #else ioctl( s, FIONREAD, &read ); #endif return read =3D=3D 0; QF_STACK_POP } NEW --- bool socket_disconnected( int s ) { QF_STACK_PUSH(socket_disconnected) #if defined(_MSC_VER) || !defined(USING_STREAMS) unsigned long read; #ifdef _MSC_VER ::ioctlsocket( s, FIONREAD, &read ); #else ::ioctl( s, FIONREAD, &read ); #endif return read =3D=3D 0; #elif defined(USING_STREAMS) char byte; return ::recv (s, &byte, sizeof (byte), MSG_PEEK) <=3D 0; #endif QF_STACK_POP } <font face=3D"Times New Roman" size=3D"3"> <p>------------------------------------------------------------------------= ------</p> <p> This email is intended only for the use of the individual(s) to whom it= is addressed and may be privileged and confidential. Unauthorised use or d= isclosure is prohibited. If you receive this e-mail in error, please advise= immediately and delete the original message. This message may have been al= tered without your or our knowledge and the sender does not accept any liab= ility for any errors or omissions in the message.</p> <p>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D</p> </font> |
From: Caleb E. <cal...@gm...> - 2005-03-21 19:20:36
|
On Mon, 21 Mar 2005 19:02:19 -0000, kri...@rb... <kri...@rb...> wrote: > I have isolated the cause of the intermittent disconnects caused by QuickFIX on Solaris. The problem was due to the use of the I_NREAD ioctl to determine whether a readable socket was EOF or not. > > In certain circumstances which seem to involve high network traffic and low machine load, I_NREAD will return zero for a readable socket that actually has data. In such cases, QuickFIX would erroneously close the socket. > > I replaced the I_NREAD code in socket_disconnected() one byte recv() with the MSG_PEEK flag. This appears to have resolved this rather troublesome issue for us in production. > Shouldn't QuickFIX just rely on recv returning 0 to detect a socket disconnect, instead of relying on this ioctl? I've never seen a socket-based application using this technique to detect disconnects before. Clearly its not 100% reliable. -- Caleb Epstein caleb dot epstein at gmail dot com |
From: Oren M. <or...@qu...> - 2005-03-21 19:22:48
|
Yeah, I agree. --oren ----- Original Message ----- From: "Caleb Epstein" <cal...@gm...> To: <kri...@rb...> Cc: <qui...@li...>; <Bar...@gs...> Sent: Monday, March 21, 2005 1:20 PM Subject: Re: [Quickfix-developers] Intermittent disconnects on Solaris > QuickFIX Documentation: > http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX FAQ: http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ > QuickFIX Support: http://www.quickfixengine.org/services.html > > On Mon, 21 Mar 2005 19:02:19 -0000, kri...@rb... > <kri...@rb...> wrote: > >> I have isolated the cause of the intermittent disconnects caused by >> QuickFIX on Solaris. The problem was due to the use of the I_NREAD ioctl >> to determine whether a readable socket was EOF or not. >> >> In certain circumstances which seem to involve high network traffic and >> low machine load, I_NREAD will return zero for a readable socket that >> actually has data. In such cases, QuickFIX would erroneously close the >> socket. >> >> I replaced the I_NREAD code in socket_disconnected() one byte recv() with >> the MSG_PEEK flag. This appears to have resolved this rather troublesome >> issue for us in production. >> > > Shouldn't QuickFIX just rely on recv returning 0 to detect a socket > disconnect, instead of relying on this ioctl? I've never seen a > socket-based application using this technique to detect disconnects > before. Clearly its not 100% reliable. > > -- > Caleb Epstein > caleb dot epstein at gmail dot com > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > _______________________________________________ > Quickfix-developers mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfix-developers > |