RE: [Quickfix-developers] Intermittent disconnects on Solaris
Brought to you by:
orenmnero
From: <kri...@rb...> - 2005-03-22 12:06:20
|
Oren, Caleb - When debugging this problem, I was quite surprised by QuickFIX's socket han= dling. Despite the use of select(), non-blocking I/O is not actually used w= hich seems to be why platform-specific ioctl's are used (to figure out how = much you can read with out blocking). As for using recv()/read() =3D=3D 0 to determine EOF, that fact that the ac= tual socket reads and writes are scattered among various functions and meth= ods of unrelated classes makes this quite harder to do. It also complicates= error reporting (i.e. errno on failed reads/writes) and disconnection reas= ons that I've seen people rquesting on the list. There is a SocketConnection class which 'owns' a connected socket and there= are functions in Utility.cpp for doing socket I/O, yet grep'ing for recv()= calls returns two calls to the syscall recv(), and neither one is in eithe= r SocketConnection or in Utility.cpp. FIX::Parser's readFromStream()'s meth= od seems an especially dubious place to put recv(); the fact that FIX::Pars= er has a member varible for holding a file descriptor as well a pointer to = an std::istream also strikes me as wrong. lsdev2:~/work/quickfix/src/C++ $ grep -n -1 recv *.cpp *.h Parser.cpp-144- Parser.cpp:145: size =3D recv( m_socket, m_readBuffer, m_bufferSize, 0 ); Parser.cpp-146- if ( size <=3D 0 ) throw RecvFailed(); -- ThreadedSocketConnection.cpp-96- buffer =3D new char[ bytes + 1 ]; ThreadedSocketConnection.cpp:97: int result =3D recv( m_socket, buffer, = bytes, 0 ); ThreadedSocketConnection.cpp-98- if ( result <=3D 0 ) { throw std::excep= tion(); } During my attempts to recreate the Solaris disconnect problem, I wrote samp= le QuickFIX apps that would send torrents of messages. It was quite easy to= lock up two QuickFIX apps talking to each other during a reconnect, both s= ides would requests resends from each other and then the SocketInitiator an= d SocketAcceptor threads in the apps would block on send() when resending t= he messages to each other, as send() would not return until the other side = called recv(), which it could not do until the other side called recv(), et= c. The apps would then sleep forever. This problem can only be avoided by using non-block I/O or having one threa= d for reading and another for writing, neither of which SocketInitiator or = SocketAcceptor do (I haven't looked at ThreadedSocketInitiator). Addressing this would be a fair amount of work, however the patch above has= eliminated the disconnects which were causing us major headaches as they t= ended to happen at around market close. Regards, - Kris -----Original Message----- From: Oren Miller [mailto:or...@qu...] Sent: 21 March 2005 19:23 To: Caleb Epstein; Peterson, Kristofer Cc: qui...@li...; Bar...@gs... Subject: Re: [Quickfix-developers] Intermittent disconnects on Solaris Yeah, I agree. --oren ----- Original Message -----=20 From: "Caleb Epstein" <cal...@gm...> To: <kri...@rb...> Cc: <qui...@li...>; <Bar...@gs...> Sent: Monday, March 21, 2005 1:20 PM Subject: Re: [Quickfix-developers] Intermittent disconnects on Solaris > QuickFIX Documentation:=20 > http://www.quickfixengine.org/quickfix/doc/html/index.html > QuickFIX FAQ: http://www.quickfixengine.org/wikifix/index.php?QuickFixFAQ > QuickFIX Support: http://www.quickfixengine.org/services.html > > On Mon, 21 Mar 2005 19:02:19 -0000, kri...@rb... > <kri...@rb...> wrote: > >> I have isolated the cause of the intermittent disconnects caused by=20 >> QuickFIX on Solaris. The problem was due to the use of the I_NREAD ioctl= >> to determine whether a readable socket was EOF or not. >> >> In certain circumstances which seem to involve high network traffic and= >> low machine load, I_NREAD will return zero for a readable socket that= >> actually has data. In such cases, QuickFIX would erroneously close the= >> socket. >> >> I replaced the I_NREAD code in socket_disconnected() one byte recv() wit= h=20 >> the MSG_PEEK flag. This appears to have resolved this rather troublesome= >> issue for us in production. >> > > Shouldn't QuickFIX just rely on recv returning 0 to detect a socket > disconnect, instead of relying on this ioctl? I've never seen a > socket-based application using this technique to detect disconnects > before. Clearly its not 100% reliable. > > --=20 > Caleb Epstein > caleb dot epstein at gmail dot com > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://ads.osdn.com/?ad_id=3D6595&alloc_id=3D14396&op=3Dclick > _______________________________________________ > Quickfix-developers mailing list > Qui...@li... > https://lists.sourceforge.net/lists/listinfo/quickfix-developers >=20 <font face=3D"Times New Roman" size=3D"3"> <p>------------------------------------------------------------------------= ------</p> <p> This email is intended only for the use of the individual(s) to whom it= is addressed and may be privileged and confidential. Unauthorised use or d= isclosure is prohibited. If you receive this e-mail in error, please advise= immediately and delete the original message. This message may have been al= tered without your or our knowledge and the sender does not accept any liab= ility for any errors or omissions in the message.</p> <p>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D</p> </font> |