[Quickfix-developers] Reconnect failed if initiator and an acceptor are connect locally under Sun S
Brought to you by:
orenmnero
|
From: <ale...@el...> - 2007-01-22 15:07:50
|
Hello together,
> we believe that we've found an error in Quickfix Version 1.12.4.
>=20
> The error occurs under the following conditions: A Quickfix initiator =
and an acceptor are connect locally under Sun Solaris 5.10 with the =
Quickfix Engine V1.12.4 ( under Microsoft Windows the reconnect problem =
does not occur). With the acceptor started up first the initiator will =
successfully connect. This you can see in the abstract of the Fixengine =
Event Log in the lines 1 to 3. Teminating the acceptor program will lead =
to the disconnect (lines 4 to 6). With a reconnect intervall set to 30 =
seconds the initiator will try to connect again. Line 7 shows the last =
entry in the Event Log for this session because the engine will never =
try to reconnect again.
> The second abstract shows the same situation with the initiator and =
the acceptor running on different host machines. In this case you may =
see that a connect will succeed (line 17 to 19) after several =
unsuccessful reconnects (line 7 to 15).
>=20
--> We've analyzed this and found that in our opinion the reason for =
this erroneous behaviour is a not sufficient error handling in the =
funcions SocketInitiator::doConnect and SocketConnector::connect.=20
An error handling was added to the last file revision, so that the =
initiator application will continue to reconnect even if the reconnect =
failed (see the 3rd abstract at the end of the file).=20
> Any comments or suggestions on this would be appreciated?
>=20
> 1) Abstract of Fixengine Event Log when initiator and acceptor =
application are running on the same host machine ():
> 1 20070119-11:20:45 : Connecting to host1 on port 50002
> 2 20070119-11:20:45 : Initiated logon request
> 3 20070119-11:20:45 : Received logon response
> 4 20070119-11:20:58 : Received logout request
> 5 20070119-11:20:58 : Sending logout response
> 6 20070119-11:20:58 : Disconnecting
> 7 20070119-11:21:16 : Connecting to host1 on port 50002
> End of abstract.
>=20
> 2) Abstract of Fixengine Event Log when initiator and acceptor =
application are running on the different host machine ( I: host1 / A: =
host2 ):
> 1 20070119-16:08:51 : Connecting to host2 on port 50002
> 2 20070119-16:08:51 : Initiated logon request
> 3 20070119-16:08:51 : Received logon response
> 4 20070119-16:09:09 : Received logout request
> 5 20070119-16:09:09 : Sending logout response
> 6 20070119-16:09:09 : Disconnecting
> 7 20070119-16:09:22 : Connecting to host2 on port 50002
> 8 20070119-16:09:22 : Initiated logon request
> 9 20070119-16:09:22 : Socket Error: Connection refused
> 10 20070119-16:09:22 : Disconnecting
> 11 20070119-16:09:52 : Connecting to host2 on port 50002
> 12 20070119-16:09:52 : Initiated logon request
> 13 20070119-16:09:52 : Socket Error: Connection refused
> 14 20070119-16:09:52 : Disconnecting
> 15 20070119-16:10:22 : Connecting to host2 on port 50002
> 16 ....
> 17 20070119-16:13:22 : Connecting to host2 on port 50002
> 18 20070119-16:13:22 : Initiated logon request
> 19 20070119-16:13:22 : Received logon response
> 20 20070119-16:13:22 : Received ResendRequest FROM: 18 TO: 0
> End of abstract.
>=20
> 3) Modifications in SocketInitiator::doConnect (File =
quickfix/src/C++/SocketInitiator.cpp):
> // old implementation which does not contain correct error handling, =
quickfix/src/C++/SocketInitiator.cpp - Revision 1775
> /*
> bool SocketInitiator::doConnect( const SessionID& s, const Dictionary& =
d )
> { QF_STACK_PUSH(SocketInitiator::doConnect)
>=20
> try
> {
> std::string address;
> short port =3D 0;
> Session* session =3D Session::lookupSession( s );
> if( !session->isSessionTime() ) return false;
>=20
> Log* log =3D session->getLog();
>=20
> getHost( s, d, address, port );
>=20
> log->onEvent( "Connecting to " + address + " on port " + =
IntConvertor::convert((unsigned short)port) );
> int result =3D m_connector.connect( address, port, m_noDelay );>=20
> setPending( s );
>=20
> m_pendingConnections[ result ]=20
> =3D new SocketConnection( *this, s, result, =
&m_connector.getMonitor() );
>=20
> return true;
> }
> catch ( std::exception& ) { return false; }
>=20
> QF_STACK_POP
> }
> */
> =20
> // new implementation with correct error handling
> bool SocketInitiator::doConnect( const SessionID& s, const Dictionary& =
d )
> { QF_STACK_PUSH(SocketInitiator::doConnect)
>=20
> try
> {
> std::string address;
> short port =3D 0;
> Session* session =3D Session::lookupSession( s );
> if( !session->isSessionTime() ) return false;
>=20
> Log* log =3D session->getLog();
>=20
> getHost( s, d, address, port );
>=20
> log->onEvent( "Connecting to " + address + " on port " + =
IntConvertor::convert((unsigned short)port) );
> int result =3D m_connector.connect( address, port, m_noDelay );
> // added error handling ( Reconnect failed if initiator and an =
acceptor are connect locally under Sun Solaris )
> if (result =3D=3D -1 )
> {
> log->onEvent( "Error on Connecting to " + address + " on port " =
+ IntConvertor::convert((unsigned short)port) + ": " + strerror(errno) + =
"(" + IntConvertor::convert(errno) +")" );
> return false;
> }
> // end of modification
> =20
>=20
> setPending( s );
>=20
> m_pendingConnections[ result ]=20
> =3D new SocketConnection( *this, s, result, =
&m_connector.getMonitor() );
>=20
> return true;
> }
> catch ( std::exception& ) { return false; }
>=20
> QF_STACK_POP
> }
>=20
> 4) Modifications in SocketConnector::connect (File =
quickfix/src/C++/SocketConnector.cpp):
> // old implementation which does not contain correct error handling, =
quickfix/src/C++/SocketConnector.cpp - Revision 1637
> /*
> int SocketConnector::connect( const std::string& address, int port, =
bool noDelay )
> { QF_STACK_PUSH(SocketConnector::connect)
>=20
> int socket =3D socket_createConnector();
>=20
> if ( socket !=3D -1 )
> {
> if( noDelay )
> socket_setsockopt( socket, TCP_NODELAY );
> m_monitor.addConnect( socket );
> socket_connect( socket, address.c_str(), port );
> }
> return socket;
>=20
> QF_STACK_POP
> }
> */
>=20
> // new implementation with correct error handling
> int SocketConnector::connect( const std::string& address, int port, =
bool noDelay )
> { QF_STACK_PUSH(SocketConnector::connect)
>=20
> int socket =3D socket_createConnector();
>=20
> if ( socket !=3D -1 )
> {
> int retVal =3D 0;
> if( noDelay )
> retVal =3D socket_setsockopt( socket, TCP_NODELAY );
> =20
> if ( retVal =3D=3D 0 ) // everything fine so far
> {
> if ( socket_connect( socket, address.c_str(), port ) !=3D 0 && =
errno !=3D EINPROGRESS )=20
> {
> socket_close( socket ); =20
> return -1; =20
> }
>=20
> m_monitor.addConnect( socket );
> }
> else
> return -1;
> }
> return socket;
>=20
> QF_STACK_POP
> }
>=20
5) Abstract of modified Fixengine Event Log when initiator and acceptor =
application are running on the same host machine ( host1 ):
> 1 20070122-11:10:29 : Connecting to host1 on port 50002
> 2 20070122-11:10:29 : Initiated logon request
> 3 20070122-11:10:29 : Received logon response
> 4 20070122-11:10:42 : Received logout request
> 5 20070122-11:10:42 : Sending logout response
> 6 20070122-11:10:42 : Disconnecting
> 7 20070122-11:11:00 : Connecting to host1 on port 50002
> 8 20070122-11:11:00 : Error on Connecting to host1 on port 50002: =
Transport endpoint is not connected(134)
> 9 20070122-11:11:30 : Connecting to host1 on port 50002
> 10 20070122-11:11:30 : Error on Connecting to host1 on port 50002: =
Transport endpoint is not connected(134)
> 11 20070122-11:12:00 : Connecting to host1 on port 50002
> 12 20070122-11:12:00 : Initiated logon request>=20
> 13 20070122-11:12:00 : Received logon response
> End of abstract.
>=20
Bye
Alex
|