Hello Sean,

We are using a File store for the message store. Our app makes a call to setrlimit and sets the fdlimit to 2048. Currently our app is running with 1301 fds being used without problems.

Thanks.

On 3/21/06, Sean Kirkpatrick <sean.kirkpatrick@pipelinefinancial.com> wrote:
What are you using for the message store?  If the acceptor is using files, then I believe 4 file descriptors are created per session (seqnums, session, body, and header).  Throw a socket in per session and you are most likely going over the 1024 FD_SET limit with 288 sessions...
 
--Sean
-----Original Message-----
From: quickfix-developers-admin@lists.sourceforge.net [mailto:quickfix-developers-admin@lists.sourceforge.net]On Behalf Of James Reed
Sent: Tuesday, March 21, 2006 11:50 AM
To: quickfix-developers@lists.sourceforge.net
Subject: [Quickfix-developers] crashes while under moderate message load

Hello all,

We've experienced unexplained crashes where no core dumps are generated nor is there anything in the logs to indicate what the problem is. We have only noticed that the crashes tend to occur when a large number of clients attempt to connect at once or if a significant portion of the connected clients try to send messages in a short span of time.

We are using QuickFIX 1.10.2, with g++ 3.2.3, on RHEL AS Release 3. Our application uses one Initiator and one Acceptor. The Acceptor is configured with 288 Sessions. Our application is also configured to use a StdOutLogger to minimize usage of file descriptors. The fdlimit for the user account running the process is 2048. I know there is not much to go on from the information presented, but does anyone have any ideas about what could possibly happening here?

Thanks.

We've run the application in a debugger and found the following upon crashing:

Stack Traces

Incident #1:

#0  0x00d22b60 in pthread_detach () from /lib/tls/libpthread.so.0
#1  0x00a4dc3b in FIX::thread_detach (thread=0) at Utility.cpp:344
#2  0x00a07334 in FIX::ThreadedSocketAcceptor::removeThread (this=0x837d210, s=1216)
    at stl_map.h:221
#3  0x00a07453 in FIX::ThreadedSocketAcceptor::socketThread (p=0x6ea61a10)
    at ThreadedSocketConnection.h:51
#4  0x00d21dec in start_thread () from /lib/tls/libpthread.so.0
#5  0x005a8a2a in clone () from /lib/tls/libc.so.6

Incident #2:
 
#0  0x00a8bb60 in pthread_detach () from /lib/tls/libpthread.so.0
#1  0x00de7c3b in FIX::thread_detach (thread=0) at Utility.cpp:344
#2  0x00da1334 in FIX::ThreadedSocketAcceptor::removeThread (this=0xb6a4f448, s=1170)
    at stl_map.h:221
#3  0x00da1453 in FIX::ThreadedSocketAcceptor::socketThread (p=0xa95f8a10)
    at ThreadedSocketConnection.h:51
#4  0x00a8adec in start_thread () from /lib/tls/libpthread.so.0
#5  0x00595a2a in clone () from /lib/tls/libc.so.6

Incident #3:
 
#0  0x0027e6d7 in std::string::_Rep::_M_grab () from /usr/lib/libstdc++.so.5
#1  0x0027e81c in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string () from /usr/lib/libstdc++.so.5
#2  0x08077b56 in FieldBase (this=0x92e9b68, _ctor_arg=@0x9379608) at SessionID.h:45
#3  0x08077afa in StringField (this=0x92e9b68, _ctor_arg=@0x9379608) at SessionID.h:45
#4  0x08077a2e in BeginString (this=0x92e9b68, _ctor_arg=@0x9379608) at SessionID.h:45
#5  0x0807782f in SessionID (this=0x92e9b68, _ctor_arg=@0x9379608) at SessionID.h:119
#6  0x00f158bb in std::_Rb_tree<FIX::SessionID, FIX::SessionID, std::_Identity<FIX::SessionID>, std::less<FIX::SessionID>, std::allocator<FIX::SessionID> >::_M_copy (
    this=0xb75dea20, __x=0x93795f8, __p=0x92e9af8) at new:89
#7  0x00f15df0 in _Rb_tree (this=0xb75dea20, __x=@0xbfff97ec) at stl_tree.h:648
#8  0x00f14189 in FIX::Initiator::isLoggedOn (this=0xbfff97d0) at stl_set.h:131
#9  0x00f19961 in FIX::SocketInitiator::onStart (this=0xbfff97d0)
    at SocketInitiator.cpp:86
#10 0x00f142e7 in FIX::Initiator::startThread (p=0xb75de8c0) at Initiator.cpp:243
#11 0x0058fdec in start_thread () from /lib/tls/libpthread.so.0
#12 0x00393a2a in clone () from /lib/tls/libc.so.6