[Quickfix-developers] ThreadedSocketConnection: fix for memory growth under heavy load
Brought to you by:
orenmnero
|
From: Caleb E. <cal...@gm...> - 2005-10-18 22:23:45
|
I have spent the past couple of days torture-testing a QuickFIX C++ application with some simple test harnesses that bombard it with messages. In the processes of this testing, I noticed that my application was using a ridiculous amount of memory, many times larger than would be expected by th= e size of the test data. So I ran everything through valgrind (not much help = - no leaks detected), and eventually came upon Google's excellent "tcmalloc" library (see http://goog-perftools.sourceforge.net/), which can be used to dump statistics about heap usage. It turned out that 95% of the memory was taken up by the memory allocations done by ThreadedSocketConnection::read! The overall design of the ThreadedSocketConnection class is relatively simple, and on paper looks like it should work correctly. There are two threads for each Session: one which reads from a socket into a new'd char buffer and then pushes the buffer + size onto a queue; the other thread pop= s elements off of this queue, adds this data to the Parser's buffer, calls th= e Parser class to process its buffer, and finally delete[]'s the buffer it first got from the queue. The code appears to be correct, and doesn't leak memory in the purest sense, but the real-world behavior when incoming traffic is high is not good. When a counterparty sends data faster than your application can process it, the "read" thread in the ThreadedSocketConnection will do a fine job keepin= g up with the socket I/O, but the queue in between these two threads will end up growing very large, containing all of the data that has been read from the socket but which has yet to be processed by the Parser and Session. In = a perfect world, this memory usage would be identical to the size of the messages that have been recv'd, but memory allocators are inefficient and what ends up happening is that the application ends up using perhaps 10-20x more memory than this! My application, which should have used perhaps 125 M= B of memory (most of it due to a mmap'ed transaction log file) had ballooned up to 775 MB! As a comparison, the FIX incoming log file is only 38 MB in size. Thankfully, there is a simple fix to this problem, and it actually simplifies the code and eliminates one thread per session. Here's my versio= n of ThreadedSocketConnection::read: ---8<--- bool ThreadedSocketConnection::read() { QF_STACK_PUSH(ThreadedSocketConnection::read) int bytes =3D 0; char buffer[BUFSIZ]; try { if ( !socket_isValid( m_socket ) ) return false; socket_fionread( m_socket, bytes ); if (!bytes) bytes =3D sizeof (buffer); int result =3D recv( m_socket, buffer, bytes, 0 ); if ( result <=3D 0 ) { throw SocketRecvFailed(); } m_parser.addToStream (buffer, result); processStream (); return true; } catch ( SocketRecvFailed& e ) { if (m_pSession) m_pSession->getLog()->onEvent( e.what() ); return false; } QF_STACK_POP } ---8<--- Compared to the current QF version, this function: - Uses a single fixed buffer for all reads, keeping memory usage more or less constant regardless of incoming data rates - Doesn't spawn an extra thread for parsing/application callbacks - Causes the counterparty to block when the QuickFIX application can't keep up with incoming data I think all of these are benefits, though some might argue the last point i= s not and that the current read-as-fast-as-possible behavior is desireable. Perhaps this could be made configurable on a per-Session basis if folks wan= t the old behavior, but I personally much prefer this approach where the sender is blocked if we can't handle their flow fast enough. Thoughts? Opinions? -- Caleb Epstein caleb dot epstein at gmail dot com |