[Quickfix-developers] ThreadedSocketConnection: fix for memory growth under heavy load

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I have spent the past couple of days torture-testing a QuickFIX C++
application with some simple test harnesses that bombard it with messages.
In the processes of this testing, I noticed that my application was using a
ridiculous amount of memory, many times larger than would be expected by th=
e
size of the test data. So I ran everything through valgrind (not much help =
-
no leaks detected), and eventually came upon Google's excellent "tcmalloc"
library (see http://goog-perftools.sourceforge.net/), which can be used to
dump statistics about heap usage. It turned out that 95% of the memory was
taken up by the memory allocations done by ThreadedSocketConnection::read!

The overall design of the ThreadedSocketConnection class is relatively
simple, and on paper looks like it should work correctly. There are two
threads for each Session: one which reads from a socket into a new'd char
buffer and then pushes the buffer + size onto a queue; the other thread pop=
s
elements off of this queue, adds this data to the Parser's buffer, calls th=
e
Parser class to process its buffer, and finally delete[]'s the buffer it
first got from the queue. The code appears to be correct, and doesn't leak
memory in the purest sense, but the real-world behavior when incoming
traffic is high is not good.

When a counterparty sends data faster than your application can process it,
the "read" thread in the ThreadedSocketConnection will do a fine job keepin=
g
up with the socket I/O, but the queue in between these two threads will end
up growing very large, containing all of the data that has been read from
the socket but which has yet to be processed by the Parser and Session. In =
a
perfect world, this memory usage would be identical to the size of the
messages that have been recv'd, but memory allocators are inefficient and
what ends up happening is that the application ends up using perhaps 10-20x
more memory than this! My application, which should have used perhaps 125 M=
B
of memory (most of it due to a mmap'ed transaction log file) had ballooned
up to 775 MB! As a comparison, the FIX incoming log file is only 38 MB in
size.

Thankfully, there is a simple fix to this problem, and it actually
simplifies the code and eliminates one thread per session. Here's my versio=
n
of ThreadedSocketConnection::read:

---8<---
bool ThreadedSocketConnection::read()
{ QF_STACK_PUSH(ThreadedSocketConnection::read)

int bytes =3D 0;
char buffer[BUFSIZ];
try
{
if ( !socket_isValid( m_socket ) )
return false;

socket_fionread( m_socket, bytes );
if (!bytes)
bytes =3D sizeof (buffer);
int result =3D recv( m_socket, buffer, bytes, 0 );
if ( result <=3D 0 ) { throw SocketRecvFailed(); }
m_parser.addToStream (buffer, result);
processStream ();
return true;
}
catch ( SocketRecvFailed& e )
{
if (m_pSession)
m_pSession->getLog()->onEvent( e.what() );
return false;
}

QF_STACK_POP
}
---8<---

Compared to the current QF version, this function:

   - Uses a single fixed buffer for all reads, keeping memory usage more
   or less constant regardless of incoming data rates
    - Doesn't spawn an extra thread for parsing/application callbacks
   - Causes the counterparty to block when the QuickFIX application can't
   keep up with incoming data

I think all of these are benefits, though some might argue the last point i=
s
not and that the current read-as-fast-as-possible behavior is desireable.
Perhaps this could be made configurable on a per-Session basis if folks wan=
t
the old behavior, but I personally much prefer this approach where the
sender is blocked if we can't handle their flow fast enough.

Thoughts? Opinions?

--
Caleb Epstein
caleb dot epstein at gmail dot com