Re: [Quickfix-developers] ThreadedSocketConnection: fix for memory growth under heavy load

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Well, the reason for the second thread is because during testing the =
socket would appear to get overloaded and the operating system would =
break the connection.  This would happen during any extended period =
where the traffic from the sender exceeded the receiver.  The whole =
thing would then pretty much collapse because resend requests would =
ensure equilibrium was never reached.

I don't know why the sender wasn't being blocked.  It seemed to me that =
data not picked up by recv was being placed in a buffer, which would at =
some point got overloaded and cause the socket to break.  Can you verify =
that during extended periods of load that this implementation remains =
stable and will simply throttle the sender?  We would need to verify =
this on both unix and windows I think.

--oren
  ----- Original Message -----=20
  From: Caleb Epstein=20
  To: qui...@li...=20
  Sent: Tuesday, October 18, 2005 5:23 PM
  Subject: [Quickfix-developers] ThreadedSocketConnection: fix for =
memory growth under heavy load

  I have spent the past couple of days torture-testing a QuickFIX C++ =
application with some simple test harnesses that bombard it with =
messages.  In the processes of this testing, I noticed that my =
application was using a ridiculous amount of memory, many times larger =
than would be expected by the size of the test data.  So I ran =
everything through valgrind (not much help - no leaks detected), and =
eventually came upon Google's excellent "tcmalloc" library (see =
http://goog-perftools.sourceforge.net/), which can be used to dump =
statistics about heap usage.  It turned out that 95% of the memory was =
taken up by the memory allocations done by =
ThreadedSocketConnection::read!

  The overall design of the ThreadedSocketConnection class is relatively =
simple, and on paper looks like it should work correctly.  There are two =
threads for each Session: one which reads from a socket into a new'd =
char buffer and then pushes the buffer + size onto a queue; the other =
thread pops elements off of this queue, adds this data to the Parser's =
buffer, calls the Parser class to process its buffer, and finally =
delete[]'s the buffer it first got from the queue.  The code appears to =
be correct, and doesn't leak memory in the purest sense, but the =
real-world behavior when incoming traffic is high is not good.

  When a counterparty sends data faster than your application can =
process it, the "read" thread in the ThreadedSocketConnection will do a =
fine job keeping up with the socket I/O, but the queue in between these =
two threads will end up growing very large, containing all of the data =
that has been read from the socket but which has yet to be processed by =
the Parser and Session.  In a perfect world, this memory usage would be =
identical to the size of the messages that have been recv'd, but memory =
allocators are inefficient and what ends up happening is that the =
application ends up using perhaps 10-20x more memory than this!  My =
application, which should have used perhaps 125 MB of memory (most of it =
due to a mmap'ed transaction log file) had ballooned up to 775 MB!  As a =
comparison, the FIX incoming log file is only 38 MB in size.

  Thankfully, there is a simple fix to this problem, and it actually =
simplifies the code and eliminates one thread per session.  Here's my =
version of ThreadedSocketConnection::read:

  ---8<---
  bool ThreadedSocketConnection::read()
  { QF_STACK_PUSH(ThreadedSocketConnection::read)

    int bytes =3D 0;
    char buffer[BUFSIZ];
    try
    {
      if ( !socket_isValid( m_socket ) )
        return false;

      socket_fionread( m_socket, bytes );
      if (!bytes)
         bytes =3D sizeof (buffer);
      int result =3D recv( m_socket, buffer, bytes, 0 );
      if ( result <=3D 0 ) { throw SocketRecvFailed(); }
      m_parser.addToStream (buffer, result);
      processStream ();
      return true;
    }
    catch ( SocketRecvFailed& e )
    {
      if (m_pSession)
        m_pSession->getLog()->onEvent( e.what() );
      return false;
    }

    QF_STACK_POP
  }
  ---8<---

  Compared to the current QF version, this function:

    a.. Uses a single fixed buffer for all reads, keeping memory usage =
more or less constant regardless of incoming data rates

    b.. Doesn't spawn an extra thread for parsing/application callbacks=20
    c.. Causes the counterparty to block when the QuickFIX application =
can't keep up with incoming data=20
  I think all of these are benefits, though some might argue the last =
point is not and that the current read-as-fast-as-possible behavior is =
desireable.  Perhaps this could be made configurable on a per-Session =
basis if folks want the old behavior, but I personally much prefer this =
approach where the sender is blocked if we can't handle their flow fast =
enough.

  Thoughts?  Opinions?

  --=20
  Caleb Epstein
  caleb dot epstein at gmail dot com