Re: [Quickfix-developers] ThreadedSocketConnection: fix for memory growth under heavy load
Brought to you by:
orenmnero
|
From: Oren M. <or...@qu...> - 2005-10-19 15:11:16
|
Well, the reason for the second thread is because during testing the = socket would appear to get overloaded and the operating system would = break the connection. This would happen during any extended period = where the traffic from the sender exceeded the receiver. The whole = thing would then pretty much collapse because resend requests would = ensure equilibrium was never reached. I don't know why the sender wasn't being blocked. It seemed to me that = data not picked up by recv was being placed in a buffer, which would at = some point got overloaded and cause the socket to break. Can you verify = that during extended periods of load that this implementation remains = stable and will simply throttle the sender? We would need to verify = this on both unix and windows I think. --oren ----- Original Message -----=20 From: Caleb Epstein=20 To: qui...@li...=20 Sent: Tuesday, October 18, 2005 5:23 PM Subject: [Quickfix-developers] ThreadedSocketConnection: fix for = memory growth under heavy load I have spent the past couple of days torture-testing a QuickFIX C++ = application with some simple test harnesses that bombard it with = messages. In the processes of this testing, I noticed that my = application was using a ridiculous amount of memory, many times larger = than would be expected by the size of the test data. So I ran = everything through valgrind (not much help - no leaks detected), and = eventually came upon Google's excellent "tcmalloc" library (see = http://goog-perftools.sourceforge.net/), which can be used to dump = statistics about heap usage. It turned out that 95% of the memory was = taken up by the memory allocations done by = ThreadedSocketConnection::read! The overall design of the ThreadedSocketConnection class is relatively = simple, and on paper looks like it should work correctly. There are two = threads for each Session: one which reads from a socket into a new'd = char buffer and then pushes the buffer + size onto a queue; the other = thread pops elements off of this queue, adds this data to the Parser's = buffer, calls the Parser class to process its buffer, and finally = delete[]'s the buffer it first got from the queue. The code appears to = be correct, and doesn't leak memory in the purest sense, but the = real-world behavior when incoming traffic is high is not good. When a counterparty sends data faster than your application can = process it, the "read" thread in the ThreadedSocketConnection will do a = fine job keeping up with the socket I/O, but the queue in between these = two threads will end up growing very large, containing all of the data = that has been read from the socket but which has yet to be processed by = the Parser and Session. In a perfect world, this memory usage would be = identical to the size of the messages that have been recv'd, but memory = allocators are inefficient and what ends up happening is that the = application ends up using perhaps 10-20x more memory than this! My = application, which should have used perhaps 125 MB of memory (most of it = due to a mmap'ed transaction log file) had ballooned up to 775 MB! As a = comparison, the FIX incoming log file is only 38 MB in size. Thankfully, there is a simple fix to this problem, and it actually = simplifies the code and eliminates one thread per session. Here's my = version of ThreadedSocketConnection::read: ---8<--- bool ThreadedSocketConnection::read() { QF_STACK_PUSH(ThreadedSocketConnection::read) int bytes =3D 0; char buffer[BUFSIZ]; try { if ( !socket_isValid( m_socket ) ) return false; socket_fionread( m_socket, bytes ); if (!bytes) bytes =3D sizeof (buffer); int result =3D recv( m_socket, buffer, bytes, 0 ); if ( result <=3D 0 ) { throw SocketRecvFailed(); } m_parser.addToStream (buffer, result); processStream (); return true; } catch ( SocketRecvFailed& e ) { if (m_pSession) m_pSession->getLog()->onEvent( e.what() ); return false; } QF_STACK_POP } ---8<--- Compared to the current QF version, this function: a.. Uses a single fixed buffer for all reads, keeping memory usage = more or less constant regardless of incoming data rates b.. Doesn't spawn an extra thread for parsing/application callbacks=20 c.. Causes the counterparty to block when the QuickFIX application = can't keep up with incoming data=20 I think all of these are benefits, though some might argue the last = point is not and that the current read-as-fast-as-possible behavior is = desireable. Perhaps this could be made configurable on a per-Session = basis if folks want the old behavior, but I personally much prefer this = approach where the sender is blocked if we can't handle their flow fast = enough. Thoughts? Opinions? --=20 Caleb Epstein caleb dot epstein at gmail dot com |