Re: [Quickfix-developers] Stability problems
Brought to you by:
orenmnero
From: <OM...@th...> - 2002-10-10 18:21:38
|
Loic, We have never run quickfix on a multi-processor Solaris machine, so in that respect you are doing something that we have never tested. Our clients tend to use linux or windows so those versions may tend to be a little more robust. In general I would recomend using the normal SocketInitiator over the Threaded one under Solaris. We have had trouble getting consistantly stable performance from the threaded implementations under Solaris, and since we have not as of yet had the demand, little work has been done to significatly improve it. Much work has been done in the upcoming release to address threading and crash recovery situations in general, as well as overall performance. This will probably help to improve the Solaris version as well. When we were implementing a market data server under 1.2.1 we ran into many of these issues and I believe have solved most of them. We plan on addressing differences between builds by setting up an online build page with a suite of automated tests. This is intended to go online early next week. We have a suite of unit tests and functional tests that we run with each build. We want to also design a set of performance tests that are also run with each build. --oren |---------+-----------------------------------------------> | | Loic Guezennec | | | <loi...@sw...> | | | Sent by: | | | qui...@li...ur| | | ceforge.net | | | | | | | | | 10/10/2002 02:46 PM | | | | |---------+-----------------------------------------------> >----------------------------------------------------------------------------------------------| | | | To: qui...@li... | | cc: | | Subject: [Quickfix-developers] Stability problems | >----------------------------------------------------------------------------------------------| I have implemented a buy-side with Quickfix which I hope to use in prod soon. The platform is Solaris 8 sparc multi-processor. compiler is gcc 2.95.3 The application runs well when heartbeating and under light load. I have severe instability problems when I apply a load test of 50 orders in one go. This happens systematically. I believe I am experiencing the problems described by Gene Gorokhovsky with the threading issues. The results so far are segmentation faults, bus errors and also perhaps a deadlock... The latter being hard for me to troubleshoot as I am not an expert on threads. An alarming point for me is the following: At times that the engine crashes, I can lose messages. This also seems to go along the message from Constantin about crash scenarios. Now my questions are: - Is quickfix known to be unstable on some platforms ( eg Sun) - Is there a preferred platform / architecture to use it. ( OS/ single or multi-proc/ Threaded or non threaded...) I have tried both threaded and non threaded socket initiators with no luck. Any feedback on what to do would be great. An example from attaching gdb to the process: Reading symbols from /usr/lib/libpthread.so.1...done. Reading symbols from /usr/lib/librt.so.1...done. Reading symbols from /usr/local/lib/libxml2.so.2...done. Reading symbols from /usr/lib/libz.so...done. Reading symbols from /usr/lib/libsocket.so.1...done. Reading symbols from /usr/lib/libnsl.so.1...done. Reading symbols from /usr/local/lib/libstdc++.so.2.10.0...done. Reading symbols from /usr/lib/libm.so.1...done. Reading symbols from /usr/lib/libc.so.1...done. Reading symbols from /usr/lib/libaio.so.1...done. Reading symbols from /usr/lib/libdl.so.1...done. Reading symbols from /usr/lib/libmp.so.2...done. Reading symbols from /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1...done. Reading symbols from /usr/lib/libthread.so.1...done. sol-thread active. Symbols already loaded for /usr/lib/libpthread.so.1 Symbols already loaded for /usr/lib/librt.so.1 Symbols already loaded for /usr/local/lib/libxml2.so.2 Symbols already loaded for /usr/lib/libz.so Symbols already loaded for /usr/lib/libsocket.so.1 Symbols already loaded for /usr/lib/libnsl.so.1 Symbols already loaded for /usr/local/lib/libstdc++.so.2.10.0 Symbols already loaded for /usr/lib/libm.so.1 Symbols already loaded for /usr/lib/libc.so.1 Symbols already loaded for /usr/lib/libaio.so.1 Symbols already loaded for /usr/lib/libdl.so.1 Symbols already loaded for /usr/lib/libmp.so.2 Symbols already loaded for /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1 Symbols already loaded for /usr/lib/libthread.so.1 0xff0194a0 in door_restart () from /usr/lib/libc.so.1 (gdb) continue Continuing. [New Thread 4 (LWP 5)] [Switching to Thread 4 (LWP 5)] Program received signal SIGSEGV, Segmentation fault. 0x142130 in __default_alloc_template<false, 0>::allocate (__n=32) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/stl_alloc.h:422 422 *__my_free_list = __result -> _M_free_list_link; (gdb) bt #0 0x142130 in __default_alloc_template<false, 0>::allocate (__n=32) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/stl_alloc.h:422 #1 0xc5148 in __nw__Q2t12basic_string3ZcZt18string_char_traits1ZcZt24__default_alloc_template2b0i0_3RepUiUi (s=16, extra=16) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.cc:33 #2 0xc5488 in basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::Rep::create (extra=16) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.cc:60 #3 0xc858c in basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::replace (this=0xfeeff390, pos=0, n1=0, s=0x24c2e0 "164\0306421", n2=3) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.cc:164 #4 0x165d34 in basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::assign (this=0xfeeff390, s=0x24c2e0 "164\0306421", n=3) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.h:218 #5 0x196d98 in basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::basic_string (this=0xfeeff390, s=0x24c2e0 "164\0306421", n=3) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.h:176 #6 0x18cf30 in stringbuf::str (this=0xfeeff29c) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/sstream:77 #7 0x1a0ca0 in stringstream::str (this=0xfeeff290) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/sstream:330 #8 0x1a3b98 in FIX::CheckSumConvertor::convert (value=164) at FieldConvertors.h:115 #9 0x1a1810 in FIX::CheckSumField::CheckSumField (this=0xfeeff4f0, field=10, data=164) at Field.h:328 #10 0x19e960 in FIX::CheckSum::CheckSum (this=0xfeeff4f0, value=164) at Fields.h:68 #11 0x192ff8 in FIX::Message::checkSum (this=0xfeeffac8) at Message.h:292 #12 0x188a40 in FIX::Message::getString (this=0xfeeffac8) at Message.h:147 #13 0xae760 in FIX::Session::sendRaw (this=0x251b28, message=@0xfeeffac8, msgSeqNum=0) at Session.cpp:323 #14 0xae498 in FIX::Session::send (this=0x251b28, message=@0xfeeffac8) at Session.cpp:293 #15 0xb3950 in FIX::Session::sendToTarget (message=@0xfeeffac8) at Session.cpp:849 #16 0x91278 in Application::enterOrderSingle (this=0xffbefc40, mapOrd={ _M_t = {<_Rb_tree_base<pair<const basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> >,basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > >,allocator<basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > > >> = {<_Rb_tree_alloc_base<pair<const basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> >,basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > >,allocator<basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > >,>> = {_M_header = 0x26d250}, <No data fields>}, _M_node_count = 31, _M_key_compare = {<binary_function<basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> >,basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> >,bool>> = {<No data fields>}, <No data f---Type <return> to continue, or q <return> to quit--- ields>}}}) at Application.cpp:1064 #17 0x8f3d4 in Application::onRun (this=0xffbefc40) at Application.cpp:827 #18 0xb6c5c in FIX::Initiator::startThread (p=0xffbefb58) at Initiator.cpp:151 (gdb) Loic Guezennec ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Quickfix-developers mailing list Qui...@li... https://lists.sourceforge.net/lists/listinfo/quickfix-developers |