Thread: Re: [Quickfix-developers] Stability problems
Brought to you by:
orenmnero
From: <OM...@th...> - 2002-10-10 18:21:38
|
Loic, We have never run quickfix on a multi-processor Solaris machine, so in that respect you are doing something that we have never tested. Our clients tend to use linux or windows so those versions may tend to be a little more robust. In general I would recomend using the normal SocketInitiator over the Threaded one under Solaris. We have had trouble getting consistantly stable performance from the threaded implementations under Solaris, and since we have not as of yet had the demand, little work has been done to significatly improve it. Much work has been done in the upcoming release to address threading and crash recovery situations in general, as well as overall performance. This will probably help to improve the Solaris version as well. When we were implementing a market data server under 1.2.1 we ran into many of these issues and I believe have solved most of them. We plan on addressing differences between builds by setting up an online build page with a suite of automated tests. This is intended to go online early next week. We have a suite of unit tests and functional tests that we run with each build. We want to also design a set of performance tests that are also run with each build. --oren |---------+-----------------------------------------------> | | Loic Guezennec | | | <loi...@sw...> | | | Sent by: | | | qui...@li...ur| | | ceforge.net | | | | | | | | | 10/10/2002 02:46 PM | | | | |---------+-----------------------------------------------> >----------------------------------------------------------------------------------------------| | | | To: qui...@li... | | cc: | | Subject: [Quickfix-developers] Stability problems | >----------------------------------------------------------------------------------------------| I have implemented a buy-side with Quickfix which I hope to use in prod soon. The platform is Solaris 8 sparc multi-processor. compiler is gcc 2.95.3 The application runs well when heartbeating and under light load. I have severe instability problems when I apply a load test of 50 orders in one go. This happens systematically. I believe I am experiencing the problems described by Gene Gorokhovsky with the threading issues. The results so far are segmentation faults, bus errors and also perhaps a deadlock... The latter being hard for me to troubleshoot as I am not an expert on threads. An alarming point for me is the following: At times that the engine crashes, I can lose messages. This also seems to go along the message from Constantin about crash scenarios. Now my questions are: - Is quickfix known to be unstable on some platforms ( eg Sun) - Is there a preferred platform / architecture to use it. ( OS/ single or multi-proc/ Threaded or non threaded...) I have tried both threaded and non threaded socket initiators with no luck. Any feedback on what to do would be great. An example from attaching gdb to the process: Reading symbols from /usr/lib/libpthread.so.1...done. Reading symbols from /usr/lib/librt.so.1...done. Reading symbols from /usr/local/lib/libxml2.so.2...done. Reading symbols from /usr/lib/libz.so...done. Reading symbols from /usr/lib/libsocket.so.1...done. Reading symbols from /usr/lib/libnsl.so.1...done. Reading symbols from /usr/local/lib/libstdc++.so.2.10.0...done. Reading symbols from /usr/lib/libm.so.1...done. Reading symbols from /usr/lib/libc.so.1...done. Reading symbols from /usr/lib/libaio.so.1...done. Reading symbols from /usr/lib/libdl.so.1...done. Reading symbols from /usr/lib/libmp.so.2...done. Reading symbols from /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1...done. Reading symbols from /usr/lib/libthread.so.1...done. sol-thread active. Symbols already loaded for /usr/lib/libpthread.so.1 Symbols already loaded for /usr/lib/librt.so.1 Symbols already loaded for /usr/local/lib/libxml2.so.2 Symbols already loaded for /usr/lib/libz.so Symbols already loaded for /usr/lib/libsocket.so.1 Symbols already loaded for /usr/lib/libnsl.so.1 Symbols already loaded for /usr/local/lib/libstdc++.so.2.10.0 Symbols already loaded for /usr/lib/libm.so.1 Symbols already loaded for /usr/lib/libc.so.1 Symbols already loaded for /usr/lib/libaio.so.1 Symbols already loaded for /usr/lib/libdl.so.1 Symbols already loaded for /usr/lib/libmp.so.2 Symbols already loaded for /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1 Symbols already loaded for /usr/lib/libthread.so.1 0xff0194a0 in door_restart () from /usr/lib/libc.so.1 (gdb) continue Continuing. [New Thread 4 (LWP 5)] [Switching to Thread 4 (LWP 5)] Program received signal SIGSEGV, Segmentation fault. 0x142130 in __default_alloc_template<false, 0>::allocate (__n=32) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/stl_alloc.h:422 422 *__my_free_list = __result -> _M_free_list_link; (gdb) bt #0 0x142130 in __default_alloc_template<false, 0>::allocate (__n=32) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/stl_alloc.h:422 #1 0xc5148 in __nw__Q2t12basic_string3ZcZt18string_char_traits1ZcZt24__default_alloc_template2b0i0_3RepUiUi (s=16, extra=16) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.cc:33 #2 0xc5488 in basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::Rep::create (extra=16) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.cc:60 #3 0xc858c in basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::replace (this=0xfeeff390, pos=0, n1=0, s=0x24c2e0 "164\0306421", n2=3) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.cc:164 #4 0x165d34 in basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::assign (this=0xfeeff390, s=0x24c2e0 "164\0306421", n=3) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.h:218 #5 0x196d98 in basic_string<char, string_char_traits<char>, __default_alloc_template<false, 0> >::basic_string (this=0xfeeff390, s=0x24c2e0 "164\0306421", n=3) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.h:176 #6 0x18cf30 in stringbuf::str (this=0xfeeff29c) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/sstream:77 #7 0x1a0ca0 in stringstream::str (this=0xfeeff290) at /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/sstream:330 #8 0x1a3b98 in FIX::CheckSumConvertor::convert (value=164) at FieldConvertors.h:115 #9 0x1a1810 in FIX::CheckSumField::CheckSumField (this=0xfeeff4f0, field=10, data=164) at Field.h:328 #10 0x19e960 in FIX::CheckSum::CheckSum (this=0xfeeff4f0, value=164) at Fields.h:68 #11 0x192ff8 in FIX::Message::checkSum (this=0xfeeffac8) at Message.h:292 #12 0x188a40 in FIX::Message::getString (this=0xfeeffac8) at Message.h:147 #13 0xae760 in FIX::Session::sendRaw (this=0x251b28, message=@0xfeeffac8, msgSeqNum=0) at Session.cpp:323 #14 0xae498 in FIX::Session::send (this=0x251b28, message=@0xfeeffac8) at Session.cpp:293 #15 0xb3950 in FIX::Session::sendToTarget (message=@0xfeeffac8) at Session.cpp:849 #16 0x91278 in Application::enterOrderSingle (this=0xffbefc40, mapOrd={ _M_t = {<_Rb_tree_base<pair<const basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> >,basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > >,allocator<basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > > >> = {<_Rb_tree_alloc_base<pair<const basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> >,basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > >,allocator<basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > >,>> = {_M_header = 0x26d250}, <No data fields>}, _M_node_count = 31, _M_key_compare = {<binary_function<basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> >,basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> >,bool>> = {<No data fields>}, <No data f---Type <return> to continue, or q <return> to quit--- ields>}}}) at Application.cpp:1064 #17 0x8f3d4 in Application::onRun (this=0xffbefc40) at Application.cpp:827 #18 0xb6c5c in FIX::Initiator::startThread (p=0xffbefb58) at Initiator.cpp:151 (gdb) Loic Guezennec ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Quickfix-developers mailing list Qui...@li... https://lists.sourceforge.net/lists/listinfo/quickfix-developers |
From: <OM...@th...> - 2002-10-14 23:11:48
|
Gene, I've done some tests and it appears your analysis is dead on. Thanks for the lead. I am thinking of having autotools turn on _PTHREADS by default so that everything will work right out of the box, but recommend that people either upgrade their compiler or move to STLPort if using Solaris. --oren |---------+-----------------------------------------------> | | Gene Gorokhovsky | | | <mus...@ya...> | | | Sent by: | | | qui...@li...ur| | | ceforge.net | | | | | | | | | 10/10/2002 08:20 PM | | | | |---------+-----------------------------------------------> >----------------------------------------------------------------------------------------------| | | | To: Loic Guezennec <loi...@sw...>, | | qui...@li... | | cc: | | Subject: Re: [Quickfix-developers] Stability problems | >----------------------------------------------------------------------------------------------| Although Quickfix code maybe partly to blame, the stack that you have shown could also be caused by subtle misconfiguraton of gcc and STL (part of libstd). Some people have reported that despite documentation saying that it matters only for Objective-C, gcc itself should be compiled with configure --enable-threads, and that C++ compiler is affected by this setting. I cannot vouch for this though, since I have always preferred Sun's own cc, and had with it significantly fewer headaches (for some $$) Also in gcc 2.95.x + STL defining _PTHREADS turns on more robust (and signficantly slower) implementation of locking in STL allocator (exactly where your stack shows crash), This apparently has been fixed in 3.2, and the flag no longer has any effect on the code. Try defining this and trying your tests. Also, some older implemenations of STL had reference-counted std::string which made strings not thread-safe even for reading. This certainly has been fixed with gcc 3.2 release, but I am not sure about about older versions. Another option yet would be to switch to STLPort implementation of STL. It has had thread-safe strings from the get-go. Gene --- Loic Guezennec <loi...@sw...> wrote: > I have implemented a buy-side with Quickfix which I > hope to use in prod > soon. > > The platform is Solaris 8 sparc multi-processor. > compiler is gcc 2.95.3 > > The application runs well when heartbeating and > under light load. > > I have severe instability problems when I apply a > load test of 50 orders > in one > go. This happens systematically. > > I believe I am experiencing the problems described > by Gene Gorokhovsky > with the > threading issues. The results so far are > segmentation faults, bus errors > and > also perhaps a deadlock... The latter being hard for > me to troubleshoot as > I am not an expert on threads. > > > An alarming point for me is the following: > At times that the engine crashes, I can lose > messages. This also seems to > go along > the message from Constantin about crash scenarios. > > Now my questions are: > > - Is quickfix known to be unstable on some platforms > ( eg Sun) > - Is there a preferred platform / architecture to > use it. > ( OS/ single or multi-proc/ Threaded or non > threaded...) > I have tried both threaded and non threaded > socket initiators > with no luck. > > Any feedback on what to do would be great. > > > An example from attaching gdb to the process: > > Reading symbols from > /usr/lib/libpthread.so.1...done. > Reading symbols from /usr/lib/librt.so.1...done. > Reading symbols from > /usr/local/lib/libxml2.so.2...done. > Reading symbols from /usr/lib/libz.so...done. > Reading symbols from /usr/lib/libsocket.so.1...done. > Reading symbols from /usr/lib/libnsl.so.1...done. > Reading symbols from > /usr/local/lib/libstdc++.so.2.10.0...done. > Reading symbols from /usr/lib/libm.so.1...done. > Reading symbols from /usr/lib/libc.so.1...done. > Reading symbols from /usr/lib/libaio.so.1...done. > Reading symbols from /usr/lib/libdl.so.1...done. > Reading symbols from /usr/lib/libmp.so.2...done. > Reading symbols from > /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1...done. > Reading symbols from /usr/lib/libthread.so.1...done. > sol-thread active. > Symbols already loaded for /usr/lib/libpthread.so.1 > Symbols already loaded for /usr/lib/librt.so.1 > Symbols already loaded for > /usr/local/lib/libxml2.so.2 > Symbols already loaded for /usr/lib/libz.so > Symbols already loaded for /usr/lib/libsocket.so.1 > Symbols already loaded for /usr/lib/libnsl.so.1 > Symbols already loaded for > /usr/local/lib/libstdc++.so.2.10.0 > Symbols already loaded for /usr/lib/libm.so.1 > Symbols already loaded for /usr/lib/libc.so.1 > Symbols already loaded for /usr/lib/libaio.so.1 > Symbols already loaded for /usr/lib/libdl.so.1 > Symbols already loaded for /usr/lib/libmp.so.2 > Symbols already loaded for > /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1 > Symbols already loaded for /usr/lib/libthread.so.1 > 0xff0194a0 in door_restart () from > /usr/lib/libc.so.1 > (gdb) continue > Continuing. > [New Thread 4 (LWP 5)] > [Switching to Thread 4 (LWP 5)] > > Program received signal SIGSEGV, Segmentation fault. > 0x142130 in __default_alloc_template<false, > 0>::allocate (__n=32) > at > /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/stl_alloc.h:422 > 422 *__my_free_list = __result -> > _M_free_list_link; > (gdb) bt > #0 0x142130 in __default_alloc_template<false, > 0>::allocate (__n=32) > at > /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/stl_alloc.h:422 > #1 0xc5148 in > __nw__Q2t12basic_string3ZcZt18string_char_traits1ZcZt24__default_alloc_template2b0i0_3RepUiUi > > (s=16, extra=16) > at > /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.cc:33 > #2 0xc5488 in basic_string<char, > string_char_traits<char>, > __default_alloc_template<false, 0> >::Rep::create > (extra=16) > at > /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.cc:60 > #3 0xc858c in basic_string<char, > string_char_traits<char>, > __default_alloc_template<false, 0> >::replace > (this=0xfeeff390, pos=0, > n1=0, > s=0x24c2e0 "164\0306421", n2=3) > at > /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.cc:164 > #4 0x165d34 in basic_string<char, > string_char_traits<char>, > __default_alloc_template<false, 0> >::assign > (this=0xfeeff390, s=0x24c2e0 > "164\0306421", n=3) > at > /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.h:218 > #5 0x196d98 in basic_string<char, > string_char_traits<char>, > __default_alloc_template<false, 0> >::basic_string > (this=0xfeeff390, > s=0x24c2e0 "164\0306421", > n=3) > at > /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/std/bastring.h:176 > #6 0x18cf30 in stringbuf::str (this=0xfeeff29c) > at > /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/sstream:77 > #7 0x1a0ca0 in stringstream::str (this=0xfeeff290) > at > /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/sstream:330 > #8 0x1a3b98 in FIX::CheckSumConvertor::convert > (value=164) > at FieldConvertors.h:115 > #9 0x1a1810 in FIX::CheckSumField::CheckSumField > (this=0xfeeff4f0, > field=10, > data=164) at Field.h:328 > #10 0x19e960 in FIX::CheckSum::CheckSum > (this=0xfeeff4f0, value=164) > at Fields.h:68 > #11 0x192ff8 in FIX::Message::checkSum > (this=0xfeeffac8) at Message.h:292 > #12 0x188a40 in FIX::Message::getString > (this=0xfeeffac8) at Message.h:147 > #13 0xae760 in FIX::Session::sendRaw (this=0x251b28, > message=@0xfeeffac8, > msgSeqNum=0) at Session.cpp:323 > #14 0xae498 in FIX::Session::send (this=0x251b28, > message=@0xfeeffac8) > at Session.cpp:293 > #15 0xb3950 in FIX::Session::sendToTarget > (message=@0xfeeffac8) > at Session.cpp:849 > #16 0x91278 in Application::enterOrderSingle > (this=0xffbefc40, mapOrd={ > _M_t = {<_Rb_tree_base<pair<const > basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > >,basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > > > >,allocator<basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > > > >> > = {<_Rb_tree_alloc_base<pair<const > basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > >,basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > > > >,allocator<basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > > >,>> > = {_M_header = 0x26d250}, <No data fields>}, > _M_node_count = 31, > _M_key_compare = > {<binary_function<basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > >,basic_string<char,string_char_traits<char>, __default_alloc_template<false,0> > >,bool>> > = {<No data fields>}, <No data f---Type <return> to > continue, or q > <return> to quit--- > ields>}}}) at Application.cpp:1064 > #17 0x8f3d4 in Application::onRun (this=0xffbefc40) > at Application.cpp:827 > #18 0xb6c5c in FIX::Initiator::startThread > (p=0xffbefb58) at > Initiator.cpp:151 > (gdb) > > > Loic Guezennec > === message truncated === __________________________________________________ Do you Yahoo!? Faith Hill - Exclusive Performances, Videos & More http://faith.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Quickfix-developers mailing list Qui...@li... https://lists.sourceforge.net/lists/listinfo/quickfix-developers |
From: Gene G. <mus...@ya...> - 2002-10-15 01:21:30
|
After some digging, it appears that libstdc++ snapshot 9 (libstdc++-2.90.8.tar.gz) is the last "official" gcc libstd which works with 2.95.x. Judging from its documentation, strings are MT safe in that release on many platforms including Solaris. For gcc 2.95.3 it requires a patch (http://gcc.gnu.org/libstdc++/libstdc++-2.90.8-compat-gcc-2.95.3.diff) I am not sure defining _PTHREADS out of the box is a great idea -- string performance (and QuickFIX depends heavliy on them strings) does suffer tremendously. This link http://gcc.gnu.org/ml/libstdc++/2001-05/msg00384.html has some interesting discussion, and also results which show that map of strings performance can decrease three-fold with _PTHREADS defined. So Solaris options could be : 1)Using gcc 3.2 (requires recompilation of all C++ libraries) 2)Staying with gcc 2.95.x and a)Using STLPort (4.0 and higher work with gcc 2.95.x) b)Upgrading libstdc++ to v3 release 9 with patch and building it with configure --enable-threads c) Defining _PTHREADS (least effort, performance penalty) Gene --- OM...@th... wrote: > > Gene, > > I've done some tests and it appears your analysis is > dead on. Thanks for > the lead. I am thinking of having autotools turn on > _PTHREADS by default > so that everything will work right out of the box, > but recommend that > people either upgrade their compiler or move to > STLPort if using Solaris. > > --oren > > > > |---------+-----------------------------------------------> > | | Gene Gorokhovsky > | > | | <mus...@ya...> > | > | | Sent by: > | > | | > qui...@li...ur| > | | ceforge.net > | > | | > | > | | > | > | | 10/10/2002 08:20 PM > | > | | > | > |---------+-----------------------------------------------> > > >----------------------------------------------------------------------------------------------| > | > | > | To: Loic Guezennec > <loi...@sw...>, > | > | qui...@li... > | > | cc: > | > | Subject: Re: [Quickfix-developers] > Stability problems | > > >----------------------------------------------------------------------------------------------| > > > > > Although Quickfix code maybe partly to blame, the > stack that you have shown could also be caused by > subtle misconfiguraton of gcc and STL (part of > libstd). Some people have reported that despite > documentation saying that it matters only for > Objective-C, gcc itself should be compiled with > configure --enable-threads, and that C++ compiler is > affected by this setting. I cannot vouch for this > though, since I have always preferred Sun's own cc, > and had with it significantly fewer headaches (for > some $$) > Also in gcc 2.95.x + STL defining _PTHREADS turns on > more robust (and signficantly slower) implementation > of locking in STL allocator (exactly where your > stack > shows crash), This apparently has been fixed in 3.2, > and the flag no longer has any effect on the code. > Try defining this and trying your tests. > Also, some older implemenations of STL had > reference-counted std::string which made strings not > thread-safe even for reading. This certainly has > been > fixed with gcc 3.2 release, but I am not sure about > about older versions. > Another option yet would be to switch to STLPort > implementation of STL. It has had thread-safe > strings > from the get-go. > > Gene > --- Loic Guezennec <loi...@sw...> > wrote: > > I have implemented a buy-side with Quickfix which > I > > hope to use in prod > > soon. > > > > The platform is Solaris 8 sparc multi-processor. > > compiler is gcc 2.95.3 > > > > The application runs well when heartbeating and > > under light load. > > > > I have severe instability problems when I apply a > > load test of 50 orders > > in one > > go. This happens systematically. > > > > I believe I am experiencing the problems described > > by Gene Gorokhovsky > > with the > > threading issues. The results so far are > > segmentation faults, bus errors > > and > > also perhaps a deadlock... The latter being hard > for > > me to troubleshoot as > > I am not an expert on threads. > > > > > > An alarming point for me is the following: > > At times that the engine crashes, I can lose > > messages. This also seems to > > go along > > the message from Constantin about crash scenarios. > > > > Now my questions are: > > > > - Is quickfix known to be unstable on some > platforms > > ( eg Sun) > > - Is there a preferred platform / architecture to > > use it. > > ( OS/ single or multi-proc/ Threaded or non > > threaded...) > > I have tried both threaded and non threaded > > socket initiators > > with no luck. > > > > Any feedback on what to do would be great. > > > > > > An example from attaching gdb to the process: > > > > Reading symbols from > > /usr/lib/libpthread.so.1...done. > > Reading symbols from /usr/lib/librt.so.1...done. > > Reading symbols from > > /usr/local/lib/libxml2.so.2...done. > > Reading symbols from /usr/lib/libz.so...done. > > Reading symbols from > /usr/lib/libsocket.so.1...done. > > Reading symbols from /usr/lib/libnsl.so.1...done. > > Reading symbols from > > /usr/local/lib/libstdc++.so.2.10.0...done. > > Reading symbols from /usr/lib/libm.so.1...done. > > Reading symbols from /usr/lib/libc.so.1...done. > > Reading symbols from /usr/lib/libaio.so.1...done. > > Reading symbols from /usr/lib/libdl.so.1...done. > > Reading symbols from /usr/lib/libmp.so.2...done. > > Reading symbols from > > > /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1...done. > > Reading symbols from > /usr/lib/libthread.so.1...done. > > sol-thread active. > > Symbols already loaded for > /usr/lib/libpthread.so.1 > > Symbols already loaded for /usr/lib/librt.so.1 > > Symbols already loaded for > > /usr/local/lib/libxml2.so.2 > > Symbols already loaded for /usr/lib/libz.so > > Symbols already loaded for /usr/lib/libsocket.so.1 > > Symbols already loaded for /usr/lib/libnsl.so.1 > > Symbols already loaded for > > /usr/local/lib/libstdc++.so.2.10.0 > > Symbols already loaded for /usr/lib/libm.so.1 > > Symbols already loaded for /usr/lib/libc.so.1 > > Symbols already loaded for /usr/lib/libaio.so.1 > > Symbols already loaded for /usr/lib/libdl.so.1 > > Symbols already loaded for /usr/lib/libmp.so.2 > > Symbols already loaded for > > /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1 > > Symbols already loaded for /usr/lib/libthread.so.1 > > 0xff0194a0 in door_restart () from > > /usr/lib/libc.so.1 > > (gdb) continue > > Continuing. > > [New Thread 4 (LWP 5)] > > [Switching to Thread 4 (LWP 5)] > > > > Program received signal SIGSEGV, Segmentation > fault. > > 0x142130 in __default_alloc_template<false, > > 0>::allocate (__n=32) > > at > > > /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/stl_alloc.h:422 > > > 422 *__my_free_list = __result -> > > _M_free_list_link; > > (gdb) bt > > #0 0x142130 in __default_alloc_template<false, > > 0>::allocate (__n=32) > > at > > > === message truncated === __________________________________________________ Do you Yahoo!? Faith Hill - Exclusive Performances, Videos & More http://faith.yahoo.com |
From: <OM...@th...> - 2002-10-15 02:22:08
|
I'm wondering if we can create an autofconf script to detect this. Due to the nature of the problem, it probably can't be 100% deterministic, but I imagine we can create a test that will make catching it extremely likely. If the problem is detected, at that point we can either define __STL_PTHREADS and display a warning, or stop the compilation altogether and recommend the options you outlined. I think it is important to try to automate this as much as possible because everything *seems* ok, but really is not at all. And a warning in the documentation, no matter how prominent, is likely to be glossed over. Loic's broker thankfully had a test that exposed this for him, but a lot of brokers don't have such tests, so I think we should. It would also be a good idea to start adding load tests to the automated test suite. --oren Gene Gorokhovsky <musor102@yahoo.c To: OM...@th... om> cc: Loic Guezennec <loi...@sw...>, qui...@li..., 10/14/2002 08:21 qui...@li... PM Subject: Re: [Quickfix-developers] Stability problems After some digging, it appears that libstdc++ snapshot 9 (libstdc++-2.90.8.tar.gz) is the last "official" gcc libstd which works with 2.95.x. Judging from its documentation, strings are MT safe in that release on many platforms including Solaris. For gcc 2.95.3 it requires a patch (http://gcc.gnu.org/libstdc++/libstdc++-2.90.8-compat-gcc-2.95.3.diff) I am not sure defining _PTHREADS out of the box is a great idea -- string performance (and QuickFIX depends heavliy on them strings) does suffer tremendously. This link http://gcc.gnu.org/ml/libstdc++/2001-05/msg00384.html has some interesting discussion, and also results which show that map of strings performance can decrease three-fold with _PTHREADS defined. So Solaris options could be : 1)Using gcc 3.2 (requires recompilation of all C++ libraries) 2)Staying with gcc 2.95.x and a)Using STLPort (4.0 and higher work with gcc 2.95.x) b)Upgrading libstdc++ to v3 release 9 with patch and building it with configure --enable-threads c) Defining _PTHREADS (least effort, performance penalty) Gene --- OM...@th... wrote: > > Gene, > > I've done some tests and it appears your analysis is > dead on. Thanks for > the lead. I am thinking of having autotools turn on > _PTHREADS by default > so that everything will work right out of the box, > but recommend that > people either upgrade their compiler or move to > STLPort if using Solaris. > > --oren > > > > |---------+-----------------------------------------------> > | | Gene Gorokhovsky > | > | | <mus...@ya...> > | > | | Sent by: > | > | | > qui...@li...ur| > | | ceforge.net > | > | | > | > | | > | > | | 10/10/2002 08:20 PM > | > | | > | > |---------+-----------------------------------------------> > > > ----------------------------------------------------------------------------------------------| > | > | > | To: Loic Guezennec > <loi...@sw...>, > | > | qui...@li... > | > | cc: > | > | Subject: Re: [Quickfix-developers] > Stability problems | > > > ----------------------------------------------------------------------------------------------| > > > > > Although Quickfix code maybe partly to blame, the > stack that you have shown could also be caused by > subtle misconfiguraton of gcc and STL (part of > libstd). Some people have reported that despite > documentation saying that it matters only for > Objective-C, gcc itself should be compiled with > configure --enable-threads, and that C++ compiler is > affected by this setting. I cannot vouch for this > though, since I have always preferred Sun's own cc, > and had with it significantly fewer headaches (for > some $$) > Also in gcc 2.95.x + STL defining _PTHREADS turns on > more robust (and signficantly slower) implementation > of locking in STL allocator (exactly where your > stack > shows crash), This apparently has been fixed in 3.2, > and the flag no longer has any effect on the code. > Try defining this and trying your tests. > Also, some older implemenations of STL had > reference-counted std::string which made strings not > thread-safe even for reading. This certainly has > been > fixed with gcc 3.2 release, but I am not sure about > about older versions. > Another option yet would be to switch to STLPort > implementation of STL. It has had thread-safe > strings > from the get-go. > > Gene > --- Loic Guezennec <loi...@sw...> > wrote: > > I have implemented a buy-side with Quickfix which > I > > hope to use in prod > > soon. > > > > The platform is Solaris 8 sparc multi-processor. > > compiler is gcc 2.95.3 > > > > The application runs well when heartbeating and > > under light load. > > > > I have severe instability problems when I apply a > > load test of 50 orders > > in one > > go. This happens systematically. > > > > I believe I am experiencing the problems described > > by Gene Gorokhovsky > > with the > > threading issues. The results so far are > > segmentation faults, bus errors > > and > > also perhaps a deadlock... The latter being hard > for > > me to troubleshoot as > > I am not an expert on threads. > > > > > > An alarming point for me is the following: > > At times that the engine crashes, I can lose > > messages. This also seems to > > go along > > the message from Constantin about crash scenarios. > > > > Now my questions are: > > > > - Is quickfix known to be unstable on some > platforms > > ( eg Sun) > > - Is there a preferred platform / architecture to > > use it. > > ( OS/ single or multi-proc/ Threaded or non > > threaded...) > > I have tried both threaded and non threaded > > socket initiators > > with no luck. > > > > Any feedback on what to do would be great. > > > > > > An example from attaching gdb to the process: > > > > Reading symbols from > > /usr/lib/libpthread.so.1...done. > > Reading symbols from /usr/lib/librt.so.1...done. > > Reading symbols from > > /usr/local/lib/libxml2.so.2...done. > > Reading symbols from /usr/lib/libz.so...done. > > Reading symbols from > /usr/lib/libsocket.so.1...done. > > Reading symbols from /usr/lib/libnsl.so.1...done. > > Reading symbols from > > /usr/local/lib/libstdc++.so.2.10.0...done. > > Reading symbols from /usr/lib/libm.so.1...done. > > Reading symbols from /usr/lib/libc.so.1...done. > > Reading symbols from /usr/lib/libaio.so.1...done. > > Reading symbols from /usr/lib/libdl.so.1...done. > > Reading symbols from /usr/lib/libmp.so.2...done. > > Reading symbols from > > > /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1...done. > > Reading symbols from > /usr/lib/libthread.so.1...done. > > sol-thread active. > > Symbols already loaded for > /usr/lib/libpthread.so.1 > > Symbols already loaded for /usr/lib/librt.so.1 > > Symbols already loaded for > > /usr/local/lib/libxml2.so.2 > > Symbols already loaded for /usr/lib/libz.so > > Symbols already loaded for /usr/lib/libsocket.so.1 > > Symbols already loaded for /usr/lib/libnsl.so.1 > > Symbols already loaded for > > /usr/local/lib/libstdc++.so.2.10.0 > > Symbols already loaded for /usr/lib/libm.so.1 > > Symbols already loaded for /usr/lib/libc.so.1 > > Symbols already loaded for /usr/lib/libaio.so.1 > > Symbols already loaded for /usr/lib/libdl.so.1 > > Symbols already loaded for /usr/lib/libmp.so.2 > > Symbols already loaded for > > /usr/platform/SUNW,Ultra-80/lib/libc_psr.so.1 > > Symbols already loaded for /usr/lib/libthread.so.1 > > 0xff0194a0 in door_restart () from > > /usr/lib/libc.so.1 > > (gdb) continue > > Continuing. > > [New Thread 4 (LWP 5)] > > [Switching to Thread 4 (LWP 5)] > > > > Program received signal SIGSEGV, Segmentation > fault. > > 0x142130 in __default_alloc_template<false, > > 0>::allocate (__n=32) > > at > > > /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/2.95.3/../../../../include/g++-3/stl_alloc.h:422 > > > 422 *__my_free_list = __result -> > > _M_free_list_link; > > (gdb) bt > > #0 0x142130 in __default_alloc_template<false, > > 0>::allocate (__n=32) > > at > > > === message truncated === __________________________________________________ Do you Yahoo!? Faith Hill - Exclusive Performances, Videos & More http://faith.yahoo.com |
From: Gene G. <mus...@ya...> - 2002-11-05 06:03:25
|
Locker(m_mutex) should be moved from Session::send to Session::sendRaw, otherwise there is a race condition between app-level and admin-level messages if the application spawns separate (from the primary Session queue processing) threads to send app-level messages. I happened to actually run into that because I have another thread per session that provides async sendToTarget outgoing queue. Gene __________________________________________________ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/ |