Thread: Re: [Quickfix-developers] Usage of non-threadsafe C functions in QF
Brought to you by:
orenmnero
From: <OM...@th...> - 2003-04-03 09:26:13
|
It appears under windows they are not thread safe. From MSDN: "gmtime, mktime, and localtime all use a single statically allocated tm= structure for the conversion. Each call to one of these routines destro= ys the result of the previous call." In fact it seems that if I am reading that correctly, all three methods= share the same static structure. I'd prefer not to lock (long term) because this is a frequently called method, and would add some unwanted contention. What might have to be = done in the future is to start using the Win32 API equivalents instead. This looks like an important fix so it should probably go out with 1.4.= 1, initially using locking for windows. --oren |---------+-----------------------------------------------> | | Joerg Thoennes | | | <Joe...@ma...> | | | Sent by: | | | qui...@li...ur| | | ceforge.net | | | | | | | | | 04/03/2003 02:47 AM | | | Please respond to Joerg.Thoennes | | | | |---------+-----------------------------------------------> >--------------------------------------------------------------------= --------------------------| | = | | To: developers QuickFIX <qui...@li...ur= ceforge.net> | | cc: = | | Subject: [Quickfix-developers] Usage of non-threadsafe C fun= ctions in QF | >--------------------------------------------------------------------= --------------------------| Hi, inspired by Barry Bishops core dumps on a fast multiprocessor Solaris machine, I just checked if QF used functions which have re-entrant equivalents on Solaris or Li= nux (suffixed by _r). Here is what I found: ./include/FieldTypes.h: *static_cast < tm* > ( this ) =3D *gmtime( &= sec ); ./include/FieldTypes.h: *static_cast < tm* > ( this ) =3D *localtime= ( &time ); ./include/FieldTypes.h: *static_cast < tm* > ( this ) =3D *gmtime( &= t ); ./src/C++/FieldTypes.h: *static_cast < tm* > ( this ) =3D *gmtime( &= sec ); ./src/C++/FieldTypes.h: *static_cast < tm* > ( this ) =3D *localtime= ( &time ); ./src/C++/FieldTypes.h: *static_cast < tm* > ( this ) =3D *gmtime( &= t ); ./src/C++/Utility.cpp: buf =3D gethostbyname( name ); ./src/C++/Utility.cpp: return inet_ntoa( **paddr ); The last function probably does not count, since the man page states th= at the returned buffer is in a thread local data space (and the _r variant is not mentioned in= the man pages). BUT: The other function can really account for some core dumps. Actuall= y we used all these function in our software until some release consistently core dumped on= our fast multiprocessor production machine. As a quick fix we used the pbind command to assign = only one processor to the process. Analysis of the core file revealed the culprits listed abo= ve. After replacing the usage of gmtime(), localtime() and gethostbyname() by their _r variants= , core dumping stopped and all went fine since then. Oren, I have no way to check whether the functions on Windows are threa= d safe by default. But I would suggest to put these functions also into Utility.cpp and perhaps protect the Windows variants by a mutex if they are unsafe. If you need an example for the UNIX implementation, I could extract one out of our code. Cheers, J=F6rg -- Joerg Thoennes http://macd.com Tel.: +49 (0)241 44597-24 Macdonald Associates GmbH Fax : +49 (0)241 44597-10 Lothringer Str. 52, D-52070 Aachen ------------------------------------------------------- This SF.net email is sponsored by: ValueWeb: Dedicated Hosting for just $79/mo with 500 GB of bandwidth! No other company gives more support or power for your dedicated server http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ _______________________________________________ Quickfix-developers mailing list Qui...@li... https://lists.sourceforge.net/lists/listinfo/quickfix-developers = |
From: Joerg T. <Joe...@ma...> - 2003-04-03 10:50:23
|
OM...@th... wrote: > It appears under windows they are not thread safe. > [....] > This looks like an important fix so it should probably go out with 1.4.1, > initially using locking for windows. May I provide some sample code from our software: For gethostbyname_r(() there are difference between Linux and Solaris: 102 /* Get the IP address of the localhost. 103 * Note: This thread-safe version needs some extra data structures. 104 */ 105 { 106 struct hostent host; 107 struct hostent *host_ptr = NULL; 108 char gethostname_buffer[ 1024 ]; 109 int error; 110 111 #ifdef SYSTEM_linux 112 gethostbyname_r( "localhost", &host, gethostname_buffer, sizeof( gethostname_buffer ), &host_ptr, &error ); 113 #endif 114 #ifdef SYSTEM_sunos 115 host_ptr = gethostbyname_r( "localhost", &host, gethostname_buffer, sizeof( gethostname_buffer ), &error ); 116 #endif 117 118 if ( NULL == host_ptr ) { 119 logError( "openSocketQueue: Could not resolve 'localhost'" ); 120 return -1; 121 } 122 123 memcpy( &sin.sin_addr, host_ptr->h_addr_list[0], sizeof( sin.sin_addr ) ); 124 } localtime() and gmtime() are simple, since you only have to provide an extra buffer instead of using the returned value. Note that there man pages may be missing on Linux. But you could use http://docs.sun.com to look up the Solaris man pages, which are equivalent here: SYNOPSIS #include <time.h> [...deleted...] struct tm *localtime_r(const time_t *clock, struct tm *res); struct tm *gmtime_r(const time_t *clock, struct tm *res); DESCRIPTION [...] The localtime_r() and gmtime_r() functions have the same functionality as localtime() and gmtime() respectively, except that the caller must supply a buffer res to store the result. [...] ERRORS The ctime_r() and asctime_r() functions will fail if: ERANGE The length of the buffer supplied by the caller is not large enough to store the result. Cheers, Jörg -- Joerg Thoennes http://macd.com Tel.: +49 (0)241 44597-24 Macdonald Associates GmbH Fax : +49 (0)241 44597-10 Lothringer Str. 52, D-52070 Aachen |
From: Gene G. <mus...@ya...> - 2003-04-03 18:26:31
Attachments:
fixpatch0403.zip
|
1) Acceptor leaks thread in which onRun is running. The code uses incorrect function (the one returning boolean, not threadid) to spawn the onRun thread, and uses resulting "1" instead of correct thread id to join. The consequenses are that accptor->stop (called after onRun) continues running after acceptor.start returns and is acceptor instance gets deleted, which often results in crashes in my server. 2) After fixing above, a race condition in ThreadedSocketAcceptor is apparent. onStop thread iterates over the collection of open connection threads trying to join them. This loop is protected by mutex. However the connection threads themselves attempt to erase themself from the same collection, and deadlock on the same mutex. The fix is to treat shutdown as the special case and when a m_stop flag is set, skip per-thread cleanup. m_stop has to be protected by a separate mutex. My patch uses a class MtVar(.h) (tested internally on all platforms) which allows to change only the declaration: bool m_stop to MtVar<bool> m_stop and have exact same semantics elsewhere, only in a thread-safe way. Four files are attached (in a zip): patches for Acceptor.cpp, ThreadedSocketAccpeptor.cpp and .h and MtVar.h Gene __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com |
From: Gene G. <mus...@ya...> - 2003-04-04 19:29:56
Attachments:
Session_cpp.patch
|
I have FIX:Session that has messages send to it from multiple threads. I have discovered (the hard way) that under certain conditions Session::send crashes. It happens when simultaneously there is a Session::disconnect called from another thread, for example when FIX session counterpart closes socket. Session::disconnect deletes m_Responder, if that happens after Session::send checks for m_Responder for presence, but before it is finished using it, crash ensues. The fix is to guard Session::disconnect with the same mutex that guards Session::sendRaw. I have attached patch that does just that. Gene __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com |
From: Vamsi K. <Vam...@ib...> - 2003-04-07 14:09:29
|
HI I was trying to poll on external source ( pre-composed FIX messages residing either in Database or MQSeries or some proprietory command messages which start or start FIXClient component). Can somebody tell me how? I was trying to use onRun function. It seems that it will get invoked only once. Thanks in advance Vamsi |
From: Vamsi K. <Vam...@ib...> - 2003-04-07 16:23:42
|
I guess I was not clear. I am developing a FIX Client which maintains a session as well as polls for an external source for pre-built FIX formatted messages ( only body) and add header and send them to the FIX Server. Can somebody tell me how to do that? Vamsi /-----Original Message----- /From: qui...@li... [mailto:quickfix- /dev...@li...] On Behalf Of Vamsi Krishna /Sent: Monday, April 07, 2003 10:09 AM /To: qui...@li... /Cc: OM...@th... /Subject: [Quickfix-developers] OnRun Function? / /HI /I was trying to poll on external source ( pre-composed FIX messages /residing either in Database or MQSeries or some proprietory command /messages which start or start FIXClient component). / /Can somebody tell me how? I was trying to use onRun function. It seems /that it will get invoked only once. / /Thanks in advance /Vamsi / / / /------------------------------------------------------- /This SF.net email is sponsored by: ValueWeb: /Dedicated Hosting for just $79/mo with 500 GB of bandwidth! /No other company gives more support or power for your dedicated server /http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ /_______________________________________________ /Quickfix-developers mailing list /Qui...@li... /https://lists.sourceforge.net/lists/listinfo/quickfix-developers |
From: Nicholas P. <nic...@sl...> - 2003-04-07 17:40:35
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have discovered a bug in QuickFIX when dealing with the 355 tag in the Email message. I am working with a vendor who has chosen to embed Custom FIX tags within the 355 block of an email message. QuickFIX does not parse this message correctly at all. Instead of looking at the 354 tag and determining the length of the 355 tag from that, it instead assumes that the next <eos> after the start of the 355 tag indicates it's end. While the embedding of FIX within a tag like the 355 tag is a dubious choice, I have to receive these message. I will endeavor to make a patch for this bug. The other bug I have found has to do with the way the BodyLength is calculated. The message I receive from this vendor has a tag that is repeated, once outside of a repeating group, and once within. QuickFIX incorrectly calculates the body length by only counting this repeated tag once, and thus rejects the message outright. Finally, a question on QuickFIX internals. I have noticed that there is a distinct lack of logging within the various components that make up QuickFIX. For example, when a message is rejected due to a bad BodyLength or Checksum QuickFIX prints no error messages at all? Another example would be in parsing the XML file for validation, if there is an error in the XML it prints an error, but not the tag that caused it, making diagnosis of the problem much more difficult. Is there any reason for this? Is there a canonical way to log from within other QuickFIX modules that I could use to log these errors? Thanks, - -Nick -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQE+kbgDR42/Somtp0QRAsCPAJ90HWqJmF5TcJ38tCkNUT+bHlNpOgCeJroN U2qGT1VcZIE/+nKuNvwKQJQ= =Kcgv -----END PGP SIGNATURE----- |
From: Oren M. <ore...@ya...> - 2003-04-07 18:07:51
|
> I have discovered a bug in QuickFIX when dealing > with the 355 tag in the > Email message. I am working with a vendor who has > chosen to embed Custom > FIX tags within the 355 block of an email message. > QuickFIX does not > parse this message correctly at all. QuickFIX does not yet fully support DATA fields (it treats them like normal strings, as you have observed). So this is new functionality that needs to be added. > The other bug I have found has to do with the way > the BodyLength is > calculated. The message I receive from this vendor > has a tag that is > repeated, once outside of a repeating group, and > once within. QuickFIX > incorrectly calculates the body length by only > counting this repeated > tag once, and thus rejects the message outright. Can you provide the message (or a similar one, with the data changed), so we can add it to our test suite? > Finally, a question on QuickFIX internals. I have > noticed that there is > a distinct lack of logging within the various > components that make up > QuickFIX. For example, when a message is rejected > due to a bad > BodyLength or Checksum QuickFIX prints no error > messages at all? Another > example would be in parsing the XML file for > validation, if there is an > error in the XML it prints an error, but not the tag > that caused it, > making diagnosis of the problem much more difficult. > Is there any reason > for this? Is there a canonical way to log from > within other QuickFIX > modules that I could use to log these errors? This depends on what version you are using. Earlier versions had no significant logging at all. In the newer releases it has gotten pretty good. You should look at the Log and LogFactory interface. Implementations that come with QF are the ScreenLogFactory, FileLogFactory, and MySQLLogFactory. The LogFactory is an optional interface that is passed in to the initiator and acceptor. If you do not pass one in, you will get no logging. There is no reason that messages are at their current detail level. Patches that improve on error messages are welcome. __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com |