Thread: Re: [Quickfix-developers] Build problems on Solaris
Brought to you by:
orenmnero
From: Joerg T. <Joe...@ma...> - 2003-04-03 11:35:00
|
Hi Barry, this is an easy one... > The only problem is that I don't seem able to execute pbind properly against > any process that I own. Insufficient rights I am guessing. For example: No, invalid argument as pbind tells you. > bash-2.03$ /usr/sbin/psrinfo > 1 on-line since 07/19/02 18:35:00 > 2 on-line since 07/19/02 18:35:02 From this output, you see that you have processors 1 and 2 (see first column). > bash-2.03$ /usr/sbin/pbind -b 0 3761 > /usr/sbin/pbind: cannot bind pid 3761: Invalid argument But you try to bind to processor 0 (-b 0) which simply does not exist -- therefore the invalid argument. I am sorry for the rather sketchy explanation in my last mail. I thought that it was obvious to use psrinfo to look the available processor number (at my Sun: 0, but at your Sun: 1 and 2) and I also thought that there is always a processor 0. Nevertheless, feel free to ask further questions. Actually, we used this approach to hotfix a threading problem under heavy load on a multiprocessor machine while development took place on a single processor one. But later we discovered, that we had used C library functions which are not thread save. After we fixed this, we could drop the pbind. For all non-thread save functions, there are thread-safe (re-entrant) variants suffixed by "_r". In our case, gethostbyname() was one of the culprits. We replaced it by gethostbyname_r(), which lead to a somewhat more complex call syntax. Oren, perhaps QF uses some non-thread safe functions (e.g. gethostbyname()). In my experience, this can work for a long time, but if you switch to a real fast MP machine (for production), the program starts to dump core. On my machine, I looked in the following way for all "_r" suffixed functions: joerg@polaris:/usr/lib $ nm -A -g -n lib*.so | egrep '_r$' | grep -v UNDEF libXm.so: [6411] | 1542716| 252|FUNC |GLOB |0 |12 |XmFontListCreate_r libXm.so: [7899] | 1542040| 256|FUNC |GLOB |0 |12 |XmFontListEntryCreate_r libXm.so: [8495] | 1542980| 12|FUNC |GLOB |0 |12 |XmStringCreateFontList_r libXm.so: [6958] | 483476| 96|FUNC |GLOB |0 |12 |_XmReCacheLabG_r libbsm.so: [865] | 30892| 408|FUNC |GLOB |0 |9 |getauclassent_r libbsm.so: [586] | 31364| 76|FUNC |GLOB |0 |9 |getauclassnam_r libbsm.so: [587] | 36924| 424|FUNC |GLOB |0 |9 |getauevent_r libbsm.so: [727] | 37412| 92|FUNC |GLOB |0 |9 |getauevnam_r libbsm.so: [814] | 37504| 88|FUNC |GLOB |0 |9 |getauevnum_r libbsm.so: [826] | 54888| 424|FUNC |GLOB |0 |9 |getauuserent_r libbsm.so: [583] | 55368| 168|FUNC |GLOB |0 |9 |getauusernam_r libc.so: [2699] | 227336| 332|FUNC |GLOB |0 |9 |__posix_asctime_r libc.so: [3019] | 227852| 68|FUNC |GLOB |0 |9 |__posix_ctime_r libc.so: [3752] | 608620| 100|FUNC |GLOB |0 |9 |__posix_getgrgid_r libc.so: [4553] | 608856| 92|FUNC |GLOB |0 |9 |__posix_getgrnam_r libc.so: [4510] | 240376| 120|FUNC |GLOB |0 |9 |__posix_getlogin_r libc.so: [3910] | 611572| 100|FUNC |GLOB |0 |9 |__posix_getpwnam_r libc.so: [3028] | 611336| 100|FUNC |GLOB |0 |9 |__posix_getpwuid_r libc.so: [3346] | 615168| 200|FUNC |GLOB |0 |9 |__posix_readdir_r libc.so: [3199] | 349788| 148|FUNC |GLOB |0 |9 |__posix_ttyname_r libc.so: [4378] | 227668| 48|FUNC |GLOB |0 |9 |_asctime_r libc.so: [3478] | 615368| 48|FUNC |GLOB |0 |9 |_ctermid_r libc.so: [4082] | 227780| 72|FUNC |GLOB |0 |9 |_ctime_r libc.so: [3184] | 609252| 124|FUNC |GLOB |0 |9 |_fgetgrent_r libc.so: [3038] | 611976| 124|FUNC |GLOB |0 |9 |_fgetpwent_r libc.so: [3294] | 613520| 124|FUNC |GLOB |0 |9 |_fgetspent_r libc.so: [3835] | 609044| 208|FUNC |GLOB |0 |9 |_getgrent_r libc.so: [3797] | 607960| 256|FUNC |GLOB |0 |9 |_getgrgid_r libc.so: [4619] | 607628| 332|FUNC |GLOB |0 |9 |_getgrnam_r libc.so: [3589] | 240080| 296|FUNC |GLOB |0 |9 |_getlogin_r libc.so: [4636] | 611768| 208|FUNC |GLOB |0 |9 |_getpwent_r libc.so: [3228] | 610772| 244|FUNC |GLOB |0 |9 |_getpwnam_r libc.so: [4694] | 611016| 184|FUNC |GLOB |0 |9 |_getpwuid_r libc.so: [3016] | 613312| 208|FUNC |GLOB |0 |9 |_getspent_r libc.so: [3807] | 613080| 136|FUNC |GLOB |0 |9 |_getspnam_r libc.so: [3116] | 336028| 72|FUNC |GLOB |0 |9 |_gmtime_r libc.so: [3720] | 335940| 68|FUNC |GLOB |0 |9 |_localtime_r libc.so: [3929] | 614524| 92|FUNC |GLOB |0 |9 |_rand_r libc.so: [3110] | 614616| 348|FUNC |GLOB |0 |9 |_readdir64_r libc.so: [3092] | 614964| 204|FUNC |GLOB |0 |9 |_readdir_r libc.so: [4684] | 324656| 128|FUNC |GLOB |0 |9 |_strtok_r libc.so: [3166] | 616336| 176|FUNC |GLOB |0 |9 |_tmpnam_r libc.so: [4325] | 349056| 732|FUNC |GLOB |0 |9 |_ttyname_r libc.so: [3870] | 608484| 136|FUNC |GLOB |0 |9 |_uncached_getgrgid_r libc.so: [4153] | 608720| 136|FUNC |GLOB |0 |9 |_uncached_getgrnam_r libc.so: [4681] | 611436| 136|FUNC |GLOB |0 |9 |_uncached_getpwnam_r libc.so: [4479] | 611200| 136|FUNC |GLOB |0 |9 |_uncached_getpwuid_r libc.so: [3587] | 227668| 48|FUNC |WEAK |0 |9 |asctime_r libc.so: [3559] | 615368| 48|FUNC |WEAK |0 |9 |ctermid_r libc.so: [2899] | 227780| 72|FUNC |WEAK |0 |9 |ctime_r libc.so: [4579] | 609252| 124|FUNC |WEAK |0 |9 |fgetgrent_r libc.so: [3187] | 611976| 124|FUNC |WEAK |0 |9 |fgetpwent_r libc.so: [3755] | 613520| 124|FUNC |WEAK |0 |9 |fgetspent_r libc.so: [3185] | 609044| 208|FUNC |WEAK |0 |9 |getgrent_r libc.so: [2760] | 607960| 256|FUNC |WEAK |0 |9 |getgrgid_r libc.so: [3782] | 607628| 332|FUNC |WEAK |0 |9 |getgrnam_r libc.so: [4251] | 240080| 296|FUNC |WEAK |0 |9 |getlogin_r libc.so: [3271] | 243412| 184|FUNC |GLOB |0 |9 |getnetgrent_r libc.so: [3520] | 611768| 208|FUNC |WEAK |0 |9 |getpwent_r libc.so: [4097] | 610772| 244|FUNC |WEAK |0 |9 |getpwnam_r libc.so: [3320] | 611016| 184|FUNC |WEAK |0 |9 |getpwuid_r libc.so: [3851] | 613312| 208|FUNC |WEAK |0 |9 |getspent_r libc.so: [4436] | 613080| 136|FUNC |WEAK |0 |9 |getspnam_r libc.so: [3362] | 336028| 72|FUNC |WEAK |0 |9 |gmtime_r libc.so: [3898] | 335940| 68|FUNC |WEAK |0 |9 |localtime_r libc.so: [3182] | 614524| 92|FUNC |WEAK |0 |9 |rand_r libc.so: [3279] | 614616| 348|FUNC |WEAK |0 |9 |readdir64_r libc.so: [3418] | 614964| 204|FUNC |WEAK |0 |9 |readdir_r libc.so: [3288] | 324656| 128|FUNC |WEAK |0 |9 |strtok_r libc.so: [4516] | 616336| 176|FUNC |WEAK |0 |9 |tmpnam_r libc.so: [3538] | 349056| 732|FUNC |WEAK |0 |9 |ttyname_r libm.so: [594] | 18020| 16|FUNC |GLOB |0 |9 |__gamma_r libm.so: [612] | 28796| 16|FUNC |GLOB |0 |9 |__lgamma_r libm.so: [658] | 18020| 16|FUNC |WEAK |0 |9 |gamma_r libm.so: [632] | 28796| 16|FUNC |WEAK |0 |9 |lgamma_r libnsl.so: [3766] | 435368| 52|FUNC |GLOB |0 |9 |__nis_map_group_r libnsl.so: [3773] | 234608| 160|FUNC |GLOB |0 |9 |_switch_gethostbyaddr_r libnsl.so: [3501] | 119800| 148|FUNC |GLOB |0 |9 |_switch_gethostbyname_r libnsl.so: [3929] | 234768| 160|FUNC |GLOB |0 |9 |_switch_getipnodebyaddr_r libnsl.so: [3543] | 234460| 148|FUNC |GLOB |0 |9 |_switch_getipnodebyname_r libnsl.so: [3646] | 216360| 48|FUNC |GLOB |0 |9 |_uncached_gethostbyaddr_r libnsl.so: [4061] | 216348| 12|FUNC |GLOB |0 |9 |_uncached_gethostbyname_r libnsl.so: [4253] | 216612| 200|FUNC |GLOB |0 |9 |gethostbyaddr_r libnsl.so: [3714] | 216408| 204|FUNC |GLOB |0 |9 |gethostbyname_r libnsl.so: [4141] | 218508| 144|FUNC |GLOB |0 |9 |gethostent_r libnsl.so: [4225] | 228912| 148|FUNC |GLOB |0 |9 |getrpcbyname_r libnsl.so: [3561] | 229060| 148|FUNC |GLOB |0 |9 |getrpcbynumber_r libnsl.so: [4424] | 229328| 132|FUNC |GLOB |0 |9 |getrpcent_r libnsl.so: [3817] | 114092| 76|FUNC |GLOB |0 |9 |inet_ntoa_r libnsl.so: [3526] | 415004| 96|FUNC |GLOB |0 |9 |nis_leaf_of_r libnsl.so: [4178] | 430712| 176|FUNC |GLOB |0 |9 |nis_sperror_r libsocket.so: [273] | 20944| 152|FUNC |GLOB |0 |9 |getnetbyaddr_r libsocket.so: [261] | 20796| 148|FUNC |GLOB |0 |9 |getnetbyname_r libsocket.so: [343] | 21220| 132|FUNC |GLOB |0 |9 |getnetent_r libsocket.so: [307] | 22304| 148|FUNC |GLOB |0 |9 |getprotobyname_r libsocket.so: [234] | 22452| 148|FUNC |GLOB |0 |9 |getprotobynumber_r libsocket.so: [463] | 22716| 132|FUNC |GLOB |0 |9 |getprotoent_r libsocket.so: [458] | 23488| 156|FUNC |GLOB |0 |9 |getservbyname_r libsocket.so: [464] | 23644| 160|FUNC |GLOB |0 |9 |getservbyport_r libsocket.so: [378] | 24268| 136|FUNC |GLOB |0 |9 |getservent_r Cheers, Jörg -- Joerg Thoennes http://macd.com Tel.: +49 (0)241 44597-24 Macdonald Associates GmbH Fax : +49 (0)241 44597-10 Lothringer Str. 52, D-52070 Aachen |
From: Joerg T. <Joe...@ma...> - 2003-04-03 11:47:10
|
Hi Barry, > No, invalid argument as pbind tells you. > >> bash-2.03$ /usr/sbin/psrinfo >> 1 on-line since 07/19/02 18:35:00 >> 2 on-line since 07/19/02 18:35:02 > > > From this output, you see that you have processors 1 and 2 (see first > column). > >> bash-2.03$ /usr/sbin/pbind -b 0 3761 >> /usr/sbin/pbind: cannot bind pid 3761: Invalid argument > > > But you try to bind to processor 0 (-b 0) which simply does not exist -- > therefore the invalid argument. Please, can you confirm whether this works now? Does it fix your problem? > Actually, we used this approach to hotfix a threading problem under > heavy load on a > multiprocessor machine while development took place on a single > processor one. > But later we discovered, that we had used C library functions which are > not thread save. > After we fixed this, we could drop the pbind. > > For all non-thread save functions, there are thread-safe (re-entrant) > variants suffixed > by "_r". In our case, gethostbyname() was one of the culprits. We > replaced it by > gethostbyname_r(), which lead to a somewhat more complex call syntax. Perhaps you have been experiencing exactly this kind of situation. It would be good to have a stack trace of core dump: $ pstack core In the stack trace we could see at which place the error occured. For more such nice utilities see "man proc", "man truss", "man gcore", "man coreadm". In addition, beginning with Solaris 2.8 there are two thread libraries: 1. Old T1 lib in /lib/libthread.so implements complex N:M mappings of N user threads to M<<N kernel threads callend LWPs( Light Weight Processes ). Due to the complexity there are many issues in the bug database. 2. New T2 lib in /lib/lwp/libthread.so. This simply maps every user thread to an LWP. This is much simpler and new kernels can handle lots of LWPs. Therefore, this is the default in Solaris 9. Prepend /usr/lib/lwp to your LD_LIBRARY_PATH to activate it: export LD_LIBRARY_PATH=/usr/lib/lwp:$LD_LIBRARY_PATH or setenv LD_LIBRARY_PATH /usr/lib/lwp:$LD_LIBRARY_PATH In our experience, if you mix Java application and JNI/native libs which create threads on their own, this is always a good choice to avoid threading problem by buggy libs. Cheers, Jörg -- Joerg Thoennes http://macd.com Tel.: +49 (0)241 44597-24 Macdonald Associates GmbH Fax : +49 (0)241 44597-10 Lothringer Str. 52, D-52070 Aachen |