From: SourceForge.net <no...@so...> - 2006-09-07 20:28:14
|
Bugs item #999544, was opened at 2004-07-28 11:32 Message generated for change (Comment added) made by dgp You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=999544&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 48. Threading Group: obsolete: 8.4.4 Status: Open Resolution: None Priority: 7 Submitted By: Rob Crittenden (rcrittenden0569) Assigned to: Zoran Vasiljevic (vasiljevic) Summary: Threaded Tcl calls non-thread-safe library functions Initial Comment: This thread-safety issue is well documented in this closed bug http://sourceforge.net/tracker/?group_id=10894&atid=110894&func=detail&aid=217833 Tcl calls at least the following non-thread-safe functions: getgrnam() getpwnam() getpwuid() gethostbyaddr() gethostbyname() A stack trace from AOLserver 4.0.1 with Tcl 8.4.4 running on Solaris 8 that crashed calling gethostbyname is: ----------------- lwp# 281 / thread# 280 -------------------- fefbb2c8 __inet_address_is_local_af (5f75d18, 2, 0, 2, 2ee4f, c5189ad9) + 108 fefba358 order_haddrlist_af (ff02263c, ffff, 2, 2edd8, 2, 4b9eae88) + 274 fef9b5f8 _get_hostserv_inetnetdir_byname (2edd8, c5189be0, 0, c5189be0, ff01ef28, ffffffff) + 830 fefb54c0 gethostbyname_r (4ba27760, 2edbc, 2edd0, 920, 2edbc, ff01ef28) + a0 ff34c944 GetAddr (0, c518a08c, c5189cd0, c518a890, 3226db80, 0) + 28 ff34c734 DnsGet (0, c5189db8, 3ae558, c518a08c, 3226db80, 1) + 134 ----------------- lwp# 280 / thread# 279 -------------------- ff11f20c _lwp_sema_wait (c5d8be60, fef6c000, 0, c5d8bd98, 0, 0) + c fef490d8 _swtch (c5d8bd98, 0, fef6c000, 5, 1000, 0) + 158 fef482ec _cond_wait (c5d8bd98, 4356, fef6c000, fef6e968, ff022668, 10) + d4 fef4c164 rw_rdlock (fef7785c, 5000, fef6c000, 5257, ff01ef28, ff022668) + d8 fefba1dc order_haddrlist_af (ff02263c, ff022648, 2, 2ede4, 2, 2edbc) + f8 fef9b5f8 _get_hostserv_inetnetdir_byname (2ede4, c5d881d8, 0, c5d881d8, ff01ef28, ffffffff) + 830 fefb54c0 gethostbyname_r (475a17d0, 2edbc, 2edd0, 920, 2edbc, ff01ef28) + a0 ff291a18 CreateSocketAddress (c5d883b8, 13f97230, c5d88264, ff13c000, 33fb4a02, 0) + 58 ff291704 CreateSocket (3be6eec0, 0, ffffffff, 0, 0, 0) + 20 ff291ac8 Tcl_OpenTcpClient (3be6eec0, 50, 13f97230, 0, 0, 0) + 2c Don't be confused by the reference to gethostbyname_r. gethostbyname apparently calls gethostbyname_r using the same shared return buffer. ---------------------------------------------------------------------- >Comment By: Don Porter (dgp) Date: 2006-09-07 16:28 Message: Logged In: YES user_id=80530 All my issues are now resolved. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2006-09-07 14:55 Message: Logged In: YES user_id=95086 Allright. This should be all fixed now. It was indeed an alignment problem of char* arrays. I believe both threaded and non-threaded tests must go OK now in both 8.4 and 8.5 branches. If not, you will tell me on time ;-) ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2006-09-07 12:17 Message: Logged In: YES user_id=95086 But this still does not work everywhere... Let me study this in more detail... I will take care of that in the next 1/2 of hour. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2006-09-07 12:15 Message: Logged In: YES user_id=95086 Damn! I'm pretty blind! Replace (in CopyArray) new = (char **)buf; with new = &buf; and recompile. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2006-09-07 12:01 Message: Logged In: YES user_id=95086 Well, it says in the code: /* *---------------------------------------------------------------- ----------- * * CopyArray -- * * Copies array of NULL-terminated or fixed-length strings * to the private buffer, honouring the size of the buffer. * * Results: * Number of bytes copied on success or -1 on error (errno = ERANGE) * * Side effects: * None. * *---------------------------------------------------------------- ----------- */ This is basically copying the src[0] .... src[n] to a user-passed buffer. The buffer stores the copy of the array and following it the strings referenced by the array. I look and still do not see the problem. There is one assigment there len = (sizeof(char *)*(i + 1)); /* Leave place for the array */ new = (char **)buf; But this should be innocent really... ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2006-09-07 11:53 Message: Logged In: YES user_id=80530 The functioning of CopyArray() is not immediately clear to me (Comments please!)... but it looks like an array of (char) is getting cast to an array of (char *) ? That does seem like an operation that's perilous when it comes to alignment matters. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2006-09-07 11:52 Message: Logged In: YES user_id=95086 Ah! Allright. Yes, this is the alignment problem. Let me see if I can get this one blindly (I think I can). Also, what I will do is to make a !defined(TCL_THREADS) which will simply fallback to the mt-unsafe call. This will take care of the issue more "naturally". Nevertheless it is good to know that there is an alignment problem there which I will fix now. When this is done, we must see other errors you mentioned which appear in the threaded build. Hmhm... ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2006-09-07 11:42 Message: Logged In: YES user_id=80530 gdb is not on this system. trying (unfamiliar) dbx, I see a suggestion the issue is alignment? % file attributes ~ signal BUS (invalid address alignment) in CopyArray Symbol *0xc23588 ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2006-09-07 11:31 Message: Logged In: YES user_id=95086 Even more weird. I need not tell you that on bash-2.03$ uname -a SunOS Develop 5.8 Generic_117350-20 sun4u sparc SUNW,Sun-Blade-2500 Solaris bash-2.03$ gcc -v Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.8/3.3.2/specs Configured with: ../configure --with-as=/usr/ccs/bin/as --with-ld=/usr/ ccs/bin/ld --disable-nls --disable-libgcj --enable-languages=c,c++ Thread model: posix gcc version 3.3.2 I get all working fine? We will have to see what's up there... I will shortly checkin a tclUnixCompat.c with fallback to original mt-unsafe calls in case TCL_THREADS is not defined. But it would help me if you can isolate a single test and run it under debugger so we se at which point it breaks. Can you do that? ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2006-09-07 11:23 Message: Logged In: YES user_id=80530 Same failures using gcc 3.2.3. ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2006-09-07 11:02 Message: Logged In: YES user_id=95086 There you go... Well, I have only GCC and this revealed no problems: ./tclsh % info patch 8.4.14 % file attributes ~ -group develop -owner zoran -permissions 040755 This is most odd. Do you have a chance to compile with GCC just to see if this makes any difference? ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2006-09-07 10:59 Message: Logged In: YES user_id=80530 The --enable-threads build is much better. Just a couple failed tests: ==== unixFCmd-15.1 SetGroupAttribute - invalid group FAILED ==== Contents of test case: catch {file delete -force -- foo.test} list [catch {file attributes foo.test -group foozzz} msg] $msg [file delete -force -- foo.test] ---- Result was: 1 {could not set group for file "foo.test": no such file or directory} {} ---- Result should have been (exact matching): 1 {could not set group for file "foo.test": group "foozzz" does not exist} {} ==== unixFCmd-15.1 FAILED ==== unixFCmd-16.3 SetOwnerAttribute - invalid owner FAILED ==== Contents of test case: catch {file delete -force -- foo.test} list [catch {file attributes foo.test -owner foozzz} msg] $msg ---- Result was: 1 {could not set owner for file "foo.test": no such file or directory} ---- Result should have been (exact matching): 1 {could not set owner for file "foo.test": user "foozzz" does not exist} ==== unixFCmd-16.3 FAILED ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2006-09-07 10:49 Message: Logged In: YES user_id=80530 latest sources have many crashes on both HEAD and 8.4.-branch on Solaris 8 in the --disable-threads configuration when compiled with Forte Developer 7 C 5.4 2002/03/09. First example: % info patch 8.4.14 % file attributes ~ Bus Error ---------------------------------------------------------------------- Comment By: Zoran Vasiljevic (vasiljevic) Date: 2006-09-06 09:43 Message: Logged In: YES user_id=95086 Took me quite some time to get rid of those :-( Anyways... this is now done in core-8-4-branch and head branch. I'm leaving the ticket open for some weeks now and unless we find some other non-mt-safe call in Tcl core, I will close it. ---------------------------------------------------------------------- Comment By: Andreas Kupries (andreas_kupries) Date: 2004-08-13 12:03 Message: Logged In: YES user_id=75003 I am not sure why assigned to me. Giving to Zoran for comments. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=999544&group_id=10894 |