From: SourceForge.net <no...@so...> - 2012-07-19 21:48:11
|
Bugs item #3544685, was opened at 2012-07-16 10:50 Message generated for change (Comment added) made by ferrieux You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3544685&group_id=10894 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: 49. Threading Group: development: 8.5.12 Status: Open Resolution: Fixed Priority: 5 Private: No Submitted By: Stuart Cassoff (stwo) Assigned to: Stuart Cassoff (stwo) Summary: Threaded build failures on OpenBSD-current Initial Comment: Trunk and 8.5.12rc0. async-4.3 fails on i386 and amd64 (sporadic) "Test file error: child killed: SIGABRT" interp-36.7 fails on i386 and amd64 event-14.1 fails on i386 (sporadic), difficult to reproduce, try repeatedly running the event tests with minimal delay between runs. I am currently able to reproduce this one with trunk but not 8.5.12rc0. ---------------------------------------------------------------------- >Comment By: Alexandre Ferrieux (ferrieux) Date: 2012-07-19 14:48 Message: Thanks to a ktrace output of a similar failure for event-11.4, here's a detailed analysis: event-11.4 is: after 10; update after 100 {set x x-done} after 200 {set y y-done} after 300 {set z z-done} after idle {set q q-done} => vwait on y and expect z to be not done yet the test fails because the pthread_condwait from 100 to 200 actually sees 500ms elapsed. As a consequence, the Tcl core immediately fires all due timers, including z. So my take is that on this system, mysterious chunks of 400ms get stolen by the system clock, or the PCLOCK_REALTIME mode of pthread_condwait is awfully buggy. ---------------------------------------------------------------------- Comment By: Stuart Cassoff (stwo) Date: 2012-07-19 11:57 Message: Async tests all pass now; the mutex lock was the thing needed, thanks. With the latest 8.5 trunk I just got an event-12.4 failure which looks like the other event failure and the interp one as well. Is it just a matter of the tests or is there a problem? I've never seen these errors in a non-threaded build. The event-12.4 failure looks like this: ==== event-12.4 Tcl_UpdateCmd procedure FAILED ==== Contents of test case: foreach i [after info] { after cancel $i } after 10; update; # On Mac make sure update won't take long after 200 {set x x-done} after 600 {set y y-done} after idle {set z z-done} set x before set y before set z before after 300 update list $x $y $z ---- Result was: x-done y-done z-done ---- Result should have been (exact matching): x-done before z-done ==== event-12.4 FAILED ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2012-07-19 11:08 Message: The fix was required, regardless of other lurking nasties. That's why I put it in the main 8.5 line. Anyway Stu, please update us now. ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2012-07-19 10:54 Message: Stuart is the best person answering that first. That's why I put it in a branch ;-) ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2012-07-19 10:48 Message: Hum, this only fixes the elephant in the front chair. What's the status of the other failures reported here ? ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2012-07-19 10:47 Message: > Of course, by "trunk" I mean "tip", of core-8-5-branch. Sure ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2012-07-19 10:44 Message: Of course, by "trunk" I mean "tip", of core-8-5-branch. ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2012-07-19 10:43 Message: No problem. I first assigned this issue to me, because it was still assigned to Joe. Now assigned to you, for handling it further. ---------------------------------------------------------------------- Comment By: Alexandre Ferrieux (ferrieux) Date: 2012-07-19 10:39 Message: Sorry, committed to trunk in the meantime. The chat is a nice place to synchronize ;) ---------------------------------------------------------------------- Comment By: Jan Nijtmans (nijtmans) Date: 2012-07-19 10:34 Message: Looks like you'r right. Fix committed to branch bug-3544685. Please test. ---------------------------------------------------------------------- Comment By: Stuart Cassoff (stwo) Date: 2012-07-19 09:48 Message: The async test problems appear to be caused by an imperfect backport resulting in a mismatched mutex lock/unlock pair (lock was missing). Bug 2981154, checkin 15a55ecb19. ---------------------------------------------------------------------- Comment By: Stuart Cassoff (stwo) Date: 2012-07-19 09:03 Message: Re async-4.3: There may be a specific problem with async-4.3 but for now the SIGABRT comes from other code in async.test which can be boiled down to: $ make runtest % testasync delete [testasync create q] Abort trap (core dumped) ---------------------------------------------------------------------- Comment By: Stuart Cassoff (stwo) Date: 2012-07-19 08:00 Message: interp-36.7 passes if "after 10 ..." is changed to "after 200 ..." ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2012-07-19 06:57 Message: so, timing sensitivty could be an issue. Does it help to change the "after 10 ..." to "after 200 ..." ? ---------------------------------------------------------------------- Comment By: Stuart Cassoff (stwo) Date: 2012-07-18 21:20 Message: This is with 8.5.12rc1. I can't get info for async-4.3 anymore since it just SIGABRTs, I guess, but iirc the reason for failure was similar. ==== interp-36.7 SlaveBgerror sets error handler of slave [1999035] FAILED ==== Contents of test case: slave eval { variable done {} after 0 error foo after 10 [list ::set [namespace which -variable done] {}] vwait [namespace which -variable done] } set result ---- Result was: untouched ---- Result should have been (exact matching): foo ==== interp-36.7 FAILED ---------------------------------------------------------------------- Comment By: Don Porter (dgp) Date: 2012-07-17 10:20 Message: What does the interp-36.7 failure look like? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=110894&aid=3544685&group_id=10894 |