[SSI-devel] [ ssic-linux-Bugs-2010453 ] OpenSSI fails the glibc tst-basic3 test
Brought to you by:
brucewalker,
rogertsang
From: SourceForge.net <no...@so...> - 2008-09-14 03:08:05
|
Bugs item #2010453, was opened at 2008-07-04 05:32 Message generated for change (Comment added) made by rogertsang You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=2010453&group_id=32541 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Process Management Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Hughes (hughesj) Assigned to: Nobody/Anonymous (nobody) Summary: OpenSSI fails the glibc tst-basic3 test Initial Comment: The test starts some threads; waits for them to exit then sends a signal. It complains that the signal is not received. $ cc -pthread tst-basic3.c $ ./a.out starting 20 + 1 threads 20 left 19 left 18 left 17 left 16 left 15 left 14 left 13 left 12 left 11 left 10 left 9 left 8 left 7 left 6 left 5 left 4 left 3 left 2 left 1 left 0 left final_test has been called Expected signal 'User defined signal 1' from child, got none ---------------------------------------------------------------------- >Comment By: Roger Tsang (rogertsang) Date: 2008-09-13 23:07 Message: Looks like waitpid() / kill() missed the threads. ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-07-04 07:08 Message: Logged In: YES user_id=166336 Originator: YES Here's a trace of the same process on a stock 2.6.11 kernel. The odd behaviour of the processes ("parent" thread exits before child) is the same. What's different on the non-OpenSSI kernel is the signal delivery - that works. 2581 clone(child_stack=0xb7e9b4c4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0xb7e9bbf8, {entry_number:6, base_addr:0xb7e9bbb0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb7e9bbf8) = 2582 2581 write(1, " 1 left in pid 2581\n", 20) = 20 [...] 2581 _exit(0) = ? 2582 write(1, " 0 left in pid 2581\n", 20) = 20 2582 write(2, "final_test has been called from "..., 41) = 41 2582 kill(2581, SIGUSR1) = 0 2582 --- SIGUSR1 (User defined signal 1) @ 0 (0) --- File Added: zz-2.6.11 ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-07-04 06:17 Message: Logged In: YES user_id=166336 Originator: YES Excerpts from trace: 69921 execve("./a.out", ["./a.out"], [/* 15 vars */]) = 0 [...] 69921 write(2, "Start main pid 69921\n", 21) = 21 69921 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7db8708) = 69922 [ that "clone" is a fork ] 69921 waitpid(69922, <unfinished ...> [...] 69922 write(2, "test running as pid 69922\n", 26) = 26 69922 clone(child_stack=0xb7db74c4, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED, parent_tidptr=0xb7db7bf8, {entry_number:6, base_addr:0xb7db7bb0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}, child_tidptr=0xb7db7bf8) = 69923 [ that "clone" is a pthread_create ] [...] 69922 munmap(0xb75b1000, 27800) = 0 69922 futex(0xb75b0be4, FUTEX_WAKE, 2147483647) = 0 69922 _exit(0) [ wierd, why is 69922 exiting, I'd expect it to be 69223 ] 69923 write(1, " 0 left in pid 69922\n", 21) = 21 69923 write(2, "final_test has been called from "..., 42) = 42 69923 kill(69922, SIGUSR1) = -1 ESRCH (No such process) 69923 exit_group(0) It looks like the thread is running in the wrong clone. ---------------------------------------------------------------------- Comment By: John Hughes (hughesj) Date: 2008-07-04 06:12 Message: Logged In: YES user_id=166336 Originator: YES Here's what goes wrong: 69772 execve("./tst-basic3", ["./tst-basic3"], [/* 15 vars */]) = 0 [...] 69793 write(1, " 0 left\n", 8) = 8 69793 write(1, "final_test has been called", 26) = 26 69793 write(1, "\n", 1) = 1 69793 kill(69773, SIGUSR1) = -1 ESRCH (No such process) 69793 exit_group(0) = ? 69772 <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 69773 69772 --- SIGCHLD (Child exited) @ 0 (0) --- 69772 futex(0xb7f409c0, FUTEX_WAKE, 2147483647) = 0 69772 write(2, "Expected signal \'User defined si"..., 61) = 61 69772 exit_group(1) = ? I.e. the signal is being sent to the wrong process, 69773 instead of 69772. Why? The code is "kill (getpid(), SIGUSR1)" Wierd, it looks like the group leader process is exiting, here's another trace: :q1 File Added: zz-strace ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=405834&aid=2010453&group_id=32541 |