|
From: Alex I. <ale...@in...> - 2005-03-17 21:49:41
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
Jeremy,<br>
<br>
I am using RedHat 8 (2.4.9-9 kernel, glibc 2.3.2, pthreads 0.10) amd MontaVista
Linux (kernel 2.4.20, glibc 2.2.5, pthreads 0.9). Unfortunately, I can't
get you a test case - the application is huge, dynamically loads about 20
shared libs and must be running in a sandbox with other apps.<br>
<br>
However, I did some debugging, and it seems that the problem is in cleaning
up the threads in shutdown_actions() and reap_threads(). My threads have
a common semaphore, so when we are "zapping" the threads, those ones that
are not blocked are going out cleanly, but some threads are still in __pthread_sigsuspend()
call, and it looks like it's not a cancellation point. Therefore, these suspended
threads never exit and valgrind never returns from reap_threads(). Ath the
same time the main thread already exited - thus I end up with a bunch of
orphaned threads on "ps".<br>
<br>
This seems to be a generic problem ... right now I got around it by commenting
out the call to reap_threads() in shutdown_actions() - at that point we nuked
all threads, and if some of them have not exited, there is nothing that can
be done. I do get a summary now.<br>
<br>
Let me know of the implications of commenting out that call or if there is
a better way to fix this. I am attaching the dump from valgrind assertion
failure, which shows the state of the threads that are lingering at exit.<br>
<br>
Thanks a lot, as always!<br>
Alex<br>
<br>
valgrind: core_os.c:85 (vgArch_terminate): Assertion `vgPlain_count_living_threads()
== 0' failed.<br>
==20739== at 0xB002E30D: ??? (vg_mylibc.c:1166)<br>
<br>
sched status:<br>
running_tid=0<br>
<br>
Thread 4: status = VgTs_WaitSys<br>
==20739== at 0x1BA7527D: __pthread_sigsuspend (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA7408C: __pthread_wait_for_restart_signal (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA70D77: <a class="moz-txt-link-abbreviated" href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1B95E251: BinarySemaphore::Wait(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/BinarySemaphore.cc:100)<br>
==20739== by 0x1B9652B5: Process::SchedulerSuspend(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/process.cc:993)<br>
==20739== by 0x1B96665E: resched(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/resched.cc:35)<br>
==20739== by 0x1B975383: Process::TrigWait(unsigned int) (/probe/SimXinuRef/v503/os/src/geoXinu/src/sys/kernel/trigger.cc:113)<br>
<br>
Thread 7: status = VgTs_WaitSys<br>
==20739== at 0x1BA7527D: __pthread_sigsuspend (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA7408C: __pthread_wait_for_restart_signal (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA70D77: <a class="moz-txt-link-abbreviated" href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1B95E251: BinarySemaphore::Wait(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/BinarySemaphore.cc:100)<br>
==20739== by 0x1B9652B5: Process::SchedulerSuspend(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/process.cc:993)<br>
==20739== by 0x1B96665E: resched(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/resched.cc:35)<br>
==20739== by 0x1B9816B6: KSemaphore::Wait(void) const (/home/agi/probe/v503.sim/os/src/geoXinu/src/../../libKal/src/xinu/KSemaphore.cpp:174)<br>
<br>
Thread 8: status = VgTs_WaitSys<br>
==20739== at 0x1BA7527D: __pthread_sigsuspend (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA7408C: __pthread_wait_for_restart_signal (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA70D77: <a class="moz-txt-link-abbreviated" href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1B95E251: BinarySemaphore::Wait(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/BinarySemaphore.cc:100)<br>
==20739== by 0x1B9652B5: Process::SchedulerSuspend(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/process.cc:993)<br>
==20739== by 0x1B96665E: resched(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/resched.cc:35)<br>
==20739== by 0x1B9816B6: KSemaphore::Wait(void) const (/home/agi/probe/v503.sim/os/src/geoXinu/src/../../libKal/src/xinu/KSemaphore.cpp:174)<br>
<br>
Thread 10: status = VgTs_WaitSys<br>
==20739== at 0x1BA7527D: __pthread_sigsuspend (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA7408C: __pthread_wait_for_restart_signal (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA70D77: <a class="moz-txt-link-abbreviated" href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1B95E251: BinarySemaphore::Wait(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/BinarySemaphore.cc:100)<br>
==20739== by 0x1B9652B5: Process::SchedulerSuspend(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/process.cc:993)<br>
==20739== by 0x1B96665E: resched(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/resched.cc:35)<br>
==20739== by 0x1B97405C: Process::Receive(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/sys/kernel/receive.cc:25)<br>
<br>
Thread 14: status = VgTs_WaitSys<br>
==20739== at 0x1BA7527D: __pthread_sigsuspend (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA7408C: __pthread_wait_for_restart_signal (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA70D77: <a class="moz-txt-link-abbreviated" href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1B95E251: BinarySemaphore::Wait(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/BinarySemaphore.cc:100)<br>
==20739== by 0x1B9652B5: Process::SchedulerSuspend(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/process.cc:993)<br>
==20739== by 0x1B96665E: resched(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/resched.cc:35)<br>
==20739== by 0x1B975383: Process::TrigWait(unsigned int) (/probe/SimXinuRef/v503/os/src/geoXinu/src/sys/kernel/trigger.cc:113)<br>
<br>
<br>
<br>
Jeremy Fitzhardinge wrote:<br>
<blockquote type="cite" cite="mid...@go...">
<pre wrap="">Alex Ivershen wrote:
</pre>
<blockquote type="cite">
<pre wrap="">I am having a little problem with 2.4.0.rc4 handling signals in my app
- 2.2.0 worked fine. The application logic is as follows - the main
thread spawns about a dozen threads, then sits in pause() call in a
forever loop. At some point one of the threads determines that the
application must be shut down and sends a SIGNIT to the main thread,
which does some cleanup and exits the application.
With rc4 it results in a bunch of defunct threads and I never get a
summary on exit. Incidentally, I don't get a summary when calling
_exit() from within the code either (it used to work). Please, let me
know what workarounds I can use here. Below is the signal trace.
</pre>
</blockquote>
<pre wrap=""><!---->
What kernel/libc/libpthread/distro are you using? Is it possible to
isolate this in a test-case program?
J</pre>
</blockquote>
------------------------------------------------------------------------
Confidentiality Notice: This e-mail transmission may contain
confidential and/or privileged information that is intended only for the
individual or entity named in the e-mail address. If you are not the
intended recipient, you are hereby notified that any disclosure,
copying, distribution or reliance upon the contents of this e-mail
message is strictly prohibited. If you have received this e-mail
transmission in error, please reply to the sender, so that proper
delivery can be arranged, and please delete the message from your
computer. Thank you.
Tektronix Texas, LLC formerly Inet Technologies, Inc.
------------------------------------------------------------------------
</body>
</html>
|