|
From: Jeremy F. <je...@go...> - 2005-03-15 03:00:27
Attachments:
NEWS.txt
|
OK, I've put 2.4.0.rc4 up at http://www.goop.org/~jeremy/valgrind/dist/ Changes since rc3 are: * fix unexpected SIGSEGV when using memcheck on programs where the first write to a particular 64k chunk is done by the FPU * fix a problem with the sys_futex wrapper which was inspecting the wrong arguments for FUTEX_REQUEUE * format fixup for a debug printf The full changelog since 2.2.0 is attached. Unless something major turns up, I think this will be the last release candidate. J |
|
From: Christian P. <tr...@ge...> - 2005-03-16 21:30:29
|
On Tuesday 15 March 2005 4:00 am, Jeremy Fitzhardinge wrote: > OK, I've put 2.4.0.rc4 up at http://www.goop.org/~jeremy/valgrind/dist/ > > Changes since rc3 are: > > * fix unexpected SIGSEGV when using memcheck on programs where the > first write to a particular 64k chunk is done by the FPU > * fix a problem with the sys_futex wrapper which was inspecting the > wrong arguments for FUTEX_REQUEUE > * format fixup for a debug printf by the way, while carefully reading the list here, even where=20 I can't use this tool since I'm having changed my system to amd64=20 and though can't use valgrind anymore :(, I do really appreciate=20 all the work you've put into this tool. I'm looking happily forward=20 in getting an amd64 version of valgrind anytime you've done it :) So, thanks, Christian Parpart. =2D-=20 Netiquette: http://www.ietf.org/rfc/rfc1855.txt 22:27:11 up 138 days, 14:57, 0 users, load average: 0.23, 0.36, 0.49 |
|
From: Alex I. <ale...@in...> - 2005-03-17 19:06:54
|
Guys, I am having a little problem with 2.4.0.rc4 handling signals in my app - 2.2.0 worked fine. The application logic is as follows - the main thread spawns about a dozen threads, then sits in pause() call in a forever loop. At some point one of the threads determines that the application must be shut down and sends a SIGNIT to the main thread, which does some cleanup and exits the application. With rc4 it results in a bunch of defunct threads and I never get a summary on exit. Incidentally, I don't get a summary when calling _exit() from within the code either (it used to work). Please, let me know what workarounds I can use here. Below is the signal trace. Thanks! Alex Ivershen ==15915== Process terminating with default action of signal 2 (SIGINT) ==15915== at 0x1BA79741: pause (in /lib/libpthread-0.10.so) ==15915== by 0x8048B8B: main (/home/agi/probe/v503.sim/SimXinuApps/SimXinu/src/SimXinu.cc:109) --15915-- kill_thread zaps tid 2 lwp 16332 --15915-- kill_thread zaps tid 3 lwp 16334 --15915-- kill_thread zaps tid 4 lwp 16344 --15915-- kill_thread zaps tid 5 lwp 16345 --15915-- kill_thread zaps tid 6 lwp 16347 --15915-- kill_thread zaps tid 7 lwp 16348 --15915-- kill_thread zaps tid 8 lwp 16349 --15915-- kill_thread zaps tid 10 lwp 16352 --15915-- kill_thread zaps tid 12 lwp 16355 --15915-- kill_thread zaps tid 13 lwp 16357 --15915-- kill_thread zaps tid 14 lwp 16359 --15915-- kill_thread zaps tid 16 lwp 17182 --15915-- kill_thread zaps tid 17 lwp 17183 --15915-- kill_thread zaps tid 18 lwp 17233 --15915-- kill_thread zaps tid 19 lwp 17293 --15915-- kill_thread zaps tid 20 lwp 17299 --15915-- kill_thread zaps tid 21 lwp 17352 --15915-- kill_thread zaps tid 22 lwp 17366 --16332-- sigvgkill for lwp 16332 tid 2 --16332-- Sending SIGVGCHLD to master tid=1 lwp=15915 --16345-- sigvgkill for lwp 16345 tid 5 --16334-- sigvgkill for lwp 16334 tid 3 --16345-- Sending SIGVGCHLD to master tid=1 lwp=15915 --16347-- sigvgkill for lwp 16347 tid 6 --16347-- Sending SIGVGCHLD to master tid=1 lwp=15915 --16355-- sigvgkill for lwp 16355 tid 12 --16355-- Sending SIGVGCHLD to master tid=1 lwp=15915 --16357-- sigvgkill for lwp 16357 tid 13 --16357-- Sending SIGVGCHLD to master tid=1 lwp=15915 --15915-- Got 63 (code=-6) from tid lwp 16332 --15915-- Got 63 (code=-6) from tid lwp 16345 --15915-- Got 63 (code=-6) from tid lwp 16347 --15915-- Got 63 (code=-6) from tid lwp 16355 --15915-- Got 63 (code=-6) from tid lwp 16357 --16334-- Sending SIGVGCHLD to master tid=1 lwp=15915 --15915-- Got 63 (code=-6) from tid lwp 16334 --16353-- Sending SIGVGCHLD to master tid=1 lwp=15915 --15915-- Got 63 (code=-6) from tid lwp 16353 ------------------------------------------------------------------------ Confidentiality Notice: This e-mail transmission may contain confidential and/or privileged information that is intended only for the individual or entity named in the e-mail address. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or reliance upon the contents of this e-mail message is strictly prohibited. If you have received this e-mail transmission in error, please reply to the sender, so that proper delivery can be arranged, and please delete the message from your computer. Thank you. Tektronix Texas, LLC formerly Inet Technologies, Inc. ------------------------------------------------------------------------ |
|
From: Jeremy F. <je...@go...> - 2005-03-17 21:06:28
|
Alex Ivershen wrote:
> I am having a little problem with 2.4.0.rc4 handling signals in my app
> - 2.2.0 worked fine. The application logic is as follows - the main
> thread spawns about a dozen threads, then sits in pause() call in a
> forever loop. At some point one of the threads determines that the
> application must be shut down and sends a SIGNIT to the main thread,
> which does some cleanup and exits the application.
>
> With rc4 it results in a bunch of defunct threads and I never get a
> summary on exit. Incidentally, I don't get a summary when calling
> _exit() from within the code either (it used to work). Please, let me
> know what workarounds I can use here. Below is the signal trace.
What kernel/libc/libpthread/distro are you using? Is it possible to
isolate this in a test-case program?
J
|
|
From: Alex I. <ale...@in...> - 2005-03-17 21:49:41
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
Jeremy,<br>
<br>
I am using RedHat 8 (2.4.9-9 kernel, glibc 2.3.2, pthreads 0.10) amd MontaVista
Linux (kernel 2.4.20, glibc 2.2.5, pthreads 0.9). Unfortunately, I can't
get you a test case - the application is huge, dynamically loads about 20
shared libs and must be running in a sandbox with other apps.<br>
<br>
However, I did some debugging, and it seems that the problem is in cleaning
up the threads in shutdown_actions() and reap_threads(). My threads have
a common semaphore, so when we are "zapping" the threads, those ones that
are not blocked are going out cleanly, but some threads are still in __pthread_sigsuspend()
call, and it looks like it's not a cancellation point. Therefore, these suspended
threads never exit and valgrind never returns from reap_threads(). Ath the
same time the main thread already exited - thus I end up with a bunch of
orphaned threads on "ps".<br>
<br>
This seems to be a generic problem ... right now I got around it by commenting
out the call to reap_threads() in shutdown_actions() - at that point we nuked
all threads, and if some of them have not exited, there is nothing that can
be done. I do get a summary now.<br>
<br>
Let me know of the implications of commenting out that call or if there is
a better way to fix this. I am attaching the dump from valgrind assertion
failure, which shows the state of the threads that are lingering at exit.<br>
<br>
Thanks a lot, as always!<br>
Alex<br>
<br>
valgrind: core_os.c:85 (vgArch_terminate): Assertion `vgPlain_count_living_threads()
== 0' failed.<br>
==20739== at 0xB002E30D: ??? (vg_mylibc.c:1166)<br>
<br>
sched status:<br>
running_tid=0<br>
<br>
Thread 4: status = VgTs_WaitSys<br>
==20739== at 0x1BA7527D: __pthread_sigsuspend (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA7408C: __pthread_wait_for_restart_signal (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA70D77: <a class="moz-txt-link-abbreviated" href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1B95E251: BinarySemaphore::Wait(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/BinarySemaphore.cc:100)<br>
==20739== by 0x1B9652B5: Process::SchedulerSuspend(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/process.cc:993)<br>
==20739== by 0x1B96665E: resched(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/resched.cc:35)<br>
==20739== by 0x1B975383: Process::TrigWait(unsigned int) (/probe/SimXinuRef/v503/os/src/geoXinu/src/sys/kernel/trigger.cc:113)<br>
<br>
Thread 7: status = VgTs_WaitSys<br>
==20739== at 0x1BA7527D: __pthread_sigsuspend (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA7408C: __pthread_wait_for_restart_signal (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA70D77: <a class="moz-txt-link-abbreviated" href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1B95E251: BinarySemaphore::Wait(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/BinarySemaphore.cc:100)<br>
==20739== by 0x1B9652B5: Process::SchedulerSuspend(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/process.cc:993)<br>
==20739== by 0x1B96665E: resched(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/resched.cc:35)<br>
==20739== by 0x1B9816B6: KSemaphore::Wait(void) const (/home/agi/probe/v503.sim/os/src/geoXinu/src/../../libKal/src/xinu/KSemaphore.cpp:174)<br>
<br>
Thread 8: status = VgTs_WaitSys<br>
==20739== at 0x1BA7527D: __pthread_sigsuspend (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA7408C: __pthread_wait_for_restart_signal (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA70D77: <a class="moz-txt-link-abbreviated" href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1B95E251: BinarySemaphore::Wait(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/BinarySemaphore.cc:100)<br>
==20739== by 0x1B9652B5: Process::SchedulerSuspend(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/process.cc:993)<br>
==20739== by 0x1B96665E: resched(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/resched.cc:35)<br>
==20739== by 0x1B9816B6: KSemaphore::Wait(void) const (/home/agi/probe/v503.sim/os/src/geoXinu/src/../../libKal/src/xinu/KSemaphore.cpp:174)<br>
<br>
Thread 10: status = VgTs_WaitSys<br>
==20739== at 0x1BA7527D: __pthread_sigsuspend (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA7408C: __pthread_wait_for_restart_signal (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA70D77: <a class="moz-txt-link-abbreviated" href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1B95E251: BinarySemaphore::Wait(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/BinarySemaphore.cc:100)<br>
==20739== by 0x1B9652B5: Process::SchedulerSuspend(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/process.cc:993)<br>
==20739== by 0x1B96665E: resched(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/resched.cc:35)<br>
==20739== by 0x1B97405C: Process::Receive(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/sys/kernel/receive.cc:25)<br>
<br>
Thread 14: status = VgTs_WaitSys<br>
==20739== at 0x1BA7527D: __pthread_sigsuspend (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA7408C: __pthread_wait_for_restart_signal (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1BA70D77: <a class="moz-txt-link-abbreviated" href="mailto:pthread_cond_wait@@GLIBC_2.3.2">pthread_cond_wait@@GLIBC_2.3.2</a> (in /lib/libpthread-0.10.so)<br>
==20739== by 0x1B95E251: BinarySemaphore::Wait(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/BinarySemaphore.cc:100)<br>
==20739== by 0x1B9652B5: Process::SchedulerSuspend(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/process.cc:993)<br>
==20739== by 0x1B96665E: resched(void) (/home/agi/probe/v503.sim/os/src/geoXinu/src/arch/sun-sparc/resched.cc:35)<br>
==20739== by 0x1B975383: Process::TrigWait(unsigned int) (/probe/SimXinuRef/v503/os/src/geoXinu/src/sys/kernel/trigger.cc:113)<br>
<br>
<br>
<br>
Jeremy Fitzhardinge wrote:<br>
<blockquote type="cite" cite="mid...@go...">
<pre wrap="">Alex Ivershen wrote:
</pre>
<blockquote type="cite">
<pre wrap="">I am having a little problem with 2.4.0.rc4 handling signals in my app
- 2.2.0 worked fine. The application logic is as follows - the main
thread spawns about a dozen threads, then sits in pause() call in a
forever loop. At some point one of the threads determines that the
application must be shut down and sends a SIGNIT to the main thread,
which does some cleanup and exits the application.
With rc4 it results in a bunch of defunct threads and I never get a
summary on exit. Incidentally, I don't get a summary when calling
_exit() from within the code either (it used to work). Please, let me
know what workarounds I can use here. Below is the signal trace.
</pre>
</blockquote>
<pre wrap=""><!---->
What kernel/libc/libpthread/distro are you using? Is it possible to
isolate this in a test-case program?
J</pre>
</blockquote>
------------------------------------------------------------------------
Confidentiality Notice: This e-mail transmission may contain
confidential and/or privileged information that is intended only for the
individual or entity named in the e-mail address. If you are not the
intended recipient, you are hereby notified that any disclosure,
copying, distribution or reliance upon the contents of this e-mail
message is strictly prohibited. If you have received this e-mail
transmission in error, please reply to the sender, so that proper
delivery can be arranged, and please delete the message from your
computer. Thank you.
Tektronix Texas, LLC formerly Inet Technologies, Inc.
------------------------------------------------------------------------
</body>
</html>
|
|
From: Jeremy F. <je...@go...> - 2005-03-17 22:56:32
|
Alex Ivershen wrote:
> I am using RedHat 8 (2.4.9-9 kernel, glibc 2.3.2, pthreads 0.10) amd
> MontaVista Linux (kernel 2.4.20, glibc 2.2.5, pthreads 0.9).
> Unfortunately, I can't get you a test case - the application is huge,
> dynamically loads about 20 shared libs and must be running in a
> sandbox with other apps.
>
> However, I did some debugging, and it seems that the problem is in
> cleaning up the threads in shutdown_actions() and reap_threads(). My
> threads have a common semaphore, so when we are "zapping" the threads,
> those ones that are not blocked are going out cleanly, but some
> threads are still in __pthread_sigsuspend() call, and it looks like
> it's not a cancellation point. Therefore, these suspended threads
> never exit and valgrind never returns from reap_threads(). Ath the
> same time the main thread already exited - thus I end up with a bunch
> of orphaned threads on "ps".
The zap should be pretty unconditional, and make any blocking syscall
unblock. It's possible the signal mask isn't getting set right and it's
blocking the VKI_SIGVGKILL signal rather than having it interrupt the
syscall. I'll take a closer look later.
J
|
|
From: Alex I. <ale...@in...> - 2005-03-18 00:49:38
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
Jeremy, I found it. Somewhere somehow VKI_SIGVGKILL is getting blocked (I
haven't found where yet). I added a line in do_setmask() line 690 - right
after removing SIGKILL and SIGSTOP from the mask we should also remove SIGVGKILL.
Everything fell into place, the app shuts down and cleans up its threads
perfectly. You might wanna patch it into the current CVS head.<br>
<br>
Thanks!<br>
Alex<br>
<br>
Jeremy Fitzhardinge wrote:<br>
<blockquote type="cite" cite="mid...@go...">
<pre wrap="">Alex Ivershen wrote:
</pre>
<blockquote type="cite">
<pre wrap="">I am using RedHat 8 (2.4.9-9 kernel, glibc 2.3.2, pthreads 0.10) amd
MontaVista Linux (kernel 2.4.20, glibc 2.2.5, pthreads 0.9).
Unfortunately, I can't get you a test case - the application is huge,
dynamically loads about 20 shared libs and must be running in a
sandbox with other apps.
However, I did some debugging, and it seems that the problem is in
cleaning up the threads in shutdown_actions() and reap_threads(). My
threads have a common semaphore, so when we are "zapping" the threads,
those ones that are not blocked are going out cleanly, but some
threads are still in __pthread_sigsuspend() call, and it looks like
it's not a cancellation point. Therefore, these suspended threads
never exit and valgrind never returns from reap_threads(). Ath the
same time the main thread already exited - thus I end up with a bunch
of orphaned threads on "ps".
</pre>
</blockquote>
<pre wrap=""><!---->
The zap should be pretty unconditional, and make any blocking syscall
unblock. It's possible the signal mask isn't getting set right and it's
blocking the VKI_SIGVGKILL signal rather than having it interrupt the
syscall. I'll take a closer look later.
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="$mailwrapcol">--
Alex G. Ivershen Tektronix, Inc.
Network Products Dept. 1500 N. Greenville Ave.
Tektronix, Inc. Richardson, TX 75081
Phone: +1-469-330-4295 USA
"I have noticed that the people who are late are often so much jollier than
the people who have to wait for them." E. V. Lucas
</pre>
<br>
------------------------------------------------------------------------
Confidentiality Notice: This e-mail transmission may contain
confidential and/or privileged information that is intended only for the
individual or entity named in the e-mail address. If you are not the
intended recipient, you are hereby notified that any disclosure,
copying, distribution or reliance upon the contents of this e-mail
message is strictly prohibited. If you have received this e-mail
transmission in error, please reply to the sender, so that proper
delivery can be arranged, and please delete the message from your
computer. Thank you.
Tektronix Texas, LLC formerly Inet Technologies, Inc.
------------------------------------------------------------------------
</body>
</html>
|
|
From: Jeremy F. <je...@go...> - 2005-03-18 03:04:58
|
Alex Ivershen wrote:
> Jeremy, I found it. Somewhere somehow VKI_SIGVGKILL is getting
> blocked (I haven't found where yet). I added a line in do_setmask()
> line 690 - right after removing SIGKILL and SIGSTOP from the mask we
> should also remove SIGVGKILL. Everything fell into place, the app
> shuts down and cleans up its threads perfectly. You might wanna patch
> it into the current CVS head.
Yep, that's the kind of thing I was expecting. Thanks for tracking it down!
Hm, though I'm confused. sanitize_client_sigmask should remove that
signal before blocking in a syscall, so there should be no way that it
isn't blocked...
J
|
|
From: Jeremy F. <je...@go...> - 2005-03-18 22:57:46
|
Alex Ivershen wrote:
> Jeremy, I found it. Somewhere somehow VKI_SIGVGKILL is getting
> blocked (I haven't found where yet). I added a line in do_setmask()
> line 690 - right after removing SIGKILL and SIGSTOP from the mask we
> should also remove SIGVGKILL. Everything fell into place, the app
> shuts down and cleans up its threads perfectly. You might wanna patch
> it into the current CVS head.
Could you confirm the changed I checked in last night fixes your problem?
J
|