|
From: ISHIKAWA,chiaki <ish...@yk...> - 2012-12-20 02:43:27
|
(2012/12/19 3:57), Philippe Waroquiers wrote: > On Tue, 2012-12-18 at 21:00 +0900, ISHIKAWA,chiaki wrote: >> (2012/12/18 8:07), Philippe Waroquiers wrote: >>> Destruction of unknown cond var is probably/maybe bug >>> https://bugs.kde.org/show_bug.cgi?id=307082 >>> >> I have produced a patch to take care of the issue. >> But before that, I have a question. >> >> Q1: Why does valgrind not complain if I compile & link >> Marc's code (in the bug entry which was given as a reminder that >> "unknown cond var" may be a bug or false positive.) in the following manner, >> >> cc -o /tmp/a.out marc.c > No idea. Maybe a problem of redirection caused by static linking ? > > >> Q2: I have produced a work-in-progress patch to take care this issue. >> I wonder if the developers in the know can take a look and improve it. >> >> The patch is posted to the bug entry >> https://bugs.kde.org/show_bug.cgi?id=307082 > I took a quick look at the patch, approach looks ok to me. > No time to look more in depth at this now however :(. > Thank you again. With my patch, I tested mozilla thunderbird mail client under helgrind, and found that most of the warning messages (destruction of unknown cond var) were bogus. Only a few warnings now come from external libraries, and so in this sense mozilla thunderbird is OK. [And the patch does not seem to introduce serious bugs so far.] However, I am still struggling to figure out whether I can learn which tasks are possibly waiting on a cond var being destroyed. The message is something like this: ==4103== Thread #1: pthread_cond_destroy: destruction of condition variable being waited upon ==4103== at 0x4027B9E: pthread_cond_destroy_WRK (hg_intercepts.c:940) ==4103== by 0x4029A01: pthread_cond_destroy@* (hg_intercepts.c:958) ==4103== by 0x47193BA: PR_DestroyCondVar (ptsynch.c:372) ==4103== by 0x5947C40: nsHTTPListener::~nsHTTPListener() (CondVar.h:56) ==4103== by 0x5947D82: nsHTTPListener::Release() (nsNSSCallbacks.cpp:536) ==4103== by 0x603EFCA: nsCOMPtr_base::assign_with_AddRef(nsISupports*) (nsCOMPtr.h:442) I wanted to print out the task IDs that are waiting on this cond variables. Now my tentative conclusion is, it is impossible to know which tasks are waiting even going outside helgrind. Here is my reasoning. I wonder if I am right or wrong. I am discussing the situation in linux. Let nWaiters be the number of tasks waiting. 1. The specification of pthread_cond_signal() does not say which task is being unblocked. So all helgrind can do is to decrement nWaiters by one. (pthread_cons_broadcast() releases all the tasks instead.) helgrind can't really know which task is being removed from the waiting list and so decrmenting nWaiters is all it does (I think). 2. My desire was just printing out the task ids still waiting. OK, let me go outside helgrind. Is it possible to do so by modifying libpthread? So I thought I could tweak libpthread and print the task list if it maintains a list of tasks that are waiting. Under Debian GNU/Linux, which I use, pthread library seems to come from libc. It is actually libc6 and is an alias of eglibc, which is a streamlined libc that can be used on embedded systems. So this is the source file I looked at. I looked inside the source code and found that, since the pthread semantics relatedto cond var is such that the library only needs to release ONE unspecified task by pthread_cond_signal(), the library does not seem to contain an explicit list of waiting tasks. thread library relies on a "futex" kernel mechanism to take care of blocking and releasing the tasks. futex is a kernel mechanism developed to take care 1-1 user/kernel task space mapping, and thread function seems to use futex for synchronization by directly invoking this kernel API. Basically, pthread functions don't use library level task list and such, but relies exclusively on futex mechanism inside the kernel to take care task synchronizations. So at user level, it is not possible to print tasks waiting on a cond var when the cond var is being destroyed (or for that matter impossible to know the task ids to begin with.) So my tentative conclusion is it is impossible to know which task(s) are still blocking on a cond var when the cond var is being destroyed. Maybe at the kernel level, we can know (not sure), but invoking extra kernel calls just to know this internal data structure (if possible at all), may introduce extra thread context switches due to such kernel calls (being cancellation point maybe) and disturb libpthread and helgrind operation... So I am inclined to avoid it and decided to forget about it. So I am stuck. I thought it was easy, but going down to kernel level seems too heavy-weight operation (AND it is not portable, and not sure is possible to begin with.) Actually, pthread_cond_destroy() ought to return EBUSY when there is at least one task waiting so a careful program can do something about such a situtaion ( but in mozilla thunderbird case, it looks the error is printed when a class object is destroyed and the whole memory area in which cond var is located seems to be released due to a release of class object or something, and error code is being ignored I am afarid. So anything goes. Granted that most of the observed cases seem to be related to shutdown of thunderbird mail client (when many objectes are destroyed), and may not bring serious consequences, BUT shutdown is where many crashes are reported today, I have a feeling this destruction of cond variable which has still some tasks waiting may contribute to a portion of crashes. So I wonder if people who have worked on helgrind agree that it is indeed very difficult to figure out exactly which tasks are waiting when a cond var is being destroyed at user level. Also, does anyone have a clever idea about how to debug this situation? TIA |