|
From: ISHIKAWA,chiaki <ish...@yk...> - 2012-12-17 12:23:47
|
During running mozilla thunderbird mail client under helgrind, I got the following message: ==13832== Thread #1: pthread_cond_destroy: destruction of condition variable being waited upon (See mozilla bugzilla entry https://bugzilla.mozilla.org/show_bug.cgi?id=819445 ~nsHTTPListener() destroys condition variable on which other threads are blocked.) I wonder if we can learn WHICH THREADs, maybe the thread ids, were waiting on the said condition variable when this message is printed. It would be at least great to learn which thread (maybe tid or its starting address or whatever) is waiting on the condition variable which is being destroyed, and also, it would be insanely great, if we can learn WHICH condition variable exactly is talked about (maybe its address?) is known. I think it is definitely worthwhile if we can print the address of the condition variable being destroyed even if symbolic information is not available because in the same log I often see something about "destruction of unknown cond var" also. I wonder if we can correlate the addresses printed by these warning messages to see if one thread is prematurely destroying a cond variable which other threads really assume to continue to exist, etc. By looking at helgrind/hg_main.c staring at line 2153 (I am quoting the function map_cond_to_CVInfo_delete ( ThreadId tid, void* cond ) below, I think printing the value of 'cond' as address in hexadecimal format would be enough to print the address of condition variable (I am not familiar how to print the symbolic information.) OR, since since CVinfo is defined as follows early in hg_main.c, typedef struct { SO* so; /* libhb-allocated SO */ void* mx_ga; /* addr of associated mutex, if any */ UWord nWaiters; /* # threads waiting on the CV */ } CVInfo; maybe we should print the value of mx_ga, I am not sure. I trust the developers of valgrind which is correct. Also, for printing out the thread id or something, cond->mx_ga can be used to call Lock *lk = map_locks_maybe_lookup( (Addr)cond->mx_ga ); and then lk->heldBy seems to contain the list of thread information. If so, we can iterate through it to print the task ID, etc. === QUOTE of map_cond_to_CVinfo_delete(): static void map_cond_to_CVInfo_delete ( ThreadId tid, void* cond ) { Thread* thr; UWord keyW, valW; thr = map_threads_maybe_lookup( tid ); tl_assert(thr); /* cannot fail - Thread* must already exist */ map_cond_to_CVInfo_INIT(); if (VG_(delFromFM)( map_cond_to_CVInfo, &keyW, &valW, (UWord)cond )) { CVInfo* cvi = (CVInfo*)valW; tl_assert(keyW == (UWord)cond); tl_assert(cvi); tl_assert(cvi->so); if (cvi->nWaiters > 0) { HG_(record_error_Misc)(thr, "pthread_cond_destroy:" " destruction of condition variable being waited upon"); } libhb_so_dealloc(cvi->so); cvi->mx_ga = 0; HG_(free)(cvi); } else { HG_(record_error_Misc)(thr, "pthread_cond_destroy: destruction of unknown cond var"); } } === END QUOTE Comment: In both places where warning is printed, I would like to see the address value of the condition variable, and hopefully symbolic information if one is available. Also for destroying a cond variable on which some tasks are waiting, I would like to know the task ID(s) waiting on it. I run helgrind with the following parameters, but adding a few other options such as --fair-sched=yes, etc. does not change the situation much. env GTK_IM_MODULE=xim valgrind --tool=helgrind ~/TB-NEW/TB-3HG/objdir-tb3/mozilla/dist/bin/thunderbird-bin -profile /TB-NEW/TB-3HG/objdir-tb3/mozilla/_tests/mozmill/mozmillprofile -jsbridge 24242 -foreground The message I got in one run: (Mozilla bugzilla points at an uploaded full log of a different run) ==13832== ---------------------------------------------------------------- ==13832== ==13832== Thread #1: pthread_cond_destroy: destruction of condition variable being waited upon ==13832== at 0x4027A7F: pthread_cond_destroy_WRK (hg_intercepts.c:940) ==13832== by 0x4029781: pthread_cond_destroy@* (hg_intercepts.c:958) ==13832== by 0x47191BA: PR_DestroyCondVar (ptsynch.c:340) ==13832== by 0x58C1A60: nsHTTPListener::~nsHTTPListener() (CondVar.h:56) ==13832== by 0x58C1AF7: nsHTTPListener::Release() (nsNSSCallbacks.cpp:536) ==13832== by 0x5F9C160: nsCOMPtr_base::assign_with_AddRef(nsISupports*) (nsCOMPtr.h:440) ==13832== by 0x4C84784: nsStreamLoader::OnStopRequest(nsIRequest*, nsISupports*, tag_nsresult) (nsCOMPtr.h:622) ==13832== by 0x4CFED70: mozilla::net::HttpBaseChannel::DoNotifyListener() (HttpBaseChannel.cpp:1463) ==13832== by 0x4D0159C: mozilla::net::nsHttpChannel::HandleAsyncAbort() (HttpBaseChannel.h:347) ==13832== by 0x4CFFFBA: nsRunnableMethodImpl<void (mozilla::net::nsHttpChannel::*)(), true>::Run() (nsThreadUtils.h:349) ==13832== by 0x5FDE0EB: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:612) ==13832== by 0x5FF556F: NS_InvokeByIndex_P (in /TB-NEW/TB-3HG/objdir-tb3/mozilla/toolkit/library/libxul.so) ==13832== PS: Unfortunately, due to the sheer size of thunderbird and its libraries, --read-var-info blows up my small PC's memory. I can't run the program with --read-var-info under 32bit linux, and under 64bits linux, it runs, but paging is so heavy (I have about 6 GB dedicated to the VMplayer in which this is done), the testing harness for thunderbird times out and stops the testing. I wait for 20 minutes for the initial network connection so that TB can be manipulated remotely through its own RPC by the test harness, but it fails due to timeout. Simply too much paging under 4-6GB of memory available in 64 bits linux if --read-var-info is specified. So trying to obtain the symbolic information of conditionn variable on my PC may be difficult since unless I have 8GB or more, it may not be possible to run thunderbird under helgrind using --read-var-info. But in other cases, where the memory demand is not that high, or with powerful PC with 16 GB of memory, say, learning the whereabout and/or symbolic information about the condition variable itself that is being destroyed will be very useful for debugging purposes. PPS: An excerpt of "destruction of unknown cond var" log. It would be also interesting to see the printing of the address of "unknown cond var". Coupled with the proposed printing of the address (at least even if the symbolic information is not available) of the cond variable being destroyed while the variable is still waited upon, we can compare such addresses to see if one routine is destroying which other thread is about to destroy (again? maybe a race or unproper locking of critical region?). The following warning is the first warning of "destruction of unknown cond var" after the "destruction of condition variable being waited upon" discussed above, and I wonder if the unknown cond var is the one that was destroyed above. (The address [which may be bogus now] may help us to find it out.) ==13832== ---------------------------------------------------------------- ==13832== ==13832== Thread #28: pthread_cond_destroy: destruction of unknown cond var ==13832== at 0x4027A7F: pthread_cond_destroy_WRK (hg_intercepts.c:940) ==13832== by 0x4029781: pthread_cond_destroy@* (hg_intercepts.c:958) ==13832== by 0x47191BA: PR_DestroyCondVar (ptsynch.c:340) ==13832== by 0x47ABE12: nssCertificate_Destroy (certificate.c:128) ==13832== by 0x47ABE6E: NSSCertificate_Destroy (certificate.c:150) ==13832== by 0x47A900B: CERT_DestroyCertificate (stanpcertdb.c:795) ==13832== by 0x47EB163: pkix_pl_Cert_Destroy (pkix_pl_cert.c:1167) ==13832== by 0x4803FFA: PKIX_PL_Object_DecRef (pkix_pl_object.c:891) ==13832== by 0x47E2FD7: pkix_List_Destroy (pkix_list.c:89) ==13832== by 0x4803FFA: PKIX_PL_Object_DecRef (pkix_pl_object.c:891) ==13832== by 0x47E301F: pkix_List_Destroy (pkix_list.c:93) ==13832== by 0x4803FFA: PKIX_PL_Object_DecRef (pkix_pl_object.c:891) ==13832== ==13832== ---------------------------------------------------------------- ==13832== |
|
From: Philippe W. <phi...@sk...> - 2012-12-17 23:07:34
|
On Mon, 2012-12-17 at 21:03 +0900, ISHIKAWA,chiaki wrote: > During running mozilla thunderbird mail client under helgrind, > I got the following message: > > ==13832== Thread #1: pthread_cond_destroy: destruction of condition > variable being waited upon > > > (See mozilla bugzilla entry > https://bugzilla.mozilla.org/show_bug.cgi?id=819445 > ~nsHTTPListener() destroys condition variable on which other threads > are blocked.) > > > I wonder if we can learn WHICH THREADs, maybe the thread ids, were waiting > on the said condition variable when this message is printed. Assuming you have a recent version of Valgrind, you can activate the embedded gdbserver and then use GDB to examine the state of all the other threads when the above error is reported. You will then see which threads are waiting on this cond var. > > It would be at least great to learn which thread (maybe tid or its > starting address or whatever) is waiting on the condition variable > which is being destroyed, and also, it would be insanely great, if we > can learn WHICH condition variable exactly is talked about (maybe its > address?) is known. > > I think it is definitely worthwhile if we can print the address of the > condition variable being destroyed even if symbolic information is not > available because in the same log I often see something about > "destruction of unknown cond var" also. I wonder if we can correlate > the addresses printed by these warning messages to see if one thread > is prematurely destroying a cond variable which other threads really > assume to continue to exist, etc. Destruction of unknown cond var is probably/maybe bug https://bugs.kde.org/show_bug.cgi?id=307082 > > By looking at helgrind/hg_main.c staring at line 2153 (I am quoting > the function map_cond_to_CVInfo_delete ( ThreadId tid, void* cond ) > below, I think printing the value of 'cond' as address > in hexadecimal format would be enough to print the address of > condition variable (I am not familiar how to print the symbolic > information.) OR, since I have in a corner a patch for helgrind which print symbolic information for the lock addresses. Patch not finished yet. Would be worth filing a wish bug in bugzilla telling that helgrind could use --read-var-info=yes to show more info about cond var addresses, lock addresses, etc. Philippe |
|
From: ISHIKAWA,chiaki <ish...@yk...> - 2012-12-18 12:00:27
|
(2012/12/18 8:07), Philippe Waroquiers wrote: > Destruction of unknown cond var is probably/maybe bug > https://bugs.kde.org/show_bug.cgi?id=307082 > I have produced a patch to take care of the issue. But before that, I have a question. Q1: Why does valgrind not complain if I compile & link Marc's code (in the bug entry which was given as a reminder that "unknown cond var" may be a bug or false positive.) in the following manner, cc -o /tmp/a.out marc.c (Note that there is no -lpthread parameter) and then run valgrind --tool=helgrind /tmp/a.out There is no warning or error at all. (under linux, that is). I wonder WHICH library is used for pthread_cond_init() and friends. This is just out of curiosity. Not related to the next important request/question. Q2: I have produced a work-in-progress patch to take care this issue. I wonder if the developers in the know can take a look and improve it. The patch is posted to the bug entry https://bugs.kde.org/show_bug.cgi?id=307082 I also posted the output of regression test for helgind. Before the patch, valgrind 3.8.1 reported the issue as Marc reported. (marc.c is the code reported by marc) ishikawa@debian-vbox-ci:/tmp$ gcc -g -o /tmp/a.out -lpthread ~/Dropbox/marc.c ishikawa@debian-vbox-ci:/tmp$ /tmp/a.out ishikawa@debian-vbox-ci:/tmp$ valgrind --tool=helgrind /tmp/a.out ==22785== Helgrind, a thread error detector ==22785== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al. ==22785== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==22785== Command: /tmp/a.out ==22785== ==22785== ---Thread-Announcement------------------------------------------ ==22785== ==22785== Thread #1 is the program's root thread ==22785== ==22785== ---------------------------------------------------------------- ==22785== ==22785== Thread #1: pthread_cond_destroy: destruction of unknown cond var ==22785== at 0x4027B1E: pthread_cond_destroy_WRK (hg_intercepts.c:940) ==22785== by 0x402987F: pthread_cond_destroy@* (hg_intercepts.c:958) ==22785== by 0x80484F4: main (marc.c:6) ==22785== ==22785== ==22785== For counts of detected and suppressed errors, rerun with: -v ==22785== Use --history-level=approx or =none to gain increased speed, at ==22785== the cost of reduced accuracy of conflicting-access information ==22785== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) But after the patch, it runs as follows. See the false positive warning is no longer there. $ valgrind --tool=helgrind /tmp/a.out ==25080== Helgrind, a thread error detector ==25080== Copyright (C) 2007-2012, and GNU GPL'd, by OpenWorks LLP et al. ==25080== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==25080== Command: /tmp/a.out ==25080== ==25080== ==25080== For counts of detected and suppressed errors, rerun with: -v ==25080== Use --history-level=approx or =none to gain increased speed, at ==25080== the cost of reduced accuracy of conflicting-access information ==25080== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) ishikawa@debian-vbox-ci:/tmp$ Not sure, though if it works with the initialized data as in pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t cond = PTHREAD_COND_INITIALIZER; [Note: No it can't be obviously. This is because the mapping of cond var to CVInfo structure can not be done explicitly using the timing of pthread_cond_init(). I tested it to confirm this observation by a slight modification of marc's code.] I really want to get down to the bottom of the problem I was analyzing (random crashes of mozilla thunderbird mail client, and I suspect races because it is random.) Seeing a mail client crash when it was exiting is very distressful: I have to wonder if the e-mail from an important client that just arrived is stored properly in the file or not before the crash :-( So if developers in the know can take a look at the patch, and offer suggestions for improvement I am all ears so that I can use a patched 3.8.1-b to dig down the problem with mozilla thunderbird (with the reduced clutter of "unknown cond var" and I will use the suggestion to check for task id or something using vgdbserver feature. Or maybe I can print the address of cond variables by modifying the code. Q3: > I have in a corner a patch for helgrind which print symbolic information > for the lock addresses. Patch not finished yet. > > Would be worth filing a wish bug in bugzilla telling that helgrind > could use --read-var-info=yes to show more info about > cond var addresses, lock addresses, etc. > > Philippe You mean helgrind can't use the information obtained by --read-var-info=yes (!?). That is tough, indeed. Maybe that is why the current helgrind doesn't even seem to attempt to print the address of cond var even in the form of offset in heap, or offset in the stack, etc. TIA |
|
From: John R. <jr...@bi...> - 2012-12-18 16:08:19
|
> cc -o /tmp/a.out marc.c > > (Note that there is no -lpthread parameter) > > and then run > > valgrind --tool=helgrind /tmp/a.out > > There is no warning or error at all. (under linux, that is). > > I wonder WHICH library is used for pthread_cond_init() and friends. The utility /usr/bin/ldd, which is a bash shell script that is part of glibc, answers such questions: $ ldd /tmp/a.out linux-vdso.so.1 => (0x00007fffb71ff000) libc.so.6 => /lib64/libc.so.6 (0x00000030f0400000) /lib64/ld-linux-x86-64.so.2 (0x00000030f0000000) and that is all. Using gdb: $ gdb /tmp/a.out (gdb) b main (gdb) run (gdb) info shared >From To Syms Read Shared Object Library 0x00000030f0000b20 0x00000030f001a2e9 Yes /lib64/ld-linux-x86-64.so.2 0x00000030f041ef60 0x00000030f055f7a0 Yes /lib64/libc.so.6 (gdb) -- |
|
From: Philippe W. <phi...@sk...> - 2012-12-18 18:57:54
|
On Tue, 2012-12-18 at 21:00 +0900, ISHIKAWA,chiaki wrote: > (2012/12/18 8:07), Philippe Waroquiers wrote: > > Destruction of unknown cond var is probably/maybe bug > > https://bugs.kde.org/show_bug.cgi?id=307082 > > > I have produced a patch to take care of the issue. > But before that, I have a question. > > Q1: Why does valgrind not complain if I compile & link > Marc's code (in the bug entry which was given as a reminder that > "unknown cond var" may be a bug or false positive.) in the following manner, > > cc -o /tmp/a.out marc.c No idea. Maybe a problem of redirection caused by static linking ? > Q2: I have produced a work-in-progress patch to take care this issue. > I wonder if the developers in the know can take a look and improve it. > > The patch is posted to the bug entry > https://bugs.kde.org/show_bug.cgi?id=307082 I took a quick look at the patch, approach looks ok to me. No time to look more in depth at this now however :(. > Not sure, though if it works with the initialized data as in > pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER; > pthread_cond_t cond = PTHREAD_COND_INITIALIZER; I assume there is no function call for the above, so no way to have Valgrind knowing that it is ok to destroy cond. > > [Note: No it can't be obviously. This is because the mapping of cond var > to CVInfo structure can not be done explicitly using the timing of > pthread_cond_init(). I tested it to confirm this observation by a slight > modification of marc's code.] > > Q3: > > I have in a corner a patch for helgrind which print symbolic information > > for the lock addresses. Patch not finished yet. > > > > Would be worth filing a wish bug in bugzilla telling that helgrind > > could use --read-var-info=yes to show more info about > > cond var addresses, lock addresses, etc. > > > > Philippe > > You mean helgrind can't use the information obtained by > --read-var-info=yes (!?). That is tough, indeed. helgrind uses --read-var-info=yes to report details about address involved in race condition. It does not use it to describe locks, cond var, etc... For this, might be good to file a wish bug. Philippe |
|
From: ISHIKAWA,chiaki <ish...@yk...> - 2012-12-20 02:43:27
|
(2012/12/19 3:57), Philippe Waroquiers wrote: > On Tue, 2012-12-18 at 21:00 +0900, ISHIKAWA,chiaki wrote: >> (2012/12/18 8:07), Philippe Waroquiers wrote: >>> Destruction of unknown cond var is probably/maybe bug >>> https://bugs.kde.org/show_bug.cgi?id=307082 >>> >> I have produced a patch to take care of the issue. >> But before that, I have a question. >> >> Q1: Why does valgrind not complain if I compile & link >> Marc's code (in the bug entry which was given as a reminder that >> "unknown cond var" may be a bug or false positive.) in the following manner, >> >> cc -o /tmp/a.out marc.c > No idea. Maybe a problem of redirection caused by static linking ? > > >> Q2: I have produced a work-in-progress patch to take care this issue. >> I wonder if the developers in the know can take a look and improve it. >> >> The patch is posted to the bug entry >> https://bugs.kde.org/show_bug.cgi?id=307082 > I took a quick look at the patch, approach looks ok to me. > No time to look more in depth at this now however :(. > Thank you again. With my patch, I tested mozilla thunderbird mail client under helgrind, and found that most of the warning messages (destruction of unknown cond var) were bogus. Only a few warnings now come from external libraries, and so in this sense mozilla thunderbird is OK. [And the patch does not seem to introduce serious bugs so far.] However, I am still struggling to figure out whether I can learn which tasks are possibly waiting on a cond var being destroyed. The message is something like this: ==4103== Thread #1: pthread_cond_destroy: destruction of condition variable being waited upon ==4103== at 0x4027B9E: pthread_cond_destroy_WRK (hg_intercepts.c:940) ==4103== by 0x4029A01: pthread_cond_destroy@* (hg_intercepts.c:958) ==4103== by 0x47193BA: PR_DestroyCondVar (ptsynch.c:372) ==4103== by 0x5947C40: nsHTTPListener::~nsHTTPListener() (CondVar.h:56) ==4103== by 0x5947D82: nsHTTPListener::Release() (nsNSSCallbacks.cpp:536) ==4103== by 0x603EFCA: nsCOMPtr_base::assign_with_AddRef(nsISupports*) (nsCOMPtr.h:442) I wanted to print out the task IDs that are waiting on this cond variables. Now my tentative conclusion is, it is impossible to know which tasks are waiting even going outside helgrind. Here is my reasoning. I wonder if I am right or wrong. I am discussing the situation in linux. Let nWaiters be the number of tasks waiting. 1. The specification of pthread_cond_signal() does not say which task is being unblocked. So all helgrind can do is to decrement nWaiters by one. (pthread_cons_broadcast() releases all the tasks instead.) helgrind can't really know which task is being removed from the waiting list and so decrmenting nWaiters is all it does (I think). 2. My desire was just printing out the task ids still waiting. OK, let me go outside helgrind. Is it possible to do so by modifying libpthread? So I thought I could tweak libpthread and print the task list if it maintains a list of tasks that are waiting. Under Debian GNU/Linux, which I use, pthread library seems to come from libc. It is actually libc6 and is an alias of eglibc, which is a streamlined libc that can be used on embedded systems. So this is the source file I looked at. I looked inside the source code and found that, since the pthread semantics relatedto cond var is such that the library only needs to release ONE unspecified task by pthread_cond_signal(), the library does not seem to contain an explicit list of waiting tasks. thread library relies on a "futex" kernel mechanism to take care of blocking and releasing the tasks. futex is a kernel mechanism developed to take care 1-1 user/kernel task space mapping, and thread function seems to use futex for synchronization by directly invoking this kernel API. Basically, pthread functions don't use library level task list and such, but relies exclusively on futex mechanism inside the kernel to take care task synchronizations. So at user level, it is not possible to print tasks waiting on a cond var when the cond var is being destroyed (or for that matter impossible to know the task ids to begin with.) So my tentative conclusion is it is impossible to know which task(s) are still blocking on a cond var when the cond var is being destroyed. Maybe at the kernel level, we can know (not sure), but invoking extra kernel calls just to know this internal data structure (if possible at all), may introduce extra thread context switches due to such kernel calls (being cancellation point maybe) and disturb libpthread and helgrind operation... So I am inclined to avoid it and decided to forget about it. So I am stuck. I thought it was easy, but going down to kernel level seems too heavy-weight operation (AND it is not portable, and not sure is possible to begin with.) Actually, pthread_cond_destroy() ought to return EBUSY when there is at least one task waiting so a careful program can do something about such a situtaion ( but in mozilla thunderbird case, it looks the error is printed when a class object is destroyed and the whole memory area in which cond var is located seems to be released due to a release of class object or something, and error code is being ignored I am afarid. So anything goes. Granted that most of the observed cases seem to be related to shutdown of thunderbird mail client (when many objectes are destroyed), and may not bring serious consequences, BUT shutdown is where many crashes are reported today, I have a feeling this destruction of cond variable which has still some tasks waiting may contribute to a portion of crashes. So I wonder if people who have worked on helgrind agree that it is indeed very difficult to figure out exactly which tasks are waiting when a cond var is being destroyed at user level. Also, does anyone have a clever idea about how to debug this situation? TIA |
|
From: ISHIKAWA,chiaki <ish...@yk...> - 2012-12-24 00:43:31
|
A new patch was posted to https://bugs.kde.org/show_bug.cgi?id=307082 This cleans up the previous patch that I posted to fix the problem of - pthread_cond_init() was not handled properly, AND - a newly discovered problem of pthread_cond_destroy() erraneously destroying internal data even when pthread_cond_destroy() returns EBUSY and retains the internal data necessary for pthread library operation since there are still tasks waiting on the condition variable. With this, I get a saner output for some sample programs (modified by inserting pthread_cond_destroy() in a place or two) when they are run under helgrind. Sample programs are http://www.cs.kent.edu/~ruttan/sysprog/lectures/multi-thread/multi-thread.html specifically, http://www.cs.kent.edu/~ruttan/sysprog/lectures/multi-thread/thread-pool-server.c Another sample I used is from https://www.securecoding.cert.org/confluence/display/seccode/POS54-C.+Notify+all+POSIX+threads+waiting+on+a+condition+variable+instead+of+a+single+thread but this one caused a hung when I tinkered with pthread_cond_destroy() and I suspect I created an invalid program (rather than helgrind being buggy.). Anyway, I am now running mozilla thunderbird under helgrind to track down a possible thread-related problem again. Again, a review of the patch by developers of valgrind will be appreciated. (Gee, I hate my misspellings and incorrect grammar in my post to https://bugs.kde.org/show_bug.cgi?id=307082 I was not even drinking egg nog. A peril of posting something in the middle of the night when one ought to be sleeping.) TIA (2012/12/23 22:06), Philippe Waroquiers wrote: > On Sun, 2012-12-23 at 03:36 +0900, ISHIKAWA,chiaki wrote: > >> Since the testing is automated (and shrouded in many layers of >> test scripts), it is not easy to figure out how we can >> attach gdb to vgdbserver, and also, it is not quite clear >> how we can prepare the input for vgdbserver/gdb interaction. >> Maybe if I were to start from the scratch, I could do it. >> But mozmill test setup has been there already. > I have no knowledge of mozmill but I suppose it is possible to run one > single specific test of the test suite (up to now, I have never > seen an automatic reg test setup which did not allow to run one > single test). > > So, you should be able to do the same as what you would do for a > "manual" test: just ensure the test tool passes --vgdb-error=0 (or > rather than 0, the error nr you want to dig into). > The app will then stop at the time of the error. > > It looks to me a lot easier/faster than having helgrind reporting > the list of threads waiting (and if you would need other information > such as stack trace, local var values, ... you can look at all this > with GDB). > > > Philippe > > > |
|
From: Philippe W. <phi...@sk...> - 2012-12-22 17:58:51
|
> helgrind can't really know which task is being removed from the waiting list and > so decrmenting nWaiters is all it does (I think). > I think it does a lot more (otherwise helgrind could not follow at all what would happen with cond variables). See e.g. pthread_cond_wait_WRK > > Also, does anyone have a clever idea about how to debug this situation? As mentionned previously, if you use vgdb, it should be trivial to find which thread is doing what. E.g. do in a GDB attached to the Valgrind embedded gdbserver: thread apply all bt The stack traces will allow to determine which thread is waiting on a cond var. You can then examine which cond var is being waited upon. Philippe |
|
From: ISHIKAWA,chiaki <ish...@yk...> - 2012-12-22 18:36:55
|
Thank you for your response. (2012/12/23 2:58), Philippe Waroquiers wrote: > >> helgrind can't really know which task is being removed from the waiting list and >> so decrmenting nWaiters is all it does (I think). >> > I think it does a lot more (otherwise helgrind could not follow at all what would > happen with cond variables). > See e.g. pthread_cond_wait_WRK I will study it a more. >> Also, does anyone have a clever idea about how to debug this situation? > As mentionned previously, if you use vgdb, it should be trivial to find which thread is doing what. > E.g. do in a GDB attached to the Valgrind embedded gdbserver: > thread apply all bt > > The stack traces will allow to determine which thread is waiting on a cond var. > You can then examine which cond var is being waited upon. > Sorry, I was not making myself clear. I was wondering if finding out which tasks are waiting on a given cond variable that was going to be destroyed and for which pthread_cond_destroy() would return EBUSY (per POSIX) in an AUTOMATED TEST SETUP. If I am to debug a single program that generates problem based on my keyboard/mouse input, your suggestion of using vgdbserver/gdb works very well. (Yes, I have figured out how to run gdb from the start: --vgdb-error=0 did the trick.) When I say, AUTOMATED TEST SETUP, I am talking about mozilla thunderbird testing harness invoked by "make mozmill" there. It is a rather complicated setup. It runs a test sequence (mimicking user input by a description of user actions), and in so doing, it show that thuderbird invokes pthread_cond_destroy() for a few cond vars which helgrind thinks have still threads waiting on them. Since the testing is automated (and shrouded in many layers of test scripts), it is not easy to figure out how we can attach gdb to vgdbserver, and also, it is not quite clear how we can prepare the input for vgdbserver/gdb interaction. Maybe if I were to start from the scratch, I could do it. But mozmill test setup has been there already. Thus a non-interactive, helgrind-initiated output is preferable in this case. > Philippe Thank you again. Chiaki Ishikawa |