|
From: Stefan S. <en...@ho...> - 2011-10-06 08:02:05
|
Hi, I am having a mutex deadlock in my code that I can relatively reliable trigger when running the test case under valgrind with memcheck. Now of cource this is not an error for memcheck and thus it hangs. When I run valgrind with --db-aatch I can Ctrl-C and print the backtrace. the trouble is I know the place already where it hangs. I need to get the back trace from the other threads currently holding the mutex. (gdb) info threads * 1 process 22412 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 (gdb) bt Thread 1 (process 22412): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x0000000009b955d9 in _L_lock_953 () from /lib/libpthread.so.0 #2 0x0000000009b953fb in __pthread_mutex_lock (mutex=0xffd26d8) at pthread_mutex_lock.c:61 #3 0x00000000099166ac in IA__g_static_rec_mutex_lock (mutex=0xffd26d0) at /tmp/glib2.0.0xzuTt/glib2.0-2.24.1/glib/gthread.c:1420 where are the other threads? Stefan |
|
From: WAROQUIERS P. <phi...@eu...> - 2011-10-06 08:18:53
|
> ... > valgrind with --db-aatch I can Ctrl-C and print the backtrace. the > ... > where are the other threads? > ... The --db-attach has several limitations (a.o. it only shows one thread, you can't put breaks, continue, etc). You can try the 3.7.0 SVN version : this has an integrated gdbserver allowing to fully debug your executable under valgrind (including looking at all threads etc). => you must download and compile the last version from SVN. see http://www.valgrind.org/downloads/repository.html Then you give the option --vgdb-error=0, and you follow the on-screen instructions (if you build the html doc, the Valgrind gdbserver functionalities are documented in a specific section). Note that to investigate deadlocks and/or race conditions, you might try helgrind or drd. (a.o., helgrind has a lock acquisition order verification). Philippe ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL, unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. |
|
From: Stefan S. <en...@ho...> - 2011-10-06 09:11:24
|
Hi, thanks for your quick reply. On 10/06/2011 10:18 AM, WAROQUIERS Philippe wrote: >> ... >> valgrind with --db-aatch I can Ctrl-C and print the backtrace. the >> ... >> where are the other threads? >> ... > The --db-attach has several limitations (a.o. it only shows one thread, > you can't put breaks, continue, etc). > > You can try the 3.7.0 SVN version : this has an integrated gdbserver > allowing to fully debug your executable under valgrind (including > looking at all threads etc). > > => you must download and compile the last version from SVN. > see http://www.valgrind.org/downloads/repository.html > > Then you give the option --vgdb-error=0, and you follow the on-screen > instructions (if you build the html doc, the Valgrind gdbserver > functionalities are documented in a specific section). > > Note that to investigate deadlocks and/or race conditions, > you might try helgrind or drd. (a.o., helgrind has a lock acquisition > order verification). Unfortunately the reports in Helgrind don't look useful to be. I don't know if there is something in glib that confuses helgrind. (btw. I am using valgrind 3.6.0). Like look at this report ==22748== Thread #1: lock order "0xF941080 before 0xF934DC0" violated ==22748== at 0x4C28B0C: pthread_mutex_lock (hg_intercepts.c:465) ==22748== by 0x84369DD: gst_pad_unlink (gstpad.c:1758) ==22748== by 0x4E60F60: unlink_wire (setup.c:564) ==22748== by 0x4E650FC: del_wire_in_pipeline (setup.c:811) ==22748== by 0x98E0D72: g_hash_table_foreach (ghash.c:1325) ==22748== by 0x4E62196: bt_setup_update_pipeline (setup.c:886) ==22748== by 0x4E6296F: bt_setup_remove_wire (setup.c:1098) ==22748== by 0x40F1D8: test_btcore_net_static4 (e-network.c:374) ==22748== by 0x43B285: srunner_run_all (in /home/ensonic/projects/buzztard/buzztard/tests/.libs/lt-bt_core) ==22748== by 0x409CC6: main (m-bt-core.c:102) ==22748== Required order was established by acquisition of lock at 0xF941080 ==22748== at 0x4C28B0C: pthread_mutex_lock (hg_intercepts.c:465) ==22748== by 0x842E05D: gst_pad_link_prepare (gstpad.c:1975) ==22748== by 0x84330BC: gst_pad_link_full (gstpad.c:2113) ==22748== by 0x4E64F12: add_wire_in_pipeline (setup.c:507) ==22748== by 0x98E0D72: g_hash_table_foreach (ghash.c:1325) ==22748== by 0x4E61B69: bt_setup_update_pipeline (setup.c:878) ==22748== by 0x4E6349F: bt_setup_add_wire (setup.c:1019) ==22748== by 0x4E7B8D5: bt_wire_constructed (wire.c:1219) ==22748== by 0x8F1FA57: g_object_newv (gobject.c:1289) ==22748== by 0x8F202AC: g_object_new_valist (gobject.c:1377) ==22748== by 0x8F204F0: g_object_new (gobject.c:1095) ==22748== by 0x4E79943: bt_wire_new (wire.c:723) ==22748== by 0x40EFB2: test_btcore_net_static4 (e-network.c:350) ==22748== by 0x43B285: srunner_run_all (in /home/ensonic/projects/buzztard/buzztard/tests/.libs/lt-bt_core) ==22748== by 0x409CC6: main (m-bt-core.c:102) ==22748== followed by a later acquisition of lock at 0xF934DC0 ==22748== at 0x4C28B0C: pthread_mutex_lock (hg_intercepts.c:465) ==22748== by 0x842DB66: gst_pad_get_caps_unlocked (gstpad.c:2245) ==22748== by 0x842E3D7: gst_pad_link_prepare (gstpad.c:1852) ==22748== by 0x84330BC: gst_pad_link_full (gstpad.c:2113) ==22748== by 0x4E64F12: add_wire_in_pipeline (setup.c:507) ==22748== by 0x98E0D72: g_hash_table_foreach (ghash.c:1325) ==22748== by 0x4E61B69: bt_setup_update_pipeline (setup.c:878) ==22748== by 0x4E6349F: bt_setup_add_wire (setup.c:1019) ==22748== by 0x4E7B8D5: bt_wire_constructed (wire.c:1219) ==22748== by 0x8F1FA57: g_object_newv (gobject.c:1289) ==22748== by 0x8F202AC: g_object_new_valist (gobject.c:1377) ==22748== by 0x8F204F0: g_object_new (gobject.c:1095) ==22748== by 0x4E79943: bt_wire_new (wire.c:723) ==22748== by 0x40EFB2: test_btcore_net_static4 (e-network.c:350) ==22748== by 0x43B285: srunner_run_all (in /home/ensonic/projects/buzztard/buzztard/tests/.libs/lt-bt_core) ==22748== by 0x409CC6: main (m-bt-core.c:102) gstpad.c: 1756 GST_OBJECT_LOCK (srcpad); 1757 1758 GST_OBJECT_LOCK (sinkpad); ... 1970 GST_OBJECT_LOCK (srcpad); 1971 1972 if (G_UNLIKELY (GST_PAD_PEER (srcpad) != NULL)) 1973 goto src_was_linked; 1974 1975 GST_OBJECT_LOCK (sinkpad); 1976 1977 if (G_UNLIKELY (GST_PAD_PEER (sinkpad) != NULL)) 1978 goto sink_was_linked; ... 2243 GST_OBJECT_UNLOCK (pad); 2244 result = GST_PAD_GETCAPSFUNC (pad) (pad); 2245 GST_OBJECT_LOCK (pad); The first two snippets use the same lock order and the third snipped temporarily releases a lock. At the point where the code locks up, I get no complaint from helgrind. Will try my luck with building 3.7 now. Stefan > > Philippe > > |
|
From: WAROQUIERS P. <phi...@eu...> - 2011-10-06 09:29:03
|
>... >2243 GST_OBJECT_UNLOCK (pad); >2244 result = GST_PAD_GETCAPSFUNC (pad) (pad); >2245 GST_OBJECT_LOCK (pad); > >The first two snippets use the same lock order and the third snipped >temporarily releases a lock. As line 2243 unlocks pad, there must be another place where pad is locked. This other place might intervene in the cycle of lock acquisition. > >At the point where the code locks up, I get no complaint from helgrind. >Will try my luck with building 3.7 now. With 3.7.0 SVN, you can try the debugger (cfr --vgdb-error=0) to understand the deadlock under memcheck. You might also try again with helgrind (Julian did several improvements in helgrind, in particular improving the error messages). Philippe ____ This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL, unless it is confirmed by appropriately signed hard copy. Any views expressed in this message are those of the sender. |
|
From: Stefan S. <en...@ho...> - 2011-10-07 12:40:38
|
On 10/06/2011 11:28 AM, WAROQUIERS Philippe wrote: >> ... >> 2243 GST_OBJECT_UNLOCK (pad); >> 2244 result = GST_PAD_GETCAPSFUNC (pad) (pad); >> 2245 GST_OBJECT_LOCK (pad); >> >> The first two snippets use the same lock order and the third snipped >> temporarily releases a lock. > As line 2243 unlocks pad, there must be another place where pad is > locked. > This other place might intervene in the cycle of lock acquisition. > I don't think so. The code is not related. >> At the point where the code locks up, I get no complaint from helgrind. >> Will try my luck with building 3.7 now. > With 3.7.0 SVN, you can try the debugger (cfr --vgdb-error=0) to > understand the deadlock under memcheck. Yes, vgdb server worked great and I know the 2 places involved in the deadlock now. Thanks for your work on the tool! > You might also try again with helgrind (Julian did several improvements > in helgrind, in particular improving the error messages). I am still not convinced. I get lots of warnings from helgrind. Some of them make sense, allthough I found many that cause no harm in practise. Stefan > Philippe > > ____ > > This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. > > Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL, unless it is confirmed by appropriately signed hard copy. > > Any views expressed in this message are those of the sender. |
|
From: Stefan S. <en...@ho...> - 2011-10-07 14:23:44
|
On 10/07/2011 02:40 PM, Stefan Sauer wrote: > On 10/06/2011 11:28 AM, WAROQUIERS Philippe wrote: >>> ... >>> 2243 GST_OBJECT_UNLOCK (pad); >>> 2244 result = GST_PAD_GETCAPSFUNC (pad) (pad); >>> 2245 GST_OBJECT_LOCK (pad); >>> >>> The first two snippets use the same lock order and the third snipped >>> temporarily releases a lock. >> As line 2243 unlocks pad, there must be another place where pad is >> locked. >> This other place might intervene in the cycle of lock acquisition. >> > I don't think so. The code is not related. I uploaded a log: http://hora-obscura.de/~ensonic/helgrind.log <http://hora-obscura.de/%7Eensonic/helgrind.log> the gstreamer version is from git HEAD, most complaints are against e.g. http://cgit.freedesktop.org/gstreamer/gstreamer/tree/gst/gstpad.c I readvalgrind/docs/html/hg-manual.html#hg-manual.lock-orders again. I also belive I understand the problem with different lock order, but in the cases in the log-file I can't see a problem. Also the repeated issues look a lot like lots of duplicates Stefan >>> At the point where the code locks up, I get no complaint from helgrind. >>> Will try my luck with building 3.7 now. >> With 3.7.0 SVN, you can try the debugger (cfr --vgdb-error=0) to >> understand the deadlock under memcheck. > Yes, vgdb server worked great and I know the 2 places involved in the > deadlock now. Thanks for your work on the tool! > >> You might also try again with helgrind (Julian did several improvements >> in helgrind, in particular improving the error messages). > I am still not convinced. I get lots of warnings from helgrind. Some of > them make sense, allthough I found many that cause no harm in practise. > > Stefan > >> Philippe >> >> ____ >> >> This message and any files transmitted with it are legally privileged and intended for the sole use of the individual(s) or entity to whom they are addressed. If you are not the intended recipient, please notify the sender by reply and delete the message and any attachments from your system. Any unauthorised use or disclosure of the content of this message is strictly prohibited and may be unlawful. >> >> Nothing in this e-mail message amounts to a contractual or legal commitment on the part of EUROCONTROL, unless it is confirmed by appropriately signed hard copy. >> >> Any views expressed in this message are those of the sender. > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2dcopy2 > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users |