|
From: Stefan K. <en...@ho...> - 2010-07-08 08:30:29
|
hi,
is anyone aware of a valgrind tool that can help me to debug a deadlock
in a highly threaded program. The programm can easily create hundreds of
threads.
What I am locking for is a tool that tracks for each thread which
mutexes are locked (incl. the strackframe of the lock) and if it is
waiting on a mutex (also including the stackframe). When the app
deadlocks, the collected data can be represented as a directed graph
("thread -> mutex" for a held lock and "mutex -> thread" for a pending
lock) and one could run Tarjan's strongly connected components algorithm
[1][2] to detect cycles. For each found cycle it could print the
involved threads with the backtraces.
Stefan
[1]
http://en.wikipedia.org/wiki/Tarjan%E2%80%99s_strongly_connected_components_algorithm
[2] http://www.logarithmic.net/pfh/blog/01208083168
|
|
From: Konstantin S. <kon...@gm...> - 2010-07-08 08:34:57
|
--tool=helgrind
On Thu, Jul 8, 2010 at 12:30 PM, Stefan Kost <en...@ho...> wrote:
> hi,
>
> is anyone aware of a valgrind tool that can help me to debug a deadlock
> in a highly threaded program. The programm can easily create hundreds of
> threads.
> What I am locking for is a tool that tracks for each thread which
> mutexes are locked (incl. the strackframe of the lock) and if it is
> waiting on a mutex (also including the stackframe). When the app
> deadlocks, the collected data can be represented as a directed graph
> ("thread -> mutex" for a held lock and "mutex -> thread" for a pending
> lock) and one could run Tarjan's strongly connected components algorithm
> [1][2] to detect cycles. For each found cycle it could print the
> involved threads with the backtraces.
>
> Stefan
>
>
> [1]
> http://en.wikipedia.org/wiki/Tarjan%E2%80%99s_strongly_connected_components_algorithm
> [2] http://www.logarithmic.net/pfh/blog/01208083168
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Valgrind-users mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
>
>
|
|
From: Stefan K. <en...@ho...> - 2010-07-08 09:09:31
|
On 08.07.2010 11:34, Konstantin Serebryany wrote:
> --tool=helgrind
>
Nope. helgrind does not complain. Does it run cycle checks on-the-fly?
Or how would it detect that the app deadlocked. I was thinking of
writing a LD_PRELOAD based toy, there I would ctrl-c the app and then
run the cycle checks and dump the results. I have found no evidence in
the docs that I can signal helgrind to tell that the app has no deadlocked.
Stefan
> On Thu, Jul 8, 2010 at 12:30 PM, Stefan Kost <en...@ho...> wrote:
>
>> hi,
>>
>> is anyone aware of a valgrind tool that can help me to debug a deadlock
>> in a highly threaded program. The programm can easily create hundreds of
>> threads.
>> What I am locking for is a tool that tracks for each thread which
>> mutexes are locked (incl. the strackframe of the lock) and if it is
>> waiting on a mutex (also including the stackframe). When the app
>> deadlocks, the collected data can be represented as a directed graph
>> ("thread -> mutex" for a held lock and "mutex -> thread" for a pending
>> lock) and one could run Tarjan's strongly connected components algorithm
>> [1][2] to detect cycles. For each found cycle it could print the
>> involved threads with the backtraces.
>>
>> Stefan
>>
>>
>> [1]
>> http://en.wikipedia.org/wiki/Tarjan%E2%80%99s_strongly_connected_components_algorithm
>> [2] http://www.logarithmic.net/pfh/blog/01208083168
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>> _______________________________________________
>> Valgrind-users mailing list
>> Val...@li...
>> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>>
>>
>>
>>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Sprint
> What will you do first with EVO, the first 4G phone?
> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
> _______________________________________________
> Valgrind-users mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-users
>
|
|
From: Konstantin S. <kon...@gm...> - 2010-07-08 09:30:42
|
On Thu, Jul 8, 2010 at 1:09 PM, Stefan Kost <en...@ho...> wrote: > On 08.07.2010 11:34, Konstantin Serebryany wrote: >> --tool=helgrind >> > > Nope. helgrind does not complain. Does it run cycle checks on-the-fly? Yes, http://valgrind.org/docs/manual/hg-manual.html#hg-manual.lock-orders > Or how would it detect that the app deadlocked. helgrind finds cycles in lock ordering, deadlock does not have to actually happen during the execution. Does your program use pthread_mutex_ or something else? Is the program dynamically linked? --kcc > I was thinking of > writing a LD_PRELOAD based toy, there I would ctrl-c the app and then > run the cycle checks and dump the results. I have found no evidence in > the docs that I can signal helgrind to tell that the app has no deadlocked. > > Stefan > > >> On Thu, Jul 8, 2010 at 12:30 PM, Stefan Kost <en...@ho...> wrote: >> >>> hi, >>> >>> is anyone aware of a valgrind tool that can help me to debug a deadlock >>> in a highly threaded program. The programm can easily create hundreds of >>> threads. >>> What I am locking for is a tool that tracks for each thread which >>> mutexes are locked (incl. the strackframe of the lock) and if it is >>> waiting on a mutex (also including the stackframe). When the app >>> deadlocks, the collected data can be represented as a directed graph >>> ("thread -> mutex" for a held lock and "mutex -> thread" for a pending >>> lock) and one could run Tarjan's strongly connected components algorithm >>> [1][2] to detect cycles. For each found cycle it could print the >>> involved threads with the backtraces. >>> >>> Stefan >>> >>> >>> [1] >>> http://en.wikipedia.org/wiki/Tarjan%E2%80%99s_strongly_connected_components_algorithm >>> [2] http://www.logarithmic.net/pfh/blog/01208083168 >>> >>> ------------------------------------------------------------------------------ >>> This SF.net email is sponsored by Sprint >>> What will you do first with EVO, the first 4G phone? >>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >>> _______________________________________________ >>> Valgrind-users mailing list >>> Val...@li... >>> https://lists.sourceforge.net/lists/listinfo/valgrind-users >>> >>> >>> >>> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by Sprint >> What will you do first with EVO, the first 4G phone? >> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >> _______________________________________________ >> Valgrind-users mailing list >> Val...@li... >> https://lists.sourceforge.net/lists/listinfo/valgrind-users >> > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > > > |
|
From: Stefan K. <en...@ho...> - 2010-07-08 10:18:17
|
On 08.07.2010 12:30, Konstantin Serebryany wrote: > On Thu, Jul 8, 2010 at 1:09 PM, Stefan Kost <en...@ho...> wrote: > >> On 08.07.2010 11:34, Konstantin Serebryany wrote: >> >>> --tool=helgrind >>> >>> >> Nope. helgrind does not complain. Does it run cycle checks on-the-fly? >> > Yes, http://valgrind.org/docs/manual/hg-manual.html#hg-manual.lock-orders > hm, then it should detect the problem indeed. > >> Or how would it detect that the app deadlocked. >> > helgrind finds cycles in lock ordering, deadlock does not have to > actually happen during the execution. > > Does your program use pthread_mutex_ or something else? > Is the program dynamically linked? > The application is a benchmark for gstreamer, using glib's gthread (which uses pthread on linux). The program is dynamically linked. If I ctrl-c the app under gdb and dump all strackframes, I have a lot of stackframes like the two below: #0 0x0012d422 in __kernel_vsyscall () #1 0x00325af9 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142 #2 0x00328e1c in _L_cond_lock_826 () from /lib/tls/i686/cmov/libpthread.so.0 #3 0x00328c40 in __pthread_mutex_cond_lock (mutex=0x824e6b0) at ../nptl/pthread_mutex_lock.c:61 #4 0x003230b3 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:203 ... and #0 0x0012d422 in __kernel_vsyscall () #1 0x00323015 in pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:122 ... Stefan > --kcc > > >> I was thinking of >> writing a LD_PRELOAD based toy, there I would ctrl-c the app and then >> run the cycle checks and dump the results. I have found no evidence in >> the docs that I can signal helgrind to tell that the app has no deadlocked. >> >> Stefan >> >> >> >>> On Thu, Jul 8, 2010 at 12:30 PM, Stefan Kost <en...@ho...> wrote: >>> >>> >>>> hi, >>>> >>>> is anyone aware of a valgrind tool that can help me to debug a deadlock >>>> in a highly threaded program. The programm can easily create hundreds of >>>> threads. >>>> What I am locking for is a tool that tracks for each thread which >>>> mutexes are locked (incl. the strackframe of the lock) and if it is >>>> waiting on a mutex (also including the stackframe). When the app >>>> deadlocks, the collected data can be represented as a directed graph >>>> ("thread -> mutex" for a held lock and "mutex -> thread" for a pending >>>> lock) and one could run Tarjan's strongly connected components algorithm >>>> [1][2] to detect cycles. For each found cycle it could print the >>>> involved threads with the backtraces. >>>> >>>> Stefan >>>> >>>> >>>> [1] >>>> http://en.wikipedia.org/wiki/Tarjan%E2%80%99s_strongly_connected_components_algorithm >>>> [2] http://www.logarithmic.net/pfh/blog/01208083168 >>>> >>>> ------------------------------------------------------------------------------ >>>> This SF.net email is sponsored by Sprint >>>> What will you do first with EVO, the first 4G phone? >>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >>>> _______________________________________________ >>>> Valgrind-users mailing list >>>> Val...@li... >>>> https://lists.sourceforge.net/lists/listinfo/valgrind-users >>>> >>>> >>>> >>>> >>>> >>> ------------------------------------------------------------------------------ >>> This SF.net email is sponsored by Sprint >>> What will you do first with EVO, the first 4G phone? >>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >>> _______________________________________________ >>> Valgrind-users mailing list >>> Val...@li... >>> https://lists.sourceforge.net/lists/listinfo/valgrind-users >>> >>> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by Sprint >> What will you do first with EVO, the first 4G phone? >> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >> _______________________________________________ >> Valgrind-users mailing list >> Val...@li... >> https://lists.sourceforge.net/lists/listinfo/valgrind-users >> >> >> >> |
|
From: Konstantin S. <kon...@gm...> - 2010-07-08 10:24:14
|
The stack trace from gdb suggests that your program is blocked on pthread_cond_wait, which does not necessary mean there is a mutex deadlock. You might be waiting for some condition which never becomes true. --kcc On Thu, Jul 8, 2010 at 2:18 PM, Stefan Kost <en...@ho...> wrote: > On 08.07.2010 12:30, Konstantin Serebryany wrote: >> On Thu, Jul 8, 2010 at 1:09 PM, Stefan Kost <en...@ho...> wrote: >> >>> On 08.07.2010 11:34, Konstantin Serebryany wrote: >>> >>>> --tool=helgrind >>>> >>>> >>> Nope. helgrind does not complain. Does it run cycle checks on-the-fly? >>> >> Yes, http://valgrind.org/docs/manual/hg-manual.html#hg-manual.lock-orders >> > hm, then it should detect the problem indeed. >> >>> Or how would it detect that the app deadlocked. >>> >> helgrind finds cycles in lock ordering, deadlock does not have to >> actually happen during the execution. >> >> Does your program use pthread_mutex_ or something else? >> Is the program dynamically linked? >> > > The application is a benchmark for gstreamer, using glib's gthread > (which uses pthread on linux). The program is dynamically linked. If I > ctrl-c the app under gdb and dump all strackframes, I have a lot of > stackframes like the two below: > #0 0x0012d422 in __kernel_vsyscall () > #1 0x00325af9 in __lll_lock_wait () at > ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142 > #2 0x00328e1c in _L_cond_lock_826 () from > /lib/tls/i686/cmov/libpthread.so.0 > #3 0x00328c40 in __pthread_mutex_cond_lock (mutex=0x824e6b0) at > ../nptl/pthread_mutex_lock.c:61 > #4 0x003230b3 in pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:203 > ... > and > #0 0x0012d422 in __kernel_vsyscall () > #1 0x00323015 in pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:122 > ... > > Stefan > > >> --kcc >> >> >>> I was thinking of >>> writing a LD_PRELOAD based toy, there I would ctrl-c the app and then >>> run the cycle checks and dump the results. I have found no evidence in >>> the docs that I can signal helgrind to tell that the app has no deadlocked. >>> >>> Stefan >>> >>> >>> >>>> On Thu, Jul 8, 2010 at 12:30 PM, Stefan Kost <en...@ho...> wrote: >>>> >>>> >>>>> hi, >>>>> >>>>> is anyone aware of a valgrind tool that can help me to debug a deadlock >>>>> in a highly threaded program. The programm can easily create hundreds of >>>>> threads. >>>>> What I am locking for is a tool that tracks for each thread which >>>>> mutexes are locked (incl. the strackframe of the lock) and if it is >>>>> waiting on a mutex (also including the stackframe). When the app >>>>> deadlocks, the collected data can be represented as a directed graph >>>>> ("thread -> mutex" for a held lock and "mutex -> thread" for a pending >>>>> lock) and one could run Tarjan's strongly connected components algorithm >>>>> [1][2] to detect cycles. For each found cycle it could print the >>>>> involved threads with the backtraces. >>>>> >>>>> Stefan >>>>> >>>>> >>>>> [1] >>>>> http://en.wikipedia.org/wiki/Tarjan%E2%80%99s_strongly_connected_components_algorithm >>>>> [2] http://www.logarithmic.net/pfh/blog/01208083168 >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> This SF.net email is sponsored by Sprint >>>>> What will you do first with EVO, the first 4G phone? >>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >>>>> _______________________________________________ >>>>> Valgrind-users mailing list >>>>> Val...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/valgrind-users >>>>> >>>>> >>>>> >>>>> >>>>> >>>> ------------------------------------------------------------------------------ >>>> This SF.net email is sponsored by Sprint >>>> What will you do first with EVO, the first 4G phone? >>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >>>> _______________________________________________ >>>> Valgrind-users mailing list >>>> Val...@li... >>>> https://lists.sourceforge.net/lists/listinfo/valgrind-users >>>> >>>> >>> >>> ------------------------------------------------------------------------------ >>> This SF.net email is sponsored by Sprint >>> What will you do first with EVO, the first 4G phone? >>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >>> _______________________________________________ >>> Valgrind-users mailing list >>> Val...@li... >>> https://lists.sourceforge.net/lists/listinfo/valgrind-users >>> >>> >>> >>> > > |
|
From: Stefan K. <en...@ho...> - 2010-07-08 11:36:38
|
On 08.07.2010 13:23, Konstantin Serebryany wrote: > The stack trace from gdb suggests that your program is blocked on > pthread_cond_wait, which does not necessary mean there is a mutex > deadlock. > You might be waiting for some condition which never becomes true. > Thanks for the help. And this was actually the case. With some combinations of sort | uniq I found the cause. Stefan > --kcc > > On Thu, Jul 8, 2010 at 2:18 PM, Stefan Kost <en...@ho...> wrote: > >> On 08.07.2010 12:30, Konstantin Serebryany wrote: >> >>> On Thu, Jul 8, 2010 at 1:09 PM, Stefan Kost <en...@ho...> wrote: >>> >>> >>>> On 08.07.2010 11:34, Konstantin Serebryany wrote: >>>> >>>> >>>>> --tool=helgrind >>>>> >>>>> >>>>> >>>> Nope. helgrind does not complain. Does it run cycle checks on-the-fly? >>>> >>>> >>> Yes, http://valgrind.org/docs/manual/hg-manual.html#hg-manual.lock-orders >>> >>> >> hm, then it should detect the problem indeed. >> >>> >>>> Or how would it detect that the app deadlocked. >>>> >>>> >>> helgrind finds cycles in lock ordering, deadlock does not have to >>> actually happen during the execution. >>> >>> Does your program use pthread_mutex_ or something else? >>> Is the program dynamically linked? >>> >>> >> The application is a benchmark for gstreamer, using glib's gthread >> (which uses pthread on linux). The program is dynamically linked. If I >> ctrl-c the app under gdb and dump all strackframes, I have a lot of >> stackframes like the two below: >> #0 0x0012d422 in __kernel_vsyscall () >> #1 0x00325af9 in __lll_lock_wait () at >> ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142 >> #2 0x00328e1c in _L_cond_lock_826 () from >> /lib/tls/i686/cmov/libpthread.so.0 >> #3 0x00328c40 in __pthread_mutex_cond_lock (mutex=0x824e6b0) at >> ../nptl/pthread_mutex_lock.c:61 >> #4 0x003230b3 in pthread_cond_wait@@GLIBC_2.3.2 () at >> ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:203 >> ... >> and >> #0 0x0012d422 in __kernel_vsyscall () >> #1 0x00323015 in pthread_cond_wait@@GLIBC_2.3.2 () at >> ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/pthread_cond_wait.S:122 >> ... >> >> Stefan >> >> >> >>> --kcc >>> >>> >>> >>>> I was thinking of >>>> writing a LD_PRELOAD based toy, there I would ctrl-c the app and then >>>> run the cycle checks and dump the results. I have found no evidence in >>>> the docs that I can signal helgrind to tell that the app has no deadlocked. >>>> >>>> Stefan >>>> >>>> >>>> >>>> >>>>> On Thu, Jul 8, 2010 at 12:30 PM, Stefan Kost <en...@ho...> wrote: >>>>> >>>>> >>>>> >>>>>> hi, >>>>>> >>>>>> is anyone aware of a valgrind tool that can help me to debug a deadlock >>>>>> in a highly threaded program. The programm can easily create hundreds of >>>>>> threads. >>>>>> What I am locking for is a tool that tracks for each thread which >>>>>> mutexes are locked (incl. the strackframe of the lock) and if it is >>>>>> waiting on a mutex (also including the stackframe). When the app >>>>>> deadlocks, the collected data can be represented as a directed graph >>>>>> ("thread -> mutex" for a held lock and "mutex -> thread" for a pending >>>>>> lock) and one could run Tarjan's strongly connected components algorithm >>>>>> [1][2] to detect cycles. For each found cycle it could print the >>>>>> involved threads with the backtraces. >>>>>> >>>>>> Stefan >>>>>> >>>>>> >>>>>> [1] >>>>>> http://en.wikipedia.org/wiki/Tarjan%E2%80%99s_strongly_connected_components_algorithm >>>>>> [2] http://www.logarithmic.net/pfh/blog/01208083168 >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> This SF.net email is sponsored by Sprint >>>>>> What will you do first with EVO, the first 4G phone? >>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >>>>>> _______________________________________________ >>>>>> Valgrind-users mailing list >>>>>> Val...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/valgrind-users >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> ------------------------------------------------------------------------------ >>>>> This SF.net email is sponsored by Sprint >>>>> What will you do first with EVO, the first 4G phone? >>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >>>>> _______________________________________________ >>>>> Valgrind-users mailing list >>>>> Val...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/valgrind-users >>>>> >>>>> >>>>> >>>> ------------------------------------------------------------------------------ >>>> This SF.net email is sponsored by Sprint >>>> What will you do first with EVO, the first 4G phone? >>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first >>>> _______________________________________________ >>>> Valgrind-users mailing list >>>> Val...@li... >>>> https://lists.sourceforge.net/lists/listinfo/valgrind-users >>>> >>>> >>>> >>>> >>>> >> >> |