Marcin Slusarz <marcin.slusarz@gmail.com> wrote on 01/15/2011 03:13:15 PM:

> From: Marcin Slusarz <marcin.slusarz@gmail.com>

> To: LKML <linux-kernel@vger.kernel.org>, oprofile-list@lists.sf.net
> Date: 01/16/2011 12:22 AM
> Subject: oprofile: possible circular locking dependency detected
>
> Hi
>
> Lockdep finds possible circular locking dependency during
> opcontrol --start on 2.6.37 kernel:
>
> [   22.457859] =======================================================
> [   22.458289] [ INFO: possible circular locking dependency detected ]
> [   22.458508] 2.6.37+ #539
> [   22.458720] -------------------------------------------------------
> [   22.458939] oprofiled/2365 is trying to acquire lock:
> [   22.459156]  (buffer_mutex){+.+...}, at:
> sync_buffer+0x2f/0x3e5
> [   22.459658]
> [   22.459659] but task is already holding lock:
> [   22.460083]  ((task_exit_notifier).rwsem){++++..}, at:
> __blocking_notifier_call_chain+0x5c/0x8e
> [   22.460590]
> [   22.460590] which lock already depends on the new lock.
> [   22.460591]
> [   22.461224]
> [   22.461225] the existing dependency chain (in reverse order) is:
> [   22.461652]
> [   22.461652] -> #1 ((task_exit_notifier).rwsem){++++..}:
> [   22.462183]         lock_acquire+0x144/0x189
> [   22.462438]         down_write+0x3f/0x62
> [   22.462695]        
> blocking_notifier_chain_register+0x4d/0x85
> [   22.463124]         profile_event_register+0x28/0x2a
> [   22.463380]         sync_start+0x42/0xb8
> [   22.463633]         oprofile_setup+0x5b/0xa1
> [   22.463888]         event_buffer_open+0x54/0x87
> [   22.464142]         __dentry_open+0x21e/0x354
> [   22.464398]         nameidata_to_filp+0x50/0x57
> [   22.464652]         do_last+0x480/0x5c7
> [   22.464907]         do_filp_open+0x1d3/0x5ee
> [   22.465161]         do_sys_open+0x5b/0xf8
> [   22.465415]         sys_open+0x1b/0x1d
> [   22.465668]         system_call_fastpath+0x16/0x1b
> [   22.465924]
> [   22.465925] -> #0 (buffer_mutex){+.+...}:
> [   22.466453]         __lock_acquire+0xf85/0x1602
> [   22.466708]         lock_acquire+0x144/0x189
> [   22.466961]         mutex_lock_nested+0x5e/0x304
> [   22.467216]         sync_buffer+0x2f/0x3e5
> [   22.467470]         task_exit_notify+0x11/0x15
> [   22.467725]         notifier_call_chain+0x74/0xa1
> [   22.467980]        
> __blocking_notifier_call_chain+0x71/0x8e
> [   22.468409]        
> blocking_notifier_call_chain+0xf/0x11
> [   22.468665]         profile_task_exit+0x15/0x17
> [   22.468920]         do_exit+0x25/0x7a7
> [   22.469175]         do_group_exit+0x7e/0xa9
> [   22.469429]         sys_exit_group+0x12/0x16
> [   22.469683]         system_call_fastpath+0x16/0x1b
> [   22.469938]
> [   22.469939] other info that might help us debug this:
> [   22.469939]
> [   22.470573] 1 lock held by oprofiled/2365:
> [   22.470788]  #0:  ((task_exit_notifier).rwsem){++++..}, at:
> __blocking_notifier_call_chain+0x5c/0x8e
> [   22.471329]
> [   22.471329] stack backtrace:
> [   22.471752] Pid: 2365, comm: oprofiled Not tainted 2.6.37+ #539
> [   22.471970] Call Trace:
> [   22.472183]   print_circular_bug+0xae/0xbd
> [   22.472403]   __lock_acquire+0xf85/0x1602
> [   22.472624]   ? sub_preempt_count+0x9e/0xb1
> [   22.472845]   ? sync_buffer+0x2f/0x3e5
> [   22.473063]   lock_acquire+0x144/0x189
> [   22.473282]   ? sync_buffer+0x2f/0x3e5
> [   22.473500]   ? check_usage_backwards+0x0/0xc9
> [   22.473721]   ? sync_buffer+0x2f/0x3e5
> [   22.473940]   ? sync_buffer+0x2f/0x3e5
> [   22.474158]   mutex_lock_nested+0x5e/0x304
> [   22.474378]   ? sync_buffer+0x2f/0x3e5
> [   22.474597]   sync_buffer+0x2f/0x3e5
> [   22.474815]   ?
> __blocking_notifier_call_chain+0x5c/0x8e
> [   22.475037]   ?
> __blocking_notifier_call_chain+0x5c/0x8e
> [   22.475259]   task_exit_notify+0x11/0x15
> [   22.475478]   notifier_call_chain+0x74/0xa1
> [   22.475698]   __blocking_notifier_call_chain+0x71/0x8e
> [   22.475920]   ? do_fork+0x2e9/0x3cf
> [   22.476138]   blocking_notifier_call_chain+0xf/0x11
> [   22.476359]   profile_task_exit+0x15/0x17
> [   22.476578]   do_exit+0x25/0x7a7
> [   22.476797]   ? retint_swapgs+0xe/0x13
> [   22.477017]   do_group_exit+0x7e/0xa9
> [   22.477236]   sys_exit_group+0x12/0x16
> [   22.477454]   system_call_fastpath+0x16/0x1b

We ran into this last week or so when we were trying to find an issue with perf and OProfile.
It comes about because you have the config options LOCKDEP_SUPPORT (I am fairly sure that is
the config option) enabled.  We spent some time looking into it and decided that since
sync_start() must complete before the sync_buffer() routine can be called that you couldn't
get a deadlock between these routines.  However, the same lock dependencies exist in sync_stop()
and sync_buffer().  In this case, the sync_buffer() routine has been enabled and could be
running when sync_exit() stops.  At least, we couldn't see any reason why that would not be
the case.  We never got to proving that it actually ever happens.

From searching through the changes, it appears that last fall, I believe it was the Sept 2010
time frame Robert Richter added the mutex to sync_start() and sync_stop() as part of
fixing another issue.  If I recall correctly, the issue was trying to process samples for a
process after the task struct for the process was gone.  The mutexes were added as well as
moving some code around to correct the issue.  

We looked at the code with the mutexes and don't think the are needed.  I was planning on posting
a message to the list asking about this change but hadn't gotten to it when this message came out.  
I guess what needs to be done is to evaluate if we really need the mutex in the sync_start()
and sync_stop() functions.

              Carl Love

>
> ------------------------------------------------------------------------------
> Protect Your Site and Customers from Malware Attacks
> Learn about various malware tactics and how to avoid them. Understand
> malware threats, the impact they can have on your business, and how you
> can protect your company and customers by using code signing.
> http://p.sf.net/sfu/oracle-sfdevnl
> _______________________________________________
> oprofile-list mailing list
> oprofile-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oprofile-list