From: Chandra S. <sek...@us...> - 2005-11-23 23:38:04
|
Andrew, I posted this set of patch to lkml last Friday as an RFC. Can you consider these for -mm inclusion. These patches apply on 2.6.15-rc2. Thanks, chandra Here are the details: In 2.6.14, notifier chains are unsafe. notifier_call_chain() walks through the list of a call chain without any protection. Alan Stern <st...@ro...> brought the issue and suggested a fix in lkml on Oct 24 2005: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 There was a lot of discussion on that thread regarding the issue, and following were the conclusions about the requirements of the notifier call mechanism: - The chain list has to be protected in all the places where the list is accessed. - We need a new notifier_head data structure to encompass the head of the notifier chain and a semaphore that protects the list. - There should be two types of call chains: one that is called in a process context and another that is called in atomic/interrupt context. - No lock should be acquired in notifier_call_chain() for an atomic-type chain. - notifier_chain_register() and notifier_chain_unregister() should be called only from process context. - notifier_chain_unregister() should _not_ be called from a callout routine. I posted an RFC that meets the above listed requirements last Friday: - http://marc.theaimsgroup.com/?l=linux-kernel&m=113175279131346&w=2 Paul McKenney provided some suggestions w.r.t RCU usage. This patchset fixes the issues he raised. Keith Owens posted some changes to the diechain for various architectures; his changes are included here. This is posted as an RFC as we want to get a green signal from the owners of the files that our classification of chains as ATOMIC or BLOCKING is ok. Please comment. This patchset has 7 patches: 1 of 7: Changes the definition of the heads. Same as what was posted last Friday with changes w.r.t Paul's comments. 2 of 7: Changes that affected only the notifier_head definition. 3 of 7: Changes in which we removed some protection (it's no longer needed as the basic infrastructure itself provides the protection). 4 of 7: changes for diechain for different architectures. 5 of 7: changes removing calls to notifier_unregister in the callout. 6 of 7: changes to dcdbas.c (requires special handling). 7 of 7: changes to make usb_notify to use the notify chain infrastructure instead of its own. ---------------------------------------- Here are the list of chains and their classification: BLOCKING: +++++++++ arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c: cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c: cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c: sleep_notifier_list drivers/macintosh/via-pmu68k.c: sleep_notifier_list drivers/macintosh/windfarm_core.c: wf_client_list drivers/usb/core/notify.c: usb_notifier_list drivers/video/fbmem.c: fb_notifier_list kernel/cpu.c: cpu_chain kernel/module.c: module_notify_list kernel/profile.c: munmap_notifier kernel/profile.c: task_exit_notifier kernel/sys.c: reboot_notifier_list net/core/dev.c: netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain ATOMIC: +++++++ arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain nen/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Chandra S. <sek...@us...> - 2005-11-19 02:19:38
|
In 2.6.14, notifier chains are unsafe. notifier_call_chain() walks through the list of a call chain without any protection. Alan Stern <st...@ro...> brought the issue and suggested a fix in lkml on Oct 24 2005: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 There was a lot of discussion on that thread regarding the issue, and following were the conclusions about the requirements of the notifier call mechanism: - The chain list has to be protected in all the places where the list is accessed. - We need a new notifier_head data structure to encompass the head of the notifier chain and a semaphore that protects the list. - There should be two types of call chains: one that is called in a process context and another that is called in atomic/interrupt context. - No lock should be acquired in notifier_call_chain() for an atomic-type chain. - notifier_chain_register() and notifier_chain_unregister() should be called only from process context. - notifier_chain_unregister() should _not_ be called from a callout routine. I posted an RFC that meets the above listed requirements last Friday: - http://marc.theaimsgroup.com/?l=linux-kernel&m=113175279131346&w=2 Paul McKenney provided some suggestions w.r.t RCU usage. This patchset fixes the issues he raised. Keith Owens posted some changes to the diechain for various architectures; his changes are included here. This is posted as an RFC as we want to get a green signal from the owners of the files that our classification of chains as ATOMIC or BLOCKING is ok. Please comment. This patchset has 7 patches: 1 of 7: Changes the definition of the heads. Same as what was posted last Friday with changes w.r.t Paul's comments. 2 of 7: Changes that affected only the notifier_head definition. 3 of 7: Changes in which we removed some protection (it's no longer needed as the basic infrastructure itself provides the protection). 4 of 7: changes for diechain for different architectures. 5 of 7: changes removing calls to notifier_unregister in the callout. 6 of 7: changes to dcdbas.c (requires special handling). 7 of 7: changes to make usb_notify to use the notify chain infrastructure instead of its own. ---------------------------------------- Here are the list of chains and their classification: BLOCKING: +++++++++ arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c: cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c: cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c: sleep_notifier_list drivers/macintosh/via-pmu68k.c: sleep_notifier_list drivers/macintosh/windfarm_core.c: wf_client_list drivers/usb/core/notify.c: usb_notifier_list drivers/video/fbmem.c: fb_notifier_list kernel/cpu.c: cpu_chain kernel/module.c: module_notify_list kernel/profile.c: munmap_notifier kernel/profile.c: task_exit_notifier kernel/sys.c: reboot_notifier_list net/core/dev.c: netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain ATOMIC: +++++++ arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain nen/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Andrew M. <ak...@os...> - 2005-11-27 04:08:08
|
Chandra Seetharaman <sek...@us...> wrote: > > Andrew, > > I posted this set of patch to lkml last Friday as an RFC. Can you > consider these for -mm inclusion. This all looks exotically complex. > ... > > Here are the details: > In 2.6.14, notifier chains are unsafe. notifier_call_chain() walks through > the list of a call chain without any protection. > > Alan Stern <st...@ro...> brought the issue and suggested a fix > in lkml on Oct 24 2005: > http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 > > There was a lot of discussion on that thread regarding the issue, and > following were the conclusions about the requirements of the notifier > call mechanism: > > - The chain list has to be protected in all the places where the > list is accessed. > - We need a new notifier_head data structure to encompass the head > of the notifier chain and a semaphore that protects the list. > - There should be two types of call chains: one that is called in > a process context and another that is called in atomic/interrupt > context. > - No lock should be acquired in notifier_call_chain() for an > atomic-type chain. > - notifier_chain_register() and notifier_chain_unregister() should > be called only from process context. > - notifier_chain_unregister() should _not_ be called from a > callout routine. > > I posted an RFC that meets the above listed requirements last Friday: > - http://marc.theaimsgroup.com/?l=linux-kernel&m=113175279131346&w=2 > > Paul McKenney provided some suggestions w.r.t RCU usage. This patchset fixes > the issues he raised. Keith Owens posted some changes to the diechain for > various architectures; his changes are included here. - You don't state _why_ a callback cannot call notifier_chain_unregister(). I assume that's because of the use of write_lock() locking? We could do this with a new callback function return code and do it in the core, or just change the code so it is permitted. - You don't explain why RCU has been introduced into this subsystem. Seems overkillish, or was it done as a way to solve the correctness problems? - Generally, please don't put so much stuff into the [patch 0/N] email. We never put empty patches into git so some sucker has to move all the [0/N] content into [1/N]. Better that it's you than me. - Overall take on the patches: the problem here is that notifier chains try to provide their own locking. Each time we design a container which does that, we screw it up and we regret it. Please consider removing all locking from the notifer chains and moving it into the callers. A migration path would be: - Introduce a new notifier API which is wholly unlocked - Migrate all callers over to that - Remove the current implementation Note that with this scheme, callbacks which wish to call the unregister function can do that, as long as they are not using read_lock() or down_read() during the chain traversal. |
From: Andi K. <ak...@su...> - 2005-11-27 13:47:48
|
> - Introduce a new notifier API which is wholly unlocked The old notifiers were already wholly unlocked. So it wouldn't even need any changes. Just additional locks everywhere. I agree it's the cleaner implementation. -Andi |
From: Keith O. <ka...@sg...> - 2005-11-27 16:00:11
|
On Sun, 27 Nov 2005 14:47:36 +0100, Andi Kleen <ak...@su...> wrote: >akpm wrote >> - Introduce a new notifier API which is wholly unlocked > >The old notifiers were already wholly unlocked. So it wouldn't >even need any changes. Just additional locks everywhere. Wrong. The existing implementation is racy as hell. There is NO locking on the existing chains, these patches make the notifier chains race free. Some of the notifier callbacks are used in weird contexts, including NMI, so the only option for those chains is RCU. Obviously those callbacks cannot sleep. Other chains are used in more normal context _AND_ the callbacks want to sleep, so those chains need to use sleeping locks. |
From: Andi K. <ak...@su...> - 2005-11-27 17:27:43
|
On Mon, Nov 28, 2005 at 02:59:05AM +1100, Keith Owens wrote: > On Sun, 27 Nov 2005 14:47:36 +0100, > Andi Kleen <ak...@su...> wrote: > >akpm wrote > >> - Introduce a new notifier API which is wholly unlocked > > > >The old notifiers were already wholly unlocked. So it wouldn't > >even need any changes. Just additional locks everywhere. > > Wrong. Did you actually read what I wrote? -Andi |
From: Keith O. <ka...@sg...> - 2005-11-27 17:40:11
|
On Sun, 27 Nov 2005 18:27:25 +0100, Andi Kleen <ak...@su...> wrote: >On Mon, Nov 28, 2005 at 02:59:05AM +1100, Keith Owens wrote: >> On Sun, 27 Nov 2005 14:47:36 +0100, >> Andi Kleen <ak...@su...> wrote: >> >akpm wrote >> >> - Introduce a new notifier API which is wholly unlocked >> > >> >The old notifiers were already wholly unlocked. So it wouldn't >> >even need any changes. Just additional locks everywhere. >> >> Wrong. > >Did you actually read what I wrote? Of course I did. The whole point is that _ALL_ of the existing notifier chain callback code is racy[*]. Saying that the code can be left without any changes is simply ignoring the existing races. They _ALL_ need to be fixed. [*] Except for one bit of hand crafted locking in USB. |
From: Andrew M. <ak...@os...> - 2005-11-27 19:57:09
|
Keith Owens <ka...@sg...> wrote: > > On Sun, 27 Nov 2005 18:27:25 +0100, > Andi Kleen <ak...@su...> wrote: > >On Mon, Nov 28, 2005 at 02:59:05AM +1100, Keith Owens wrote: > >> On Sun, 27 Nov 2005 14:47:36 +0100, > >> Andi Kleen <ak...@su...> wrote: > >> >akpm wrote > >> >> - Introduce a new notifier API which is wholly unlocked > >> > > >> >The old notifiers were already wholly unlocked. The register and unregister functions take a write_lock on notifier_lock. notifier_call_chain() runs unlocked. > So it wouldn't > >> >even need any changes. Just additional locks everywhere. > >> > >> Wrong. > > > >Did you actually read what I wrote? > > Of course I did. No you didn't! > The whole point is that _ALL_ of the existing > notifier chain callback code is racy[*]. yup. > Saying that the code can be > left without any changes is simply ignoring the existing races. They > _ALL_ need to be fixed. We're saying that kernel/sys.c:notifier_lock should be removed and that all callers of notifier_chain_register(), notifier_chain_unregister() and notifier_call_chain() should be changed to define and use their own lock. So the _callers_ get to decide whether they're going to use down(), spin_lock(), down_read(), read_lock(), preempt_disable(), local_irq_disable() or whatever. Furthermore we should alter notifier_call_chain() so that a callback may safely perform notifier_chain_unregister() - that's sane and easy enough. |
From: Greg KH <gr...@kr...> - 2005-11-27 22:04:11
|
On Sun, Nov 27, 2005 at 11:56:40AM -0800, Andrew Morton wrote: > We're saying that kernel/sys.c:notifier_lock should be removed and that all > callers of notifier_chain_register(), notifier_chain_unregister() and > notifier_call_chain() should be changed to define and use their own lock. > > So the _callers_ get to decide whether they're going to use down(), > spin_lock(), down_read(), read_lock(), preempt_disable(), local_irq_disable() > or whatever. I completely agree. I just watched in mute horror as Chandra and Alan spun off into the rcu blackhole trying to create one-size-fits-all notifiers. Making the user do the locking it needs to do is simple and sane. And the reason USB's notifiers are implemented correctly, is they don't use the notifier core, but rather, reimplemented their own, due to the locking mess. thanks, greg k-h |
From: Paul E. M. <pa...@us...> - 2005-11-28 02:43:07
|
On Sun, Nov 27, 2005 at 02:03:29PM -0800, Greg KH wrote: > On Sun, Nov 27, 2005 at 11:56:40AM -0800, Andrew Morton wrote: > > We're saying that kernel/sys.c:notifier_lock should be removed and that all > > callers of notifier_chain_register(), notifier_chain_unregister() and > > notifier_call_chain() should be changed to define and use their own lock. > > > > So the _callers_ get to decide whether they're going to use down(), > > spin_lock(), down_read(), read_lock(), preempt_disable(), local_irq_disable() > > or whatever. > > I completely agree. I just watched in mute horror as Chandra and Alan > spun off into the rcu blackhole trying to create one-size-fits-all > notifiers. RCU as blackhole? I am impressed -- even -I- don't find RCU to have the same limiting-case attraction that a black hole does. ;-) (My apologies, Greg, but you did leave yourself open for this!) > Making the user do the locking it needs to do is simple and sane. > > And the reason USB's notifiers are implemented correctly, is they don't > use the notifier core, but rather, reimplemented their own, due to the > locking mess. The locking mess is due to the fact that the notifier chain itself must be protected by something or another, right? Here are the options I have come across: 1. The notifier chain is guarded by a separate notifier-chain lock. We then have the following deadlock situation: o The subsystem requesting the notifier likely has to hold one of its own locks when registering and unregistering, which means that the notifier lock must be subordinate to the subsystem lock. o But when invoking the notifiers, the notifier lock must be held while walking the chain. Since the subsystem being notified likely has to acquire one of its own locks, the subsystem lock must be subordinate to the notifier lock. There are a number of ways of breaking this deadlock, some of which are listed below. Note that this deadlock situation can arise both for spinlocks and for sleeplocks. I believe that this deadlock situation is the core reason why notifier locking is so difficult to get right. Even ignoring the deadlock, this does not work for NMIs. 2. Each element of the notifier chain is guarded by reference counts, and the chain itself is guarded by a lock. Each element of the chain holds a reference to the next element in the chain, and new elements are always added to the end of the chain. The traversal code looks something like the following: spin_lock(&chain_lock); p = head; atomic_inc(&p->refcnt); while (p != NULL) { spin_unlock(&chain_lock); p->func(p->arg); spin_lock(&chain_lock); q = p->next; atomic_inc(&q->refcnt); if (atomic_dec_and_test(&p->refcnt)) { kfree(p); } p = q; } spin_unlock(&chain_lock); Note that all actual deletion is done under chain_lock. This works (I think...), and the subsystem code can use a single lock that the notifier lock (chain_lock in this case) is subordinate to. But the notifier traversal loop is quite heavyweight. And it cannot be invoked from NMI handlers without risk of self-deadlock. 3. Like #1, but require that the subsystem have at least two locks, one of which is subordinate to the notifier lock (and is acquired from the notifier callback function), and the other of which is superior to the notifier lock, and is held when registering and deregistering callbacks. This can work, but could significantly complicate the subsystem locking, and also will require that some operations acquire two locks instead of just one, since one must hold both to exclude both notifications and register/unregister operations. This approach also fails to handle notifications from NMIs. 4. "Just say no" to a separate notifier-chain lock, and guard the hole thing with whatever mechanism the subsystem deems appropriate. This can be made to work with NMI-based notification, since such subsystems can use RCU or whatever other lock-free mechanism they desire. I believe that Greg took this "just say no" approach in USB, but could easily be missing something. This works with minimal added complexity to the subsystem, but requires that each subsystem have its own notifier chain, since it does not make sense to have a single chain guarded by multiple locks. But it means that notifier code must be replicated in a number of subsystems, which seems sub-optimal. This might be the best we can do, but in the interest of completeness... 5. Use RCU to traverse the notifier chain and use simple locking to guard updates, similar to Alan's and Chandra's patch. This avoids the deadlock, and handles NMIs nicely, but does not tolerate synchronous notification callbacks that sleep. (Cases that can tolerate asynchronous notification can make use of workqueues or similar mechanisms to defer the sleeping code so that it is not executed in the scope of the rcu_read_lock() protecting the notifier-chain traversal.) 6. Have separate mechanisms for the non-sleeping and the synchronous sleeping cases. For example, #5 for non-sleeping and either #2, #3, or #4 for the synchronous sleeping case. This works in all cases, and achieves a high degree of code commonality, but does require two separate APIs. 7. Use wait-free synchronization everywhere. This has some issues with complexity, last I checked. 8. Come up with a safe and sane RCU implementation that tolerates general blocking in read-side critical sections. Note that although some realtime implementations of RCU do tolerate blocking, they do so only in the special case that priority inheritance applies to. Might happen, but I would not recommend holding your breath. Any options I missed? Thanx, Paul |
From: Andrew M. <ak...@os...> - 2005-11-28 04:58:09
|
"Paul E. McKenney" <pa...@us...> wrote: > > Any options I missed? Stop using the notifier chains from NMI context - it's too hard. Use a fixed-size array in the NMI code instead. |
From: Andi K. <ak...@su...> - 2005-11-28 04:59:35
|
On Sun, Nov 27, 2005 at 08:57:45PM -0800, Andrew Morton wrote: > "Paul E. McKenney" <pa...@us...> wrote: > > > > Any options I missed? > > Stop using the notifier chains from NMI context - it's too hard. Use a > fixed-size array in the NMI code instead. Or just don't unregister. That is what I did for the debug notifiers. -Andi |
From: Paul E. M. <pa...@us...> - 2005-11-28 05:05:25
|
On Mon, Nov 28, 2005 at 05:59:22AM +0100, Andi Kleen wrote: > On Sun, Nov 27, 2005 at 08:57:45PM -0800, Andrew Morton wrote: > > "Paul E. McKenney" <pa...@us...> wrote: > > > > > > Any options I missed? > > > > Stop using the notifier chains from NMI context - it's too hard. Use a > > fixed-size array in the NMI code instead. > > Or just don't unregister. That is what I did for the debug notifiers. So the thought is to replicate the notifier code in all non-NMI subsystems that require it? Thanx, Paul |
From: Andi K. <ak...@su...> - 2005-11-28 05:15:43
|
On Sun, Nov 27, 2005 at 09:05:22PM -0800, Paul E. McKenney wrote: > On Mon, Nov 28, 2005 at 05:59:22AM +0100, Andi Kleen wrote: > > On Sun, Nov 27, 2005 at 08:57:45PM -0800, Andrew Morton wrote: > > > "Paul E. McKenney" <pa...@us...> wrote: > > > > > > > > Any options I missed? > > > > > > Stop using the notifier chains from NMI context - it's too hard. Use a > > > fixed-size array in the NMI code instead. > > > > Or just don't unregister. That is what I did for the debug notifiers. > > So the thought is to replicate the notifier code in all non-NMI subsystems > that require it? The non NMI subsystems just need to get appropiate locking (or preferable just use RCU if the notifiers can't sleep) -Andi |
From: Keith O. <ka...@sg...> - 2005-11-28 08:32:49
|
On Mon, 28 Nov 2005 05:59:22 +0100, Andi Kleen <ak...@su...> wrote: >On Sun, Nov 27, 2005 at 08:57:45PM -0800, Andrew Morton wrote: >> "Paul E. McKenney" <pa...@us...> wrote: >> > >> > Any options I missed? >> >> Stop using the notifier chains from NMI context - it's too hard. Use a >> fixed-size array in the NMI code instead. > >Or just don't unregister. That is what I did for the debug notifiers. Unregister is not the only problem. Chain traversal races with register as well. |
From: Andi K. <ak...@su...> - 2005-11-28 12:07:30
|
On Mon, Nov 28, 2005 at 07:31:36PM +1100, Keith Owens wrote: > On Mon, 28 Nov 2005 05:59:22 +0100, > Andi Kleen <ak...@su...> wrote: > >On Sun, Nov 27, 2005 at 08:57:45PM -0800, Andrew Morton wrote: > >> "Paul E. McKenney" <pa...@us...> wrote: > >> > > >> > Any options I missed? > >> > >> Stop using the notifier chains from NMI context - it's too hard. Use a > >> fixed-size array in the NMI code instead. > > > >Or just don't unregister. That is what I did for the debug notifiers. > > Unregister is not the only problem. Chain traversal races with > register as well. Either it follows the old next or the new next. Both are valid. The only problem is that there isn't a write barrier between n->next = *list; *list=n; in notifier_chain_register, which might hit on non i386 architectures. -Andi |
From: Paul E. M. <pa...@us...> - 2005-11-28 19:55:38
|
On Mon, Nov 28, 2005 at 01:07:11PM +0100, Andi Kleen wrote: > On Mon, Nov 28, 2005 at 07:31:36PM +1100, Keith Owens wrote: > > On Mon, 28 Nov 2005 05:59:22 +0100, > > Andi Kleen <ak...@su...> wrote: > > >On Sun, Nov 27, 2005 at 08:57:45PM -0800, Andrew Morton wrote: > > >> "Paul E. McKenney" <pa...@us...> wrote: > > >> > > > >> > Any options I missed? > > >> > > >> Stop using the notifier chains from NMI context - it's too hard. Use a > > >> fixed-size array in the NMI code instead. > > > > > >Or just don't unregister. That is what I did for the debug notifiers. > > > > Unregister is not the only problem. Chain traversal races with > > register as well. > > Either it follows the old next or the new next. Both are valid. > The only problem is that there isn't a write barrier between > > n->next = *list; > *list=n; > > in notifier_chain_register, which might hit on non i386 architectures. Coding as follows: n->next = *list; rcu_assign_pointer(*list, n); will provide memory barriers as needed, even if you are never removing elements. Thanx, Paul |
From: Alan C. <al...@lx...> - 2005-12-06 15:36:31
|
On Llu, 2005-11-28 at 19:31 +1100, Keith Owens wrote: > >Or just don't unregister. That is what I did for the debug notifiers. > > Unregister is not the only problem. Chain traversal races with > register as well. There are some NMI handler registration functions and attempts at safe code for it in the unmerged experimental part of the bluesmoke (bluesmoke.sf.net) project that may be useful perhaps ? |
From: Chandra S. <sek...@us...> - 2005-11-28 18:58:29
|
On Sat, 2005-11-26 at 20:07 -0800, Andrew Morton wrote: > Chandra Seetharaman <sek...@us...> wrote: > > > > Andrew, > > > > I posted this set of patch to lkml last Friday as an RFC. Can you > > consider these for -mm inclusion. > > This all looks exotically complex. > > > ... > > > > Here are the details: > > In 2.6.14, notifier chains are unsafe. notifier_call_chain() walks through > > the list of a call chain without any protection. > > > > Alan Stern <st...@ro...> brought the issue and suggested a fix > > in lkml on Oct 24 2005: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 > > > > There was a lot of discussion on that thread regarding the issue, and > > following were the conclusions about the requirements of the notifier > > call mechanism: > > > > - The chain list has to be protected in all the places where the > > list is accessed. > > - We need a new notifier_head data structure to encompass the head > > of the notifier chain and a semaphore that protects the list. > > - There should be two types of call chains: one that is called in > > a process context and another that is called in atomic/interrupt > > context. > > - No lock should be acquired in notifier_call_chain() for an > > atomic-type chain. > > - notifier_chain_register() and notifier_chain_unregister() should > > be called only from process context. > > - notifier_chain_unregister() should _not_ be called from a > > callout routine. > > > > I posted an RFC that meets the above listed requirements last Friday: > > - http://marc.theaimsgroup.com/?l=linux-kernel&m=113175279131346&w=2 > > > > Paul McKenney provided some suggestions w.r.t RCU usage. This patchset fixes > > the issues he raised. Keith Owens posted some changes to the diechain for > > various architectures; his changes are included here. > > - You don't state _why_ a callback cannot call > notifier_chain_unregister(). I assume that's because of the use of > write_lock() locking? Yes. Also, for both type of notifiers we use semaphore to protect the list, so we cannot call unregister from atomic context. > > We could do this with a new callback function return code and do it in > the core, or just change the code so it is permitted. That is a good idea, we can do that. > > - You don't explain why RCU has been introduced into this subsystem. > Seems overkillish, or was it done as a way to solve the correctness > problems? After discussions in lkml (and looking at usages of notifier chains in code), the consensus was that we needed two types of notifier chains, atomic and blocking. During the discussions we concluded that blocking notifiers can be easily protected with a semaphore. Locks is not good enough for atomic notifiers as they are called from NMI context too. So, we needed a lockless way to protect the list traversal, hence RCU. > > - Generally, please don't put so much stuff into the [patch 0/N] email. > We never put empty patches into git so some sucker has to move all the > [0/N] content into [1/N]. Better that it's you than me. Will do. > > - Overall take on the patches: the problem here is that notifier chains > try to provide their own locking. Each time we design a container which > does that, we screw it up and we regret it. > The problem is that the current notifier chain mechanism gives the notion of protection (lock acquired in register/unregister), but does not protect list traversal. > Please consider removing all locking from the notifer chains and moving > it into the callers. > > A migration path would be: > > - Introduce a new notifier API which is wholly unlocked > > - Migrate all callers over to that > > - Remove the current implementation We can just remove the write_lock from register/unregister of current implementation and it will do what you are suggesting. Also, with the current implementation, subsystems can assume that there is no locks, but none of them do. They work around the locking/race issues different ways like (1) not unregistering at all (most of them) or having (2) their own _complete_ notify chain logic (usb). (1) is a workaround and (2) is redundant code. > > Note that with this scheme, callbacks which wish to call the unregister > function can do that, as long as they are not using read_lock() or > down_read() during the chain traversal. > -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |