From: Matt H. <mat...@us...> - 2006-05-05 07:08:50
|
On Thu, 2006-05-04 at 15:21 +0200, Jes Sorensen wrote: > Matt Helsley wrote: > > On Tue, 2006-05-02 at 11:45 +0200, Jes Sorensen wrote: > >> so focused on memory consumption. However the real case is that you are > >> likely to have maybe 2-3 notifiers per task, since what the per task > >> notifier chain solves is also it gets rid of a ton of rarely used per > >> task callouts that are currently hard coded. > > > > I wonder if you're over-exaagerating what it can get rid of. They seem > > useful, but "a ton".. > > Why write it with a pen if you can have it on a neon sign? Right now > we have a lot of things in the fork path which is to support what I'd > qualify as uninteresting for the majority of users, such as > security_task_alloc, audit_alloc, copy_keys ... unfortunately the > distros have gone obsessed about the selinux/apparmor stuff so everybody > is being forcefed it now :( Anyway probably better to discuss over beer > :) Ahh, interesting. I don't know selinux and apparmor well enough to determine if they can be factored out as a per-task notifier block or not. I find writing with a pen is faster than making a neon sign. ;) <snip> > > I distinctly recall that you used a boot-time anecdote -- which is a > > distinctly different set of paths than we're talking about here -- to > > justify your argument. > > The boot time case was just an example of what happens when a lock is > too contested. The problem in that case wasn't specific to booting, even > if one managed to boot the machine, the cost of bouncing cache lines > around can be quite high if one does it too much. > > > I'm saying we should go with the simpler solution until you can > > empirically show that a more complex solution is needed. Perhaps you > > could test this patch or something like it on your 512-way to show what > > kind of impact it has: > > > > http://lkml.org/lkml/2006/1/12/379 > > > > Then if you find there are problems we can choose a more complex > > solution for the global notifier chain. > > Note that what was requested recently was a blocking semaphore, not > a spin lock. In addition it is going to depend on the actual load, for > the case where one lanches NR_CPUS jobs and they all run for hours it's > not going to cost a lot. The point I am after is that we know that locks > are expensive, we should stop going around adding more unless we > absolutely have to. The lock patch was intended to get some idea of the lock contention and/or cache line bouncing that would be seen. You could replace the lock with a semaphore and make it more accurate -- which is why I said "something like it" :). I disagree with your criteria for when to use locks. My criteria would take code complexity into account. I would trade miniscule performance differences for significantly simpler code. > Another example I can mention, which wasn't an actual lock, just a hot > cacheline, was back when I wrote the acenic GigE driver. Going from a > global SLAB pointer to per-CPU gave a 10% net performance increase on > a 2 CPU 400MHz P2 box. You don't need a 512 CPU machine to see the > effects of this, it's all about the conditions. A nice improvement. Please consider your points (that locks aren't free and cache lines can be hot) well-taken. I don't think you've connected these examples with the logic needed to support your assertion. You need to show that the paths in your anecdotes and the path's we're considering are very similar. Otherwise, while interesting, they are irrelevant to our discussion. Cheers, -Matt Helsley |
From: Jes S. <je...@sg...> - 2006-05-08 08:44:29
|
Matt Helsley wrote: > On Thu, 2006-05-04 at 15:21 +0200, Jes Sorensen wrote: >> Note that what was requested recently was a blocking semaphore, not >> a spin lock. In addition it is going to depend on the actual load, for >> the case where one lanches NR_CPUS jobs and they all run for hours it's >> not going to cost a lot. The point I am after is that we know that locks >> are expensive, we should stop going around adding more unless we >> absolutely have to. > > The lock patch was intended to get some idea of the lock contention > and/or cache line bouncing that would be seen. You could replace the > lock with a semaphore and make it more accurate -- which is why I said > "something like it" :). Hi Matt, I'd be worth trying to see how it shows up in lmbench. If I can find some time and some time on a large box I might give it a whirl. > I disagree with your criteria for when to use locks. My criteria would > take code complexity into account. I would trade miniscule performance > differences for significantly simpler code. The problem I am trying to raise awareness of is that there are a lot of people out there who seem to think that if something doesn't show on a 2-way it doesn't show at all. I don't think you're saying that, but one does get a bit thick skull'ed after a while. > I don't think you've connected these examples with the logic needed to > support your assertion. You need to show that the paths in your > anecdotes and the path's we're considering are very similar. Otherwise, > while interesting, they are irrelevant to our discussion. The paths here are definately different, but we should always bear the cost in mind. Locks are good for some things, but one should never add one without considering the impact. For the fork/exit path we are talking about we should expect to take the lock twice for each task run. For some runs this clearly won't be measurable, but for others it will. Cheers, Jes |
From: Jes S. <je...@sg...> - 2006-05-02 09:27:54
|
Chandra Seetharaman wrote: > On Mon, 2006-05-01 at 12:07 +0200, Jes Sorensen wrote: >> that someone will unregister a file system like that while the task that >> accessed it is still live? The cost of adding locking is far worse than >> the above IMHO fairly artificial case. > > even if you consider this as an artificial case, don't you think the > kernel code should work properly ? There are many cases where the kernel is optimized for the common path and the obscure path is penalized. This is exactly what I am arguing here. > 1: static int __kprobes notifier_call_chain(struct notifier_block **nl, > 2: unsigned long val, void *v) > 3: { > 4: int ret = NOTIFY_DONE; > 5: struct notifier_block *nb; > 6: > 7: nb = rcu_dereference(*nl); > 8: while (nb) { > 9: ret = nb->notifier_call(nb, val, v); > a: if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK) > b: break; > c: nb = rcu_dereference(nb->next); > d: } > e: return ret; > f: } > > line 7 gets nb, line c resets nb with nb->next, if nb is freed in > notifier_call on line 9, how can we be sure that nb->next in line c is > sane ? > > Hence, IMO you cannot free the notifier block from the callback itself. I was just sitting looking at this, and you're right. This is a very unfortunate side-effect of RCU'ifying the notifier code. Being able to have a cleanup notifier run makes a ton of sense, so maybe there ought to be a special notifier call run for this or a flag indicating that the notifier block should be kfree'ed after the rcu_dereference. Adding Alan Stern to the CC since he wrote the current notifier implementation. Alan, any suggestions? >>> even though the notifier block is per task, notifier_head in the task is >>> global, in other words, there can be many users (of the task notify >>> mechanism) that can be doing one of the following at the _same time_ >>> with a single notifier head: >>> - register a new notifier block >>> - unregister a notifier block >>> - walk through the call chain in notifier_call_chain >> Nobody will walk the notifier chain outside of task context, anything >> doing that is broken. In addition, since the registration is done by >> the task itself this isn't a problem and unregistration is done at >> exit time. > > So your model is based on all of the above operations (register, > unregister and call_chain) happening in the 'said' task's context. > right ? Correct > I have a module that is interested in few tasks in the system. When my > module is insmoded, how do i get into each process's context to register > my notifier_block ? As I said before, you launch it the way cpusets do it and all children will inherit it. However if we can find a solution that allows us to use the atomic's RCU approach then I think that will be a reasonable compromise. The read-write semaphore is just too nasty to add for it. >> It's totally valid for the mechanism. If you wish to apply accounting or >> resource control to the system, then it's expectable that this is >> enabled by the init task and not at random later. It doesn't mean the >> module has to be in use at all time, but once you insmod your accounting >> then it's on the system. Hence the cpusets comparison makes sense IMHO. > > kernel does not impose "once you insmod your accounting then it's on the > system". So, we _should_ not have that as an assumption. Since when were you able to rmmod the bsd accounting subsystem? > Task notify should be able to handle insmod and rmmod of a module > cleanly. I have already given you a way to do this without incurring unreasonable cost. This argument is artificial and doesn't match actual use of modules. In fact, when the new module support was implemented there was strong argument for disallowing module unload alltogether. >> For things that wants to know about every task in the system it doesn't >> make sense to register it with a task notifier. It should rather be put >> in with a fixed callout in the path or there should be two levels of >> notifier chains, global notifiers and task notifiers. > > So, you are suggesting that there should be two kind of notifiers, one > for global notification and one for per task ? I think we're talking two totally different things. A global notifier chain as you suggest is not feasible for per-task notification as it would cripple performance completely. However per-task notification is probably not suitable for other things. > Still need to resolve the register/unregister/locking above for task > notify. Correct >> Pardon my ignorance, but what is MVFS? What does it do and how does it >> rely on this feature. > > In an earlier thread on this same topic, John Kohl mentioned about it. > Here is the link http://sourceforge.net/mailarchive/message.php? > msg_id=13827470 > > John can provide more info. Well reading John's posting it looks like per-task notification like I propose would match exactly what he is asking for. Cheers, Jes |
From: Alan S. <st...@ro...> - 2006-05-02 15:37:17
|
On Tue, 2 May 2006, Jes Sorensen wrote: > I was just sitting looking at this, and you're right. This is a very > unfortunate side-effect of RCU'ifying the notifier code. Being able to > have a cleanup notifier run makes a ton of sense, so maybe there ought > to be a special notifier call run for this or a flag indicating that the > notifier block should be kfree'ed after the rcu_dereference. Adding Alan > Stern to the CC since he wrote the current notifier implementation. > > Alan, any suggestions? I think Matt may be on the right path in http://marc.theaimsgroup.com/?l=lse-tech&m=114654571406352&w=2 although I don't understand most of what he says. I gather that a large part of the problem comes from the usage patterns you expect. There will be some task_notifier users that are interested in every task in the system, and there will be others that are interested in only a handful of tasks. This certainly suggests using two different registration mechanisms and data structures, even though the form of the callouts might be the same for the two cases. The first case is easy. There can be a single global blocking_notifier chain (kind of like Chandra's original suggestion but not atomic) that gets called out on every interesting event. Anybody who cares about every task in the system registers on this chain. Locking shouldn't be an issue. The blocking notifier implementation uses rw-semaphores, which are pretty efficient in the no-writers case. (The cost is essntially that of a cache miss and a memory barrier -- pretty small compared to all the other overhead of task creation or removal.) The only time a write-lock would be needed is when someone is registering or unregistering on the chain, which won't happen very often. The second case (people interested in some tasks but not all) is more complicated. Even before trying to decide on an implementation, there are some obvious questions. If someone is interested in only some tasks, how does he know which tasks to be interested in? How does he know when he wants to start being interested in a new task? How does he know when he wants to stop being interested in a task (other than the obvious cases of task exit or module unloading)? If he's interested in a task, is he also automatically interested in the task's children? The best approach will depend on the answers to these and possibly other related questions. Alan Stern |
From: Jes S. <je...@sg...> - 2006-05-04 13:00:35
|
Alan Stern wrote: > On Tue, 2 May 2006, Jes Sorensen wrote: > >> I was just sitting looking at this, and you're right. This is a very >> unfortunate side-effect of RCU'ifying the notifier code. Being able to >> have a cleanup notifier run makes a ton of sense, so maybe there ought >> to be a special notifier call run for this or a flag indicating that the >> notifier block should be kfree'ed after the rcu_dereference. Adding Alan >> Stern to the CC since he wrote the current notifier implementation. >> >> Alan, any suggestions? > > I think Matt may be on the right path in > > http://marc.theaimsgroup.com/?l=lse-tech&m=114654571406352&w=2 > > although I don't understand most of what he says. Hi Alan, As I wrote in my response to that posting he isn't .... or rather we're onto oranges and apples. Two different problems we're trying to solve and while a solution for one of the problems can be abused fo solve the other problem it is going to be very expensive doing so. > I gather that a large part of the problem comes from the usage patterns > you expect. There will be some task_notifier users that are interested > in every task in the system, and there will be others that are interested > in only a handful of tasks. This certainly suggests using two different > registration mechanisms and data structures, even though the form of the > callouts might be the same for the two cases. Correct > The first case is easy. There can be a single global blocking_notifier > chain (kind of like Chandra's original suggestion but not atomic) that > gets called out on every interesting event. Anybody who cares about every > task in the system registers on this chain. > > Locking shouldn't be an issue. The blocking notifier implementation uses > rw-semaphores, which are pretty efficient in the no-writers case. (The > cost is essntially that of a cache miss and a memory barrier -- pretty > small compared to all the other overhead of task creation or removal.) > The only time a write-lock would be needed is when someone is registering > or unregistering on the chain, which won't happen very often. I have to correct you here. Locking on a global notifier is a major issue. There is a misconception that rwlock's are free as long as one is a reader. This is not the case, you still need to go on the bus with an atomic operation resulting in the cache line being bounced around. On small systems you probably won't notice it, but on a large one which runs many small tasks, eg. forks constantly, it will become noticeable. This is why I am saying that any global notifier really needs to use the RCU version and not the blocking lock version, but that may have impact on memory allocation etc. > The second case (people interested in some tasks but not all) is more > complicated. Even before trying to decide on an implementation, there are > some obvious questions. If someone is interested in only some tasks, how > does he know which tasks to be interested in? How does he know when he > wants to start being interested in a new task? How does he know when he > wants to stop being interested in a task (other than the obvious cases > of task exit or module unloading)? If he's interested in a task, is he > also automatically interested in the task's children? Which tasks are of interest is either inherited, in which case we can do it completely lockless, or it will be explicitly decided by another user app by doing a syscall requesting a certain feature to be attached to a given task. For when to stop being interested my argument is that it is sufficient that the user is interested until the task dies. However if really needed this could be done via RCU ... but then that has other unpleasant sideeffects. As for the children that is up to the given client, which it will determine by providing a fork() hook. The reason I pulled you onto the CC list is the issue of how to solve the problem of being able to unregister yourself from a notifier list from the notifier callout? The use of RCU in the core notifier implementation makes this really difficult, so the question is should one just bite the bullet and do a special notifier implementation for this or can we modify the code to handle it, eg. by having the notifier callout have a special flag in the return value requesting the code to unregister and free the notifier block? Or does it make more sense to have a notifier_chain_drain() call that unregisters and frees all? Cheers, Jes |
From: Alan S. <st...@ro...> - 2006-05-04 17:27:26
|
On Thu, 4 May 2006, Jes Sorensen wrote: > > The first case is easy. There can be a single global blocking_notifier > > chain (kind of like Chandra's original suggestion but not atomic) that > > gets called out on every interesting event. Anybody who cares about every > > task in the system registers on this chain. > > > > Locking shouldn't be an issue. The blocking notifier implementation uses > > rw-semaphores, which are pretty efficient in the no-writers case. (The > > cost is essntially that of a cache miss and a memory barrier -- pretty > > small compared to all the other overhead of task creation or removal.) > > The only time a write-lock would be needed is when someone is registering > > or unregistering on the chain, which won't happen very often. > > I have to correct you here. Locking on a global notifier is a major > issue. There is a misconception that rwlock's are free as long as one > is a reader. This is not the case, you still need to go on the bus > with an atomic operation resulting in the cache line being bounced > around. That's pretty much what I said, isn't it? > On small systems you probably won't notice it, but on a large > one which runs many small tasks, eg. forks constantly, it will become > noticeable. Isn't a cache-line bounce still very small compared to all the overhead of a fork or an exit? > This is why I am saying that any global notifier really needs to use > the RCU version and not the blocking lock version, but that may have > impact on memory allocation etc. I assumed that a notifier called during fork might need to block. If that's so then you have no choice but to use a blocking notifier. However, if all of the callouts on the chain can run in an atomic context then an atomic notifier (with RCU) will work okay. It's up to you and the users to decide which best suits your needs. > > The second case (people interested in some tasks but not all) is more > > complicated. Even before trying to decide on an implementation, there are > > some obvious questions. If someone is interested in only some tasks, how > > does he know which tasks to be interested in? How does he know when he > > wants to start being interested in a new task? How does he know when he > > wants to stop being interested in a task (other than the obvious cases > > of task exit or module unloading)? If he's interested in a task, is he > > also automatically interested in the task's children? > > Which tasks are of interest is either inherited, in which case we can > do it completely lockless, or it will be explicitly decided by another > user app by doing a syscall requesting a certain feature to be attached > to a given task. > > For when to stop being interested my argument is that it is sufficient > that the user is interested until the task dies. However if really > needed this could be done via RCU ... but then that has other unpleasant > sideeffects. This was one of the issues you and Chandra disagreed about. I don't know what's right. I don't even know how a driver can tell when to stop being interested in some task. > As for the children that is up to the given client, which it will > determine by providing a fork() hook. > > The reason I pulled you onto the CC list is the issue of how to solve > the problem of being able to unregister yourself from a notifier list > from the notifier callout? The use of RCU in the core notifier > implementation makes this really difficult, so the question is should > one just bite the bullet and do a special notifier implementation for > this or can we modify the code to handle it, eg. by having the notifier > callout have a special flag in the return value requesting the code to > unregister and free the notifier block? Or does it make more sense to > have a notifier_chain_drain() call that unregisters and frees all? Well, this is why I asked all those questions. It's not clear to me that what you're doing really matches the capabilities of a notifier_chain, even apart from the issue of unregistering. (Note that the general question of how to implement something like a notifier chain while allowing unregistration from within a callout is extremely difficult, regardless of the tricks you play. In fact it's unsolvable without extensive locking and synchronization, which I'm sure you want to avoid.) However... This is how it looks right now. You could use notifier chains, with one chain per process. That chain would contain notifier_blocks for all the drivers interested in the process (but not interested in all processes). It would have to be a very unusual usage pattern because all the notifier_block structures would be allocated dynamically. As for unregistering, I don't see why a driver would want to unregister from one of these chains during a callout. The only event that comes to mind is "exit", but there's no need to unregister then because the entire chain is about to be destroyed. But let's assume some driver does indeed have a reason to stop being interested in a particular process. Here's how it can be done without trying to unregister during a callout. When the notifier_block structures are allocated dynamically, allocate them as part of a larger structure that includes a driver_flags field. When a driver is no longer interested in a process, it sets the driver_flags for the corresponding notifier_block to a non-zero value. Then at the start of the callout routine, it tests the driver_flags and returns immediately if the value is non-zero. If the driver still wants to, it can use schedule_work() to unregister itself permanently from the chain. Alan Stern |
From: <jt...@us...> - 2006-05-04 18:57:45
|
>>>>> "Alan" == Alan Stern <st...@ro...> writes: Alan> I assumed that a notifier called during fork might need to block. That's certainly the case for ClearCase MVFS--we'd need to allocate additional state for the new child process, and that could block waiting for memory. -- John Kohl Senior Software Engineer - Rational Software - IBM Software Group Lexington, Massachusetts, USA jt...@us... <http://www.ibm.com/software/rational/> |
From: Jes S. <je...@sg...> - 2006-05-05 08:02:28
|
John T. Kohl wrote: >>>>>> "Alan" == Alan Stern <st...@ro...> writes: > > Alan> I assumed that a notifier called during fork might need to block. > > That's certainly the case for ClearCase MVFS--we'd need to allocate > additional state for the new child process, and that could block waiting > for memory. But you also mentioned that you need per-task data associated with it, is that correct? In that case Jack's original task notifier implementation probably would suit you far better. Cheers, Jes |
From: Christoph H. <hc...@ls...> - 2006-05-08 07:55:55
|
On Thu, May 04, 2006 at 02:57:18PM -0400, John T. Kohl wrote: > >>>>> "Alan" == Alan Stern <st...@ro...> writes: > > Alan> I assumed that a notifier called during fork might need to block. > > That's certainly the case for ClearCase MVFS--we'd need to allocate > additional state for the new child process, and that could block waiting > for memory. Any such interface would be a _GPL export anyway, so please sodd off here with your illegal crap. |
From: Jes S. <je...@sg...> - 2006-05-05 08:01:06
|
Alan Stern wrote: > On Thu, 4 May 2006, Jes Sorensen wrote: >> I have to correct you here. Locking on a global notifier is a major >> issue. There is a misconception that rwlock's are free as long as one >> is a reader. This is not the case, you still need to go on the bus >> with an atomic operation resulting in the cache line being bounced >> around. > > That's pretty much what I said, isn't it? Hi Alan, Maybe I misread you, my point was merely to underline the fact that locks aren't cheap at all, in particularly the rwlocks. >> On small systems you probably won't notice it, but on a large >> one which runs many small tasks, eg. forks constantly, it will become >> noticeable. > > Isn't a cache-line bounce still very small compared to all the overhead of > a fork or an exit? It probably is relatively small, however adding another expensive contention point just because there already are a few really isn't that good an idea. >> This is why I am saying that any global notifier really needs to use >> the RCU version and not the blocking lock version, but that may have >> impact on memory allocation etc. > > I assumed that a notifier called during fork might need to block. If > that's so then you have no choice but to use a blocking notifier. > However, if all of the callouts on the chain can run in an atomic > context then an atomic notifier (with RCU) will work okay. It's up to you > and the users to decide which best suits your needs. The per-task notification that we are proposing doesn't have this problem because it is only ever called in task context, it can block just as anything else in that path is allowed to do. >> The reason I pulled you onto the CC list is the issue of how to solve >> the problem of being able to unregister yourself from a notifier list >> from the notifier callout? The use of RCU in the core notifier >> implementation makes this really difficult, so the question is should >> one just bite the bullet and do a special notifier implementation for >> this or can we modify the code to handle it, eg. by having the notifier >> callout have a special flag in the return value requesting the code to >> unregister and free the notifier block? Or does it make more sense to >> have a notifier_chain_drain() call that unregisters and frees all? > > Well, this is why I asked all those questions. It's not clear to me that > what you're doing really matches the capabilities of a notifier_chain, > even apart from the issue of unregistering. I am coming to the conclusion that the generic notifier chain code probably isn't suited for this :( It had been my initial hope that we could share the code, but the RCU'ification actually prevents it, at least in it's current form. > (Note that the general question of how to implement something like a > notifier chain while allowing unregistration from within a callout is > extremely difficult, regardless of the tricks you play. In fact it's > unsolvable without extensive locking and synchronization, which I'm sure > you want to avoid.) It's not that difficult as long as you have the right contraints around it, eg. if it is only ever accessible from the task's own context then you can get away with it totally lock free. > As for unregistering, I don't see why a driver would want to unregister > from one of these chains during a callout. The only event that comes to > mind is "exit", but there's no need to unregister then because the entire > chain is about to be destroyed. You often want to have an exit call performed first so you need the callout before destroying the chain. One way to do it would be two functions, first a normal callout then a drain/destroy call that flushes the chain and releases it all. Cheers, Jes |
From: Alan S. <st...@ro...> - 2006-05-05 15:14:18
|
On Fri, 5 May 2006, Jes Sorensen wrote: > Hi Alan, > > Maybe I misread you, my point was merely to underline the fact that > locks aren't cheap at all, in particularly the rwlocks. > > >> On small systems you probably won't notice it, but on a large > >> one which runs many small tasks, eg. forks constantly, it will become > >> noticeable. > > > > Isn't a cache-line bounce still very small compared to all the overhead of > > a fork or an exit? > > It probably is relatively small, however adding another expensive > contention point just because there already are a few really isn't > that good an idea. Basically what you need is a variant rw-semaphore that has no cache-bounce overhead in the readers-only case. Such a thing could be implemented using a per-CPU reader count. This would solve part of your problem (the global chain of drivers who are interested in every task). Somebody else once complained about the overhead of using rw-semaphores for blocking notifiers. He seemed to be more concerned about the cost of the memory barriers than the cost of the cache-line bouncing. I can't think of any way to eliminate barriers when coordinating multiple CPUs... > > I assumed that a notifier called during fork might need to block. If > > that's so then you have no choice but to use a blocking notifier. > > However, if all of the callouts on the chain can run in an atomic > > context then an atomic notifier (with RCU) will work okay. It's up to you > > and the users to decide which best suits your needs. > > The per-task notification that we are proposing doesn't have this > problem because it is only ever called in task context, it can block > just as anything else in that path is allowed to do. Which rules out using RCU, since you're not allowed to block while holding an rcu_read_lock. > I am coming to the conclusion that the generic notifier chain code > probably isn't suited for this :( It had been my initial hope that we > could share the code, but the RCU'ification actually prevents it, at > least in it's current form. Perhaps. It may not be very easy to find anything better, however. > > (Note that the general question of how to implement something like a > > notifier chain while allowing unregistration from within a callout is > > extremely difficult, regardless of the tricks you play. In fact it's > > unsolvable without extensive locking and synchronization, which I'm sure > > you want to avoid.) > > It's not that difficult as long as you have the right contraints around > it, eg. if it is only ever accessible from the task's own context then > you can get away with it totally lock free. Unregistration for rmmod becomes very difficult. How does the driver get into the contexts of all the tasks it is interested in, so that it can unregister? Also, this constraint is obviously impossible to satisfy for drivers that are interested in every task. > > As for unregistering, I don't see why a driver would want to unregister > > from one of these chains during a callout. The only event that comes to > > mind is "exit", but there's no need to unregister then because the entire > > chain is about to be destroyed. > > You often want to have an exit call performed first so you need the > callout before destroying the chain. I don't understand. You want to have an exit call performed first before what? And of course you have to perform the callouts before destroying the chain -- without the chain you can't do the callouts at all! > One way to do it would be two > functions, first a normal callout then a drain/destroy call that flushes > the chain and releases it all. That's what I suggested above. Except for draining/flushing the chain; I don't know what you mean by that. Do you mean waiting for all users of the chain to finish? Why would you need to wait if the chain is never accessed outside the task's context? > One way to get around that would be to do something like this. However > I am starting to think that having a notifier_chain_destroy might be a > better way to go to avoid calling notifier_chain_unregister repeatedly > just to wipe things out. Patch #2 of the attached. I don't follow your patches. For one thing, your NOTIFIER_UNREGISTER_FREE patch will oops because the code holds an rcu_read_lock() while calling synchronize_rcu(). You could move synchronize_rcu outside the locked region -- but that begs the question of why you have synchronize_rcu there at all. What good does it do? For another, I don't understand why you modified the atomic notifier routines. Didn't you say that your callouts will need to be able to block? That makes them not atomic, by definition, and not RCU-able. And I don't think it's a good idea to add atomic_notifier_chain_drain to kernel/sys.c. It's not part of the regular notifier infrastructure; it's something special for your individual application. Alan Stern |
From: Jes S. <je...@sg...> - 2006-05-05 15:22:11
|
Alan Stern wrote: > Basically what you need is a variant rw-semaphore that has no cache-bounce > overhead in the readers-only case. Such a thing could be implemented > using a per-CPU reader count. This would solve part of your problem (the > global chain of drivers who are interested in every task). > > Somebody else once complained about the overhead of using rw-semaphores > for blocking notifiers. He seemed to be more concerned about the cost of > the memory barriers than the cost of the cache-line bouncing. I can't > think of any way to eliminate barriers when coordinating multiple CPUs... Well the memory barries basically enforce things hitting the bus, so it's the same thing. >> The per-task notification that we are proposing doesn't have this >> problem because it is only ever called in task context, it can block >> just as anything else in that path is allowed to do. > > Which rules out using RCU, since you're not allowed to block while holding > an rcu_read_lock. Correct, Jack's original patches didn't require RCU. I was trying to share code, but it seems thats not an option. >> It's not that difficult as long as you have the right contraints around >> it, eg. if it is only ever accessible from the task's own context then >> you can get away with it totally lock free. > > Unregistration for rmmod becomes very difficult. How does the driver get > into the contexts of all the tasks it is interested in, so that it can > unregister? > > Also, this constraint is obviously impossible to satisfy for drivers > that are interested in every task. The drivers interested in every task would benefit from a global chain as opposed to a per task one. Anything being register for a per task, will need to wait until the task exits, which isn't an unreasonable requirement. >>> As for unregistering, I don't see why a driver would want to unregister >>> from one of these chains during a callout. The only event that comes to >>> mind is "exit", but there's no need to unregister then because the entire >>> chain is about to be destroyed. >> You often want to have an exit call performed first so you need the >> callout before destroying the chain. > > I don't understand. You want to have an exit call performed first before > what? And of course you have to perform the callouts before destroying > the chain -- without the chain you can't do the callouts at all! Before the task is destroyed. Which is why one would do a last callout, then destroy the chain. >> One way to do it would be two >> functions, first a normal callout then a drain/destroy call that flushes >> the chain and releases it all. > > That's what I suggested above. Except for draining/flushing the chain; I > don't know what you mean by that. Do you mean waiting for all users of > the chain to finish? Why would you need to wait if the chain is never > accessed outside the task's context? No I mean by releasing all notifier blocks in the chain, assuming they were all kmalloc'ed (which one would expect if there's a registration per task). >> One way to get around that would be to do something like this. However >> I am starting to think that having a notifier_chain_destroy might be a >> better way to go to avoid calling notifier_chain_unregister repeatedly >> just to wipe things out. Patch #2 of the attached. > > I don't follow your patches. > > For one thing, your NOTIFIER_UNREGISTER_FREE patch will oops because the > code holds an rcu_read_lock() while calling synchronize_rcu(). You could > move synchronize_rcu outside the locked region -- but that begs the > question of why you have synchronize_rcu there at all. What good does it > do? > > For another, I don't understand why you modified the atomic notifier > routines. Didn't you say that your callouts will need to be able to > block? That makes them not atomic, by definition, and not RCU-able. The patches were meant more to show the idea, I didn't claim they were final production quality<tm>. But you're right, doing it in the raw chain would make more sense. > And I don't think it's a good idea to add atomic_notifier_chain_drain to > kernel/sys.c. It's not part of the regular notifier infrastructure; it's > something special for your individual application. It happens to be applicable to this application, but I don't see why it wouldn't work for other cases, hence placing it in kernel/sys.c Jes |
From: Alan S. <st...@ro...> - 2006-05-05 18:24:04
|
On Fri, 5 May 2006, Jes Sorensen wrote: > > Somebody else once complained about the overhead of using rw-semaphores > > for blocking notifiers. He seemed to be more concerned about the cost of > > the memory barriers than the cost of the cache-line bouncing. I can't > > think of any way to eliminate barriers when coordinating multiple CPUs... > > Well the memory barries basically enforce things hitting the bus, so > it's the same thing. No it isn't. Things will hit the bus eventually, whether the barrier is there or not. All the barrier does is slow down the pipeline somewhat. > The drivers interested in every task would benefit from a global chain > as opposed to a per task one. Anything being register for a per task, > will need to wait until the task exits, which isn't an unreasonable > requirement. Okay, I'll take your word for it. Of course, there might be situations where a driver _is_ in a task's context and wants to unregister from that task's chain. Such a possibility should be allowed for. Then the driver wouldn't have to wait for the task to exit. > > That's what I suggested above. Except for draining/flushing the chain; I > > don't know what you mean by that. Do you mean waiting for all users of > > the chain to finish? Why would you need to wait if the chain is never > > accessed outside the task's context? > > No I mean by releasing all notifier blocks in the chain, assuming they > were all kmalloc'ed (which one would expect if there's a registration > per task). That's what I meant by "destroying the chain". "Flushing/draining" would mean something else. Never mind; I think we're in agreement. > > I don't follow your patches. ... > The patches were meant more to show the idea, I didn't claim they were > final production quality<tm>. But you're right, doing it in the raw > chain would make more sense. All right then. How do you feel about combining a global raw_notifier chain (for drivers interested in all tasks) that is protected by a "no-cache-bounce-for-readers" rw-semaphore together with a per-task raw_notifier chain that is protected by nothing (accessed only from within the task's context)? You'd have to write an API for registering and unregistering on each type of chain, including dynamic allocation of the notifier_blocks, plus an API for invoking the chains (easy) and one for destroying the per-task chains (called only from do_exit or one of its subordinates). > > And I don't think it's a good idea to add atomic_notifier_chain_drain to > > kernel/sys.c. It's not part of the regular notifier infrastructure; it's > > something special for your individual application. > > It happens to be applicable to this application, but I don't see why > it wouldn't work for other cases, hence placing it in kernel/sys.c My feeling for now is keep it separate, since it's just a part of your private API. If other parts of the kernel want to use it then it can be moved later. The "no-cache-bounce" rw-sem, on the other hand, is a candidate for mainlining. We could see what Andrew Morton and others think about it. Alan Stern |
From: Jes S. <je...@sg...> - 2006-05-08 08:31:16
|
Alan Stern wrote: > On Fri, 5 May 2006, Jes Sorensen wrote: >> The patches were meant more to show the idea, I didn't claim they were >> final production quality<tm>. But you're right, doing it in the raw >> chain would make more sense. > > All right then. How do you feel about combining a global raw_notifier > chain (for drivers interested in all tasks) that is protected by a > "no-cache-bounce-for-readers" rw-semaphore together with a per-task > raw_notifier chain that is protected by nothing (accessed only from within > the task's context)? How do you anticipate that no-cache-bounce-for-readers? Whenever a reader takes a semaphore they still have to go and update the semaphore, when the next reader comes in on a different CPU that reader still has to pull over the cacheline to update it. This is why I keep pointing out that rw locks are no better when it comes to cache impact! > You'd have to write an API for registering and > unregistering on each type of chain, including dynamic allocation of the > notifier_blocks, plus an API for invoking the chains (easy) and one for > destroying the per-task chains (called only from do_exit or one of its > subordinates). I'd go back to Jack's original patch for this, shouldn't be too big a deal. >> It happens to be applicable to this application, but I don't see why >> it wouldn't work for other cases, hence placing it in kernel/sys.c > > My feeling for now is keep it separate, since it's just a part of your > private API. If other parts of the kernel want to use it then it can be > moved later. It would never be part of any private API, it would be for 'public' use, EXPORT_SYMBOL_GPL naturally. > The "no-cache-bounce" rw-sem, on the other hand, is a candidate for > mainlining. We could see what Andrew Morton and others think about it. I am very curious how you expect to achieve this .... Jes |
From: Alan S. <st...@ro...> - 2006-05-08 18:34:57
|
On Mon, 8 May 2006, Jes Sorensen wrote: > > All right then. How do you feel about combining a global raw_notifier > > chain (for drivers interested in all tasks) that is protected by a > > "no-cache-bounce-for-readers" rw-semaphore together with a per-task > > raw_notifier chain that is protected by nothing (accessed only from within > > the task's context)? > > How do you anticipate that no-cache-bounce-for-readers? Whenever a > reader takes a semaphore they still have to go and update the semaphore, > when the next reader comes in on a different CPU that reader still has > to pull over the cacheline to update it. In a word: per-cpu reader counts. > This is why I keep pointing out > that rw locks are no better when it comes to cache impact! Per-cpu data structures were devised specifically to reduce or eliminate cache impact. > > The "no-cache-bounce" rw-sem, on the other hand, is a candidate for > > mainlining. We could see what Andrew Morton and others think about it. > > I am very curious how you expect to achieve this .... An initial implementation is attached. Let me know what you think of it. Alan Stern |
From: Jes S. <je...@sg...> - 2006-05-11 14:37:47
|
Alan Stern wrote: > On Mon, 8 May 2006, Jes Sorensen wrote: >>> The "no-cache-bounce" rw-sem, on the other hand, is a candidate for >>> mainlining. We could see what Andrew Morton and others think about it. >> I am very curious how you expect to achieve this .... > > An initial implementation is attached. Let me know what you think of it. Ok, had to chew on this one a couple of times ... clever :) Obviously not a generic replacement for rwlocks but for the case where it's write-once, read-unlimited, like with the global notifier chain, then I think it could work well. Obviously the write case is going to be "interesting" on a system with 512 CPUs :) I don't think it makes sense to provide the write_trylock version since one really shouldn't be using the qrwsem like that, and I think we should avoid encouraging people using them if they are not absolutely sure they need them for their code. Would be interesting to investigate if there are other cases where we could benefit from this. Cheers, Jes |
From: Alan S. <st...@ro...> - 2006-05-11 17:43:54
|
On Thu, 11 May 2006, Jes Sorensen wrote: > > An initial implementation is attached. Let me know what you think of it. > > Ok, had to chew on this one a couple of times ... clever :) > > Obviously not a generic replacement for rwlocks but for the case where > it's write-once, read-unlimited, like with the global notifier chain, > then I think it could work well. Obviously the write case is going to be > "interesting" on a system with 512 CPUs :) Especially since the global notifier chain will block all forks and exits until it acquires the lock! Better make sure none of the callouts try to start a new thread or wait for an old one to exit. (In general, a callout shouldn't do anything that would invoke the notifier chain recursively -- it can lead to deadlock.) > I don't think it makes sense to provide the write_trylock version since > one really shouldn't be using the qrwsem like that, and I think we > should avoid encouraging people using them if they are not absolutely > sure they need them for their code. I provided all the standard interfaces because it was no additional trouble to do so. But you're right that they shouldn't be used other than in exceptional circumstances. > Would be interesting to investigate if there are other cases where we > could benefit from this. Could be. Posting on LKML might catch someone's attention. Anyway, this should be good enough for you to use with your task notifiers. Are there any other aspects that still need to be straightened out? Alan Stern |
From: Jes S. <je...@sg...> - 2006-05-12 10:06:36
|
Alan Stern wrote: > On Thu, 11 May 2006, Jes Sorensen wrote: >> Obviously not a generic replacement for rwlocks but for the case where >> it's write-once, read-unlimited, like with the global notifier chain, >> then I think it could work well. Obviously the write case is going to be >> "interesting" on a system with 512 CPUs :) > > Especially since the global notifier chain will block all forks and exits > until it acquires the lock! Better make sure none of the callouts try to > start a new thread or wait for an old one to exit. (In general, a callout > shouldn't do anything that would invoke the notifier chain recursively -- > it can lead to deadlock.) Hmmm, there's a side effect I hadn't thought about, I wonder how bad it will hit. >> Would be interesting to investigate if there are other cases where we >> could benefit from this. > > Could be. Posting on LKML might catch someone's attention. > > Anyway, this should be good enough for you to use with your task > notifiers. Are there any other aspects that still need to be straightened > out? Actually I saw this more as a solution to the IBM guys' global notifier locking issue. For task notification, one can do it totally lock free ;) Cheers, Jes |
From: Alan S. <st...@ro...> - 2006-05-04 18:50:05
|
Alan Stern wrote: > I assumed that a notifier called during fork might need to block. If > that's so then you have no choice but to use a blocking notifier. I forgot to mention -- if you can guarantee that these per-task notifier chains will never be used outside the task's context, then a raw_notifier will work. No need for locking when by definition only a single CPU can be using the resource at any time. Nor is unregistering from within a callout a problem. Of course, this assumes you can make that guarantee. The implication is that a driver can't register or unregister for one of these chains unless the driver is running in the task's context. That makes rmmod awkward, as Chandra pointed out. Alan Stern |
From: Chandra S. <sek...@us...> - 2006-05-04 21:34:33
|
On Thu, 2006-05-04 at 14:23 -0400, Alan Stern wrote: > Alan Stern wrote: > > > I assumed that a notifier called during fork might need to block. If > > that's so then you have no choice but to use a blocking notifier. > > I forgot to mention -- if you can guarantee that these per-task notifier > chains will never be used outside the task's context, then a raw_notifier > will work. No need for locking when by definition only a single CPU can > be using the resource at any time. Nor is unregistering from within a > callout a problem. unregistering from the callout might still be a problem, since we cannot be sure that the next pointer in the freed notifier_block will be sane. > > Of course, this assumes you can make that guarantee. The implication is > that a driver can't register or unregister for one of these chains unless > the driver is running in the task's context. That makes rmmod awkward, as > Chandra pointed out. > > Alan Stern > -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Paul E. M. <pa...@us...> - 2006-05-04 23:20:26
|
On Thu, May 04, 2006 at 02:34:23PM -0700, Chandra Seetharaman wrote: > On Thu, 2006-05-04 at 14:23 -0400, Alan Stern wrote: > > Alan Stern wrote: > > > > > I assumed that a notifier called during fork might need to block. If > > > that's so then you have no choice but to use a blocking notifier. > > > > I forgot to mention -- if you can guarantee that these per-task notifier > > chains will never be used outside the task's context, then a raw_notifier > > will work. No need for locking when by definition only a single CPU can > > be using the resource at any time. Nor is unregistering from within a > > callout a problem. > > unregistering from the callout might still be a problem, since we cannot > be sure that the next pointer in the freed notifier_block will be sane. If you reference-count the entire chain, then any subsequently removed notifier_block it guaranteed to remain sane. Splitting the reference count across multiple CPUs keeps the cost of reference counting quite low. Thanx, Paul > > Of course, this assumes you can make that guarantee. The implication is > > that a driver can't register or unregister for one of these chains unless > > the driver is running in the task's context. That makes rmmod awkward, as > > Chandra pointed out. > > > > Alan Stern > > > -- > > ---------------------------------------------------------------------- > Chandra Seetharaman | Be careful what you choose.... > - sek...@us... | .......you may get it. > ---------------------------------------------------------------------- > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech |
From: Chandra S. <sek...@us...> - 2006-05-05 00:12:55
|
On Thu, 2006-05-04 at 16:20 -0700, Paul E. McKenney wrote: > On Thu, May 04, 2006 at 02:34:23PM -0700, Chandra Seetharaman wrote: > > On Thu, 2006-05-04 at 14:23 -0400, Alan Stern wrote: > > > Alan Stern wrote: > > > > > > > I assumed that a notifier called during fork might need to block. If > > > > that's so then you have no choice but to use a blocking notifier. > > > > > > I forgot to mention -- if you can guarantee that these per-task notifier > > > chains will never be used outside the task's context, then a raw_notifier > > > will work. No need for locking when by definition only a single CPU can > > > be using the resource at any time. Nor is unregistering from within a > > > callout a problem. > > > > unregistering from the callout might still be a problem, since we cannot > > be sure that the next pointer in the freed notifier_block will be sane. > > If you reference-count the entire chain, then any subsequently removed > notifier_block it guaranteed to remain sane. Splitting the reference > count across multiple CPUs keeps the cost of reference counting quite > low. Paul, I am not able to get my head around to understand the reference count logic. call_chain is defined as: static int __kprobes notifier_call_chain(struct notifier_block **nl, unsigned long val, void *v) { 1: int ret = NOTIFY_DONE; 2: struct notifier_block *nb; 3: 4: nb = rcu_dereference(*nl); 5: while (nb) { 6: ret = nb->notifier_call(nb, val, v); 7: if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK) 8: break; 9: nb = rcu_dereference(nb->next); a: } b: return ret; } Assume that the user of the notifier chain kmalloc'd a notifier_block and registered initially. When the user's notifier_call is called from line 6 above, the user kfrees the notifier_block, how having a reference count for the whole chain would help in making sure that nb->next is sane in line 9 ? Or, Are you describing a different mechanism without using this notifier chain mechanism. Thanks in advance for clarifying this. chandra > > Thanx, Paul > > > > Of course, this assumes you can make that guarantee. The implication is > > > that a driver can't register or unregister for one of these chains unless > > > the driver is running in the task's context. That makes rmmod awkward, as > > > Chandra pointed out. > > > > > > Alan Stern > > > > > -- > > > > ---------------------------------------------------------------------- > > Chandra Seetharaman | Be careful what you choose.... > > - sek...@us... | .......you may get it. > > ---------------------------------------------------------------------- > > > > > > > > > > ------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Lse-tech mailing list > > Lse...@li... > > https://lists.sourceforge.net/lists/listinfo/lse-tech > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Alan S. <st...@ro...> - 2006-05-05 00:48:50
|
On Thu, 4 May 2006, Chandra Seetharaman wrote: > > > unregistering from the callout might still be a problem, since we cannot > > > be sure that the next pointer in the freed notifier_block will be sane. > > > > If you reference-count the entire chain, then any subsequently removed > > notifier_block it guaranteed to remain sane. Splitting the reference > > count across multiple CPUs keeps the cost of reference counting quite > > low. > > Paul, > > I am not able to get my head around to understand the reference count > logic. > > call_chain is defined as: > static int __kprobes notifier_call_chain(struct notifier_block **nl, > unsigned long val, void *v) > { > 1: int ret = NOTIFY_DONE; > 2: struct notifier_block *nb; > 3: > 4: nb = rcu_dereference(*nl); > 5: while (nb) { > 6: ret = nb->notifier_call(nb, val, v); > 7: if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK) > 8: break; > 9: nb = rcu_dereference(nb->next); > a: } > b: return ret; > } > > Assume that the user of the notifier chain kmalloc'd a notifier_block > and registered initially. > > When the user's notifier_call is called from line 6 above, the user > kfrees the notifier_block, how having a reference count for the whole > chain would help in making sure that nb->next is sane in line 9 ? We can always change the code: while (nb) { next_nb = rcu_dereference(nb->next); ret = nb->notifier_call(nb, val, v); if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK) break; nb = next_nb; } Alan Stern |
From: Chandra S. <sek...@us...> - 2006-05-05 01:02:25
|
On Thu, 2006-05-04 at 20:48 -0400, Alan Stern wrote: > On Thu, 4 May 2006, Chandra Seetharaman wrote: > > > > > unregistering from the callout might still be a problem, since we cannot > > > > be sure that the next pointer in the freed notifier_block will be sane. > > > > > > If you reference-count the entire chain, then any subsequently removed > > > notifier_block it guaranteed to remain sane. Splitting the reference > > > count across multiple CPUs keeps the cost of reference counting quite > > > low. > > > > Paul, > > > > I am not able to get my head around to understand the reference count > > logic. > > > > call_chain is defined as: > > static int __kprobes notifier_call_chain(struct notifier_block **nl, > > unsigned long val, void *v) > > { > > 1: int ret = NOTIFY_DONE; > > 2: struct notifier_block *nb; > > 3: > > 4: nb = rcu_dereference(*nl); > > 5: while (nb) { > > 6: ret = nb->notifier_call(nb, val, v); > > 7: if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK) > > 8: break; > > 9: nb = rcu_dereference(nb->next); > > a: } > > b: return ret; > > } > > > > Assume that the user of the notifier chain kmalloc'd a notifier_block > > and registered initially. > > > > When the user's notifier_call is called from line 6 above, the user > > kfrees the notifier_block, how having a reference count for the whole > > chain would help in making sure that nb->next is sane in line 9 ? > > We can always change the code: > > while (nb) { > next_nb = rcu_dereference(nb->next); > ret = nb->notifier_call(nb, val, v); > if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK) > break; > nb = next_nb; > } I think we should make it that way. Also, should the assignment n->next = *nl; in static int notifier_chain_register(struct notifier_block **nl, struct notifier_block *n) { while ((*nl) != NULL) { if (n->priority > (*nl)->priority) break; nl = &((*nl)->next); } n->next = *nl; rcu_assign_pointer(*nl, n); return 0; } be a rcu_assign_pointer ? > > Alan Stern > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Alan S. <st...@ro...> - 2006-05-05 01:15:05
|
On Thu, 4 May 2006, Chandra Seetharaman wrote: > > We can always change the code: > > > > while (nb) { > > next_nb = rcu_dereference(nb->next); > > ret = nb->notifier_call(nb, val, v); > > if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK) > > break; > > nb = next_nb; > > } > > I think we should make it that way. > > Also, should the assignment n->next = *nl; in > > static int notifier_chain_register(struct notifier_block **nl, > struct notifier_block *n) > { > while ((*nl) != NULL) { > if (n->priority > (*nl)->priority) > break; > nl = &((*nl)->next); > } > n->next = *nl; > rcu_assign_pointer(*nl, n); > return 0; > } > > be a rcu_assign_pointer ? No, it doesn't need to be. What rcu_assign_pointer does is add a memory barrier just before the assignment. So for example, we don't want some other CPU seeing the *nl = n assignment _before_ it sees the n->next = *nl assigment. That's what the rcu_assign_pointer macro prevents. But since there are no global assignments before n->next = *nl, there doesn't need to be a memory barrier in front of it. Alan Stern |