Thread: Re: [Lse-tech] [patch - RFC] simple task notifier support (Page 2)

lse-tech

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Matt H. <mat...@us...> - 2006-05-05 07:08:50

On Thu, 2006-05-04 at 15:21 +0200, Jes Sorensen wrote:
> Matt Helsley wrote:
> > On Tue, 2006-05-02 at 11:45 +0200, Jes Sorensen wrote:
> >> so focused on memory consumption. However the real case is that you are
> >> likely to have maybe 2-3 notifiers per task, since what the per task
> >> notifier chain solves is also it gets rid of a ton of rarely used per
> >> task callouts that are currently hard coded.
> > 
> > I wonder if you're over-exaagerating what it can get rid of. They seem
> > useful, but "a ton"..
> 
> Why write it with a pen if you can have it on a neon sign? Right now
> we have a lot of things in the fork path which is to support what I'd
> qualify as uninteresting for the majority of users, such as
> security_task_alloc, audit_alloc, copy_keys ... unfortunately the
> distros have gone obsessed about the selinux/apparmor stuff so everybody
> is being forcefed it now :( Anyway probably better to discuss over beer
> :)

Ahh, interesting. I don't know selinux and apparmor well enough to
determine if they can be factored out as a per-task notifier block or
not.

I find writing with a pen is faster than making a neon sign. ;)

<snip>

> > 	I distinctly recall that you used a boot-time anecdote -- which is a
> > distinctly different set of paths than we're talking about here -- to
> > justify your argument.
> 
> The boot time case was just an example of what happens when a lock is
> too contested. The problem in that case wasn't specific to booting, even
> if one managed to boot the machine, the cost of bouncing cache lines
> around can be quite high if one does it too much.
> 
> > 	I'm saying we should go with the simpler solution until you can
> > empirically show that a more complex solution is needed. Perhaps you
> > could test this patch or something like it on your 512-way to show what
> > kind of impact it has:
> > 
> > http://lkml.org/lkml/2006/1/12/379
> > 
> > 	Then if you find there are problems we can choose a more complex
> > solution for the global notifier chain.
> 
> Note that what was requested recently was a blocking semaphore, not
> a spin lock. In addition it is going to depend on the actual load, for
> the case where one lanches NR_CPUS jobs and they all run for hours it's
> not going to cost a lot. The point I am after is that we know that locks
> are expensive, we should stop going around adding more unless we
> absolutely have to.

	The lock patch was intended to get some idea of the lock contention
and/or cache line bouncing that would be seen. You could replace the
lock with a semaphore and make it more accurate -- which is why I said
"something like it" :).

	I disagree with your criteria for when to use locks. My criteria would
take code complexity into account. I would trade miniscule performance
differences for significantly simpler code.

> Another example I can mention, which wasn't an actual lock, just a hot
> cacheline, was back when I wrote the acenic GigE driver. Going from a
> global SLAB pointer to per-CPU gave a 10% net performance increase on
> a 2 CPU 400MHz P2 box. You don't need a 512 CPU machine to see the
> effects of this, it's all about the conditions.

	A nice improvement. Please consider your points (that locks aren't free
and cache lines can be hot) well-taken.

	I don't think you've connected these examples with the logic needed to
support your assertion. You need to show that the paths in your
anecdotes and the path's we're considering are very similar. Otherwise,
while interesting, they are irrelevant to our discussion.

Cheers,
	-Matt Helsley

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Jes S. <je...@sg...> - 2006-05-08 08:44:29

Matt Helsley wrote:
> On Thu, 2006-05-04 at 15:21 +0200, Jes Sorensen wrote:
>> Note that what was requested recently was a blocking semaphore, not
>> a spin lock. In addition it is going to depend on the actual load, for
>> the case where one lanches NR_CPUS jobs and they all run for hours it's
>> not going to cost a lot. The point I am after is that we know that locks
>> are expensive, we should stop going around adding more unless we
>> absolutely have to.
> 
> 	The lock patch was intended to get some idea of the lock contention
> and/or cache line bouncing that would be seen. You could replace the
> lock with a semaphore and make it more accurate -- which is why I said
> "something like it" :).

Hi Matt,

I'd be worth trying to see how it shows up in lmbench. If I can find
some time and some time on a large box I might give it a whirl.

> 	I disagree with your criteria for when to use locks. My criteria would
> take code complexity into account. I would trade miniscule performance
> differences for significantly simpler code.

The problem I am trying to raise awareness of is that there are a lot of
people out there who seem to think that if something doesn't show on a
2-way it doesn't show at all. I don't think you're saying that, but one
does get a bit thick skull'ed after a while.

> 	I don't think you've connected these examples with the logic needed to
> support your assertion. You need to show that the paths in your
> anecdotes and the path's we're considering are very similar. Otherwise,
> while interesting, they are irrelevant to our discussion.

The paths here are definately different, but we should always bear the
cost in mind. Locks are good for some things, but one should never add
one without considering the impact. For the fork/exit path we are
talking about we should expect to take the lock twice for each task run.
For some runs this clearly won't be measurable, but for others it will.

Cheers,
Jes

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Jes S. <je...@sg...> - 2006-05-02 09:27:54

Chandra Seetharaman wrote:
> On Mon, 2006-05-01 at 12:07 +0200, Jes Sorensen wrote:
>> that someone will unregister a file system like that while the task that
>> accessed it is still live? The cost of adding locking is far worse than
>> the above IMHO fairly artificial case.
> 
> even if you consider this as an artificial case, don't you think the
> kernel code should work properly ?

There are many cases where the kernel is optimized for the common path
and the obscure path is penalized. This is exactly what I am arguing
here.

> 1: static int __kprobes notifier_call_chain(struct notifier_block **nl,
> 2:                 unsigned long val, void *v)
> 3: {
> 4:         int ret = NOTIFY_DONE;
> 5:         struct notifier_block *nb;
> 6: 
> 7:         nb = rcu_dereference(*nl);
> 8:         while (nb) {
> 9:                 ret = nb->notifier_call(nb, val, v);
> a:                 if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
> b:                         break;
> c:                 nb = rcu_dereference(nb->next);
> d:         }
> e:         return ret;
> f: }
> 
> line 7 gets nb, line c resets nb with nb->next, if nb is freed in
> notifier_call on line 9, how can we be sure that nb->next in line c is
> sane ?
> 
> Hence, IMO you cannot free the notifier block from the callback itself.

I was just sitting looking at this, and you're right. This is a very
unfortunate side-effect of RCU'ifying the notifier code. Being able to
have a cleanup notifier run makes a ton of sense, so maybe there ought
to be a special notifier call run for this or a flag indicating that the
notifier block should be kfree'ed after the rcu_dereference. Adding Alan
Stern to the CC since he wrote the current notifier implementation.

Alan, any suggestions?

>>> even though the notifier block is per task, notifier_head in the task is
>>> global, in other words, there can be many users (of the task notify
>>> mechanism) that can be doing one of the following at the _same time_
>>> with a single notifier head:
>>>  - register a new notifier block
>>>  - unregister a notifier block
>>>  - walk through the call chain in notifier_call_chain
>> Nobody will walk the notifier chain outside of task context, anything
>> doing that is broken. In addition, since the registration is done by
>> the task itself this isn't a problem and unregistration is done at
>> exit time.
> 
> So your model is based on all of the above operations (register,
> unregister and call_chain) happening in the 'said' task's context.
> right ?

Correct

> I have a module that is interested in few tasks in the system. When my
> module is insmoded, how do i get into each process's context to register
> my notifier_block ?

As I said before, you launch it the way cpusets do it and all children
will inherit it.

However if we can find a solution that allows us to use the atomic's RCU
approach then I think that will be a reasonable compromise. The
read-write semaphore is just too nasty to add for it.

>> It's totally valid for the mechanism. If you wish to apply accounting or
>> resource control to the system, then it's expectable that this is
>> enabled by the init task and not at random later. It doesn't mean the
>> module has to be in use at all time, but once you insmod your accounting
>> then it's on the system. Hence the cpusets comparison makes sense IMHO.
> 
> kernel does not impose "once you insmod your accounting then it's on the
> system". So, we _should_ not have that as an assumption.

Since when were you able to rmmod the bsd accounting subsystem?

> Task notify should be able to handle insmod and rmmod of a module
> cleanly.

I have already given you a way to do this without incurring unreasonable
cost. This argument is artificial and doesn't match actual use of
modules. In fact, when the new module support was implemented there was
strong argument for disallowing module unload alltogether.

>> For things that wants to know about every task in the system it doesn't
>> make sense to register it with a task notifier. It should rather be put
>> in with a fixed callout in the path or there should be two levels of
>> notifier chains, global notifiers and task notifiers.
> 
> So, you are suggesting that there should be two kind of notifiers, one
> for global notification and one for per task ? 

I think we're talking two totally different things. A global notifier
chain as you suggest is not feasible for per-task notification as it
would cripple performance completely. However per-task notification is
probably not suitable for other things.

> Still need to resolve the register/unregister/locking above for task
> notify.

Correct

>> Pardon my ignorance, but what is MVFS? What does it do and how does it
>> rely on this feature.
> 
> In an earlier thread on this same topic, John Kohl mentioned about it.
> Here is the link http://sourceforge.net/mailarchive/message.php?
> msg_id=13827470
> 
> John can provide more info.

Well reading John's posting it looks like per-task notification like I
propose would match exactly what he is asking for.

Cheers,
Jes

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Alan S. <st...@ro...> - 2006-05-02 15:37:17

On Tue, 2 May 2006, Jes Sorensen wrote:

> I was just sitting looking at this, and you're right. This is a very
> unfortunate side-effect of RCU'ifying the notifier code. Being able to
> have a cleanup notifier run makes a ton of sense, so maybe there ought
> to be a special notifier call run for this or a flag indicating that the
> notifier block should be kfree'ed after the rcu_dereference. Adding Alan
> Stern to the CC since he wrote the current notifier implementation.
> 
> Alan, any suggestions?

I think Matt may be on the right path in

	http://marc.theaimsgroup.com/?l=lse-tech&m=114654571406352&w=2

although I don't understand most of what he says.

I gather that a large part of the problem comes from the usage patterns 
you expect.  There will be some task_notifier users that are interested 
in every task in the system, and there will be others that are interested 
in only a handful of tasks.  This certainly suggests using two different 
registration mechanisms and data structures, even though the form of the 
callouts might be the same for the two cases.

The first case is easy.  There can be a single global blocking_notifier 
chain (kind of like Chandra's original suggestion but not atomic) that 
gets called out on every interesting event.  Anybody who cares about every 
task in the system registers on this chain.

Locking shouldn't be an issue.  The blocking notifier implementation uses
rw-semaphores, which are pretty efficient in the no-writers case.  (The
cost is essntially that of a cache miss and a memory barrier -- pretty
small compared to all the other overhead of task creation or removal.)  
The only time a write-lock would be needed is when someone is registering
or unregistering on the chain, which won't happen very often.

The second case (people interested in some tasks but not all) is more
complicated.  Even before trying to decide on an implementation, there are
some obvious questions.  If someone is interested in only some tasks, how
does he know which tasks to be interested in?  How does he know when he 
wants to start being interested in a new task?  How does he know when he
wants to stop being interested in a task (other than the obvious cases 
of task exit or module unloading)?  If he's interested in a task, is he 
also automatically interested in the task's children?

The best approach will depend on the answers to these and possibly other 
related questions.

Alan Stern

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Jes S. <je...@sg...> - 2006-05-04 13:00:35

Alan Stern wrote:
> On Tue, 2 May 2006, Jes Sorensen wrote:
> 
>> I was just sitting looking at this, and you're right. This is a very
>> unfortunate side-effect of RCU'ifying the notifier code. Being able to
>> have a cleanup notifier run makes a ton of sense, so maybe there ought
>> to be a special notifier call run for this or a flag indicating that the
>> notifier block should be kfree'ed after the rcu_dereference. Adding Alan
>> Stern to the CC since he wrote the current notifier implementation.
>>
>> Alan, any suggestions?
> 
> I think Matt may be on the right path in
> 
> 	http://marc.theaimsgroup.com/?l=lse-tech&m=114654571406352&w=2
> 
> although I don't understand most of what he says.

Hi Alan,

As I wrote in my response to that posting he isn't .... or rather we're
onto oranges and apples. Two different problems we're trying to solve
and while a solution for one of the problems can be abused fo solve
the other problem it is going to be very expensive doing so.

> I gather that a large part of the problem comes from the usage patterns 
> you expect.  There will be some task_notifier users that are interested 
> in every task in the system, and there will be others that are interested 
> in only a handful of tasks.  This certainly suggests using two different 
> registration mechanisms and data structures, even though the form of the 
> callouts might be the same for the two cases.

Correct

> The first case is easy.  There can be a single global blocking_notifier 
> chain (kind of like Chandra's original suggestion but not atomic) that 
> gets called out on every interesting event.  Anybody who cares about every 
> task in the system registers on this chain.
> 
> Locking shouldn't be an issue.  The blocking notifier implementation uses
> rw-semaphores, which are pretty efficient in the no-writers case.  (The
> cost is essntially that of a cache miss and a memory barrier -- pretty
> small compared to all the other overhead of task creation or removal.)  
> The only time a write-lock would be needed is when someone is registering
> or unregistering on the chain, which won't happen very often.

I have to correct you here. Locking on a global notifier is a major
issue. There is a misconception that rwlock's are free as long as one
is a reader. This is not the case, you still need to go on the bus
with an atomic operation resulting in the cache line being bounced
around. On small systems you probably won't notice it, but on a large
one which runs many small tasks, eg. forks constantly, it will become
noticeable.

This is why I am saying that any global notifier really needs to use
the RCU version and not the blocking lock version, but that may have
impact on memory allocation etc.

> The second case (people interested in some tasks but not all) is more
> complicated.  Even before trying to decide on an implementation, there are
> some obvious questions.  If someone is interested in only some tasks, how
> does he know which tasks to be interested in?  How does he know when he 
> wants to start being interested in a new task?  How does he know when he
> wants to stop being interested in a task (other than the obvious cases 
> of task exit or module unloading)?  If he's interested in a task, is he 
> also automatically interested in the task's children?

Which tasks are of interest is either inherited, in which case we can
do it completely lockless, or it will be explicitly decided by another
user app by doing a syscall requesting a certain feature to be attached
to a given task.

For when to stop being interested my argument is that it is sufficient
that the user is interested until the task dies. However if really
needed this could be done via RCU ... but then that has other unpleasant
sideeffects.

As for the children that is up to the given client, which it will
determine by providing a fork() hook.

The reason I pulled you onto the CC list is the issue of how to solve
the problem of being able to unregister yourself from a notifier list
from the notifier callout? The use of RCU in the core notifier
implementation makes this really difficult, so the question is should
one just bite the bullet and do a special notifier implementation for
this or can we modify the code to handle it, eg. by having the notifier
callout have a special flag in the return value requesting the code to
unregister and free the notifier block? Or does it make more sense to
have a notifier_chain_drain() call that unregisters and frees all?

Cheers,
Jes

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Alan S. <st...@ro...> - 2006-05-04 17:27:26

On Thu, 4 May 2006, Jes Sorensen wrote:

> > The first case is easy.  There can be a single global blocking_notifier 
> > chain (kind of like Chandra's original suggestion but not atomic) that 
> > gets called out on every interesting event.  Anybody who cares about every 
> > task in the system registers on this chain.
> > 
> > Locking shouldn't be an issue.  The blocking notifier implementation uses
> > rw-semaphores, which are pretty efficient in the no-writers case.  (The
> > cost is essntially that of a cache miss and a memory barrier -- pretty
> > small compared to all the other overhead of task creation or removal.)  
> > The only time a write-lock would be needed is when someone is registering
> > or unregistering on the chain, which won't happen very often.
> 
> I have to correct you here. Locking on a global notifier is a major
> issue. There is a misconception that rwlock's are free as long as one
> is a reader. This is not the case, you still need to go on the bus
> with an atomic operation resulting in the cache line being bounced
> around.

That's pretty much what I said, isn't it?

> On small systems you probably won't notice it, but on a large
> one which runs many small tasks, eg. forks constantly, it will become
> noticeable.

Isn't a cache-line bounce still very small compared to all the overhead of 
a fork or an exit?

> This is why I am saying that any global notifier really needs to use
> the RCU version and not the blocking lock version, but that may have
> impact on memory allocation etc.

I assumed that a notifier called during fork might need to block.  If 
that's so then you have no choice but to use a blocking notifier.  
However, if all of the callouts on the chain can run in an atomic 
context then an atomic notifier (with RCU) will work okay.  It's up to you 
and the users to decide which best suits your needs.

> > The second case (people interested in some tasks but not all) is more
> > complicated.  Even before trying to decide on an implementation, there are
> > some obvious questions.  If someone is interested in only some tasks, how
> > does he know which tasks to be interested in?  How does he know when he 
> > wants to start being interested in a new task?  How does he know when he
> > wants to stop being interested in a task (other than the obvious cases 
> > of task exit or module unloading)?  If he's interested in a task, is he 
> > also automatically interested in the task's children?
> 
> Which tasks are of interest is either inherited, in which case we can
> do it completely lockless, or it will be explicitly decided by another
> user app by doing a syscall requesting a certain feature to be attached
> to a given task.
> 
> For when to stop being interested my argument is that it is sufficient
> that the user is interested until the task dies. However if really
> needed this could be done via RCU ... but then that has other unpleasant
> sideeffects.

This was one of the issues you and Chandra disagreed about.  I don't know
what's right.  I don't even know how a driver can tell when to stop being
interested in some task.

> As for the children that is up to the given client, which it will
> determine by providing a fork() hook.
> 
> The reason I pulled you onto the CC list is the issue of how to solve
> the problem of being able to unregister yourself from a notifier list
> from the notifier callout? The use of RCU in the core notifier
> implementation makes this really difficult, so the question is should
> one just bite the bullet and do a special notifier implementation for
> this or can we modify the code to handle it, eg. by having the notifier
> callout have a special flag in the return value requesting the code to
> unregister and free the notifier block? Or does it make more sense to
> have a notifier_chain_drain() call that unregisters and frees all?

Well, this is why I asked all those questions.  It's not clear to me that 
what you're doing really matches the capabilities of a notifier_chain,
even apart from the issue of unregistering.

(Note that the general question of how to implement something like a
notifier chain while allowing unregistration from within a callout is
extremely difficult, regardless of the tricks you play.  In fact it's
unsolvable without extensive locking and synchronization, which I'm sure
you want to avoid.)

However...  This is how it looks right now.  You could use notifier
chains, with one chain per process.  That chain would contain
notifier_blocks for all the drivers interested in the process (but not
interested in all processes).  It would have to be a very unusual usage
pattern because all the notifier_block structures would be allocated
dynamically.

As for unregistering, I don't see why a driver would want to unregister 
from one of these chains during a callout.  The only event that comes to 
mind is "exit", but there's no need to unregister then because the entire 
chain is about to be destroyed.

But let's assume some driver does indeed have a reason to stop being
interested in a particular process.  Here's how it can be done without
trying to unregister during a callout.  When the notifier_block structures
are allocated dynamically, allocate them as part of a larger structure
that includes a driver_flags field.  When a driver is no longer interested
in a process, it sets the driver_flags for the corresponding
notifier_block to a non-zero value.  Then at the start of the callout
routine, it tests the driver_flags and returns immediately if the value is
non-zero.  If the driver still wants to, it can use schedule_work() to
unregister itself permanently from the chain.

Alan Stern

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: <jt...@us...> - 2006-05-04 18:57:45

>>>>> "Alan" == Alan Stern <st...@ro...> writes:

Alan> I assumed that a notifier called during fork might need to block.

That's certainly the case for ClearCase MVFS--we'd need to allocate
additional state for the new child process, and that could block waiting
for memory.
-- 
John Kohl
Senior Software Engineer - Rational Software - IBM Software Group
Lexington, Massachusetts, USA
jt...@us...
<http://www.ibm.com/software/rational/>

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Jes S. <je...@sg...> - 2006-05-05 08:02:28

John T. Kohl wrote:
>>>>>> "Alan" == Alan Stern <st...@ro...> writes:
> 
> Alan> I assumed that a notifier called during fork might need to block.
> 
> That's certainly the case for ClearCase MVFS--we'd need to allocate
> additional state for the new child process, and that could block waiting
> for memory.

But you also mentioned that you need per-task data associated with it,
is that correct? In that case Jack's original task notifier
implementation probably would suit you far better.

Cheers,
Jes

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Christoph H. <hc...@ls...> - 2006-05-08 07:55:55

On Thu, May 04, 2006 at 02:57:18PM -0400, John T. Kohl wrote:
> >>>>> "Alan" == Alan Stern <st...@ro...> writes:
> 
> Alan> I assumed that a notifier called during fork might need to block.
> 
> That's certainly the case for ClearCase MVFS--we'd need to allocate
> additional state for the new child process, and that could block waiting
> for memory.

Any such interface would be a _GPL export anyway, so please sodd off
here with your illegal crap.

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Jes S. <je...@sg...> - 2006-05-05 08:01:06

Alan Stern wrote:
> On Thu, 4 May 2006, Jes Sorensen wrote:
>> I have to correct you here. Locking on a global notifier is a major
>> issue. There is a misconception that rwlock's are free as long as one
>> is a reader. This is not the case, you still need to go on the bus
>> with an atomic operation resulting in the cache line being bounced
>> around.
> 
> That's pretty much what I said, isn't it?

Hi Alan,

Maybe I misread you, my point was merely to underline the fact that
locks aren't cheap at all, in particularly the rwlocks.

>> On small systems you probably won't notice it, but on a large
>> one which runs many small tasks, eg. forks constantly, it will become
>> noticeable.
> 
> Isn't a cache-line bounce still very small compared to all the overhead of 
> a fork or an exit?

It probably is relatively small, however adding another expensive
contention point just because there already are a few really isn't
that good an idea.

>> This is why I am saying that any global notifier really needs to use
>> the RCU version and not the blocking lock version, but that may have
>> impact on memory allocation etc.
> 
> I assumed that a notifier called during fork might need to block.  If 
> that's so then you have no choice but to use a blocking notifier.  
> However, if all of the callouts on the chain can run in an atomic 
> context then an atomic notifier (with RCU) will work okay.  It's up to you 
> and the users to decide which best suits your needs.

The per-task notification that we are proposing doesn't have this
problem because it is only ever called in task context, it can block
just as anything else in that path is allowed to do.

>> The reason I pulled you onto the CC list is the issue of how to solve
>> the problem of being able to unregister yourself from a notifier list
>> from the notifier callout? The use of RCU in the core notifier
>> implementation makes this really difficult, so the question is should
>> one just bite the bullet and do a special notifier implementation for
>> this or can we modify the code to handle it, eg. by having the notifier
>> callout have a special flag in the return value requesting the code to
>> unregister and free the notifier block? Or does it make more sense to
>> have a notifier_chain_drain() call that unregisters and frees all?
> 
> Well, this is why I asked all those questions.  It's not clear to me that 
> what you're doing really matches the capabilities of a notifier_chain,
> even apart from the issue of unregistering.

I am coming to the conclusion that the generic notifier chain code
probably isn't suited for this :( It had been my initial hope that we
could share the code, but the RCU'ification actually prevents it, at
least in it's current form.

> (Note that the general question of how to implement something like a
> notifier chain while allowing unregistration from within a callout is
> extremely difficult, regardless of the tricks you play.  In fact it's
> unsolvable without extensive locking and synchronization, which I'm sure
> you want to avoid.)

It's not that difficult as long as you have the right contraints around
it, eg. if it is only ever accessible from the task's own context then
you can get away with it totally lock free.

> As for unregistering, I don't see why a driver would want to unregister 
> from one of these chains during a callout.  The only event that comes to 
> mind is "exit", but there's no need to unregister then because the entire 
> chain is about to be destroyed.

You often want to have an exit call performed first so you need the
callout before destroying the chain. One way to do it would be two
functions, first a normal callout then a drain/destroy call that flushes
the chain and releases it all.

Cheers,
Jes

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Alan S. <st...@ro...> - 2006-05-05 15:14:18

On Fri, 5 May 2006, Jes Sorensen wrote:

> Hi Alan,
> 
> Maybe I misread you, my point was merely to underline the fact that
> locks aren't cheap at all, in particularly the rwlocks.
> 
> >> On small systems you probably won't notice it, but on a large
> >> one which runs many small tasks, eg. forks constantly, it will become
> >> noticeable.
> > 
> > Isn't a cache-line bounce still very small compared to all the overhead of 
> > a fork or an exit?
> 
> It probably is relatively small, however adding another expensive
> contention point just because there already are a few really isn't
> that good an idea.

Basically what you need is a variant rw-semaphore that has no cache-bounce
overhead in the readers-only case.  Such a thing could be implemented 
using a per-CPU reader count.  This would solve part of your problem (the 
global chain of drivers who are interested in every task).

Somebody else once complained about the overhead of using rw-semaphores 
for blocking notifiers.  He seemed to be more concerned about the cost of 
the memory barriers than the cost of the cache-line bouncing.  I can't 
think of any way to eliminate barriers when coordinating multiple CPUs...

> > I assumed that a notifier called during fork might need to block.  If 
> > that's so then you have no choice but to use a blocking notifier.  
> > However, if all of the callouts on the chain can run in an atomic 
> > context then an atomic notifier (with RCU) will work okay.  It's up to you 
> > and the users to decide which best suits your needs.
> 
> The per-task notification that we are proposing doesn't have this
> problem because it is only ever called in task context, it can block
> just as anything else in that path is allowed to do.

Which rules out using RCU, since you're not allowed to block while holding 
an rcu_read_lock.

> I am coming to the conclusion that the generic notifier chain code
> probably isn't suited for this :( It had been my initial hope that we
> could share the code, but the RCU'ification actually prevents it, at
> least in it's current form.

Perhaps.  It may not be very easy to find anything better, however.

> > (Note that the general question of how to implement something like a
> > notifier chain while allowing unregistration from within a callout is
> > extremely difficult, regardless of the tricks you play.  In fact it's
> > unsolvable without extensive locking and synchronization, which I'm sure
> > you want to avoid.)
> 
> It's not that difficult as long as you have the right contraints around
> it, eg. if it is only ever accessible from the task's own context then
> you can get away with it totally lock free.

Unregistration for rmmod becomes very difficult.  How does the driver get 
into the contexts of all the tasks it is interested in, so that it can 
unregister?

Also, this constraint is obviously impossible to satisfy for drivers 
that are interested in every task.

> > As for unregistering, I don't see why a driver would want to unregister 
> > from one of these chains during a callout.  The only event that comes to 
> > mind is "exit", but there's no need to unregister then because the entire 
> > chain is about to be destroyed.
> 
> You often want to have an exit call performed first so you need the
> callout before destroying the chain.

I don't understand.  You want to have an exit call performed first before
what?  And of course you have to perform the callouts before destroying
the chain -- without the chain you can't do the callouts at all!

> One way to do it would be two
> functions, first a normal callout then a drain/destroy call that flushes
> the chain and releases it all.

That's what I suggested above.  Except for draining/flushing the chain; I
don't know what you mean by that.  Do you mean waiting for all users of
the chain to finish?  Why would you need to wait if the chain is never
accessed outside the task's context?

> One way to get around that would be to do something like this. However
> I am starting to think that having a notifier_chain_destroy might be a
> better way to go to avoid calling notifier_chain_unregister repeatedly
> just to wipe things out. Patch #2 of the attached.

I don't follow your patches.  

For one thing, your NOTIFIER_UNREGISTER_FREE patch will oops because the
code holds an rcu_read_lock() while calling synchronize_rcu().  You could 
move synchronize_rcu outside the locked region -- but that begs the 
question of why you have synchronize_rcu there at all.  What good does it 
do?

For another, I don't understand why you modified the atomic notifier
routines.  Didn't you say that your callouts will need to be able to
block?  That makes them not atomic, by definition, and not RCU-able.

And I don't think it's a good idea to add atomic_notifier_chain_drain to 
kernel/sys.c.  It's not part of the regular notifier infrastructure; it's 
something special for your individual application.

Alan Stern

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Jes S. <je...@sg...> - 2006-05-05 15:22:11

Alan Stern wrote:
> Basically what you need is a variant rw-semaphore that has no cache-bounce
> overhead in the readers-only case.  Such a thing could be implemented 
> using a per-CPU reader count.  This would solve part of your problem (the 
> global chain of drivers who are interested in every task).
> 
> Somebody else once complained about the overhead of using rw-semaphores 
> for blocking notifiers.  He seemed to be more concerned about the cost of 
> the memory barriers than the cost of the cache-line bouncing.  I can't 
> think of any way to eliminate barriers when coordinating multiple CPUs...

Well the memory barries basically enforce things hitting the bus, so
it's the same thing.

>> The per-task notification that we are proposing doesn't have this
>> problem because it is only ever called in task context, it can block
>> just as anything else in that path is allowed to do.
> 
> Which rules out using RCU, since you're not allowed to block while holding 
> an rcu_read_lock.

Correct, Jack's original patches didn't require RCU. I was trying to
share code, but it seems thats not an option.

>> It's not that difficult as long as you have the right contraints around
>> it, eg. if it is only ever accessible from the task's own context then
>> you can get away with it totally lock free.
> 
> Unregistration for rmmod becomes very difficult.  How does the driver get 
> into the contexts of all the tasks it is interested in, so that it can 
> unregister?
> 
> Also, this constraint is obviously impossible to satisfy for drivers 
> that are interested in every task.

The drivers interested in every task would benefit from a global chain
as opposed to a per task one. Anything being register for a per task,
will need to wait until the task exits, which isn't an unreasonable
requirement.

>>> As for unregistering, I don't see why a driver would want to unregister 
>>> from one of these chains during a callout.  The only event that comes to 
>>> mind is "exit", but there's no need to unregister then because the entire 
>>> chain is about to be destroyed.
>> You often want to have an exit call performed first so you need the
>> callout before destroying the chain.
> 
> I don't understand.  You want to have an exit call performed first before
> what?  And of course you have to perform the callouts before destroying
> the chain -- without the chain you can't do the callouts at all!

Before the task is destroyed. Which is why one would do a last callout,
then destroy the chain.

>> One way to do it would be two
>> functions, first a normal callout then a drain/destroy call that flushes
>> the chain and releases it all.
> 
> That's what I suggested above.  Except for draining/flushing the chain; I
> don't know what you mean by that.  Do you mean waiting for all users of
> the chain to finish?  Why would you need to wait if the chain is never
> accessed outside the task's context?

No I mean by releasing all notifier blocks in the chain, assuming they
were all kmalloc'ed (which one would expect if there's a registration
per task).

>> One way to get around that would be to do something like this. However
>> I am starting to think that having a notifier_chain_destroy might be a
>> better way to go to avoid calling notifier_chain_unregister repeatedly
>> just to wipe things out. Patch #2 of the attached.
> 
> I don't follow your patches.  
> 
> For one thing, your NOTIFIER_UNREGISTER_FREE patch will oops because the
> code holds an rcu_read_lock() while calling synchronize_rcu().  You could 
> move synchronize_rcu outside the locked region -- but that begs the 
> question of why you have synchronize_rcu there at all.  What good does it 
> do?
>
> For another, I don't understand why you modified the atomic notifier
> routines.  Didn't you say that your callouts will need to be able to
> block?  That makes them not atomic, by definition, and not RCU-able.

The patches were meant more to show the idea, I didn't claim they were
final production quality<tm>. But you're right, doing it in the raw
chain would make more sense.

> And I don't think it's a good idea to add atomic_notifier_chain_drain to 
> kernel/sys.c.  It's not part of the regular notifier infrastructure; it's 
> something special for your individual application.

It happens to be applicable to this application, but I don't see why
it wouldn't work for other cases, hence placing it in kernel/sys.c

Jes

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Alan S. <st...@ro...> - 2006-05-05 18:24:04

On Fri, 5 May 2006, Jes Sorensen wrote:

> > Somebody else once complained about the overhead of using rw-semaphores 
> > for blocking notifiers.  He seemed to be more concerned about the cost of 
> > the memory barriers than the cost of the cache-line bouncing.  I can't 
> > think of any way to eliminate barriers when coordinating multiple CPUs...
> 
> Well the memory barries basically enforce things hitting the bus, so
> it's the same thing.

No it isn't.  Things will hit the bus eventually, whether the barrier is 
there or not.  All the barrier does is slow down the pipeline somewhat.

> The drivers interested in every task would benefit from a global chain
> as opposed to a per task one. Anything being register for a per task,
> will need to wait until the task exits, which isn't an unreasonable
> requirement.

Okay, I'll take your word for it.

Of course, there might be situations where a driver _is_ in a task's
context and wants to unregister from that task's chain.  Such a
possibility should be allowed for.  Then the driver wouldn't have to wait
for the task to exit.

> > That's what I suggested above.  Except for draining/flushing the chain; I
> > don't know what you mean by that.  Do you mean waiting for all users of
> > the chain to finish?  Why would you need to wait if the chain is never
> > accessed outside the task's context?
> 
> No I mean by releasing all notifier blocks in the chain, assuming they
> were all kmalloc'ed (which one would expect if there's a registration
> per task).

That's what I meant by "destroying the chain".  "Flushing/draining" would
mean something else.  Never mind; I think we're in agreement.

> > I don't follow your patches.  
...
> The patches were meant more to show the idea, I didn't claim they were
> final production quality<tm>. But you're right, doing it in the raw
> chain would make more sense.

All right then.  How do you feel about combining a global raw_notifier
chain (for drivers interested in all tasks) that is protected by a
"no-cache-bounce-for-readers" rw-semaphore together with a per-task
raw_notifier chain that is protected by nothing (accessed only from within
the task's context)?  You'd have to write an API for registering and
unregistering on each type of chain, including dynamic allocation of the
notifier_blocks, plus an API for invoking the chains (easy) and one for
destroying the per-task chains (called only from do_exit or one of its
subordinates).

> > And I don't think it's a good idea to add atomic_notifier_chain_drain to 
> > kernel/sys.c.  It's not part of the regular notifier infrastructure; it's 
> > something special for your individual application.
> 
> It happens to be applicable to this application, but I don't see why
> it wouldn't work for other cases, hence placing it in kernel/sys.c

My feeling for now is keep it separate, since it's just a part of your
private API.  If other parts of the kernel want to use it then it can be
moved later.

The "no-cache-bounce" rw-sem, on the other hand, is a candidate for
mainlining.  We could see what Andrew Morton and others think about it.

Alan Stern

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Jes S. <je...@sg...> - 2006-05-08 08:31:16

Alan Stern wrote:
> On Fri, 5 May 2006, Jes Sorensen wrote:
>> The patches were meant more to show the idea, I didn't claim they were
>> final production quality<tm>. But you're right, doing it in the raw
>> chain would make more sense.
> 
> All right then.  How do you feel about combining a global raw_notifier
> chain (for drivers interested in all tasks) that is protected by a
> "no-cache-bounce-for-readers" rw-semaphore together with a per-task
> raw_notifier chain that is protected by nothing (accessed only from within
> the task's context)?

How do you anticipate that no-cache-bounce-for-readers? Whenever a
reader takes a semaphore they still have to go and update the semaphore,
when the next reader comes in on a different CPU that reader still has
to pull over the cacheline to update it. This is why I keep pointing out
that rw locks are no better when it comes to cache impact!

> You'd have to write an API for registering and
> unregistering on each type of chain, including dynamic allocation of the
> notifier_blocks, plus an API for invoking the chains (easy) and one for
> destroying the per-task chains (called only from do_exit or one of its
> subordinates).

I'd go back to Jack's original patch for this, shouldn't be too big a
deal.

>> It happens to be applicable to this application, but I don't see why
>> it wouldn't work for other cases, hence placing it in kernel/sys.c
> 
> My feeling for now is keep it separate, since it's just a part of your
> private API.  If other parts of the kernel want to use it then it can be
> moved later.

It would never be part of any private API, it would be for 'public' use,
EXPORT_SYMBOL_GPL naturally.

> The "no-cache-bounce" rw-sem, on the other hand, is a candidate for
> mainlining.  We could see what Andrew Morton and others think about it.

I am very curious how you expect to achieve this ....

Jes

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Alan S. <st...@ro...> - 2006-05-08 18:34:57

Attachments: qrwsem.h qrwsem.c

On Mon, 8 May 2006, Jes Sorensen wrote:

> > All right then.  How do you feel about combining a global raw_notifier
> > chain (for drivers interested in all tasks) that is protected by a
> > "no-cache-bounce-for-readers" rw-semaphore together with a per-task
> > raw_notifier chain that is protected by nothing (accessed only from within
> > the task's context)?
> 
> How do you anticipate that no-cache-bounce-for-readers? Whenever a
> reader takes a semaphore they still have to go and update the semaphore,
> when the next reader comes in on a different CPU that reader still has
> to pull over the cacheline to update it.

In a word: per-cpu reader counts.

> This is why I keep pointing out
> that rw locks are no better when it comes to cache impact!

Per-cpu data structures were devised specifically to reduce or eliminate 
cache impact.

> > The "no-cache-bounce" rw-sem, on the other hand, is a candidate for
> > mainlining.  We could see what Andrew Morton and others think about it.
> 
> I am very curious how you expect to achieve this ....

An initial implementation is attached.  Let me know what you think of it.

Alan Stern

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Jes S. <je...@sg...> - 2006-05-11 14:37:47

Alan Stern wrote:
> On Mon, 8 May 2006, Jes Sorensen wrote:
>>> The "no-cache-bounce" rw-sem, on the other hand, is a candidate for
>>> mainlining.  We could see what Andrew Morton and others think about it.
>> I am very curious how you expect to achieve this ....
> 
> An initial implementation is attached.  Let me know what you think of it.

Ok, had to chew on this one a couple of times ... clever :)

Obviously not a generic replacement for rwlocks but for the case where
it's write-once, read-unlimited, like with the global notifier chain,
then I think it could work well. Obviously the write case is going to be
"interesting" on a system with 512 CPUs :)

I don't think it makes sense to provide the write_trylock version since
one really shouldn't be using the qrwsem like that, and I think we
should avoid encouraging people using them if they are not absolutely
sure they need them for their code.

Would be interesting to investigate if there are other cases where we
could benefit from this.

Cheers,
Jes

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Alan S. <st...@ro...> - 2006-05-11 17:43:54

On Thu, 11 May 2006, Jes Sorensen wrote:

> > An initial implementation is attached.  Let me know what you think of it.
> 
> Ok, had to chew on this one a couple of times ... clever :)
> 
> Obviously not a generic replacement for rwlocks but for the case where
> it's write-once, read-unlimited, like with the global notifier chain,
> then I think it could work well. Obviously the write case is going to be
> "interesting" on a system with 512 CPUs :)

Especially since the global notifier chain will block all forks and exits
until it acquires the lock!  Better make sure none of the callouts try to 
start a new thread or wait for an old one to exit.  (In general, a callout 
shouldn't do anything that would invoke the notifier chain recursively -- 
it can lead to deadlock.)

> I don't think it makes sense to provide the write_trylock version since
> one really shouldn't be using the qrwsem like that, and I think we
> should avoid encouraging people using them if they are not absolutely
> sure they need them for their code.

I provided all the standard interfaces because it was no additional 
trouble to do so.  But you're right that they shouldn't be used other than 
in exceptional circumstances.

> Would be interesting to investigate if there are other cases where we
> could benefit from this.

Could be.  Posting on LKML might catch someone's attention.

Anyway, this should be good enough for you to use with your task 
notifiers.  Are there any other aspects that still need to be straightened 
out?

Alan Stern

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Jes S. <je...@sg...> - 2006-05-12 10:06:36

Alan Stern wrote:
> On Thu, 11 May 2006, Jes Sorensen wrote:
>> Obviously not a generic replacement for rwlocks but for the case where
>> it's write-once, read-unlimited, like with the global notifier chain,
>> then I think it could work well. Obviously the write case is going to be
>> "interesting" on a system with 512 CPUs :)
> 
> Especially since the global notifier chain will block all forks and exits
> until it acquires the lock!  Better make sure none of the callouts try to 
> start a new thread or wait for an old one to exit.  (In general, a callout 
> shouldn't do anything that would invoke the notifier chain recursively -- 
> it can lead to deadlock.)

Hmmm, there's a side effect I hadn't thought about, I wonder how bad it
will hit.

>> Would be interesting to investigate if there are other cases where we
>> could benefit from this.
> 
> Could be.  Posting on LKML might catch someone's attention.
> 
> Anyway, this should be good enough for you to use with your task 
> notifiers.  Are there any other aspects that still need to be straightened 
> out?

Actually I saw this more as a solution to the IBM guys' global notifier
locking issue. For task notification, one can do it totally lock free ;)

Cheers,
Jes

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Alan S. <st...@ro...> - 2006-05-04 18:50:05

Alan Stern wrote:

> I assumed that a notifier called during fork might need to block.  If 
> that's so then you have no choice but to use a blocking notifier.  

I forgot to mention -- if you can guarantee that these per-task notifier 
chains will never be used outside the task's context, then a raw_notifier 
will work.  No need for locking when by definition only a single CPU can 
be using the resource at any time.  Nor is unregistering from within a 
callout a problem.

Of course, this assumes you can make that guarantee.  The implication is
that a driver can't register or unregister for one of these chains unless
the driver is running in the task's context.  That makes rmmod awkward, as 
Chandra pointed out.

Alan Stern

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Chandra S. <sek...@us...> - 2006-05-04 21:34:33

On Thu, 2006-05-04 at 14:23 -0400, Alan Stern wrote:
> Alan Stern wrote:
> 
> > I assumed that a notifier called during fork might need to block.  If 
> > that's so then you have no choice but to use a blocking notifier.  
> 
> I forgot to mention -- if you can guarantee that these per-task notifier 
> chains will never be used outside the task's context, then a raw_notifier 
> will work.  No need for locking when by definition only a single CPU can 
> be using the resource at any time.  Nor is unregistering from within a 
> callout a problem.

unregistering from the callout might still be a problem, since we cannot
be sure that the next pointer in the freed notifier_block will be sane.

> 
> Of course, this assumes you can make that guarantee.  The implication is
> that a driver can't register or unregister for one of these chains unless
> the driver is running in the task's context.  That makes rmmod awkward, as 
> Chandra pointed out.
> 
> Alan Stern
> 
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sek...@us...   |      .......you may get it.
----------------------------------------------------------------------

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Paul E. M. <pa...@us...> - 2006-05-04 23:20:26

On Thu, May 04, 2006 at 02:34:23PM -0700, Chandra Seetharaman wrote:
> On Thu, 2006-05-04 at 14:23 -0400, Alan Stern wrote:
> > Alan Stern wrote:
> > 
> > > I assumed that a notifier called during fork might need to block.  If 
> > > that's so then you have no choice but to use a blocking notifier.  
> > 
> > I forgot to mention -- if you can guarantee that these per-task notifier 
> > chains will never be used outside the task's context, then a raw_notifier 
> > will work.  No need for locking when by definition only a single CPU can 
> > be using the resource at any time.  Nor is unregistering from within a 
> > callout a problem.
> 
> unregistering from the callout might still be a problem, since we cannot
> be sure that the next pointer in the freed notifier_block will be sane.

If you reference-count the entire chain, then any subsequently removed
notifier_block it guaranteed to remain sane.  Splitting the reference
count across multiple CPUs keeps the cost of reference counting quite
low.

							Thanx, Paul

> > Of course, this assumes you can make that guarantee.  The implication is
> > that a driver can't register or unregister for one of these chains unless
> > the driver is running in the task's context.  That makes rmmod awkward, as 
> > Chandra pointed out.
> > 
> > Alan Stern
> > 
> -- 
> 
> ----------------------------------------------------------------------
>     Chandra Seetharaman               | Be careful what you choose....
>               - sek...@us...   |      .......you may get it.
> ----------------------------------------------------------------------
> 
> 
> 
> 
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Lse-tech mailing list
> Lse...@li...
> https://lists.sourceforge.net/lists/listinfo/lse-tech

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Chandra S. <sek...@us...> - 2006-05-05 00:12:55

On Thu, 2006-05-04 at 16:20 -0700, Paul E. McKenney wrote:
> On Thu, May 04, 2006 at 02:34:23PM -0700, Chandra Seetharaman wrote:
> > On Thu, 2006-05-04 at 14:23 -0400, Alan Stern wrote:
> > > Alan Stern wrote:
> > > 
> > > > I assumed that a notifier called during fork might need to block.  If 
> > > > that's so then you have no choice but to use a blocking notifier.  
> > > 
> > > I forgot to mention -- if you can guarantee that these per-task notifier 
> > > chains will never be used outside the task's context, then a raw_notifier 
> > > will work.  No need for locking when by definition only a single CPU can 
> > > be using the resource at any time.  Nor is unregistering from within a 
> > > callout a problem.
> > 
> > unregistering from the callout might still be a problem, since we cannot
> > be sure that the next pointer in the freed notifier_block will be sane.
> 
> If you reference-count the entire chain, then any subsequently removed
> notifier_block it guaranteed to remain sane.  Splitting the reference
> count across multiple CPUs keeps the cost of reference counting quite
> low.

Paul,

I am not able to get my head around to understand the reference count
logic.

call_chain is defined as:
static int __kprobes notifier_call_chain(struct notifier_block **nl,
                unsigned long val, void *v)
{
1:        int ret = NOTIFY_DONE;
2:        struct notifier_block *nb;
3:
4:        nb = rcu_dereference(*nl);
5:        while (nb) {
6:                ret = nb->notifier_call(nb, val, v);
7:                if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
8:                        break;
9:                nb = rcu_dereference(nb->next);
a:        }
b:        return ret;
}

Assume that the user of the notifier chain kmalloc'd a notifier_block
and registered initially.

When the user's notifier_call is called from line 6 above, the user
kfrees the notifier_block, how having a reference count for the whole
chain would help in making sure that nb->next is sane in line 9 ?

Or, Are you describing a different mechanism without using this notifier
chain mechanism.

Thanks in advance for clarifying this.

chandra
> 
> 							Thanx, Paul
> 
> > > Of course, this assumes you can make that guarantee.  The implication is
> > > that a driver can't register or unregister for one of these chains unless
> > > the driver is running in the task's context.  That makes rmmod awkward, as 
> > > Chandra pointed out.
> > > 
> > > Alan Stern
> > > 
> > -- 
> > 
> > ----------------------------------------------------------------------
> >     Chandra Seetharaman               | Be careful what you choose....
> >               - sek...@us...   |      .......you may get it.
> > ----------------------------------------------------------------------
> > 
> > 
> > 
> > 
> > -------------------------------------------------------
> > Using Tomcat but need to do more? Need to support web services, security?
> > Get stuff done quickly with pre-integrated technology to make your job easier
> > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> > _______________________________________________
> > Lse-tech mailing list
> > Lse...@li...
> > https://lists.sourceforge.net/lists/listinfo/lse-tech
> 
> 
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Lse-tech mailing list
> Lse...@li...
> https://lists.sourceforge.net/lists/listinfo/lse-tech
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sek...@us...   |      .......you may get it.
----------------------------------------------------------------------

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Alan S. <st...@ro...> - 2006-05-05 00:48:50

On Thu, 4 May 2006, Chandra Seetharaman wrote:

> > > unregistering from the callout might still be a problem, since we cannot
> > > be sure that the next pointer in the freed notifier_block will be sane.
> > 
> > If you reference-count the entire chain, then any subsequently removed
> > notifier_block it guaranteed to remain sane.  Splitting the reference
> > count across multiple CPUs keeps the cost of reference counting quite
> > low.
> 
> Paul,
> 
> I am not able to get my head around to understand the reference count
> logic.
> 
> call_chain is defined as:
> static int __kprobes notifier_call_chain(struct notifier_block **nl,
>                 unsigned long val, void *v)
> {
> 1:        int ret = NOTIFY_DONE;
> 2:        struct notifier_block *nb;
> 3:
> 4:        nb = rcu_dereference(*nl);
> 5:        while (nb) {
> 6:                ret = nb->notifier_call(nb, val, v);
> 7:                if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
> 8:                        break;
> 9:                nb = rcu_dereference(nb->next);
> a:        }
> b:        return ret;
> }
> 
> Assume that the user of the notifier chain kmalloc'd a notifier_block
> and registered initially.
> 
> When the user's notifier_call is called from line 6 above, the user
> kfrees the notifier_block, how having a reference count for the whole
> chain would help in making sure that nb->next is sane in line 9 ?

We can always change the code:

	while (nb) {
		next_nb = rcu_dereference(nb->next);
		ret = nb->notifier_call(nb, val, v);
		if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
			break;
		nb = next_nb;
	}

Alan Stern

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Chandra S. <sek...@us...> - 2006-05-05 01:02:25

On Thu, 2006-05-04 at 20:48 -0400, Alan Stern wrote:
> On Thu, 4 May 2006, Chandra Seetharaman wrote:
> 
> > > > unregistering from the callout might still be a problem, since we cannot
> > > > be sure that the next pointer in the freed notifier_block will be sane.
> > > 
> > > If you reference-count the entire chain, then any subsequently removed
> > > notifier_block it guaranteed to remain sane.  Splitting the reference
> > > count across multiple CPUs keeps the cost of reference counting quite
> > > low.
> > 
> > Paul,
> > 
> > I am not able to get my head around to understand the reference count
> > logic.
> > 
> > call_chain is defined as:
> > static int __kprobes notifier_call_chain(struct notifier_block **nl,
> >                 unsigned long val, void *v)
> > {
> > 1:        int ret = NOTIFY_DONE;
> > 2:        struct notifier_block *nb;
> > 3:
> > 4:        nb = rcu_dereference(*nl);
> > 5:        while (nb) {
> > 6:                ret = nb->notifier_call(nb, val, v);
> > 7:                if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
> > 8:                        break;
> > 9:                nb = rcu_dereference(nb->next);
> > a:        }
> > b:        return ret;
> > }
> > 
> > Assume that the user of the notifier chain kmalloc'd a notifier_block
> > and registered initially.
> > 
> > When the user's notifier_call is called from line 6 above, the user
> > kfrees the notifier_block, how having a reference count for the whole
> > chain would help in making sure that nb->next is sane in line 9 ?
> 
> We can always change the code:
> 
> 	while (nb) {
> 		next_nb = rcu_dereference(nb->next);
> 		ret = nb->notifier_call(nb, val, v);
> 		if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
> 			break;
> 		nb = next_nb;
> 	}

I think we should make it that way.

Also, should the assignment n->next = *nl; in

static int notifier_chain_register(struct notifier_block **nl,
                struct notifier_block *n)
{
        while ((*nl) != NULL) {
                if (n->priority > (*nl)->priority)
                        break;
                nl = &((*nl)->next);
        }
        n->next = *nl;
        rcu_assign_pointer(*nl, n);
        return 0;
}

be a rcu_assign_pointer ?

> 
> Alan Stern
> 
> 
> 
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Lse-tech mailing list
> Lse...@li...
> https://lists.sourceforge.net/lists/listinfo/lse-tech
-- 

----------------------------------------------------------------------
    Chandra Seetharaman               | Be careful what you choose....
              - sek...@us...   |      .......you may get it.
----------------------------------------------------------------------

Re: [Lse-tech] [patch - RFC] simple task notifier support

From: Alan S. <st...@ro...> - 2006-05-05 01:15:05

On Thu, 4 May 2006, Chandra Seetharaman wrote:

> > We can always change the code:
> > 
> > 	while (nb) {
> > 		next_nb = rcu_dereference(nb->next);
> > 		ret = nb->notifier_call(nb, val, v);
> > 		if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
> > 			break;
> > 		nb = next_nb;
> > 	}
> 
> I think we should make it that way.
> 
> Also, should the assignment n->next = *nl; in
> 
> static int notifier_chain_register(struct notifier_block **nl,
>                 struct notifier_block *n)
> {
>         while ((*nl) != NULL) {
>                 if (n->priority > (*nl)->priority)
>                         break;
>                 nl = &((*nl)->next);
>         }
>         n->next = *nl;
>         rcu_assign_pointer(*nl, n);
>         return 0;
> }
> 
> be a rcu_assign_pointer ?

No, it doesn't need to be.  What rcu_assign_pointer does is add a memory 
barrier just before the assignment.  So for example, we don't want some 
other CPU seeing the *nl = n assignment _before_ it sees the n->next = *nl 
assigment.  That's what the rcu_assign_pointer macro prevents.

But since there are no global assignments before n->next = *nl, there
doesn't need to be a memory barrier in front of it.

Alan Stern

<< < 1 2 3 > >> (Page 2 of 3)