From: Paul E. M. <pa...@us...> - 2006-01-17 17:26:39
|
On Thu, Jan 12, 2006 at 07:17:41AM -0800, Paul E. McKenney wrote: > On Thu, Jan 12, 2006 at 06:50:34PM +1100, Keith Owens wrote: > > "Paul E. McKenney" (on Wed, 11 Jan 2006 22:51:15 -0800) wrote: > > >On Thu, Jan 12, 2006 at 05:19:01PM +1100, Keith Owens wrote: > > >> OK, I have thought about it and ... > > >> > > >> notifier_call_chain_lockfree() must be called with preempt disabled. > > >> > > >Fair enough! A comment, perhaps? In a former life I would have also > > >demanded debug code to verify that preemption/interrupts/whatever were > > >actually disabled, given the very subtle nature of any resulting bugs... > > > > Comment - OK. Debug code is not required, the reference to > > smp_processor_id() already does all the debug checks that > > notifier_call_chain_lockfree() needs. CONFIG_PREEMPT_DEBUG is your > > friend. > > Ah, debug_smp_processor_id(). Very cool!!! One other thing -- given that you are requiring that the read side have preemption disabled, another update-side option would be to use synchronize_sched() to wait for all preemption-disabled code segments to complete. This would allow you to remove the per-CPU reference counters from the read side, but would require that the update side either (1) be able to block or (2) be able to defer the cleanup to process context. Just another option... Thanx, Paul |
From: Keith O. <ka...@sg...> - 2006-01-17 23:57:52
|
"Paul E. McKenney" (on Tue, 17 Jan 2006 09:26:17 -0800) wrote: >On Thu, Jan 12, 2006 at 07:17:41AM -0800, Paul E. McKenney wrote: >> On Thu, Jan 12, 2006 at 06:50:34PM +1100, Keith Owens wrote: >> > "Paul E. McKenney" (on Wed, 11 Jan 2006 22:51:15 -0800) wrote: >> > >On Thu, Jan 12, 2006 at 05:19:01PM +1100, Keith Owens wrote: >> > >> OK, I have thought about it and ... >> > >> >> > >> notifier_call_chain_lockfree() must be called with preempt disabled. >> > >> >> > >Fair enough! A comment, perhaps? In a former life I would have also >> > >demanded debug code to verify that preemption/interrupts/whatever were >> > >actually disabled, given the very subtle nature of any resulting bugs... >> > >> > Comment - OK. Debug code is not required, the reference to >> > smp_processor_id() already does all the debug checks that >> > notifier_call_chain_lockfree() needs. CONFIG_PREEMPT_DEBUG is your >> > friend. >> >> Ah, debug_smp_processor_id(). Very cool!!! > >One other thing -- given that you are requiring that the read side >have preemption disabled, another update-side option would be to >use synchronize_sched() to wait for all preemption-disabled code >segments to complete. This would allow you to remove the per-CPU >reference counters from the read side, but would require that the >update side either (1) be able to block or (2) be able to defer >the cleanup to process context. Originally I looked at that code but the comment scared me off. synchronize_sched() maps to synchronize_rcu() and the comment says that this only protects against rcu_read_lock(), not against preempt_disable(). /** * synchronize_sched - block until all CPUs have exited any non-preemptive * kernel code sequences. * * This means that all preempt_disable code sequences, including NMI and * hardware-interrupt handlers, in progress on entry will have completed * before this primitive returns. However, this does not guarantee that * softirq handlers will have completed, since in some kernels * * This primitive provides the guarantees made by the (deprecated) * synchronize_kernel() API. In contrast, synchronize_rcu() only * guarantees that rcu_read_lock() sections will have completed. */ #define synchronize_sched() synchronize_rcu() |
From: Paul E. M. <pa...@us...> - 2006-01-18 02:50:12
|
On Wed, Jan 18, 2006 at 10:57:47AM +1100, Keith Owens wrote: > "Paul E. McKenney" (on Tue, 17 Jan 2006 09:26:17 -0800) wrote: [ . . .] > >One other thing -- given that you are requiring that the read side > >have preemption disabled, another update-side option would be to > >use synchronize_sched() to wait for all preemption-disabled code > >segments to complete. This would allow you to remove the per-CPU > >reference counters from the read side, but would require that the > >update side either (1) be able to block or (2) be able to defer > >the cleanup to process context. > > Originally I looked at that code but the comment scared me off. > synchronize_sched() maps to synchronize_rcu() and the comment says that > this only protects against rcu_read_lock(), not against preempt_disable(). Good point -- in mainline kernels (but not in some realtime variants), the two guarantees happen to be one and the same. But the comment certainly does not make this clear. > /** > * synchronize_sched - block until all CPUs have exited any non-preemptive > * kernel code sequences. > * > * This means that all preempt_disable code sequences, including NMI and > * hardware-interrupt handlers, in progress on entry will have completed > * before this primitive returns. However, this does not guarantee that > * softirq handlers will have completed, since in some kernels > * > * This primitive provides the guarantees made by the (deprecated) > * synchronize_kernel() API. In contrast, synchronize_rcu() only > * guarantees that rcu_read_lock() sections will have completed. > */ > #define synchronize_sched() synchronize_rcu() Does the following change help? Thanx, Paul diff -urpNa -X dontdiff linux-2.6.15/include/linux/rcupdate.h linux-2.6.15-RCUcomment/include/linux/rcupdate.h --- linux-2.6.15/include/linux/rcupdate.h 2006-01-02 19:21:10.000000000 -0800 +++ linux-2.6.15-RCUcomment/include/linux/rcupdate.h 2006-01-17 18:48:33.000000000 -0800 @@ -265,11 +265,14 @@ static inline int rcu_pending(int cpu) * This means that all preempt_disable code sequences, including NMI and * hardware-interrupt handlers, in progress on entry will have completed * before this primitive returns. However, this does not guarantee that - * softirq handlers will have completed, since in some kernels + * softirq handlers will have completed, since in some kernels, these + * handlers can run in process context, and can block. * * This primitive provides the guarantees made by the (deprecated) * synchronize_kernel() API. In contrast, synchronize_rcu() only * guarantees that rcu_read_lock() sections will have completed. + * In "classic RCU", these two guarantees happen to be one and + * the same, but can differ in realtime RCU implementations. */ #define synchronize_sched() synchronize_rcu() |
From: Lee R. <rlr...@jo...> - 2006-01-18 02:55:11
|
On Tue, 2006-01-17 at 18:49 -0800, Paul E. McKenney wrote: > - * softirq handlers will have completed, since in some kernels > + * softirq handlers will have completed, since in some kernels, these > + * handlers can run in process context, and can block. > * I was under the impression that softirq handlers can run in process context in all kernels. Specifically, in realtime variants softirqs always run in process context, and in mainline this only happens under high load. Lee |
From: Paul E. M. <pa...@us...> - 2006-01-18 06:29:52
|
On Tue, Jan 17, 2006 at 09:55:06PM -0500, Lee Revell wrote: > On Tue, 2006-01-17 at 18:49 -0800, Paul E. McKenney wrote: > > - * softirq handlers will have completed, since in some kernels > > + * softirq handlers will have completed, since in some kernels, these > > + * handlers can run in process context, and can block. > > * > > I was under the impression that softirq handlers can run in process > context in all kernels. Specifically, in realtime variants softirqs > always run in process context, and in mainline this only happens under > high load. We might be talking past each other on this one. If I am not getting too confused, it is possible to configure a mainline kernel so that the load cannot rise high enough to force softirqs into process context. Although looking at 2.6.15, it appears that this would require rebuilding after hand-editing the value of MAX_SOFTIRQ_RESTART, which some might feel too-brutal of tweaking to be considered mere "configuration". In any case, the key point of the comment is that synchronize_sched() is not guaranteed to wait for all pending softirq handlers to complete. Does the comment make that sufficiently clear? Thanx, Paul |
From: Matt H. <mat...@us...> - 2006-01-12 05:36:32
|
On Thu, 2006-01-12 at 14:29 +1100, Keith Owens wrote: > John Hesterberg (on Wed, 11 Jan 2006 15:39:10 -0600) wrote: > >On Wed, Jan 11, 2006 at 01:02:10PM -0800, Matt Helsley wrote: > >> Have you looked at Alan Stern's notifier chain fix patch? Could that be > >> used in task_notify? > > > >I have two concerns about an all-tasks notification interface. > >First, we want this to scale, so don't want more global locks. > >One unique part of the task notify is that it doesn't use locks. > > Neither does Alan Stern's atomic notifier chain. Indeed it cannot use > locks, because the atomic notifier chains can be called from anywhere, > including non maskable interrupts. The downside is that Alan's atomic > notifier chains require RCU. > > An alternative patch that requires no locks and does not even require > RCU is in http://marc.theaimsgroup.com/?l=linux-kernel&m=113392370322545&w=2 > Interesting. Might the 'inuse' flags suffer from bouncing due to false sharing? Cheers, -Matt Helsley |
From: Matt H. <mat...@us...> - 2006-01-11 22:48:32
|
On Wed, 2006-01-11 at 15:39 -0600, John Hesterberg wrote: > On Wed, Jan 11, 2006 at 01:02:10PM -0800, Matt Helsley wrote: > > On Wed, 2006-01-11 at 05:36 -0500, Jes Sorensen wrote: > > > >>>>> "Shailabh" == Shailabh Nagar <na...@wa...> writes: > > > > > > Shailabh> Jes Sorensen wrote: > > > >> I am quite concerned about that lock your patches put into struct > > > >> task_struct through struct task_delay_info. Have you done any > > > >> measurements on how this impacts performance on highly threaded > > > >> apps on larger system? > > > > > > Shailabh> I don't expect the lock contention to be high. The lock is > > > Shailabh> held for a very short time (across two > > > Shailabh> additions/increments). Moreover, it gets contended only when > > > Shailabh> the stats are being read (either through /proc or > > > Shailabh> connectors). Since the reading of stats won't be that > > > Shailabh> frequent (the utility of these numbers is to influence the > > > Shailabh> I/O priority/rss limit etc. which won't be done at a very > > > Shailabh> small granularity anyway), I wouldn't expect a problem. > > > > > > Hi Shailabh, > > > > > > When this is read through connectors, it's initiated by the connectors > > > code which is called from the task's context hence we don't need > > > locking for that. It's very similar to the task_notify code I am about > > > to post and I think the connector code could fit into that > > > framework. The main issue is /proc, but then one could even have a > > > mechanism with a hook when the task exits that pushes the data to a > > > storage point which is lock protected. > > > > > > Even if a lock isn't contended, you are still going to see the cache > > > lines bounce around due to the writes. It may not show up on a 4-way > > > box but what happens on a 64-way? We have seen some pretty nasty > > > effects on the bigger SN2 boxes with locks that were taken far too > > > frequently, to the point where it would prevent the box from booting > > > (now I don't expect it to that severe here, but I'd still like to > > > explore an approach of doing it lock free). > > > > > > Shailabh> But its better to take some measurements anyway. Any > > > Shailabh> suggestions on a benchmark ? > > > > > > >> IMHO it seems to make more sense to use something like Jack's > > > >> proposed task_notifier code to lock-less collect the data into task > > > >> local data structures and then take the data from there and ship > > > >> off to userland through netlink or similar like you are doing? > > > >> > > > >> I am working on modifying Jack's patch to carry task local data and > > > >> use it for not just accounting but other areas that need optional > > > >> callbacks (optional in the sense that it's a feature that can be > > > >> enabled or disabled). Looking at Shailabh's delayacct_blkio() > > > >> changes it seems that it would be really easy to fit those into > > > >> that framework. > > > >> > > > >> Guess I should post some of this code ..... > > > > > > Shailabh> Please do. If this accounting can fit into some other > > > Shailabh> framework, thats fine too. > > > > > > Ok, finally, sorry for the delay. My current code snapshot is > > > available at http://www.trained-monkey.org/~jes/patches/task_notify/ - > > > it's a modified version of Jack's task_notify code, and three example > > > users of it (the SysV IPC semundo semaphore, the key infrastructure > > > and SGI's JOB module). The patch order should be task_notify.diff, > > > task-notify-keys.diff, task-notify-semundo.diff, and > > > task_notify-job.diff last. > > > > I can already tell you I don't like the "magic" mechanism to identify > > notifier blocks. The problem is that it's yet another space of id > > numbers that we have to manage -- either manually, by having a person > > hand the numbers out to developers, or automatically using the idr code. > > You could use the notifier block's address and avoid an additional id > > space. > > > > Also, even if this mechanism goes into task_notify it needs a better > > name than "magic". > > > > > I think task_notify it should be usable for statistics gathering as > > > well, the only issue is how to attach it to the processes we wish to > > > gather accounting for. Personally I am not a big fan of the current > > > concept where statistics are gathered for all tasks at all time but > > > just not exported until accounting is enabled. > > > > Have you looked at Alan Stern's notifier chain fix patch? Could that be > > used in task_notify? > > > > If not, perhaps the patch use the standard kernel list idioms. > > > > Another potential user for the task_notify functionality is the process > > events connector. The problem is it requires the ability to receive > > notifications about all processes. Also, there's a chance that future > > CKRM code could use all-task and per-task notification. Combined with > > John Hesterberg's feedback I think there is strong justification for an > > all-tasks notification interface. > > I have two concerns about an all-tasks notification interface. > First, we want this to scale, so don't want more global locks. > One unique part of the task notify is that it doesn't use locks. Are you against global locks in these paths based solely on principle or have you measured their impact on scalability in those locations? Without measurement there are two conflicting principles here: code complexity vs. scalability. If you've made measurements I'm curious to know what kind of locks were measured, where they were used, what the load was doing -- as a rough characterization of the pattern of notifications -- and how much it impacted scalability. This information might help suggest a better solution. > Second, in at least some of the cases we're familiar with, > even when we might need all-tasks notification we still need task-local > data. That's been a problem with some of the global mechanisms I've > seen discussed. The per-task notification can allow you to have task-local data. Would registration of per-task notification from within an all-task notification be sufficient? Cheers, -Matt Helsley |
From: Jes S. <je...@tr...> - 2006-01-12 10:01:11
|
>>>>> "Matt" == Matt Helsley <mat...@us...> writes: Matt> On Wed, 2006-01-11 at 15:39 -0600, John Hesterberg wrote: >> I have two concerns about an all-tasks notification interface. >> First, we want this to scale, so don't want more global locks. One >> unique part of the task notify is that it doesn't use locks. Matt> Are you against global locks in these paths based solely on Matt> principle or have you measured their impact on scalability in Matt> those locations? Matt, It all depends on the specific location of the locks and how often they are taken. As long as something is taken by the same CPU all the time is not going to be a major issue, but if we end up with anything resembling a global lock, even if it is only held for a very short time, it is going to bite badly. On a 4-way you probably won't notice, but go to a 64-way and it bites, on a 512-way it eats you alive (we had a problem in the timer code quite a while back that prevented the machine from booting - it wasn't a lock that was held for a long time, just the fact that every CPU would take it every HZ was enough). Matt> Without measurement there are two conflicting principles here: Matt> code complexity vs. scalability. The two should never be mutually exlusive as long as we are careful. Matt> If you've made measurements I'm curious to know what kind of Matt> locks were measured, where they were used, what the load was Matt> doing -- as a rough characterization of the pattern of Matt> notifications -- and how much it impacted scalability. This Matt> information might help suggest a better solution. The one I mentioned above was in the timer interrupt path. Cheers, Jes |
From: Matt H. <mat...@us...> - 2006-01-12 23:26:45
|
On Thu, 2006-01-12 at 05:01 -0500, Jes Sorensen wrote: > >>>>> "Matt" == Matt Helsley <mat...@us...> writes: > > Matt> On Wed, 2006-01-11 at 15:39 -0600, John Hesterberg wrote: > >> I have two concerns about an all-tasks notification interface. > >> First, we want this to scale, so don't want more global locks. One > >> unique part of the task notify is that it doesn't use locks. > > Matt> Are you against global locks in these paths based solely on > Matt> principle or have you measured their impact on scalability in > Matt> those locations? > > Matt, > > It all depends on the specific location of the locks and how often > they are taken. As long as something is taken by the same CPU all the > time is not going to be a major issue, but if we end up with anything > resembling a global lock, even if it is only held for a very short > time, it is going to bite badly. On a 4-way you probably won't notice, > but go to a 64-way and it bites, on a 512-way it eats you alive (we > had a problem in the timer code quite a while back that prevented the > machine from booting - it wasn't a lock that was held for a long time, > just the fact that every CPU would take it every HZ was enough). OK, so you've established that global locks in timer paths are Bad. However you haven't established that timer paths are almost the same as the task paths we're talking about. I suspect timer paths are used much more frequently than fork, exec, or exit. I've appended a small patch that adds a global lock to the task_notify paths for testing purposes. I'm curious to know what kind of a performance difference you would see on your 64 or 512-way if you were to run with it. Looking at the ideas for lockless implementations I'm curious how well Keith's notifier chains patch would work in this case. It does not acquire a global lock in the "call" path and, as Keith suggested, probably can be modified to avoid cache bouncing. Cheers, -Matt Helsley > Matt> Without measurement there are two conflicting principles here: > Matt> code complexity vs. scalability. > > The two should never be mutually exlusive as long as we are careful. > > Matt> If you've made measurements I'm curious to know what kind of > Matt> locks were measured, where they were used, what the load was > Matt> doing -- as a rough characterization of the pattern of > Matt> notifications -- and how much it impacted scalability. This > Matt> information might help suggest a better solution. > > The one I mentioned above was in the timer interrupt path. > > Cheers, > Jes Index: linux-2.6.15/kernel/fork.c =================================================================== --- linux-2.6.15.orig/kernel/fork.c +++ linux-2.6.15/kernel/fork.c @@ -849,10 +849,12 @@ asmlinkage long sys_set_tid_address(int current->clear_child_tid = tidptr; return current->pid; } +extern spinlock_t proposed_global_lock; + /* * This creates a new process as a copy of the old one, * but does not actually start it yet. * * It copies the registers, and all the appropriate @@ -1137,12 +1139,14 @@ static task_t *copy_process(unsigned lon p->signal->tty = NULL; nr_threads++; total_forks++; write_unlock_irq(&tasklist_lock); + spin_lock(&proposed_global_lock); proc_fork_connector(p); cpuset_fork(p); + spin_unlock(&proposed_global_lock); retval = 0; fork_out: if (retval) return ERR_PTR(retval); Index: linux-2.6.15/kernel/exit.c =================================================================== --- linux-2.6.15.orig/kernel/exit.c +++ linux-2.6.15/kernel/exit.c @@ -784,10 +784,11 @@ static void exit_notify(struct task_stru /* If the process is dead, release it - nobody will wait for it */ if (state == EXIT_DEAD) release_task(tsk); } +extern spinlock_t proposed_global_lock; fastcall NORET_TYPE void do_exit(long code) { struct task_struct *tsk = current; int group_dead; @@ -844,10 +845,12 @@ fastcall NORET_TYPE void do_exit(long co if (group_dead) { del_timer_sync(&tsk->signal->real_timer); exit_itimers(tsk->signal); acct_process(code); } + spin_lock(&proposed_global_lock); + spin_unlock(&proposed_global_lock); exit_mm(tsk); exit_sem(tsk); __exit_files(tsk); __exit_fs(tsk); Index: linux-2.6.15/fs/exec.c =================================================================== --- linux-2.6.15.orig/fs/exec.c +++ linux-2.6.15/fs/exec.c @@ -1121,10 +1121,12 @@ int search_binary_handler(struct linux_b return retval; } EXPORT_SYMBOL(search_binary_handler); +extern spinlock_t proposed_global_lock; + /* * sys_execve() executes a new program. */ int do_execve(char * filename, char __user *__user *argv, @@ -1190,10 +1192,12 @@ int do_execve(char * filename, retval = copy_strings(bprm->argc, argv, bprm); if (retval < 0) goto out; + spin_lock(&proposed_global_lock); + spin_unlock(&proposed_global_lock); retval = search_binary_handler(bprm,regs); if (retval >= 0) { free_arg_pages(bprm); /* execve success */ Index: linux-2.6.15/init/main.c =================================================================== --- linux-2.6.15.orig/init/main.c +++ linux-2.6.15/init/main.c @@ -440,10 +440,11 @@ void __init parse_early_param(void) /* * Activate the first processor. */ +spinlock_t proposed_global_lock = SPIN_LOCK_UNLOCKED; asmlinkage void __init start_kernel(void) { char * command_line; extern struct kernel_param __start___param[], __stop___param[]; /* |
From: Jes S. <je...@tr...> - 2006-01-13 09:35:16
|
>>>>> "Matt" == Matt Helsley <mat...@us...> writes: Matt> On Thu, 2006-01-12 at 05:01 -0500, Jes Sorensen wrote: >> It all depends on the specific location of the locks and how often >> they are taken. As long as something is taken by the same CPU all >> the time is not going to be a major issue, but if we end up with >> anything resembling a global lock, even if it is only held for a >> very short time, it is going to bite badly. On a 4-way you probably >> won't notice, but go to a 64-way and it bites, on a 512-way it eats >> you alive (we had a problem in the timer code quite a while back >> that prevented the machine from booting - it wasn't a lock that was >> held for a long time, just the fact that every CPU would take it >> every HZ was enough). Matt> OK, so you've established that global locks in timer paths are Matt> Bad. However you haven't established that timer paths are Matt> almost the same as the task paths we're talking about. I suspect Matt> timer paths are used much more frequently than fork, exec, or Matt> exit. Hi Matt, I wasn't trying to make it sound like this was an apples to apples comparison, what I am saying is simply that locks aren't free. You are totally right that fork/exec should be called a lot less frequently, but the delay account data collection points will be in far more places than that and they will all go for the lock. Matt> I've appended a small patch that adds a global lock to the Matt> task_notify paths for testing purposes. I'm curious to know what Matt> kind of a performance difference you would see on your 64 or Matt> 512-way if you were to run with it. I don't have a 512-way to play with at the moment, but again I don't think it makes sense to benchmark things which aren't matching what we are looking at. If we can avoid the locks in the first place then there's really no reason for not doing that. Matt> Looking at the ideas for lockless implementations I'm curious Matt> how well Keith's notifier chains patch would work in this Matt> case. It does not acquire a global lock in the "call" path and, Matt> as Keith suggested, probably can be modified to avoid cache Matt> bouncing. Yup, I was curious about that. I haven't had a chance to look at it carefully yet. Cheers, Jes |
From: Matt H. <mat...@us...> - 2006-01-14 07:34:53
|
On Fri, 2006-01-13 at 04:35 -0500, Jes Sorensen wrote: > >>>>> "Matt" == Matt Helsley <mat...@us...> writes: > > Matt> On Thu, 2006-01-12 at 05:01 -0500, Jes Sorensen wrote: > >> It all depends on the specific location of the locks and how often > >> they are taken. As long as something is taken by the same CPU all > >> the time is not going to be a major issue, but if we end up with > >> anything resembling a global lock, even if it is only held for a > >> very short time, it is going to bite badly. On a 4-way you probably > >> won't notice, but go to a 64-way and it bites, on a 512-way it eats > >> you alive (we had a problem in the timer code quite a while back > >> that prevented the machine from booting - it wasn't a lock that was > >> held for a long time, just the fact that every CPU would take it > >> every HZ was enough). > > Matt> OK, so you've established that global locks in timer paths are > Matt> Bad. However you haven't established that timer paths are > Matt> almost the same as the task paths we're talking about. I suspect > Matt> timer paths are used much more frequently than fork, exec, or > Matt> exit. > > Hi Matt, > > I wasn't trying to make it sound like this was an apples to apples > comparison, what I am saying is simply that locks aren't free. OK. I'm not sure what gave you the impression I thought they were. > You are totally right that fork/exec should be called a lot less > frequently, but the delay account data collection points will be in far > more places than that and they will all go for the lock. All of these places are highly likely to also be in the context of the task that has the lock in it's task->delays struct. This could be a strongly recommended practice for taking the lock -- we can add a comment next to the lock suggesting as much. > Matt> I've appended a small patch that adds a global lock to the > Matt> task_notify paths for testing purposes. I'm curious to know what > Matt> kind of a performance difference you would see on your 64 or > Matt> 512-way if you were to run with it. > > I don't have a 512-way to play with at the moment, but again I don't > think it makes sense to benchmark things which aren't matching what we > are looking at. If we can avoid the locks in the first place then > there's really no reason for not doing that. It could be good for establishing a lower bound on contention resulting from locking implementations of task_notify. I think in that respect it's comparing two species of apples. We need to clarify what locking we're talking about -- locking in task_notify to protect the notification list vs. locking in delayacct to protect the accounting data consistency. I think that a simple lockless approach may only be possible for task_notify. Cheers, -Matt Helsley |
From: Jes S. <je...@tr...> - 2006-01-11 10:36:42
|
>>>>> "Shailabh" == Shailabh Nagar <na...@wa...> writes: Shailabh> Jes Sorensen wrote: >> I am quite concerned about that lock your patches put into struct >> task_struct through struct task_delay_info. Have you done any >> measurements on how this impacts performance on highly threaded >> apps on larger system? Shailabh> I don't expect the lock contention to be high. The lock is Shailabh> held for a very short time (across two Shailabh> additions/increments). Moreover, it gets contended only when Shailabh> the stats are being read (either through /proc or Shailabh> connectors). Since the reading of stats won't be that Shailabh> frequent (the utility of these numbers is to influence the Shailabh> I/O priority/rss limit etc. which won't be done at a very Shailabh> small granularity anyway), I wouldn't expect a problem. Hi Shailabh, When this is read through connectors, it's initiated by the connectors code which is called from the task's context hence we don't need locking for that. It's very similar to the task_notify code I am about to post and I think the connector code could fit into that framework. The main issue is /proc, but then one could even have a mechanism with a hook when the task exits that pushes the data to a storage point which is lock protected. Even if a lock isn't contended, you are still going to see the cache lines bounce around due to the writes. It may not show up on a 4-way box but what happens on a 64-way? We have seen some pretty nasty effects on the bigger SN2 boxes with locks that were taken far too frequently, to the point where it would prevent the box from booting (now I don't expect it to that severe here, but I'd still like to explore an approach of doing it lock free). Shailabh> But its better to take some measurements anyway. Any Shailabh> suggestions on a benchmark ? >> IMHO it seems to make more sense to use something like Jack's >> proposed task_notifier code to lock-less collect the data into task >> local data structures and then take the data from there and ship >> off to userland through netlink or similar like you are doing? >> >> I am working on modifying Jack's patch to carry task local data and >> use it for not just accounting but other areas that need optional >> callbacks (optional in the sense that it's a feature that can be >> enabled or disabled). Looking at Shailabh's delayacct_blkio() >> changes it seems that it would be really easy to fit those into >> that framework. >> >> Guess I should post some of this code ..... Shailabh> Please do. If this accounting can fit into some other Shailabh> framework, thats fine too. Ok, finally, sorry for the delay. My current code snapshot is available at http://www.trained-monkey.org/~jes/patches/task_notify/ - it's a modified version of Jack's task_notify code, and three example users of it (the SysV IPC semundo semaphore, the key infrastructure and SGI's JOB module). The patch order should be task_notify.diff, task-notify-keys.diff, task-notify-semundo.diff, and task_notify-job.diff last. I think task_notify it should be usable for statistics gathering as well, the only issue is how to attach it to the processes we wish to gather accounting for. Personally I am not a big fan of the current concept where statistics are gathered for all tasks at all time but just not exported until accounting is enabled. I just had a quick look at the connector code and I think it could possibly be hooked into the task_notify code as well as connectors seem to be another optional feature. The API for the task_notify code is not set in stone and we can add more notifier hooks where needed. If someone has strong reasoning for making changes to the API, then I'll be very open to that. Anyway, let me know what you think? Cheers, Jes |
From: Matt H. <mat...@us...> - 2006-01-12 23:55:43
|
On Wed, 2006-01-11 at 05:36 -0500, Jes Sorensen wrote: > >>>>> "Shailabh" == Shailabh Nagar <na...@wa...> writes: > > Shailabh> Jes Sorensen wrote: > >> I am quite concerned about that lock your patches put into struct > >> task_struct through struct task_delay_info. Have you done any > >> measurements on how this impacts performance on highly threaded > >> apps on larger system? > > Shailabh> I don't expect the lock contention to be high. The lock is > Shailabh> held for a very short time (across two > Shailabh> additions/increments). Moreover, it gets contended only when > Shailabh> the stats are being read (either through /proc or > Shailabh> connectors). Since the reading of stats won't be that > Shailabh> frequent (the utility of these numbers is to influence the > Shailabh> I/O priority/rss limit etc. which won't be done at a very > Shailabh> small granularity anyway), I wouldn't expect a problem. Not just when the stats are being read, but when the stats of that particular task (A) are being read by one task (B) AND updated by the task itself (A). > Hi Shailabh, > > When this is read through connectors, it's initiated by the connectors > code which is called from the task's context hence we don't need > locking for that. It's very similar to the task_notify code I am about > to post and I think the connector code could fit into that > framework. The main issue is /proc, but then one could even have a > mechanism with a hook when the task exits that pushes the data to a > storage point which is lock protected. Hmm, which task? The request for stats does not necessarily/usualy originate from the task we desire stats on. Hence the synchronization. In this way it's significantly different from lockless task_notify. > Even if a lock isn't contended, you are still going to see the cache > lines bounce around due to the writes. It may not show up on a 4-way > box but what happens on a 64-way? We have seen some pretty nasty > effects on the bigger SN2 boxes with locks that were taken far too > frequently, to the point where it would prevent the box from booting > (now I don't expect it to that severe here, but I'd still like to > explore an approach of doing it lock free). You're referring to the performance impact of a global lock on a large system. The lock Shailabh introduced is per-task. Those won't bounce around nearly as much -- I think they bounce under the following conditions: - The task in which the lock is embedded is in the cache of processor A and the task reading the stats of that task is on processor B. - The scheduler bounces a task between processor A and processor B. Am I missing any other circumstances under which it could bounce? > Shailabh> But its better to take some measurements anyway. Any > Shailabh> suggestions on a benchmark ? > > >> IMHO it seems to make more sense to use something like Jack's > >> proposed task_notifier code to lock-less collect the data into task > >> local data structures and then take the data from there and ship > >> off to userland through netlink or similar like you are doing? > >> > >> I am working on modifying Jack's patch to carry task local data and > >> use it for not just accounting but other areas that need optional > >> callbacks (optional in the sense that it's a feature that can be > >> enabled or disabled). Looking at Shailabh's delayacct_blkio() > >> changes it seems that it would be really easy to fit those into > >> that framework. > >> > >> Guess I should post some of this code ..... > > Shailabh> Please do. If this accounting can fit into some other > Shailabh> framework, thats fine too. > > Ok, finally, sorry for the delay. My current code snapshot is > available at http://www.trained-monkey.org/~jes/patches/task_notify/ - > it's a modified version of Jack's task_notify code, and three example > users of it (the SysV IPC semundo semaphore, the key infrastructure > and SGI's JOB module). The patch order should be task_notify.diff, > task-notify-keys.diff, task-notify-semundo.diff, and > task_notify-job.diff last. > > I think task_notify it should be usable for statistics gathering as > well, the only issue is how to attach it to the processes we wish to > gather accounting for. Personally I am not a big fan of the current > concept where statistics are gathered for all tasks at all time but > just not exported until accounting is enabled. The only potential problem I can see with using task_notify for gathering statistics is each section of code that increments stats would have to look for its notify block in the task's list. Cheers, -Matt Helsley |
From: Matt H. <mat...@us...> - 2006-01-11 21:06:55
|
On Wed, 2006-01-11 at 05:36 -0500, Jes Sorensen wrote: > >>>>> "Shailabh" == Shailabh Nagar <na...@wa...> writes: > > Shailabh> Jes Sorensen wrote: > >> I am quite concerned about that lock your patches put into struct > >> task_struct through struct task_delay_info. Have you done any > >> measurements on how this impacts performance on highly threaded > >> apps on larger system? > > Shailabh> I don't expect the lock contention to be high. The lock is > Shailabh> held for a very short time (across two > Shailabh> additions/increments). Moreover, it gets contended only when > Shailabh> the stats are being read (either through /proc or > Shailabh> connectors). Since the reading of stats won't be that > Shailabh> frequent (the utility of these numbers is to influence the > Shailabh> I/O priority/rss limit etc. which won't be done at a very > Shailabh> small granularity anyway), I wouldn't expect a problem. > > Hi Shailabh, > > When this is read through connectors, it's initiated by the connectors > code which is called from the task's context hence we don't need > locking for that. It's very similar to the task_notify code I am about > to post and I think the connector code could fit into that > framework. The main issue is /proc, but then one could even have a > mechanism with a hook when the task exits that pushes the data to a > storage point which is lock protected. > > Even if a lock isn't contended, you are still going to see the cache > lines bounce around due to the writes. It may not show up on a 4-way > box but what happens on a 64-way? We have seen some pretty nasty > effects on the bigger SN2 boxes with locks that were taken far too > frequently, to the point where it would prevent the box from booting > (now I don't expect it to that severe here, but I'd still like to > explore an approach of doing it lock free). > > Shailabh> But its better to take some measurements anyway. Any > Shailabh> suggestions on a benchmark ? > > >> IMHO it seems to make more sense to use something like Jack's > >> proposed task_notifier code to lock-less collect the data into task > >> local data structures and then take the data from there and ship > >> off to userland through netlink or similar like you are doing? > >> > >> I am working on modifying Jack's patch to carry task local data and > >> use it for not just accounting but other areas that need optional > >> callbacks (optional in the sense that it's a feature that can be > >> enabled or disabled). Looking at Shailabh's delayacct_blkio() > >> changes it seems that it would be really easy to fit those into > >> that framework. > >> > >> Guess I should post some of this code ..... > > Shailabh> Please do. If this accounting can fit into some other > Shailabh> framework, thats fine too. > > Ok, finally, sorry for the delay. My current code snapshot is > available at http://www.trained-monkey.org/~jes/patches/task_notify/ - > it's a modified version of Jack's task_notify code, and three example > users of it (the SysV IPC semundo semaphore, the key infrastructure > and SGI's JOB module). The patch order should be task_notify.diff, > task-notify-keys.diff, task-notify-semundo.diff, and > task_notify-job.diff last. I can already tell you I don't like the "magic" mechanism to identify notifier blocks. The problem is that it's yet another space of id numbers that we have to manage -- either manually, by having a person hand the numbers out to developers, or automatically using the idr code. You could use the notifier block's address and avoid an additional id space. Also, even if this mechanism goes into task_notify it needs a better name than "magic". > I think task_notify it should be usable for statistics gathering as > well, the only issue is how to attach it to the processes we wish to > gather accounting for. Personally I am not a big fan of the current > concept where statistics are gathered for all tasks at all time but > just not exported until accounting is enabled. Have you looked at Alan Stern's notifier chain fix patch? Could that be used in task_notify? If not, perhaps the patch use the standard kernel list idioms. Another potential user for the task_notify functionality is the process events connector. The problem is it requires the ability to receive notifications about all processes. Also, there's a chance that future CKRM code could use all-task and per-task notification. Combined with John Hesterberg's feedback I think there is strong justification for an all-tasks notification interface. If there was some way to quickly check for registration on an all-tasks list you could do both all-task and per-task notification in the same code. I took a shot at this a while back but the locking was incomplete. Perhaps a per-cpu all-task notification list would satisfy your performance-impact concerns. <snip> > The API for the task_notify code is not set in stone and we can add > more notifier hooks where needed. If someone has strong reasoning for > making changes to the API, then I'll be very open to that. I'd like to see the all-task notification I mentioned above. Notifications where uid/gids change could be useful for auditing and process events connector. <snip> Cheers, -Matt Helsley |
From: Jes S. <je...@tr...> - 2006-01-12 09:51:36
|
>>>>> "Matt" == Matt Helsley <mat...@us...> writes: Matt> On Wed, 2006-01-11 at 05:36 -0500, Jes Sorensen wrote: Matt> I can already tell you I don't like the "magic" mechanism to Matt> identify notifier blocks. The problem is that it's yet another Matt> space of id numbers that we have to manage -- either manually, Matt> by having a person hand the numbers out to developers, or Matt> automatically using the idr code. You could use the notifier Matt> block's address and avoid an additional id space. Hi Matt, I did debate with myself whether or not to take the 'magic' approach. I don't think the allocation is a big issue, it's all compile time determined and as long as people stick it in task_notifier.h it doesn't matter if their magic number is changed when it goes into the official tree. Using the notifier block's address won't work, they are often dynamically allocated so a client searching for it will rarely know the address. The alternative is to search for a function pointer of one of the call out functions, howevere it requires all users all provide a specific function pointer or we need to add some flags for the search function (eg. one of the TN_ ones could be passed in), but this would complicate the search function and make it slower. This is the approach Jack took in his original approach, however Jack's patch had just a single notifier function and it was the user's respoinsibility to write the code for the demultiplexing of the callouts. I do not like this approach as it will be a lot more error prone with everyone doing their own version. Matt> Also, even if this mechanism goes into task_notify it needs a Matt> better name than "magic". Magic is fairly standard for this kind of feature in the kernel, grep MAGIC include/linux/*.h ;-) >> I think task_notify it should be usable for statistics gathering as >> well, the only issue is how to attach it to the processes we wish >> to gather accounting for. Personally I am not a big fan of the >> current concept where statistics are gathered for all tasks at all >> time but just not exported until accounting is enabled. Matt> Have you looked at Alan Stern's notifier chain fix patch? Matt> Could that be used in task_notify? No sorry, do you have a pointer? Matt> If not, perhaps the patch use the standard kernel list idioms. Matt> Another potential user for the task_notify functionality is Matt> the process events connector. The problem is it requires the Matt> ability to receive notifications about all processes. Also, Matt> there's a chance that future CKRM code could use all-task and Matt> per-task notification. Combined with John Hesterberg's feedback Matt> I think there is strong justification for an all-tasks Matt> notification interface. I am wary of this. I don't see how we effectively will be able to do an all task notification without having to traverse the full task list and take a ton of global locks. If anybody have any idea to get around this problem I'd be very interested in hearing their suggestions. Matt> If there was some way to quickly check for registration on an Matt> all-tasks list you could do both all-task and per-task Matt> notification in the same code. I took a shot at this a while Matt> back but the locking was incomplete. Perhaps a per-cpu all-task Matt> notification list would satisfy your performance-impact Matt> concerns. That could be the approach to take to get around it, but I must admit I don't know the schedule enough to say whether we will be able to assign all tasks to one single CPU at any given time and then there's the issue of locking when they migrate. But again, the latter may already be handled by the scheduler? >> The API for the task_notify code is not set in stone and we can add >> more notifier hooks where needed. If someone has strong reasoning >> for making changes to the API, then I'll be very open to that. Matt> I'd like to see the all-task notification I mentioned above. Matt> Notifications where uid/gids change could be useful for auditing Matt> and process events connector. If anybody is willing to step up and propose a patch for this I'd be interested in taking a look. I don't off hand see a simple solution for it. Regards, Jes |
From: Matt H. <mat...@us...> - 2006-01-12 23:07:08
|
On Thu, 2006-01-12 at 04:51 -0500, Jes Sorensen wrote: > >>>>> "Matt" == Matt Helsley <mat...@us...> writes: > > Matt> On Wed, 2006-01-11 at 05:36 -0500, Jes Sorensen wrote: > > Matt> I can already tell you I don't like the "magic" mechanism to > Matt> identify notifier blocks. The problem is that it's yet another > Matt> space of id numbers that we have to manage -- either manually, > Matt> by having a person hand the numbers out to developers, or > Matt> automatically using the idr code. You could use the notifier > Matt> block's address and avoid an additional id space. > > Hi Matt, > > I did debate with myself whether or not to take the 'magic' approach. > I don't think the allocation is a big issue, it's all compile time > determined and as long as people stick it in task_notifier.h it > doesn't matter if their magic number is changed when it goes into the > official tree. > > Using the notifier block's address won't work, they are often > dynamically allocated so a client searching for it will rarely know > the address. If the task_notify user is responsible for allocating the notifier block then it could easily keep a copy of the pointer handy for the corresponding deregistration. > The alternative is to search for a function pointer of one of the call > out functions, howevere it requires all users all provide a specific > function pointer or we need to add some flags for the search function > (eg. one of the TN_ ones could be passed in), but this would > complicate the search function and make it slower. This is the > approach Jack took in his original approach, however Jack's patch had > just a single notifier function and it was the user's respoinsibility > to write the code for the demultiplexing of the callouts. I do not > like this approach as it will be a lot more error prone with everyone > doing their own version. I don't see how it will be error prone. Jack's interface was simple. And if we're really worred about users messing it up we could generate a set of macros that users must use to demultiplex the call. One function pointer per event does bloat the notifier block structure. Given that this notifier block may need to be replicated per-task this is a significant amount of space. Jack's demultiplexing approach requires that the space be proportional to the number of event functions instead of proportional to the number of notifier blocks. Furthermore, if new task events are added this structure would need to be expanded. In contrast, Jack's approach only required the addition of a new event value. For these reasons I think using a single function pointer will be much more effective. > Matt> Also, even if this mechanism goes into task_notify it needs a > Matt> better name than "magic". > > Magic is fairly standard for this kind of feature in the kernel, > grep MAGIC include/linux/*.h ;-) Using ids and pointers in the kernel are pretty standard to. I invite you to grep for those and count them too ;) Seriously, it's a nit but I think the name could better reflect the purpose of the field. > >> I think task_notify it should be usable for statistics gathering as > >> well, the only issue is how to attach it to the processes we wish > >> to gather accounting for. Personally I am not a big fan of the > >> current concept where statistics are gathered for all tasks at all > >> time but just not exported until accounting is enabled. > > Matt> Have you looked at Alan Stern's notifier chain fix patch? > Matt> Could that be used in task_notify? > > No sorry, do you have a pointer? No problem. Here it is: http://marc.theaimsgroup.com/?l=linux-kernel&m=113407207418475&w=2 I think it would be ideal if task_notify could simply be a notifier chain for notifying users of task events/changes. > Matt> If not, perhaps the patch use the standard kernel list idioms. > Matt> Another potential user for the task_notify functionality is > Matt> the process events connector. The problem is it requires the > Matt> ability to receive notifications about all processes. Also, > Matt> there's a chance that future CKRM code could use all-task and > Matt> per-task notification. Combined with John Hesterberg's feedback > Matt> I think there is strong justification for an all-tasks > Matt> notification interface. > > I am wary of this. I don't see how we effectively will be able to do > an all task notification without having to traverse the full task list > and take a ton of global locks. If anybody have any idea to get around > this problem I'd be very interested in hearing their suggestions. Why would we have to traverse the full task list?! Just add one notifier block to a single, global list of notifiers. Keith's patch demonstrates one potential method of avoiding a lock in the common path: traversing the notification chain and making the calls. > Matt> If there was some way to quickly check for registration on an > Matt> all-tasks list you could do both all-task and per-task > Matt> notification in the same code. I took a shot at this a while > Matt> back but the locking was incomplete. Perhaps a per-cpu all-task > Matt> notification list would satisfy your performance-impact > Matt> concerns. > > That could be the approach to take to get around it, but I must admit > I don't know the schedule enough to say whether we will be able to > assign all tasks to one single CPU at any given time and then there's > the issue of locking when they migrate. But again, the latter may > already be handled by the scheduler? "all-task" means the notification block calls its function when any task triggers a traversal of the notification lists. This does not imply that registration/call/deregister of an all-task notification must traverse all tasks. > >> The API for the task_notify code is not set in stone and we can add > >> more notifier hooks where needed. If someone has strong reasoning > >> for making changes to the API, then I'll be very open to that. > > Matt> I'd like to see the all-task notification I mentioned above. > Matt> Notifications where uid/gids change could be useful for auditing > Matt> and process events connector. > > If anybody is willing to step up and propose a patch for this I'd be > interested in taking a look. I don't off hand see a simple solution > for it. Keith posted an interesting patch for notification chains that I believe could address your concerns -- though from what I've read it needs to disable preemption. As far as adding notifications for uid/gid change, that's relatively trivial. You might look at some revisions of task_notify that Chandra Seetharaman and I have posted. Cheers, -Matt |
From: Jes S. <je...@tr...> - 2006-01-13 09:59:05
|
>>>>> "Matt" == Matt Helsley <mat...@us...> writes: Matt> On Thu, 2006-01-12 at 04:51 -0500, Jes Sorensen wrote: >> Using the notifier block's address won't work, they are often >> dynamically allocated so a client searching for it will rarely know >> the address. Matt> If the task_notify user is responsible for allocating the Matt> notifier block then it could easily keep a copy of the pointer Matt> handy for the corresponding deregistration. And when they are dynamically allocated on a per-object level? Then you'd have to keep linked lists around to keep track of them. Sorry, doesn't work. >> The alternative is to search for a function pointer of one of the >> call out functions, howevere it requires all users all provide a >> specific function pointer or we need to add some flags for the >> search function (eg. one of the TN_ ones could be passed in), but >> this would complicate the search function and make it slower. This >> is the approach Jack took in his original approach, however Jack's >> patch had just a single notifier function and it was the user's >> respoinsibility to write the code for the demultiplexing of the >> callouts. I do not like this approach as it will be a lot more >> error prone with everyone doing their own version. Matt> I don't see how it will be error prone. Jack's interface was Matt> simple. And if we're really worred about users messing it up we Matt> could generate a set of macros that users must use to Matt> demultiplex the call. Because experience shows that the more some people copy the same code someone will get it wrong. Matt> One function pointer per event does bloat the notifier block Matt> structure. Given that this notifier block may need to be Matt> replicated per-task this is a significant amount of Matt> space. Jack's demultiplexing approach requires that the space be Matt> proportional to the number of event functions instead of Matt> proportional to the number of notifier blocks. Matt> Furthermore, if new task events are added this structure would Matt> need to be expanded. In contrast, Jack's approach only required Matt> the addition of a new event value. Matt> For these reasons I think using a single function pointer will Matt> be much more effective. So you add a few pointers per task compared to the code that does the demultiplexing. We're talkin 3-4 extra pointers in comparison to the demultiplexing code *and* the extra function call. My approach is consitent with the rest of the kernel does for most structures, struct file operations etc. But if thats what makes the difference as to whether we go for this type of API, then lets change it back. I am not married to the API, but rather the functionality. Matt> No problem. Here it is: Matt> http://marc.theaimsgroup.com/?l=linux-kernel&m=113407207418475&w=2 Matt> I think it would be ideal if task_notify could simply be a Matt> notifier chain for notifying users of task events/changes. Thats really what it was intended for, but of course only in the task's own context. >> That could be the approach to take to get around it, but I must >> admit I don't know the schedule enough to say whether we will be >> able to assign all tasks to one single CPU at any given time and >> then there's the issue of locking when they migrate. But again, the >> latter may already be handled by the scheduler? Matt> "all-task" means the notification block calls its function Matt> when any task triggers a traversal of the notification Matt> lists. This does not imply that registration/call/deregister of Matt> an all-task notification must traverse all tasks. Well then the name is somewhat misleading ;-) Sounds almost like something that could fit into Erik's Job stuff, but I guess it's a question of what criteria those tasks are put onto that list with. Being a little short on the history, are there any pointers to examples or descriptions of what this would be used for? Curious to understand the usage pattern. Matt> I'd like to see the all-task notification I mentioned above. Matt> Notifications where uid/gids change could be useful for auditing Matt> and process events connector. >> If anybody is willing to step up and propose a patch for this I'd >> be interested in taking a look. I don't off hand see a simple >> solution for it. Matt> Keith posted an interesting patch for notification chains that Matt> I believe could address your concerns -- though from what I've Matt> read it needs to disable preemption. As far as adding Matt> notifications for uid/gid change, that's relatively trivial. You Matt> might look at some revisions of task_notify that Chandra Matt> Seetharaman and I have posted. I did go through the archives a while back and I did notice that both Jack and Erik pointed out a number of issues with those approaches. If we are going to do the task-group-notify thing, then going to Keith's approach is probably the best. Cheers, Jes |
From: Jes S. <je...@tr...> - 2006-01-13 10:39:17
|
>>>>> "Matt" == Matt Helsley <mat...@us...> writes: Matt> On Thu, 2006-01-12 at 04:51 -0500, Jes Sorensen wrote: Matt> Have you looked at Alan Stern's notifier chain fix patch? Could Matt> that be used in task_notify? >> No sorry, do you have a pointer? Matt> No problem. Here it is: Matt> http://marc.theaimsgroup.com/?l=linux-kernel&m=113407207418475&w=2 Matt> I think it would be ideal if task_notify could simply be a Matt> notifier chain for notifying users of task events/changes. Ok, went back and looked at this. I think the core concept is fine, but there are details such as having a data pointer associated with the notifier block which is too important to leave out. Otherwise we have to stick things into the task struct in many cases which is a waste of space. I also think it needs to be possible to search the list for special slow path uses to avoid us adding excessive amounts of callbacks that are only used in one place by one client. If we can cross-API it for task-group-notifiers then that should be fine. Cheers, Jes |
From: Matt H. <mat...@us...> - 2006-01-13 23:29:11
|
On Fri, 2006-01-13 at 05:38 -0500, Jes Sorensen wrote: > >>>>> "Matt" == Matt Helsley <mat...@us...> writes: > > Matt> On Thu, 2006-01-12 at 04:51 -0500, Jes Sorensen wrote: > Matt> Have you looked at Alan Stern's notifier chain fix patch? Could > Matt> that be used in task_notify? > >> No sorry, do you have a pointer? > > Matt> No problem. Here it is: > Matt> http://marc.theaimsgroup.com/?l=linux-kernel&m=113407207418475&w=2 > > Matt> I think it would be ideal if task_notify could simply be a > Matt> notifier chain for notifying users of task events/changes. > > Ok, went back and looked at this. I think the core concept is fine, > but there are details such as having a data pointer associated with > the notifier block which is too important to leave out. Otherwise we > have to stick things into the task struct in many cases which is a > waste of space. I also think it needs to be possible to search the > list for special slow path uses to avoid us adding excessive amounts > of callbacks that are only used in one place by one client. I agree that having a data pointer associated with the notifier block is important. It helps us avoid increasing the size of task struct for each task_notify user and makes modularization of them possible. Hmm, yes excessive amounts of callbacks for those used by only one client could be a problem. My first approach to that problem would be to have one task_notify list per-event rather than just a single list for all events. This has ugly space implications. More importantly, I don't think it's a problem yet. Until it's a problem we ought to go with the simpler implementations. When it is a problem the task_notify interface should insulate the users from such a change. > If we can cross-API it for task-group-notifiers then that should be > fine. > > Cheers, > Jes Yup. Cheers, -Matt Helsley |