From: Chandra S. <sek...@us...> - 2005-10-10 22:37:12
|
Jack, Forewarning: I am looking to get the least common functionality separated. It looks simpler than the earlier version and has less overhead. But, not everybody that would use this functionality would need the callout list to be maintained in the task and having the overhead of copying it over to a new process. For example, CKRM and Process Event Notifier would use this feature but don't need it to be maintained per task. Can we make this module an in-kernel task event notifier and move the functionality that is needed by PAGG to its module ? regards, chandra On Fri, 2005-10-07 at 09:26 -0500, Jack Steiner wrote: > Here is an alternate proposal for a PAGG / pnotifier mechanism. This > mechanism uses a new "task notifier" that is task specific. Unlike some > of the other callout mechanisms, callouts are attached to specific tasks. > There is no global callout list. Task notifiers are optional callouts > that can be registered on a per-task basic. > > The task_notifier mechanism does not provide all of the capabilities of PAGG. > Specifically, a task cannot register a task_notifier to an arbitrary task. > Registration is supported only against 1) the current task, or 2) a child > task during a clone. There is no support for adding notifiers to an > arbitrary task. > > Interesting points: > - no locks > - no global data maintained by the notifier mechanism > - 1 void* pointer added to task struct > - arbitrary number of callouts > - support for callout priority > - each callout can have private data (ex., embed task_notifier in larger > structure) but general callout mechanism is unaware of the data > > Loadable modules need to reference-count being in a notifier list & prevent > unloading if the reference-count is non-zero. Modules that use task_notifiers > are not visible to anything in the kernel if no task_notifiers are currently > registered. > > We have a couple of modules that use the current PAGG callout > mechanism. I converted one of these users (dplace) to use the new mechanism. > The conversion effort was trivial. We have additional modules that use PAGG. > These have not yet been analyzed to see if task_notifiers are sufficient > but it looks promising. > > I'm proposing this patch to see if other users of pagg / pnotify could > use this mechanism instead. The mechanism is lightweight, small & unintrusive. > Does this look like something that would be of general use. > > > Here is a trial patch against a SUSE kernel that also has the PAGG patch applied. > I added callouts in fork(), exec(), & exit() at the same point that PAGG > added callouts. Task_notifier callouts can be added in other places as needed. > > > > fs/exec.c | 2 > include/linux/init_task.h | 1 > include/linux/sched.h | 2 > include/linux/task_notifier.h | 56 ++++++++++++++++++++ > kernel/Makefile | 2 > kernel/exit.c | 2 > kernel/fork.c | 4 + > kernel/task_notifier.c | 113 ++++++++++++++++++++++++++++++++++++++++++ > > Index: linux/include/linux/task_notifier.h > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux/include/linux/task_notifier.h 2005-10-05 13:06:33.994660706 -0500 > @@ -0,0 +1,55 @@ > +/* > + * This file is subject to the terms and conditions of the GNU General Public > + * License. See the file "COPYING" in the main directory of this archive > + * for more details. > + * > + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. > + * > + * > + * Routines to manage task notifier chains for passing task state changes to any > + * interested routines. > + * > + */ > + > +#ifndef _LINUX_TASK_NOTIFIER_H > +#define _LINUX_TASK_NOTIFIER_H > +#include <linux/compiler.h> > +#include <linux/types.h> > +#include <linux/sched.h> > +#include <asm/current.h> > + > +struct task_notifier_block; > + > +typedef void (task_notifier_func)(unsigned int, void *, struct task_notifier_block *nb); > + > +struct task_notifier_block > +{ > + task_notifier_func *task_notifier_func; > + struct task_notifier_block *next; > + int priority; > + unsigned int id_select; > +}; > + > + > +extern void task_notifier_chain_register(struct task_notifier_block *nb); > +extern void task_notifier_chain_register_child(struct task_notifier_block *nb, struct task_struct *p); > +extern void task_notifier_chain_unregister(struct task_notifier_block *nb); > +extern struct task_notifier_block *find_task_notifier_block_proc(task_notifier_func *func); > +extern void _task_notifier_call_chain(unsigned int id, void *arg); > + > +static inline void task_notifier_call_chain(unsigned int id, void *arg) > +{ > + if (unlikely(current->task_notifier)) > + _task_notifier_call_chain(id, arg); > +} > + > + > +/* > + * Notifier identifiers - used as bitmask for selections > + */ > + > +#define TN_FORK 0x00000001 > +#define TN_EXEC 0x00000002 > +#define TN_EXIT 0x00000004 > + > +#endif /* _LINUX_TASK_NOTIFIER_H */ > Index: linux/kernel/task_notifier.c > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux/kernel/task_notifier.c 2005-10-05 13:06:52.217418263 -0500 > @@ -0,0 +1,111 @@ > +/* > + * This file is subject to the terms and conditions of the GNU General Public > + * License. See the file "COPYING" in the main directory of this archive > + * for more details. > + * > + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. > + */ > + > +#include <linux/config.h> > +#include <linux/module.h> > +#include <linux/task_notifier.h> > +#include <asm/current.h> > + > +/** > + * task_notifier_chain_register_task - Add a task_notifier to a task. > + * @n: Entry to be added to the task notifier chain > + * @p: Task (must be current or a child being created during fork/clone > + */ > +static inline void task_notifier_chain_register_task(struct task_notifier_block > + *nb, struct task_struct *p) > +{ > + struct task_notifier_block **list = > + (struct task_notifier_block **)&p->task_notifier; > + > + while (*list && nb->priority > (*list)->priority) > + list = &((*list)->next); > + nb->next = *list; > + *list = nb; > +} > + > +/** > + * task_notifier_chain_register_child - Add a task_notifier to a task during fork/clone. > + * @n: Entry to be added to the task notifier chain > + * @p: Task (must be current or a child being created during fork/clone > + */ > +void task_notifier_chain_register_child(struct task_notifier_block *nb, > + struct task_struct *p) > +{ > + task_notifier_chain_register_task(nb, p); > +} > + > +/** > + * task_notifier_chain_register - Add a task_notifier to the current task > + * @n: Entry to be added to the task notifier chain > + */ > +void task_notifier_chain_register(struct task_notifier_block *nb) > +{ > + task_notifier_chain_register_task(nb, current); > +} > + > +/** > + * task_notifier_chain_unregister - Remove notifier from the current task > + * @nb: Entry in notifier chain > + * > + */ > + > +void task_notifier_chain_unregister(struct task_notifier_block *nb) > +{ > + struct task_notifier_block **list = > + (struct task_notifier_block **)¤t->task_notifier; > + > + while (*list && *list != nb) > + list = &((*list)->next); > + > + if (*list == nb) > + *list = nb->next; > +} > + > +/** > + * find_task_notifier_block_proc - Scan the current task's notifiers for a > + * task_notifier_block with a pointer to @func > + * @func: Function > + * > + */ > + > +struct task_notifier_block *find_task_notifier_block_proc(task_notifier_func * > + func) > +{ > + struct task_notifier_block *nb = current->task_notifier; > + > + while (nb && nb->task_notifier_func != func) > + nb = nb->next; > + > + return nb; > +} > + > +/** > + * task_notifier_call_chain - Call functions in a notifier chain > + * @id: Callout identifier > + * @arg: Argument to pass to called fumctions > + * > + * Calls each function in a notifier chain in turn. > + * > + */ > + > +void _task_notifier_call_chain(unsigned int id, void *arg) > +{ > + struct task_notifier_block *nb, *next_nb = current->task_notifier; > + > + while ((nb = next_nb)) { > + next_nb = nb->next; > + if (nb->id_select & id) > + nb->task_notifier_func(id, arg, nb); > + } > +} > + > +EXPORT_SYMBOL_GPL(task_notifier_chain_register); > +EXPORT_SYMBOL_GPL(task_notifier_chain_register_child); > +EXPORT_SYMBOL_GPL(task_notifier_chain_unregister); > +EXPORT_SYMBOL_GPL(find_task_notifier_block_proc); > +EXPORT_SYMBOL_GPL(task_notifier_call_chain); > Index: linux/fs/exec.c > =================================================================== > --- linux.orig/fs/exec.c 2005-09-17 09:25:56.990242412 -0500 > +++ linux/fs/exec.c 2005-10-05 13:11:29.621874069 -0500 > @@ -51,6 +51,7 @@ > #include <linux/audit.h> > #include <linux/trigevent_hooks.h> > #include <linux/pagg.h> > +#include <linux/task_notifier.h> > #include <linux/acct.h> > > #include <asm/uaccess.h> > @@ -1209,6 +1210,7 @@ int do_execve(char * filename, > > free_arg_pages(&bprm); > pagg_exec(current); > + task_notifier_call_chain(TN_EXEC, NULL); /* should file or bprm be passed??? */ > > /* execve success */ > security_bprm_free(&bprm); > Index: linux/include/linux/init_task.h > =================================================================== > --- linux.orig/include/linux/init_task.h 2005-09-17 09:25:22.922404438 -0500 > +++ linux/include/linux/init_task.h 2005-09-17 09:45:28.637217785 -0500 > @@ -115,6 +115,7 @@ extern struct group_info init_groups; > .journal_info = NULL, \ > .map_base = __TASK_UNMAPPED_BASE, \ > .io_wait = NULL, \ > + .task_notifier = NULL, \ > INIT_TASK_PAGG(tsk) \ > } > > Index: linux/include/linux/sched.h > =================================================================== > --- linux.orig/include/linux/sched.h 2005-09-17 09:25:56.999030585 -0500 > +++ linux/include/linux/sched.h 2005-09-17 10:02:23.314663901 -0500 > @@ -601,7 +601,7 @@ struct task_struct { > struct list_head pagg_list; > struct rw_semaphore pagg_sem; > #endif > - > + void *task_notifier; > }; > > static inline pid_t process_group(struct task_struct *tsk) > Index: linux/kernel/exit.c > =================================================================== > --- linux.orig/kernel/exit.c 2005-09-17 09:25:57.001959975 -0500 > +++ linux/kernel/exit.c 2005-09-17 09:59:17.263210664 -0500 > @@ -31,6 +31,7 @@ > #include <linux/trigevent_hooks.h> > #include <linux/ltt.h> > #include <linux/pagg.h> > +#include <linux/task_notifier.h> > > #include <asm/uaccess.h> > #include <asm/pgtable.h> > @@ -863,6 +864,7 @@ asmlinkage NORET_TYPE void do_exit(long > > tsk->exit_code = code; > pagg_detach(tsk); > + task_notifier_call_chain(TN_EXIT, NULL); > exit_notify(tsk); > BUG_ON(!(current->flags & PF_DEAD)); > schedule(); > Index: linux/kernel/fork.c > =================================================================== > --- linux.orig/kernel/fork.c 2005-09-17 09:25:57.884683056 -0500 > +++ linux/kernel/fork.c 2005-10-07 09:15:25.285049535 -0500 > @@ -37,6 +37,7 @@ > #include <linux/trigevent_hooks.h> > #include <linux/cpuset.h> > #include <linux/pagg.h> > +#include <linux/task_notifier.h> > #include <linux/acct.h> > > #include <linux/ckrm.h> > @@ -1065,6 +1066,8 @@ struct task_struct *copy_process(unsigne > * process aggregate containers as the parent process. > */ > pagg_attach(p, current); > + p->task_notifier = NULL; > + task_notifier_call_chain(TN_FORK, p); > > /* > * Ok, make it visible to the rest of the system. > @@ -1150,6 +1153,7 @@ fork_out: > > bad_fork_cleanup_namespace: > pagg_detach(p); > + task_notifier_call_chain(TN_EXIT, NULL); /* should this be a unique TN_xxx id?? */ > exit_namespace(p); > bad_fork_cleanup_mm: > exit_mm(p); > Index: linux/kernel/Makefile > =================================================================== > --- linux.orig/kernel/Makefile 2005-09-17 09:25:27.373125449 -0500 > +++ linux/kernel/Makefile 2005-09-17 10:23:09.740024850 -0500 > @@ -7,7 +7,7 @@ obj-y = sched.o fork.o exec_domain.o > sysctl.o capability.o ptrace.o timer.o user.o \ > signal.o sys.o kmod.o workqueue.o pid.o \ > rcupdate.o intermodule.o extable.o params.o posix-timers.o \ > - kthread.o ckrm/ > + kthread.o ckrm/ task_notifier.o > > obj-$(CONFIG_FUTEX) += futex.o > obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o > -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Matt H. <mat...@us...> - 2005-10-11 01:45:45
|
On Mon, 2005-10-10 at 15:37 -0700, Chandra Seetharaman wrote: > Jack, > > Forewarning: I am looking to get the least common functionality > separated. > > It looks simpler than the earlier version and has less overhead. But, > not everybody that would use this functionality would need the callout > list to be maintained in the task and having the overhead of copying it > over to a new process. For example, CKRM and Process Event Notifier > would use this feature but don't need it to be maintained per task. > > Can we make this module an in-kernel task event notifier and move the > functionality that is needed by PAGG to its module ? > > regards, > > chandra > > On Fri, 2005-10-07 at 09:26 -0500, Jack Steiner wrote: > > Here is an alternate proposal for a PAGG / pnotifier mechanism. This > > mechanism uses a new "task notifier" that is task specific. Unlike some > > of the other callout mechanisms, callouts are attached to specific tasks. > > There is no global callout list. Task notifiers are optional callouts > > that can be registered on a per-task basic. > > > > The task_notifier mechanism does not provide all of the capabilities of PAGG. > > Specifically, a task cannot register a task_notifier to an arbitrary task. > > Registration is supported only against 1) the current task, or 2) a child > > task during a clone. There is no support for adding notifiers to an > > arbitrary task. > > > > Interesting points: > > - no locks > > - no global data maintained by the notifier mechanism > > - 1 void* pointer added to task struct > > - arbitrary number of callouts > > - support for callout priority > > - each callout can have private data (ex., embed task_notifier in larger > > structure) but general callout mechanism is unaware of the data > > > > Loadable modules need to reference-count being in a notifier list & prevent > > unloading if the reference-count is non-zero. Modules that use task_notifiers > > are not visible to anything in the kernel if no task_notifiers are currently > > registered. > > > > We have a couple of modules that use the current PAGG callout > > mechanism. I converted one of these users (dplace) to use the new mechanism. > > The conversion effort was trivial. We have additional modules that use PAGG. > > These have not yet been analyzed to see if task_notifiers are sufficient > > but it looks promising. > > > > I'm proposing this patch to see if other users of pagg / pnotify could > > use this mechanism instead. The mechanism is lightweight, small & unintrusive. > > Does this look like something that would be of general use. > > > > > > Here is a trial patch against a SUSE kernel that also has the PAGG patch applied. > > I added callouts in fork(), exec(), & exit() at the same point that PAGG > > added callouts. Task_notifier callouts can be added in other places as needed. > > This looks much better than pnotify for the purposes of the Process Events Connector. However it still adds per-task list overhead that the connector currently does not need since it's interested in all tasks. > > > > > > fs/exec.c | 2 > > include/linux/init_task.h | 1 > > include/linux/sched.h | 2 > > include/linux/task_notifier.h | 56 ++++++++++++++++++++ > > kernel/Makefile | 2 > > kernel/exit.c | 2 > > kernel/fork.c | 4 + > > kernel/task_notifier.c | 113 ++++++++++++++++++++++++++++++++++++++++++ nit: What happened to the diffstat summary line? Sometimes the histogram can't represent the small number of removals that took place... > > Index: linux/include/linux/task_notifier.h > > =================================================================== > > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > > +++ linux/include/linux/task_notifier.h 2005-10-05 13:06:33.994660706 -0500 > > @@ -0,0 +1,55 @@ > > +/* > > + * This file is subject to the terms and conditions of the GNU General Public > > + * License. See the file "COPYING" in the main directory of this archive > > + * for more details. > > + * > > + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. > > + * > > + * > > + * Routines to manage task notifier chains for passing task state changes to any > > + * interested routines. > > + * > > + */ > > + > > +#ifndef _LINUX_TASK_NOTIFIER_H > > +#define _LINUX_TASK_NOTIFIER_H > > +#include <linux/compiler.h> > > +#include <linux/types.h> > > +#include <linux/sched.h> > > +#include <asm/current.h> > > + > > +struct task_notifier_block; > > + > > +typedef void (task_notifier_func)(unsigned int, void *, struct task_notifier_block *nb); > > + > > +struct task_notifier_block > > +{ > > + task_notifier_func *task_notifier_func; > > + struct task_notifier_block *next; I think the lists should use the common kernel list macros and idioms. > > + int priority; The implied use of the "priority" field bugs me. It appears "priority" is meant to establish a static ordering between notifiers. This assumes consistent use of "priority" by related notifiers. I think this coordination is better performed explicitly and entirely by the related notifiers rather than involving the task_notifier interface. > > + unsigned int id_select; From what I could see this is not really an id so much as a mask. Perhaps a better name could be found. I'm partial to "event_mask". The fact that it is a mask suggests to me that the "id" term should be dropped. Further comments on this below. > > +}; > > + > > + > > +extern void task_notifier_chain_register(struct task_notifier_block *nb); > > +extern void task_notifier_chain_register_child(struct task_notifier_block *nb, struct task_struct *p); > > +extern void task_notifier_chain_unregister(struct task_notifier_block *nb); > > +extern struct task_notifier_block *find_task_notifier_block_proc(task_notifier_func *func); > > +extern void _task_notifier_call_chain(unsigned int id, void *arg); > > +static inline void task_notifier_call_chain(unsigned int id, void *arg) > > +{ > > + if (unlikely(current->task_notifier)) > > + _task_notifier_call_chain(id, arg); > > +} The signature of the task_notifier_call_chain() function does not support delta-style notifications. See my next comment. > > +/* > > + * Notifier identifiers - used as bitmask for selections > > + */ > > + > > +#define TN_FORK 0x00000001 > > +#define TN_EXEC 0x00000002 > > +#define TN_EXIT 0x00000004 I think there should be a TN_DESTROY to distinguish between a true exit and when the notifier block associated with a task is destroyed. TN_DESTROY might happen by itself if copy_process fails, but TN_EXIT will always be accompanied by TN_DESTROY. More on this in many of my other comments. The Process Events Connector patch is also interested in ID changes (uid, gid, euid, egid, etc.). In order to fully represent the change event I've chosen to report the previous ID and the new ID. The connector can't use task_notifiers for all its events because there is no means of passing both values. > > +#endif /* _LINUX_TASK_NOTIFIER_H */ > > Index: linux/kernel/task_notifier.c > > =================================================================== > > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > > +++ linux/kernel/task_notifier.c 2005-10-05 13:06:52.217418263 -0500 > > @@ -0,0 +1,111 @@ > > +/* > > + * This file is subject to the terms and conditions of the GNU General Public > > + * License. See the file "COPYING" in the main directory of this archive > > + * for more details. > > + * > > + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. > > + */ > > + > > +#include <linux/config.h> > > +#include <linux/module.h> > > +#include <linux/task_notifier.h> > > +#include <asm/current.h> > > + > > +/** > > + * task_notifier_chain_register_task - Add a task_notifier to a task. > > + * @n: Entry to be added to the task notifier chain I assume you meant @nb > > + * @p: Task (must be current or a child being created during fork/clone > > + */ > > +static inline void task_notifier_chain_register_task(struct task_notifier_block > > + *nb, struct task_struct *p) > > +{ > > + struct task_notifier_block **list = > > + (struct task_notifier_block **)&p->task_notifier; > > + > > + while (*list && nb->priority > (*list)->priority) > > + list = &((*list)->next); > > + nb->next = *list; > > + *list = nb; > > +} > > + > > +/** > > + * task_notifier_chain_register_child - Add a task_notifier to a task during fork/clone. > > + * @n: Entry to be added to the task notifier chain I assume you meant @nb > > + * @p: Task (must be current or a child being created during fork/clone > > + */ > > +void task_notifier_chain_register_child(struct task_notifier_block *nb, > > + struct task_struct *p) > > +{ > > + task_notifier_chain_register_task(nb, p); > > +} > > + > > +/** > > + * task_notifier_chain_register - Add a task_notifier to the current task > > + * @n: Entry to be added to the task notifier chain I assume you meant @nb > > + */ > > +void task_notifier_chain_register(struct task_notifier_block *nb) > > +{ > > + task_notifier_chain_register_task(nb, current); > > +} > > + > > +/** > > + * task_notifier_chain_unregister - Remove notifier from the current task > > + * @nb: Entry in notifier chain > > + * > > + */ > > + > > +void task_notifier_chain_unregister(struct task_notifier_block *nb) > > +{ > > + struct task_notifier_block **list = > > + (struct task_notifier_block **)¤t->task_notifier; > > + > > + while (*list && *list != nb) > > + list = &((*list)->next); > > + > > + if (*list == nb) > > + *list = nb->next; > > +} The notifier removes itself from current's list in the context of the current process. Doesn't this mean that each notifier would need to get called for TN_EXIT (or TN_DESTROY) in order to decrement module reference counts correctly? Perhaps you ought to check for TN_EXIT (or TN_DESTROY) with the "id" during registration and either exit with an error or issue a warning. Automatically adding TN_EXIT seems to be a bad idea however. This would lock modules in the kernel until referring processes exit and hence may confuse/annoy users by preventing them from removing modules. > > +/** > > + * find_task_notifier_block_proc - Scan the current task's notifiers for a > > + * task_notifier_block with a pointer to @func > > + * @func: Function > > + * > > + */ > > + > > +struct task_notifier_block *find_task_notifier_block_proc(task_notifier_func * > > + func) > > +{ > > + struct task_notifier_block *nb = current->task_notifier; > > + > > + while (nb && nb->task_notifier_func != func) > > + nb = nb->next; > > + > > + return nb; > > +} > > + > > +/** > > + * task_notifier_call_chain - Call functions in a notifier chain > > + * @id: Callout identifier id doesn't really identify the "callout" -- it identifies the reason the task notifier function is being called, and it really should be considered a mask. This function could be used to report multiple events with a single "call". I think this is a useful feature that a mask offers at no additional expense. For instance, if you add TN_DESTROY, then you will always report TN_DESTROY with TN_EXIT but not always TN_EXIT with TN_DESTROY (see my comments on copy_process). By |ing the events together you can report both events with a single call to a task_notifier_func. > > + * @arg: Argument to pass to called fumctions > > + * > > + * Calls each function in a notifier chain in turn. > > + * > > + */ > > + > > +void _task_notifier_call_chain(unsigned int id, void *arg) > > +{ > > + struct task_notifier_block *nb, *next_nb = current->task_notifier; > > + > > + while ((nb = next_nb)) { > > + next_nb = nb->next; > > + if (nb->id_select & id) > > + nb->task_notifier_func(id, arg, nb); > > + } > > +} > > + > > +EXPORT_SYMBOL_GPL(task_notifier_chain_register); > > +EXPORT_SYMBOL_GPL(task_notifier_chain_register_child); > > +EXPORT_SYMBOL_GPL(task_notifier_chain_unregister); > > +EXPORT_SYMBOL_GPL(find_task_notifier_block_proc); > > +EXPORT_SYMBOL_GPL(task_notifier_call_chain); > > Index: linux/fs/exec.c > > =================================================================== > > --- linux.orig/fs/exec.c 2005-09-17 09:25:56.990242412 -0500 > > +++ linux/fs/exec.c 2005-10-05 13:11:29.621874069 -0500 > > @@ -51,6 +51,7 @@ > > #include <linux/audit.h> > > #include <linux/trigevent_hooks.h> > > #include <linux/pagg.h> > > +#include <linux/task_notifier.h> > > #include <linux/acct.h> > > > > #include <asm/uaccess.h> > > @@ -1209,6 +1210,7 @@ int do_execve(char * filename, > > > > free_arg_pages(&bprm); > > pagg_exec(current); > > + task_notifier_call_chain(TN_EXEC, NULL); /* should file or bprm be passed??? */ It should be something besides NULL since that just wastes the parameter :) Whatever gets passed should lead unambiguously to the exec'ing task. I think that the task struct pointer should be reachable within notifiers without relying on "current". Passing in the task directly would be consistent with the other notifiers, and it would allow you to specify exactly the prototype of the notifier function -- no more void*. > > /* execve success */ > > security_bprm_free(&bprm); > > Index: linux/include/linux/init_task.h > > =================================================================== > > --- linux.orig/include/linux/init_task.h 2005-09-17 09:25:22.922404438 -0500 > > +++ linux/include/linux/init_task.h 2005-09-17 09:45:28.637217785 -0500 > > @@ -115,6 +115,7 @@ extern struct group_info init_groups; > > .journal_info = NULL, \ > > .map_base = __TASK_UNMAPPED_BASE, \ > > .io_wait = NULL, \ > > + .task_notifier = NULL, \ > > INIT_TASK_PAGG(tsk) \ > > } > > > > Index: linux/include/linux/sched.h > > =================================================================== > > --- linux.orig/include/linux/sched.h 2005-09-17 09:25:56.999030585 -0500 > > +++ linux/include/linux/sched.h 2005-09-17 10:02:23.314663901 -0500 > > @@ -601,7 +601,7 @@ struct task_struct { > > struct list_head pagg_list; > > struct rw_semaphore pagg_sem; > > #endif > > - > > + void *task_notifier; This apparently has to be a void* because struct task_notifier_block is not defined. Perhaps including the header and using the type is best. > > }; > > > > static inline pid_t process_group(struct task_struct *tsk) > > Index: linux/kernel/exit.c > > =================================================================== > > --- linux.orig/kernel/exit.c 2005-09-17 09:25:57.001959975 -0500 > > +++ linux/kernel/exit.c 2005-09-17 09:59:17.263210664 -0500 > > @@ -31,6 +31,7 @@ > > #include <linux/trigevent_hooks.h> > > #include <linux/ltt.h> > > #include <linux/pagg.h> > > +#include <linux/task_notifier.h> > > > > #include <asm/uaccess.h> > > #include <asm/pgtable.h> > > @@ -863,6 +864,7 @@ asmlinkage NORET_TYPE void do_exit(long > > > > tsk->exit_code = code; > > pagg_detach(tsk); > > + task_notifier_call_chain(TN_EXIT, NULL); Why not pass in tsk instead of NULL? Is tsk guaranteed to be == current in this case? Seems to me it would be best to be consistent and simply pass tsk in. I think the "id" should be TN_EXIT|TN_DESTROY. See next comment. > > exit_notify(tsk); > > BUG_ON(!(current->flags & PF_DEAD)); > > schedule(); > > Index: linux/kernel/fork.c > > =================================================================== > > --- linux.orig/kernel/fork.c 2005-09-17 09:25:57.884683056 -0500 > > +++ linux/kernel/fork.c 2005-10-07 09:15:25.285049535 -0500 > > @@ -37,6 +37,7 @@ > > #include <linux/trigevent_hooks.h> > > #include <linux/cpuset.h> > > #include <linux/pagg.h> > > +#include <linux/task_notifier.h> > > #include <linux/acct.h> > > > > #include <linux/ckrm.h> > > @@ -1065,6 +1066,8 @@ struct task_struct *copy_process(unsigne > > * process aggregate containers as the parent process. > > */ > > pagg_attach(p, current); > > + p->task_notifier = NULL; > > + task_notifier_call_chain(TN_FORK, p); > > > > /* > > * Ok, make it visible to the rest of the system. > > @@ -1150,6 +1153,7 @@ fork_out: > > > > bad_fork_cleanup_namespace: > > pagg_detach(p); > > + task_notifier_call_chain(TN_EXIT, NULL); /* should this be a unique TN_xxx id?? */ I think this should be a different ID because it's not an exit -- the process never really started in the first place. I think it should be something like TN_DESTROY. Then copy_process reports TN_DESTROY while do_exit reports both TN_EXIT and TN_DESTROY. > > exit_namespace(p); > > bad_fork_cleanup_mm: > > exit_mm(p); > > Index: linux/kernel/Makefile > > =================================================================== > > --- linux.orig/kernel/Makefile 2005-09-17 09:25:27.373125449 -0500 > > +++ linux/kernel/Makefile 2005-09-17 10:23:09.740024850 -0500 > > @@ -7,7 +7,7 @@ obj-y = sched.o fork.o exec_domain.o > > sysctl.o capability.o ptrace.o timer.o user.o \ > > signal.o sys.o kmod.o workqueue.o pid.o \ > > rcupdate.o intermodule.o extable.o params.o posix-timers.o \ > > - kthread.o ckrm/ > > + kthread.o ckrm/ task_notifier.o > > > > obj-$(CONFIG_FUTEX) += futex.o > > obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o > > Cheers, -Matt Helsley |
From: Jack S. <st...@sg...> - 2005-10-11 03:21:04
|
On Mon, Oct 10, 2005 at 03:37:08PM -0700, Chandra Seetharaman wrote: > Jack, > > Forewarning: I am looking to get the least common functionality > separated. > > It looks simpler than the earlier version and has less overhead. But, > not everybody that would use this functionality would need the callout > list to be maintained in the task and having the overhead of copying it > over to a new process. For example, CKRM and Process Event Notifier > would use this feature but don't need it to be maintained per task. Interesting. That points out our different environments. For many of the HPC workloads that use the PAGG/pnotify/task-notifiers, keeping the lists in the task_struct is a performance win because only a small percentage of tasks actually have non-null lists. And most that do are long running tasks. A global list adds more overhead & has locking issues.. Am I making a mistake trying to create a single mechanism that all can use? Do we need different but similar mechanisms? Current proposals include: - task_notifiers: per-task list, no locks, restricted to own-task attachment - pnotify/PAGG: per-task list, locks, no restrictions on task attachment - global list: nothing in task_struct, global lock, no restrictions on task attachment Hmmmm.... > > Can we make this module an in-kernel task event notifier and move the > functionality that is needed by PAGG to its module ? > > regards, > > chandra > > On Fri, 2005-10-07 at 09:26 -0500, Jack Steiner wrote: > > Here is an alternate proposal for a PAGG / pnotifier mechanism. This > > mechanism uses a new "task notifier" that is task specific. Unlike some > > of the other callout mechanisms, callouts are attached to specific tasks. > > There is no global callout list. Task notifiers are optional callouts > > that can be registered on a per-task basic. > > > > The task_notifier mechanism does not provide all of the capabilities of PAGG. > > Specifically, a task cannot register a task_notifier to an arbitrary task. > > Registration is supported only against 1) the current task, or 2) a child > > task during a clone. There is no support for adding notifiers to an > > arbitrary task. > > > > Interesting points: > > - no locks > > - no global data maintained by the notifier mechanism > > - 1 void* pointer added to task struct > > - arbitrary number of callouts > > - support for callout priority > > - each callout can have private data (ex., embed task_notifier in larger > > structure) but general callout mechanism is unaware of the data > > > > Loadable modules need to reference-count being in a notifier list & prevent > > unloading if the reference-count is non-zero. Modules that use task_notifiers > > are not visible to anything in the kernel if no task_notifiers are currently > > registered. > > > > We have a couple of modules that use the current PAGG callout > > mechanism. I converted one of these users (dplace) to use the new mechanism. > > The conversion effort was trivial. We have additional modules that use PAGG. > > These have not yet been analyzed to see if task_notifiers are sufficient > > but it looks promising. > > > > I'm proposing this patch to see if other users of pagg / pnotify could > > use this mechanism instead. The mechanism is lightweight, small & unintrusive. > > Does this look like something that would be of general use. > > > > > > Here is a trial patch against a SUSE kernel that also has the PAGG patch applied. > > I added callouts in fork(), exec(), & exit() at the same point that PAGG > > added callouts. Task_notifier callouts can be added in other places as needed. > > > > > > > > fs/exec.c | 2 > > include/linux/init_task.h | 1 > > include/linux/sched.h | 2 > > include/linux/task_notifier.h | 56 ++++++++++++++++++++ > > kernel/Makefile | 2 > > kernel/exit.c | 2 > > kernel/fork.c | 4 + > > kernel/task_notifier.c | 113 ++++++++++++++++++++++++++++++++++++++++++ > > > > Index: linux/include/linux/task_notifier.h > > =================================================================== > > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > > +++ linux/include/linux/task_notifier.h 2005-10-05 13:06:33.994660706 -0500 > > @@ -0,0 +1,55 @@ > > +/* > > + * This file is subject to the terms and conditions of the GNU General Public > > + * License. See the file "COPYING" in the main directory of this archive > > + * for more details. > > + * > > + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. > > + * > > + * > > + * Routines to manage task notifier chains for passing task state changes to any > > + * interested routines. > > + * > > + */ > > + > > +#ifndef _LINUX_TASK_NOTIFIER_H > > +#define _LINUX_TASK_NOTIFIER_H > > +#include <linux/compiler.h> > > +#include <linux/types.h> > > +#include <linux/sched.h> > > +#include <asm/current.h> > > + > > +struct task_notifier_block; > > + > > +typedef void (task_notifier_func)(unsigned int, void *, struct task_notifier_block *nb); > > + > > +struct task_notifier_block > > +{ > > + task_notifier_func *task_notifier_func; > > + struct task_notifier_block *next; > > + int priority; > > + unsigned int id_select; > > +}; > > + > > + > > +extern void task_notifier_chain_register(struct task_notifier_block *nb); > > +extern void task_notifier_chain_register_child(struct task_notifier_block *nb, struct task_struct *p); > > +extern void task_notifier_chain_unregister(struct task_notifier_block *nb); > > +extern struct task_notifier_block *find_task_notifier_block_proc(task_notifier_func *func); > > +extern void _task_notifier_call_chain(unsigned int id, void *arg); > > + > > +static inline void task_notifier_call_chain(unsigned int id, void *arg) > > +{ > > + if (unlikely(current->task_notifier)) > > + _task_notifier_call_chain(id, arg); > > +} > > + > > + > > +/* > > + * Notifier identifiers - used as bitmask for selections > > + */ > > + > > +#define TN_FORK 0x00000001 > > +#define TN_EXEC 0x00000002 > > +#define TN_EXIT 0x00000004 > > + > > +#endif /* _LINUX_TASK_NOTIFIER_H */ > > Index: linux/kernel/task_notifier.c > > =================================================================== > > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > > +++ linux/kernel/task_notifier.c 2005-10-05 13:06:52.217418263 -0500 > > @@ -0,0 +1,111 @@ > > +/* > > + * This file is subject to the terms and conditions of the GNU General Public > > + * License. See the file "COPYING" in the main directory of this archive > > + * for more details. > > + * > > + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. > > + */ > > + > > +#include <linux/config.h> > > +#include <linux/module.h> > > +#include <linux/task_notifier.h> > > +#include <asm/current.h> > > + > > +/** > > + * task_notifier_chain_register_task - Add a task_notifier to a task. > > + * @n: Entry to be added to the task notifier chain > > + * @p: Task (must be current or a child being created during fork/clone > > + */ > > +static inline void task_notifier_chain_register_task(struct task_notifier_block > > + *nb, struct task_struct *p) > > +{ > > + struct task_notifier_block **list = > > + (struct task_notifier_block **)&p->task_notifier; > > + > > + while (*list && nb->priority > (*list)->priority) > > + list = &((*list)->next); > > + nb->next = *list; > > + *list = nb; > > +} > > + > > +/** > > + * task_notifier_chain_register_child - Add a task_notifier to a task during fork/clone. > > + * @n: Entry to be added to the task notifier chain > > + * @p: Task (must be current or a child being created during fork/clone > > + */ > > +void task_notifier_chain_register_child(struct task_notifier_block *nb, > > + struct task_struct *p) > > +{ > > + task_notifier_chain_register_task(nb, p); > > +} > > + > > +/** > > + * task_notifier_chain_register - Add a task_notifier to the current task > > + * @n: Entry to be added to the task notifier chain > > + */ > > +void task_notifier_chain_register(struct task_notifier_block *nb) > > +{ > > + task_notifier_chain_register_task(nb, current); > > +} > > + > > +/** > > + * task_notifier_chain_unregister - Remove notifier from the current task > > + * @nb: Entry in notifier chain > > + * > > + */ > > + > > +void task_notifier_chain_unregister(struct task_notifier_block *nb) > > +{ > > + struct task_notifier_block **list = > > + (struct task_notifier_block **)¤t->task_notifier; > > + > > + while (*list && *list != nb) > > + list = &((*list)->next); > > + > > + if (*list == nb) > > + *list = nb->next; > > +} > > + > > +/** > > + * find_task_notifier_block_proc - Scan the current task's notifiers for a > > + * task_notifier_block with a pointer to @func > > + * @func: Function > > + * > > + */ > > + > > +struct task_notifier_block *find_task_notifier_block_proc(task_notifier_func * > > + func) > > +{ > > + struct task_notifier_block *nb = current->task_notifier; > > + > > + while (nb && nb->task_notifier_func != func) > > + nb = nb->next; > > + > > + return nb; > > +} > > + > > +/** > > + * task_notifier_call_chain - Call functions in a notifier chain > > + * @id: Callout identifier > > + * @arg: Argument to pass to called fumctions > > + * > > + * Calls each function in a notifier chain in turn. > > + * > > + */ > > + > > +void _task_notifier_call_chain(unsigned int id, void *arg) > > +{ > > + struct task_notifier_block *nb, *next_nb = current->task_notifier; > > + > > + while ((nb = next_nb)) { > > + next_nb = nb->next; > > + if (nb->id_select & id) > > + nb->task_notifier_func(id, arg, nb); > > + } > > +} > > + > > +EXPORT_SYMBOL_GPL(task_notifier_chain_register); > > +EXPORT_SYMBOL_GPL(task_notifier_chain_register_child); > > +EXPORT_SYMBOL_GPL(task_notifier_chain_unregister); > > +EXPORT_SYMBOL_GPL(find_task_notifier_block_proc); > > +EXPORT_SYMBOL_GPL(task_notifier_call_chain); > > Index: linux/fs/exec.c > > =================================================================== > > --- linux.orig/fs/exec.c 2005-09-17 09:25:56.990242412 -0500 > > +++ linux/fs/exec.c 2005-10-05 13:11:29.621874069 -0500 > > @@ -51,6 +51,7 @@ > > #include <linux/audit.h> > > #include <linux/trigevent_hooks.h> > > #include <linux/pagg.h> > > +#include <linux/task_notifier.h> > > #include <linux/acct.h> > > > > #include <asm/uaccess.h> > > @@ -1209,6 +1210,7 @@ int do_execve(char * filename, > > > > free_arg_pages(&bprm); > > pagg_exec(current); > > + task_notifier_call_chain(TN_EXEC, NULL); /* should file or bprm be passed??? */ > > > > /* execve success */ > > security_bprm_free(&bprm); > > Index: linux/include/linux/init_task.h > > =================================================================== > > --- linux.orig/include/linux/init_task.h 2005-09-17 09:25:22.922404438 -0500 > > +++ linux/include/linux/init_task.h 2005-09-17 09:45:28.637217785 -0500 > > @@ -115,6 +115,7 @@ extern struct group_info init_groups; > > .journal_info = NULL, \ > > .map_base = __TASK_UNMAPPED_BASE, \ > > .io_wait = NULL, \ > > + .task_notifier = NULL, \ > > INIT_TASK_PAGG(tsk) \ > > } > > > > Index: linux/include/linux/sched.h > > =================================================================== > > --- linux.orig/include/linux/sched.h 2005-09-17 09:25:56.999030585 -0500 > > +++ linux/include/linux/sched.h 2005-09-17 10:02:23.314663901 -0500 > > @@ -601,7 +601,7 @@ struct task_struct { > > struct list_head pagg_list; > > struct rw_semaphore pagg_sem; > > #endif > > - > > + void *task_notifier; > > }; > > > > static inline pid_t process_group(struct task_struct *tsk) > > Index: linux/kernel/exit.c > > =================================================================== > > --- linux.orig/kernel/exit.c 2005-09-17 09:25:57.001959975 -0500 > > +++ linux/kernel/exit.c 2005-09-17 09:59:17.263210664 -0500 > > @@ -31,6 +31,7 @@ > > #include <linux/trigevent_hooks.h> > > #include <linux/ltt.h> > > #include <linux/pagg.h> > > +#include <linux/task_notifier.h> > > > > #include <asm/uaccess.h> > > #include <asm/pgtable.h> > > @@ -863,6 +864,7 @@ asmlinkage NORET_TYPE void do_exit(long > > > > tsk->exit_code = code; > > pagg_detach(tsk); > > + task_notifier_call_chain(TN_EXIT, NULL); > > exit_notify(tsk); > > BUG_ON(!(current->flags & PF_DEAD)); > > schedule(); > > Index: linux/kernel/fork.c > > =================================================================== > > --- linux.orig/kernel/fork.c 2005-09-17 09:25:57.884683056 -0500 > > +++ linux/kernel/fork.c 2005-10-07 09:15:25.285049535 -0500 > > @@ -37,6 +37,7 @@ > > #include <linux/trigevent_hooks.h> > > #include <linux/cpuset.h> > > #include <linux/pagg.h> > > +#include <linux/task_notifier.h> > > #include <linux/acct.h> > > > > #include <linux/ckrm.h> > > @@ -1065,6 +1066,8 @@ struct task_struct *copy_process(unsigne > > * process aggregate containers as the parent process. > > */ > > pagg_attach(p, current); > > + p->task_notifier = NULL; > > + task_notifier_call_chain(TN_FORK, p); > > > > /* > > * Ok, make it visible to the rest of the system. > > @@ -1150,6 +1153,7 @@ fork_out: > > > > bad_fork_cleanup_namespace: > > pagg_detach(p); > > + task_notifier_call_chain(TN_EXIT, NULL); /* should this be a unique TN_xxx id?? */ > > exit_namespace(p); > > bad_fork_cleanup_mm: > > exit_mm(p); > > Index: linux/kernel/Makefile > > =================================================================== > > --- linux.orig/kernel/Makefile 2005-09-17 09:25:27.373125449 -0500 > > +++ linux/kernel/Makefile 2005-09-17 10:23:09.740024850 -0500 > > @@ -7,7 +7,7 @@ obj-y = sched.o fork.o exec_domain.o > > sysctl.o capability.o ptrace.o timer.o user.o \ > > signal.o sys.o kmod.o workqueue.o pid.o \ > > rcupdate.o intermodule.o extable.o params.o posix-timers.o \ > > - kthread.o ckrm/ > > + kthread.o ckrm/ task_notifier.o > > > > obj-$(CONFIG_FUTEX) += futex.o > > obj-$(CONFIG_GENERIC_ISA_DMA) += dma.o > > > -- > > ---------------------------------------------------------------------- > Chandra Seetharaman | Be careful what you choose.... > - sek...@us... | .......you may get it. > ---------------------------------------------------------------------- > -- Thanks Jack Steiner (st...@sg...) 651-683-5302 Principal Engineer SGI - Silicon Graphics, Inc. |
From: Matt H. <mat...@us...> - 2005-10-11 05:00:22
|
On Mon, 2005-10-10 at 22:21 -0500, Jack Steiner wrote: > On Mon, Oct 10, 2005 at 03:37:08PM -0700, Chandra Seetharaman wrote: > > Jack, > > > > Forewarning: I am looking to get the least common functionality > > separated. > > > > It looks simpler than the earlier version and has less overhead. But, > > not everybody that would use this functionality would need the callout > > list to be maintained in the task and having the overhead of copying it > > over to a new process. For example, CKRM and Process Event Notifier > > would use this feature but don't need it to be maintained per task. > > Interesting. That points out our different environments. For many of the > HPC workloads that use the PAGG/pnotify/task-notifiers, keeping the lists > in the task_struct is a performance win because only a small percentage > of tasks actually have non-null lists. And most that do are long > running tasks. A global list adds more overhead & has locking issues.. > > Am I making a mistake trying to create a single mechanism that all can > use? Do we need different but similar mechanisms? Current proposals include: > > - task_notifiers: per-task list, no locks, restricted to own-task attachment > - pnotify/PAGG: per-task list, locks, no restrictions on task attachment > - global list: nothing in task_struct, global lock, no restrictions on task attachment > > Hmmmm.... Unfortunately nothing else I've thought of so far beats calling the code directly -- which avoids locking and per-task data structures. Might it be possible to manage a global task_notifier list without a lock? Perhaps using RCU? In the case of RCU I think we'd have to disallow self-removal within notifier functions on the global list. Addition and removal would take place at module load/unload time and be protected with a spinlock while the read and calling of the notifier function would happen within an RCU read lock. However, I'm always skeptical of my understanding of RCU's (mis)uses so I'm not sure if this would work correctly. Comments? Cheers, -Matt Helsley The patch I've trimmed from this reply is at: http://marc.theaimsgroup.com/?l=lse-tech&m=112869558116290&q=raw |
From: Chandra S. <sek...@us...> - 2005-10-11 18:47:20
|
On Mon, 2005-10-10 at 22:21 -0500, Jack Steiner wrote: > On Mon, Oct 10, 2005 at 03:37:08PM -0700, Chandra Seetharaman wrote: > > Jack, > > > > Forewarning: I am looking to get the least common functionality > > separated. > > > > It looks simpler than the earlier version and has less overhead. But, > > not everybody that would use this functionality would need the callout > > list to be maintained in the task and having the overhead of copying it > > over to a new process. For example, CKRM and Process Event Notifier > > would use this feature but don't need it to be maintained per task. > > Interesting. That points out our different environments. For many of the > HPC workloads that use the PAGG/pnotify/task-notifiers, keeping the lists > in the task_struct is a performance win because only a small percentage > of tasks actually have non-null lists. And most that do are long > running tasks. A global list adds more overhead & has locking issues.. Let me make it clear that i am not advocating PAGG to drop its valuable features in order to achieve the common ground. What i am trying to do is to move the minimalistic functionality required by all of us into a smaller patch with little or no overhead while moving the overhead to the corresponding module. For example, PAGG can maintain its callout list in the task data structure and call the callouts from the global callout function, if it is present. Agreed that it will be a function call overhead if there are no callout registered for that task. PAGG don't have to drop the feature of attaching the notifier list to any random task. > > Am I making a mistake trying to create a single mechanism that all can > use? Do we need different but similar mechanisms? Current proposals include: > > - task_notifiers: per-task list, no locks, restricted to own-task attachment > - pnotify/PAGG: per-task list, locks, no restrictions on task attachment > - global list: nothing in task_struct, global lock, no restrictions on task attachment > > Hmmmm.... We can use rcu(as Matt suggested) to reduce the global locking overhead as we are only concerned about the notifiers disappearing from the linked list when the callouts are being called. FYI... kernel/auditsc.c has a similar usage with _no_ spin locks involved. I will write up some code and post it today. > > > > > Can we make this module an in-kernel task event notifier and move the > > functionality that is needed by PAGG to its module ? > > > > regards, > > > > chandra > > The patch I've trimmed from this reply is at: http://marc.theaimsgroup.com/?l=lse-tech&m=112869558116290&q=raw -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Chandra S. <sek...@us...> - 2005-10-12 02:49:40
|
On Tue, 2005-10-11 at 11:47 -0700, Chandra Seetharaman wrote: > I will write up some code and post it today. Here is the code. Do let me know how it stands. Task Event notifier On exec, fork, exit, real/effective gid/uid changes notify the registered modules. fs/exec.c | 2 include/linux/task_notify.h | 44 +++++++++++++++++++++ init/Kconfig | 10 ++++ kernel/Makefile | 1 kernel/exit.c | 2 kernel/fork.c | 2 kernel/sys.c | 12 +++++ kernel/task_notify.c | 89 ++++++++++++++++++++++++++++++++++++ ++++++++ 8 files changed, 162 insertions(+) Index: l2613-t_notify/fs/exec.c =================================================================== --- l2613-t_notify.orig/fs/exec.c +++ l2613-t_notify/fs/exec.c @@ -48,6 +48,7 @@ #include <linux/syscalls.h> #include <linux/rmap.h> #include <linux/acct.h> +#include <linux/task_notify.h> #include <asm/uaccess.h> #include <asm/mmu_context.h> @@ -1104,6 +1105,7 @@ int search_binary_handler(struct linux_b fput(bprm->file); bprm->file = NULL; current->did_exec = 1; + task_notify_event(TASK_NOTIFY_EXEC, current); return retval; } read_lock(&binfmt_lock); Index: l2613-t_notify/include/linux/task_notify.h =================================================================== --- /dev/null +++ l2613-t_notify/include/linux/task_notify.h @@ -0,0 +1,44 @@ +/* + * task_notify.h - Task event notification + * + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 + * + * Provides the interface and macros for managing in-kernel task + * notification. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + * + */ + +#ifndef _LINUX_TASK_NOTIFY_H +#define _LINUX_TASK_NOTIFY_H + +#define TASK_NOTIFY_FORK 0x00000001 +#define TASK_NOTIFY_EXEC 0x00000002 +#define TASK_NOTIFY_EXIT 0x00000004 +#define TASK_NOTIFY_UID 0x00000008 +#define TASK_NOTIFY_GID 0x00000010 + +#define TASK_NOTIFY_ALL 0x0000001f + +#ifdef CONFIG_TASK_NOTIFY + +typedef void (task_notifier_func)(unsigned int, struct task_struct *); + +extern int task_notify_register(task_notifier_func *, unsigned int); +extern void task_notify_unregister(task_notifier_func *); +extern void task_notify_event(unsigned int event, struct task_struct *task); + +#else /* !CONFIG_TASK_NOTIFY */ + +static inline void task_notify_event(unsigned int i, struct task_struct *t) {} + +#endif /* CONFIG_TASK_NOTIFY */ + +#endif /* _LINUX_TASK_NOTIFY_H */ Index: l2613-t_notify/init/Kconfig =================================================================== --- l2613-t_notify.orig/init/Kconfig +++ l2613-t_notify/init/Kconfig @@ -146,6 +146,16 @@ config BSD_PROCESS_ACCT_V3 for processing it. A preliminary version of these tools is available at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>. +config TASK_NOTIFY + bool "Task Notification" + depends on EXPERIMENTAL + help + Enabling this provides a in kernel notification of task events. + Modules can register to be notified for specific task events and + get notified when any task goes thru that event. + + Say N if you are unsure. + config SYSCTL bool "Sysctl support" ---help--- Index: l2613-t_notify/kernel/task_notify.c =================================================================== --- /dev/null +++ l2613-t_notify/kernel/task_notify.c @@ -0,0 +1,89 @@ +/* task_notify.c - in-kernel task event notification + * + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 + * + * Provides API for task event registration and notification. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#include <linux/config.h> +#include <linux/module.h> +#include <linux/list.h> +#include <linux/rcupdate.h> +#include <linux/task_notify.h> + +static LIST_HEAD(notify_list); + +struct task_notifier_block { + struct list_head list; + struct rcu_head rcu; + task_notifier_func *func; + unsigned int mask; +}; + +/******************************************************************* + * Event callback invocation + *******************************************************************/ +int +task_notify_register(task_notifier_func *func, unsigned int mask) +{ + struct task_notifier_block *nb; + + if ((func == NULL) || ((mask & ~TASK_NOTIFY_ALL) != 0)) + return -EINVAL; + + nb = kmalloc(sizeof(struct task_notifier_block), GFP_KERNEL); + if (nb == NULL) + return -ENOMEM; + nb->func = func; + nb->mask = mask; + list_add_rcu(&nb->list, ¬ify_list); + return 0; +} + +static void +task_notify_free(struct rcu_head *head) +{ + struct task_notifier_block *nb = + container_of(head, struct task_notifier_block, rcu); + kfree(nb); + return; +} + +void +task_notify_unregister(task_notifier_func *func) +{ + struct task_notifier_block *nb; + + /* Do not use the _rcu iterator here */ + list_for_each_entry(nb, ¬ify_list, list) { + if (nb->func == func) { + list_del_rcu(&nb->list); + call_rcu(&nb->rcu, task_notify_free); + return; + } + } + return; +} + +EXPORT_SYMBOL_GPL(task_notify_register); +EXPORT_SYMBOL_GPL(task_notify_unregister); + +void +task_notify_event(unsigned int event, struct task_struct *task) +{ + struct task_notifier_block *nb; + + rcu_read_lock(); + list_for_each_entry_rcu(nb, ¬ify_list, list) { + if ((nb->mask & event) == event) + nb->func(event, task); + } + rcu_read_unlock(); + return; +} Index: l2613-t_notify/kernel/exit.c =================================================================== --- l2613-t_notify.orig/kernel/exit.c +++ l2613-t_notify/kernel/exit.c @@ -28,6 +28,7 @@ #include <linux/cpuset.h> #include <linux/syscalls.h> #include <linux/signal.h> +#include <linux/task_notify.h> #include <asm/uaccess.h> #include <asm/unistd.h> @@ -851,6 +852,7 @@ fastcall NORET_TYPE void do_exit(long co module_put(tsk->binfmt->module); tsk->exit_code = code; + task_notify_event(TASK_NOTIFY_EXIT, tsk); exit_notify(tsk); #ifdef CONFIG_NUMA mpol_free(tsk->mempolicy); Index: l2613-t_notify/kernel/fork.c =================================================================== --- l2613-t_notify.orig/kernel/fork.c +++ l2613-t_notify/kernel/fork.c @@ -41,6 +41,7 @@ #include <linux/profile.h> #include <linux/rmap.h> #include <linux/acct.h> +#include <linux/task_notify.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> @@ -1102,6 +1103,7 @@ static task_t *copy_process(unsigned lon __ptrace_link(p, current->parent); cpuset_fork(p); + task_notify_event(TASK_NOTIFY_FORK, p); attach_pid(p, PIDTYPE_PID, p->pid); attach_pid(p, PIDTYPE_TGID, p->tgid); Index: l2613-t_notify/kernel/Makefile =================================================================== --- l2613-t_notify.orig/kernel/Makefile +++ l2613-t_notify/kernel/Makefile @@ -30,6 +30,7 @@ obj-$(CONFIG_SYSFS) += ksysfs.o obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_SECCOMP) += seccomp.o +obj-$(CONFIG_TASK_NOTIFY) += task_notify.o ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) # According to Alan Modra <al...@li...>, the -fno-omit-frame- pointer is Index: l2613-t_notify/kernel/sys.c =================================================================== --- l2613-t_notify.orig/kernel/sys.c +++ l2613-t_notify/kernel/sys.c @@ -31,6 +31,7 @@ #include <linux/compat.h> #include <linux/syscalls.h> +#include <linux/task_notify.h> #include <asm/uaccess.h> #include <asm/io.h> @@ -583,6 +584,7 @@ asmlinkage long sys_setregid(gid_t rgid, current->egid = new_egid; current->gid = new_rgid; key_fsgid_changed(current); + task_notify_event(TASK_NOTIFY_GID, current); return 0; } @@ -622,6 +624,7 @@ asmlinkage long sys_setgid(gid_t gid) return -EPERM; key_fsgid_changed(current); + task_notify_event(TASK_NOTIFY_GID, current); return 0; } @@ -712,6 +715,8 @@ asmlinkage long sys_setreuid(uid_t ruid, key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); + return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); } @@ -759,6 +764,8 @@ asmlinkage long sys_setuid(uid_t uid) key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); + return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); } @@ -807,6 +814,8 @@ asmlinkage long sys_setresuid(uid_t ruid key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); + return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); } @@ -858,6 +867,7 @@ asmlinkage long sys_setresgid(gid_t rgid current->sgid = sgid; key_fsgid_changed(current); + task_notify_event(TASK_NOTIFY_GID, current); return 0; } @@ -900,6 +910,7 @@ asmlinkage long sys_setfsuid(uid_t uid) } key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); @@ -929,6 +940,7 @@ asmlinkage long sys_setfsgid(gid_t gid) current->fsgid = gid; key_fsgid_changed(current); } + task_notify_event(TASK_NOTIFY_GID, current); return old_fsgid; } -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Chandra S. <sek...@us...> - 2005-10-12 03:11:08
|
On Tue, 2005-10-11 at 19:49 -0700, Chandra Seetharaman wrote: > On Tue, 2005-10-11 at 11:47 -0700, Chandra Seetharaman wrote: > > > I will write up some code and post it today. > Here is the code i used to test it: kernel/task_notify.c | 25 +++++++++++++++++++++++++ 1 files changed, 25 insertions(+) Index: l2613-t_notify/kernel/task_notify.c =================================================================== --- l2613-t_notify.orig/kernel/task_notify.c +++ l2613-t_notify/kernel/task_notify.c @@ -87,3 +87,28 @@ task_notify_event(unsigned int event, st rcu_read_unlock(); return; } + +void +my_notify_func(unsigned int event, struct task_struct *tsk) +{ + printk("received event 0x%x for task <%s:%d>\n", + event, tsk->comm, tsk->pid); + return; +} + +static int __init +init_task_notify_user(void) +{ + printk("registering myfunc\n"); + return task_notify_register(my_notify_func, TASK_NOTIFY_ALL); +} + +static void __exit +exit_task_notify_user(void) +{ + printk("unregistering myfunc\n"); + task_notify_unregister(my_notify_func); +} + +module_init(init_task_notify_user); +module_exit(exit_task_notify_user); |
From: Jack S. <st...@sg...> - 2005-10-12 19:31:54
|
On Tue, Oct 11, 2005 at 07:49:21PM -0700, Chandra Seetharaman wrote: > On Tue, 2005-10-11 at 11:47 -0700, Chandra Seetharaman wrote: > > > I will write up some code and post it today. The patch looks very similar to the original task_notifier patch but has some significant changes that concern me. These changes make the patch much less useful for our needs, (although I'm sure the changes make it better suited for your needs :-) The changes that concern me are: 1) Task notifiers are registered in a global list instead of a per-task list. This makes sense if most tasks require callouts, but for our usage, only a small percentage of fork/exec/exits actually need a callout. Scanning the notifier list and calling each function on every fork/exit/exec is unnecessary overhead if it is not needed. On the otherhand, if all tasks needed the callouts (your case, I think), I can see the advantage of a global list. 2) There is no easy way to associate any per-task data with a task. The original task_notifier mechanism registered a notifier block for each task. Typically, the notifier block would be embedded in a larger container that contains the additional per-task data. Locating the task data is nothing more than a typecast (embedded in macros, of course). Nothing prevents the notifier function itself from managing per-task state in additional data structures but it is not straightforward. Although it may be able to be done without locks (not sure), there are frequently 100's of tasks per notifier function that require task data. Managing the data in a single linked list is not practical. An array, hash or tree structure is probably needed. Certainly doable, but not trivial without adding data to the task_struct. 3) The notifier functions are called from within a rcu_read_lock()/ rcu_read_unlock() block. Doesn't this mean that the notifier function cannot sleep or do anything that has the potential to sleep? That would be a significant limitation. Let me explain our typical usage of PAGG/pnotifier/task_callouts. Maybe this will help explain some of the issues and tradeoffs that concern me. (I know that your requirements are different). We currently have 4 users of the PAGG/pnotifier/task_callout mechanism. That means that at all times there would be 4 notifier functions registered in a global callout list. Of the 4 functions, 3 are infrequently used although all 4 are usually in use by at least 1 task. At any point in time, _most_ tasks typically require no notifiers. A typical system has 1000's or 10000's of tasks. At any point in time, most tasks have no need for any notifiers. There is roughly no more than 1 task per cpu that is using a notifier function although the number may be more or less than this (our big systems have 512p). Of the 4 users of the PAGG/pnotifier/task_callout mechanism, 3 require per-task data to manage additional task state. Requiring 4 separate functions to scan for per-task state when typically none exists is unnecessary overhead. For our typical usage, statistically most fork/exec/exits do not need a callout. Unless I'm overlooking something, using the newer mechanism would require the following at every fork/exec/exit: - scan the global notifier list - call 4 notifier functions - each notifier function scans it's own task-state data to see if the task needs additional work done as part of the callout - no task data found because callout was not needed - callout was a waste of time With the original task_notifier mechanism, the notifier_block pointer in most task_structs is NULL. The notifier mechanism is skipped and incurs almost zero overhead. I want to reiterate an earlier point. Although our requirements appear to very similar, it may be a mistake to try to create a _single_ mechanism that covers both cases. Maybe a better approach is to design two similar mechanisms, each optimized for a particular purpose. The 2 mechanisms should have similar APIs and could possibly share some code. > > Here is the code. Do let me know how it stands. > Task Event notifier > > On exec, fork, exit, real/effective gid/uid changes notify the > registered > modules. > > fs/exec.c | 2 > include/linux/task_notify.h | 44 +++++++++++++++++++++ > init/Kconfig | 10 ++++ > kernel/Makefile | 1 > kernel/exit.c | 2 > kernel/fork.c | 2 > kernel/sys.c | 12 +++++ > kernel/task_notify.c | 89 ++++++++++++++++++++++++++++++++++++ > ++++++++ > 8 files changed, 162 insertions(+) > > Index: l2613-t_notify/fs/exec.c > =================================================================== > --- l2613-t_notify.orig/fs/exec.c > +++ l2613-t_notify/fs/exec.c > @@ -48,6 +48,7 @@ > #include <linux/syscalls.h> > #include <linux/rmap.h> > #include <linux/acct.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/mmu_context.h> > @@ -1104,6 +1105,7 @@ int search_binary_handler(struct linux_b > fput(bprm->file); > bprm->file = NULL; > current->did_exec = 1; > + task_notify_event(TASK_NOTIFY_EXEC, current); > return retval; > } > read_lock(&binfmt_lock); > Index: l2613-t_notify/include/linux/task_notify.h > =================================================================== > --- /dev/null > +++ l2613-t_notify/include/linux/task_notify.h > @@ -0,0 +1,44 @@ > +/* > + * task_notify.h - Task event notification > + * > + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 > + * > + * Provides the interface and macros for managing in-kernel task > + * notification. > + * > + * This program is free software; you can redistribute it and/or modify > it > + * under the terms of version 2.1 of the GNU Lesser General Public > License > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it would be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > + * > + */ > + > +#ifndef _LINUX_TASK_NOTIFY_H > +#define _LINUX_TASK_NOTIFY_H > + > +#define TASK_NOTIFY_FORK 0x00000001 > +#define TASK_NOTIFY_EXEC 0x00000002 > +#define TASK_NOTIFY_EXIT 0x00000004 > +#define TASK_NOTIFY_UID 0x00000008 > +#define TASK_NOTIFY_GID 0x00000010 > + > +#define TASK_NOTIFY_ALL 0x0000001f > + > +#ifdef CONFIG_TASK_NOTIFY > + > +typedef void (task_notifier_func)(unsigned int, struct task_struct *); > + > +extern int task_notify_register(task_notifier_func *, unsigned int); > +extern void task_notify_unregister(task_notifier_func *); > +extern void task_notify_event(unsigned int event, struct task_struct > *task); > + > +#else /* !CONFIG_TASK_NOTIFY */ > + > +static inline void task_notify_event(unsigned int i, struct task_struct > *t) {} > + > +#endif /* CONFIG_TASK_NOTIFY */ > + > +#endif /* _LINUX_TASK_NOTIFY_H */ > Index: l2613-t_notify/init/Kconfig > =================================================================== > --- l2613-t_notify.orig/init/Kconfig > +++ l2613-t_notify/init/Kconfig > @@ -146,6 +146,16 @@ config BSD_PROCESS_ACCT_V3 > for processing it. A preliminary version of these tools is available > at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>. > > +config TASK_NOTIFY > + bool "Task Notification" > + depends on EXPERIMENTAL > + help > + Enabling this provides a in kernel notification of task events. > + Modules can register to be notified for specific task events and > + get notified when any task goes thru that event. > + > + Say N if you are unsure. > + > config SYSCTL > bool "Sysctl support" > ---help--- > Index: l2613-t_notify/kernel/task_notify.c > =================================================================== > --- /dev/null > +++ l2613-t_notify/kernel/task_notify.c > @@ -0,0 +1,89 @@ > +/* task_notify.c - in-kernel task event notification > + * > + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 > + * > + * Provides API for task event registration and notification. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + */ > + > +#include <linux/config.h> > +#include <linux/module.h> > +#include <linux/list.h> > +#include <linux/rcupdate.h> > +#include <linux/task_notify.h> > + > +static LIST_HEAD(notify_list); > + > +struct task_notifier_block { > + struct list_head list; > + struct rcu_head rcu; > + task_notifier_func *func; > + unsigned int mask; > +}; > + > +/******************************************************************* > + * Event callback invocation > + *******************************************************************/ > +int > +task_notify_register(task_notifier_func *func, unsigned int mask) > +{ > + struct task_notifier_block *nb; > + > + if ((func == NULL) || ((mask & ~TASK_NOTIFY_ALL) != 0)) > + return -EINVAL; > + > + nb = kmalloc(sizeof(struct task_notifier_block), GFP_KERNEL); > + if (nb == NULL) > + return -ENOMEM; > + nb->func = func; > + nb->mask = mask; > + list_add_rcu(&nb->list, ¬ify_list); > + return 0; > +} > + > +static void > +task_notify_free(struct rcu_head *head) > +{ > + struct task_notifier_block *nb = > + container_of(head, struct task_notifier_block, rcu); > + kfree(nb); > + return; > +} > + > +void > +task_notify_unregister(task_notifier_func *func) > +{ > + struct task_notifier_block *nb; > + > + /* Do not use the _rcu iterator here */ > + list_for_each_entry(nb, ¬ify_list, list) { > + if (nb->func == func) { > + list_del_rcu(&nb->list); > + call_rcu(&nb->rcu, task_notify_free); > + return; > + } > + } > + return; > +} > + > +EXPORT_SYMBOL_GPL(task_notify_register); > +EXPORT_SYMBOL_GPL(task_notify_unregister); > + > +void > +task_notify_event(unsigned int event, struct task_struct *task) > +{ > + struct task_notifier_block *nb; > + > + rcu_read_lock(); > + list_for_each_entry_rcu(nb, ¬ify_list, list) { > + if ((nb->mask & event) == event) > + nb->func(event, task); > + } > + rcu_read_unlock(); > + return; > +} > Index: l2613-t_notify/kernel/exit.c > =================================================================== > --- l2613-t_notify.orig/kernel/exit.c > +++ l2613-t_notify/kernel/exit.c > @@ -28,6 +28,7 @@ > #include <linux/cpuset.h> > #include <linux/syscalls.h> > #include <linux/signal.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/unistd.h> > @@ -851,6 +852,7 @@ fastcall NORET_TYPE void do_exit(long co > module_put(tsk->binfmt->module); > > tsk->exit_code = code; > + task_notify_event(TASK_NOTIFY_EXIT, tsk); > exit_notify(tsk); > #ifdef CONFIG_NUMA > mpol_free(tsk->mempolicy); > Index: l2613-t_notify/kernel/fork.c > =================================================================== > --- l2613-t_notify.orig/kernel/fork.c > +++ l2613-t_notify/kernel/fork.c > @@ -41,6 +41,7 @@ > #include <linux/profile.h> > #include <linux/rmap.h> > #include <linux/acct.h> > +#include <linux/task_notify.h> > > #include <asm/pgtable.h> > #include <asm/pgalloc.h> > @@ -1102,6 +1103,7 @@ static task_t *copy_process(unsigned lon > __ptrace_link(p, current->parent); > > cpuset_fork(p); > + task_notify_event(TASK_NOTIFY_FORK, p); > > attach_pid(p, PIDTYPE_PID, p->pid); > attach_pid(p, PIDTYPE_TGID, p->tgid); > Index: l2613-t_notify/kernel/Makefile > =================================================================== > --- l2613-t_notify.orig/kernel/Makefile > +++ l2613-t_notify/kernel/Makefile > @@ -30,6 +30,7 @@ obj-$(CONFIG_SYSFS) += ksysfs.o > obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ > obj-$(CONFIG_CRASH_DUMP) += crash_dump.o > obj-$(CONFIG_SECCOMP) += seccomp.o > +obj-$(CONFIG_TASK_NOTIFY) += task_notify.o > > ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) > # According to Alan Modra <al...@li...>, the -fno-omit-frame- > pointer is > Index: l2613-t_notify/kernel/sys.c > =================================================================== > --- l2613-t_notify.orig/kernel/sys.c > +++ l2613-t_notify/kernel/sys.c > @@ -31,6 +31,7 @@ > > #include <linux/compat.h> > #include <linux/syscalls.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/io.h> > @@ -583,6 +584,7 @@ asmlinkage long sys_setregid(gid_t rgid, > current->egid = new_egid; > current->gid = new_rgid; > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > @@ -622,6 +624,7 @@ asmlinkage long sys_setgid(gid_t gid) > return -EPERM; > > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > @@ -712,6 +715,8 @@ asmlinkage long sys_setreuid(uid_t ruid, > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, > LSM_SETID_RE); > } > > @@ -759,6 +764,8 @@ asmlinkage long sys_setuid(uid_t uid) > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, > LSM_SETID_ID); > } > > @@ -807,6 +814,8 @@ asmlinkage long sys_setresuid(uid_t ruid > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, > LSM_SETID_RES); > } > > @@ -858,6 +867,7 @@ asmlinkage long sys_setresgid(gid_t rgid > current->sgid = sgid; > > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > @@ -900,6 +910,7 @@ asmlinkage long sys_setfsuid(uid_t uid) > } > > key_fsuid_changed(current); > + task_notify_event(TASK_NOTIFY_UID, current); > > security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, > LSM_SETID_FS); > > @@ -929,6 +940,7 @@ asmlinkage long sys_setfsgid(gid_t gid) > current->fsgid = gid; > key_fsgid_changed(current); > } > + task_notify_event(TASK_NOTIFY_GID, current); > return old_fsgid; > } > > > -- > > ---------------------------------------------------------------------- > Chandra Seetharaman | Be careful what you choose.... > - sek...@us... | .......you may get it. > ---------------------------------------------------------------------- > -- Thanks Jack Steiner (st...@sg...) 651-683-5302 Principal Engineer SGI - Silicon Graphics, Inc. |
From: Matt H. <mat...@us...> - 2005-10-12 23:24:41
|
On Wed, 2005-10-12 at 14:31 -0500, Jack Steiner wrote: > On Tue, Oct 11, 2005 at 07:49:21PM -0700, Chandra Seetharaman wrote: > > On Tue, 2005-10-11 at 11:47 -0700, Chandra Seetharaman wrote: > > > > > I will write up some code and post it today. > > > The patch looks very similar to the original task_notifier patch but has some > significant changes that concern me. These changes make the patch much less > useful for our needs, (although I'm sure the changes make it better suited for > your needs :-) > > > The changes that concern me are: > > 1) Task notifiers are registered in a global list instead of a per-task > list. This makes sense if most tasks require callouts, but for > our usage, only a small percentage of fork/exec/exits actually > need a callout. Scanning the notifier list and calling > each function on every fork/exit/exec is unnecessary overhead > if it is not needed. On the otherhand, if all tasks needed > the callouts (your case, I think), I can see the advantage of a global list. > > 2) There is no easy way to associate any per-task data with a task. The > original task_notifier mechanism registered a notifier block for > each task. Typically, the notifier block would be embedded in a larger > container that contains the additional per-task data. Locating the > task data is nothing more than a typecast (embedded in macros, of course). > > Nothing prevents the notifier function itself from managing per-task state > in additional data structures but it is not straightforward. Although > it may be able to be done without locks (not sure), there are > frequently 100's of tasks per notifier function that require task data. > Managing the data in a single linked list is not practical. An array, hash > or tree structure is probably needed. Certainly doable, but not trivial > without adding data to the task_struct. > > 3) The notifier functions are called from within a rcu_read_lock()/ > rcu_read_unlock() block. Doesn't this mean that the notifier function > cannot sleep or do anything that has the potential to sleep? > That would be a significant limitation. > > > > Let me explain our typical usage of PAGG/pnotifier/task_callouts. Maybe this > will help explain some of the issues and tradeoffs that concern me. (I know > that your requirements are different). > > > We currently have 4 users of the PAGG/pnotifier/task_callout mechanism. That > means that at all times there would be 4 notifier functions registered in a > global callout list. Of the 4 functions, 3 are infrequently used although all 4 > are usually in use by at least 1 task. At any point in time, _most_ tasks > typically require no notifiers. > > A typical system has 1000's or 10000's of tasks. At any point in time, most > tasks have no need for any notifiers. There is roughly no more than 1 task per > cpu that is using a notifier function although the number may be more or less > than this (our big systems have 512p). > > Of the 4 users of the PAGG/pnotifier/task_callout mechanism, 3 require per-task > data to manage additional task state. Requiring 4 separate functions to > scan for per-task state when typically none exists is unnecessary overhead. > > For our typical usage, statistically most fork/exec/exits do not need a callout. > Unless I'm overlooking something, using the newer mechanism would require the > following at every fork/exec/exit: > > - scan the global notifier list > - call 4 notifier functions > - each notifier function scans it's own task-state data to see if the > task needs additional work done as part of the callout > - no task data found because callout was not needed > - callout was a waste of time > > With the original task_notifier mechanism, the notifier_block pointer in most > task_structs is NULL. The notifier mechanism is skipped and incurs almost zero > overhead. > > > I want to reiterate an earlier point. Although our requirements appear to very > similar, it may be a mistake to try to create a _single_ mechanism that covers > both cases. Maybe a better approach is to design two similar mechanisms, each > optimized for a particular purpose. The 2 mechanisms should have similar > APIs and could possibly share some code. Jack, OK, how about this patch? It's essentially an amalgam of the features of task_notifiers and task_notify based on Chandra's task_notify patch. It's also completely untested. This is just to demonstrate what I'm proposing. Cheers, -Matt Helsley -- Task Event notifier On exec, fork, exit, real/effective gid/uid changes notify the registered callers. --- Applies to 2.6.14-rc4 Documentation/task_notify.txt | 48 ++++++++ fs/exec.c | 2 include/linux/init_task.h | 1 include/linux/sched.h | 4 include/linux/task_notify.h | 53 +++++++++ init/Kconfig | 10 + kernel/Makefile | 1 kernel/exit.c | 2 kernel/fork.c | 5 kernel/sys.c | 12 ++ kernel/task_notify.c | 183 ++++++++++++++++++++++++++++++++++ 11 files changed, 321 insertions(+) Index: linux-2.6.14-rc4/fs/exec.c =================================================================== --- linux-2.6.14-rc4.orig/fs/exec.c +++ linux-2.6.14-rc4/fs/exec.c @@ -46,10 +46,11 @@ #include <linux/mount.h> #include <linux/security.h> #include <linux/syscalls.h> #include <linux/rmap.h> #include <linux/acct.h> +#include <linux/task_notify.h> #include <asm/uaccess.h> #include <asm/mmu_context.h> #ifdef CONFIG_KMOD @@ -1098,10 +1099,11 @@ int search_binary_handler(struct linux_b allow_write_access(bprm->file); if (bprm->file) fput(bprm->file); bprm->file = NULL; current->did_exec = 1; + task_notify_event(TASK_NOTIFY_EXEC, current); return retval; } read_lock(&binfmt_lock); put_binfmt(fmt); if (retval != -ENOEXEC || bprm->mm == NULL) Index: linux-2.6.14-rc4/include/linux/task_notify.h =================================================================== --- /dev/null +++ linux-2.6.14-rc4/include/linux/task_notify.h @@ -0,0 +1,53 @@ +/* + * task_notify.h - Task event notification + * + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 + * (C) Matt Helsley, IBM Corp. 2005 + * + * Provides the interface and macros for managing in-kernel task + * notification. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of version 2.1 of the GNU Lesser General Public License + * as published by the Free Software Foundation. + * + * This program is distributed in the hope that it would be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + * + */ + +#ifndef _LINUX_TASK_NOTIFY_H +#define _LINUX_TASK_NOTIFY_H + +typedef __bitwise unsigned int task_notify_event; +#define TASK_NOTIFY_FORK ((__force task_notify_event)0x00000001) +#define TASK_NOTIFY_EXEC ((__force task_notify_event)0x00000002) +#define TASK_NOTIFY_EXIT ((__force task_notify_event)0x00000004) +#define TASK_NOTIFY_DESTROY ((__force task_notify_event)0x00000008) +#define TASK_NOTIFY_UID ((__force task_notify_event)0x00000010) +#define TASK_NOTIFY_GID ((__force task_notify_event)0x00000020) +#define TASK_NOTIFY_ALL_EVENTS ((__force task_notify_event)0x0000003f) + +typedef __bitwise unsigned int task_notify_flag; +#define TASK_NOTIFY_DONT_SLEEP ((__force task_notify_flag)0x00000001) + +#ifdef CONFIG_TASK_NOTIFY +typedef void (task_notify_func)(unsigned int events, unsigned int flags, + struct task_struct * task, void *watcher_data); +extern struct task_notifier_block; + +extern int task_notify_watch_all (task_notify_func *, void*, unsigned int) +extern int task_notify_watch_task (task_notify_func *, void*, unsigned int, + struct task_struct *) +extern void task_notify_ignore_all (task_notify_func *) +extern void task_notify_ignore_task (task_notify_func *, struct task_struct *) +extern void task_notify_event(unsigned int, struct task_struct *); + +#else /* !CONFIG_TASK_NOTIFY */ + +static inline void task_notify_event(unsigned int i, struct task_struct *t) {} + +#endif /* CONFIG_TASK_NOTIFY */ + +#endif /* _LINUX_TASK_NOTIFY_H */ Index: linux-2.6.14-rc4/init/Kconfig =================================================================== --- linux-2.6.14-rc4.orig/init/Kconfig +++ linux-2.6.14-rc4/init/Kconfig @@ -160,10 +160,20 @@ config BSD_PROCESS_ACCT_V3 process and it's parent. Note that this file format is incompatible with previous v0/v1/v2 file formats, so you will need updated tools for processing it. A preliminary version of these tools is available at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>. +config TASK_NOTIFY + bool "Task Notification" + depends on EXPERIMENTAL + help + Enabling this provides a in kernel notification of task events. + Modules can register to be notified for specific task events and + get notified when any task goes thru that event. + + Say N if you are unsure. + config SYSCTL bool "Sysctl support" ---help--- The sysctl interface provides a means of dynamically changing certain kernel parameters and variables on the fly without requiring Index: linux-2.6.14-rc4/kernel/task_notify.c =================================================================== --- /dev/null +++ linux-2.6.14-rc4/kernel/task_notify.c @@ -0,0 +1,183 @@ +/* task_notify.c - in-kernel task event notification + * + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 + * (C) Matt Helsley, IBM Corp. 2005 + * + * Provides API for task event registration and notification. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + */ + +#include <linux/config.h> +#include <linux/module.h> +#include <linux/list.h> +#include <linux/rcupdate.h> +#include <linux/task_notify.h> + +static LIST_HEAD(notify_list); + +struct task_notifier_block { + struct list_head list; + union { + struct rcu_head rcu; + void *private_data; /* when not freeing via rcu -- bad idea? */ + }; + task_notify_func *func; + unsigned int event_mask; /* set of events to pass to func */ +}; + +static kmem_cache_t *nb_cache = NULL; + +void __init task_notify_init (void) +{ + /* Info on SLAB_DESTROY_BY_RCU : + * http://www.ussg.iu.edu/hypermail/linux/kernel/0407.1/1108.html */ + nb_cache = kmem_cache_create("task_notify", + sizeof(struct task_notifier_block), 0, + SLAB_KERNEL | SLAB_NO_REAP | + SLAB_DESTROY_BY_RCU, NULL, NULL); +} +core_initcall(task_notify_init); + +void __exit_call task_notify_exit (void) +{ + kmem_cache_destroy(nb_cache); + nb_cache = NULL; +} + +/* internal use only */ +#define TASK_NOTIFY_ALL_FLAGS ((__force task_notify_flag_t)0x0001) + +static inline struct task_notifier_block* +get_notifier_block (task_notify_func *func, void* private_data, + unsigned short event_mask) +{ + struct task_notifier_block *nb; + + nb = kmem_cache_alloc(nb_cache, GFP_KERNEL); + if (nb == NULL) + return NULL; + nb->private_data = private_data; + nb->func = func; + nb->event_mask = event_mask; +} + +static inline void put_notifier_block (struct task_notifier_block *nb) +{ + kmem_cache_free(nb_cache, nb); +} + +/******************************************************************* + * Event callback invocation + *******************************************************************/ + +int +task_notify_watch_all (task_notify_func *func, void* func_data, + unsigned short event_mask) +{ + struct task_notifier_block *nb; + + if ((func == NULL) || ((event_mask & ~TASK_NOTIFY_ALL_EVENTS) != 0)) + return -EINVAL; + + nb = get_notifier_block(func, func_data, + event_mask, TASK_NOTIFY_DONT_SLEEP); + if (nb == NULL) + return -ENOMEM; + list_add_rcu(&nb->list, ¬ify_list); + return 0; +} + +int +task_notify_watch_task (task_notify_func *func, void* func_data, + unsigned short event_mask, struct task_struct *task) +{ + struct task_notifier_block *nb; + + if ((func == NULL) || ((event_mask & ~TASK_NOTIFY_ALL_EVENTS) != 0)) + return -EINVAL; + + nb = get_notifier_block(func, func_data, event_mask, 0); + if (nb == NULL) + return -ENOMEM; + list_add(&nb->list, &task->task_watchers); + return 0; +} + +static void +task_notify_free (struct rcu_head *head) +{ + struct task_notifier_block *nb = + container_of(head, struct task_notifier_block, rcu); + put_notifier_block(nb); + return; +} + +void +task_notify_ignore_all (task_notify_func *func) +{ + struct task_notifier_block *nb; + void *private_data; + + /* Do not use the _rcu iterator here */ + list_for_each_entry(nb, ¬ify_list, list) { + if (nb->func != func) + continue; + list_del_rcu(&nb->list); + private_data = nb->private_data; + call_rcu(&nb->rcu, task_notify_free); + + /* Must we restrict from sleeping? */ + nb->func(0, TASK_NOTIFY_DONT_SLEEP, NULL, private_data); + return; + } + return; +} + +void +task_notify_ignore_task (task_notify_func *func, struct task_struct *task) +{ + struct task_notifier_block *nb; + + list_for_each_entry(nb, &task->task_watchers, list) { + if (nb->func != func) + continue; + list_del(&nb->list); + nb->func(0, 0, task, nb->private_data); + put_notifier_block(nb); + return; + } + return; +} + +EXPORT_SYMBOL_GPL(task_notify_watch_all); +EXPORT_SYMBOL_GPL(task_notify_watch_task); +EXPORT_SYMBOL_GPL(task_notify_ignore_all); +EXPORT_SYMBOL_GPL(task_notify_ignore_task); + +void +task_notify_event(unsigned int event, struct task_struct *task) +{ + struct task_notifier_block *nb; + + /* + * Call the per-task watchers -- use safe variant in case + * watchers want to remove themselves + */ + list_for_each_entry_safe(nb, nbn, &task->task_watchers, list) + if ((nb->event_mask & event) == event) + nb->func(event, 0, task, nb->private_data); + + /* Call the watchers interested in all tasks */ + rcu_read_lock(); + list_for_each_entry_rcu(nb, ¬ify_list, list) + if ((nb->event_mask & event) == event) + nb->func(event, TASK_NOTIFY_DONT_SLEEP, task, + nb->private_data); + rcu_read_unlock(); + return; +} Index: linux-2.6.14-rc4/kernel/exit.c =================================================================== --- linux-2.6.14-rc4.orig/kernel/exit.c +++ linux-2.6.14-rc4/kernel/exit.c @@ -26,10 +26,11 @@ #include <linux/proc_fs.h> #include <linux/mempolicy.h> #include <linux/cpuset.h> #include <linux/syscalls.h> #include <linux/signal.h> +#include <linux/task_notify.h> #include <asm/uaccess.h> #include <asm/unistd.h> #include <asm/pgtable.h> #include <asm/mmu_context.h> @@ -861,10 +862,11 @@ fastcall NORET_TYPE void do_exit(long co module_put(tsk->thread_info->exec_domain->module); if (tsk->binfmt) module_put(tsk->binfmt->module); tsk->exit_code = code; + task_notify_event(TASK_NOTIFY_EXIT, tsk); exit_notify(tsk); #ifdef CONFIG_NUMA mpol_free(tsk->mempolicy); tsk->mempolicy = NULL; #endif Index: linux-2.6.14-rc4/kernel/fork.c =================================================================== --- linux-2.6.14-rc4.orig/kernel/fork.c +++ linux-2.6.14-rc4/kernel/fork.c @@ -40,10 +40,11 @@ #include <linux/mount.h> #include <linux/audit.h> #include <linux/profile.h> #include <linux/rmap.h> #include <linux/acct.h> +#include <linux/task_notify.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> #include <asm/uaccess.h> #include <asm/mmu_context.h> @@ -1036,10 +1037,12 @@ static task_t *copy_process(unsigned lon /* ok, now we should be set up.. */ p->exit_signal = (clone_flags & CLONE_THREAD) ? -1 : (clone_flags & CSIGNAL); p->pdeath_signal = 0; p->exit_state = 0; + p->task_watchers = NULL; + /* * Ok, make it visible to the rest of the system. * We dont wake it up yet. */ p->group_leader = p; @@ -1144,10 +1147,12 @@ static task_t *copy_process(unsigned lon attach_pid(p, PIDTYPE_SID, p->signal->session); if (p->pid) __get_cpu_var(process_counts)++; } + task_notify_event(TASK_NOTIFY_FORK, p); + if (!current->signal->tty && p->signal->tty) p->signal->tty = NULL; nr_threads++; total_forks++; Index: linux-2.6.14-rc4/kernel/Makefile =================================================================== --- linux-2.6.14-rc4.orig/kernel/Makefile +++ linux-2.6.14-rc4/kernel/Makefile @@ -30,10 +30,11 @@ obj-$(CONFIG_KPROBES) += kprobes.o obj-$(CONFIG_SYSFS) += ksysfs.o obj-$(CONFIG_DETECT_SOFTLOCKUP) += softlockup.o obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_SECCOMP) += seccomp.o +obj-$(CONFIG_TASK_NOTIFY) += task_notify.o ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) # According to Alan Modra <al...@li...>, the -fno-omit-frame-pointer is # needed for x86 only. Why this used to be enabled for all architectures is beyond # me. I suspect most platforms don't need this, but until we know that for sure Index: linux-2.6.14-rc4/kernel/sys.c =================================================================== --- linux-2.6.14-rc4.orig/kernel/sys.c +++ linux-2.6.14-rc4/kernel/sys.c @@ -29,10 +29,11 @@ #include <linux/tty.h> #include <linux/signal.h> #include <linux/compat.h> #include <linux/syscalls.h> +#include <linux/task_notify.h> #include <asm/uaccess.h> #include <asm/io.h> #include <asm/unistd.h> @@ -621,10 +622,11 @@ asmlinkage long sys_setregid(gid_t rgid, current->sgid = new_egid; current->fsgid = new_egid; current->egid = new_egid; current->gid = new_rgid; key_fsgid_changed(current); + task_notify_event(TASK_NOTIFY_GID, current); return 0; } /* * setgid() is implemented like SysV w/ SAVED_IDS @@ -660,10 +662,11 @@ asmlinkage long sys_setgid(gid_t gid) } else return -EPERM; key_fsgid_changed(current); + task_notify_event(TASK_NOTIFY_GID, current); return 0; } static int set_user(uid_t new_ruid, int dumpclear) { @@ -750,10 +753,12 @@ asmlinkage long sys_setreuid(uid_t ruid, current->suid = current->euid; current->fsuid = current->euid; key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); + return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); } @@ -797,10 +802,12 @@ asmlinkage long sys_setuid(uid_t uid) current->fsuid = current->euid = uid; current->suid = new_suid; key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); + return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); } /* @@ -845,10 +852,12 @@ asmlinkage long sys_setresuid(uid_t ruid if (suid != (uid_t) -1) current->suid = suid; key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); + return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); } asmlinkage long sys_getresuid(uid_t __user *ruid, uid_t __user *euid, uid_t __user *suid) { @@ -896,10 +905,11 @@ asmlinkage long sys_setresgid(gid_t rgid current->gid = rgid; if (sgid != (gid_t) -1) current->sgid = sgid; key_fsgid_changed(current); + task_notify_event(TASK_NOTIFY_GID, current); return 0; } asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid) { @@ -938,10 +948,11 @@ asmlinkage long sys_setfsuid(uid_t uid) } current->fsuid = uid; } key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); return old_fsuid; } @@ -967,10 +978,11 @@ asmlinkage long sys_setfsgid(gid_t gid) smp_wmb(); } current->fsgid = gid; key_fsgid_changed(current); } + task_notify_event(TASK_NOTIFY_GID, current); return old_fsgid; } asmlinkage long sys_times(struct tms __user * tbuf) { Index: linux-2.6.14-rc4/include/linux/init_task.h =================================================================== --- linux-2.6.14-rc4.orig/include/linux/init_task.h +++ linux-2.6.14-rc4/include/linux/init_task.h @@ -120,10 +120,11 @@ extern struct group_info init_groups; .alloc_lock = SPIN_LOCK_UNLOCKED, \ .proc_lock = SPIN_LOCK_UNLOCKED, \ .journal_info = NULL, \ .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \ .fs_excl = ATOMIC_INIT(0), \ + .task_watchers = NULL, \ } #define INIT_CPU_TIMERS(cpu_timers) \ { \ Index: linux-2.6.14-rc4/include/linux/sched.h =================================================================== --- linux-2.6.14-rc4.orig/include/linux/sched.h +++ linux-2.6.14-rc4/include/linux/sched.h @@ -34,10 +34,11 @@ #include <linux/percpu.h> #include <linux/topology.h> #include <linux/seccomp.h> #include <linux/auxvec.h> /* For AT_VECTOR_SIZE */ +#include <linux/task_notify.h> struct exec_domain; /* * cloning flags: @@ -811,10 +812,13 @@ struct task_struct { struct cpuset *cpuset; nodemask_t mems_allowed; int cpuset_mems_generation; #endif atomic_t fs_excl; /* holding fs exclusive resources */ +#ifdef CONFIG_TASK_NOTIFY + struct task_notifier_block *task_watchers; +#endif }; static inline pid_t process_group(struct task_struct *tsk) { return tsk->signal->pgrp; Index: linux-2.6.14-rc4/Documentation/task_notify.txt =================================================================== --- /dev/null +++ linux-2.6.14-rc4/Documentation/task_notify.txt @@ -0,0 +1,48 @@ +task_notify is a means for a kernel system or module to attach per-task data +and/or get called at certain important events in the lifetime of a task. + +Events include: + + fork + exec + exit + uid/gid change + per-task data going away (more on this below) + +A module interested in these events must register itself with one of two +task_notify_watch functions by specifying the set of interested events, a +function to be called during the event, and a data pointer to be passed to +the function. While functions may be registered multiple times callers should +be aware that the first registration of the function is removed first. + +Registering with task_notify_watch_all() allows the caller to watch the +specified set of events for all tasks. Note that in this case the data passed +to the function isn't associated with a task. Furthermore, functions registered +for all tasks must not sleep when the flag TASK_NOTIFY_DONT_SLEEP is specified. + +If a caller is interested only in a single task or a small number of tasks, it +may choose to utilize the more effecient task_notify_watch_task() function. The +data passed can be specific to the task, or it can be shared. When this per-task +data pointer is no longer in use the function will be called with no events. + +When an event occurs the specified function and is called with several +parameters: + +events - the set of events that occurred + +flags - special flags indicating the context of the call: + TASK_NOTIFY_DONT_SLEEP tells the function it must not + sleep. This only will only happen if a function is registered + for all tasks and it gets called. + +data - the data pointer passed in at registration time + +If a data pointer is provided, one last invocation of the function will be made +with no events. This indicates that the reference to the data is being dropped. +As with other invocations, functions registered for all tasks must still honor +the TASK_NOTIFY_DONT_SLEEP flag. + +When a caller is no longer interested in being called during a task event it +must use a corresponding task_notify_ignore call by specifying the function +which should ignore events. The first registration of the specified function +will be removed. |
From: Chandra S. <sek...@us...> - 2005-10-12 23:54:19
|
On Wed, 2005-10-12 at 14:31 -0500, Jack Steiner wrote: > On Tue, Oct 11, 2005 at 07:49:21PM -0700, Chandra Seetharaman wrote: > > On Tue, 2005-10-11 at 11:47 -0700, Chandra Seetharaman wrote: > > > > > I will write up some code and post it today. > > > The patch looks very similar to the original task_notifier patch but has some yup... That is why i was trying to get to a small subset (instead of having multiple callouts from fork()/exec() etc., to satisfy for each of our needs) > significant changes that concern me. These changes make the patch much less > useful for our needs, (although I'm sure the changes make it better suited for > your needs :-) CKRM needs are very simple. It just needs global notifiers that is called on every task. I could use your task notifier, but it is heavy weight for our usage. I would guess that is the case with the Process Event connector (formerly fork/exit connectors) too. > > > The changes that concern me are: > > 1) Task notifiers are registered in a global list instead of a per-task > list. This makes sense if most tasks require callouts, but for > our usage, only a small percentage of fork/exec/exits actually > need a callout. Scanning the notifier list and calling > each function on every fork/exit/exec is unnecessary overhead > if it is not needed. On the otherhand, if all tasks needed > the callouts (your case, I think), I can see the advantage of a global list. Yes, that is correct. > > 2) There is no easy way to associate any per-task data with a task. The > original task_notifier mechanism registered a notifier block for > each task. Typically, the notifier block would be embedded in a larger > container that contains the additional per-task data. Locating the > task data is nothing more than a typecast (embedded in macros, of course). > > Nothing prevents the notifier function itself from managing per-task state > in additional data structures but it is not straightforward. Although > it may be able to be done without locks (not sure), there are > frequently 100's of tasks per notifier function that require task data. > Managing the data in a single linked list is not practical. An array, hash > or tree structure is probably needed. Certainly doable, but not trivial > without adding data to the task_struct. I agree that PAGG (for that matter CKRM also) need to add some field in the task data structure. In your implementaion it was a void pointer in the task data structure. Can't you keep the same and use it ? > > 3) The notifier functions are called from within a rcu_read_lock()/ > rcu_read_unlock() block. Doesn't this mean that the notifier function > cannot sleep or do anything that has the potential to sleep? > That would be a significant limitation. oh yeah... i did not think about that. Need to think about a different way to do it. > > > > Let me explain our typical usage of PAGG/pnotifier/task_callouts. Maybe this > will help explain some of the issues and tradeoffs that concern me. (I know > that your requirements are different). > > > We currently have 4 users of the PAGG/pnotifier/task_callout mechanism. That > means that at all times there would be 4 notifier functions registered in a > global callout list. Of the 4 functions, 3 are infrequently used although all 4 > are usually in use by at least 1 task. At any point in time, _most_ tasks > typically require no notifiers. > > A typical system has 1000's or 10000's of tasks. At any point in time, most > tasks have no need for any notifiers. There is roughly no more than 1 task per > cpu that is using a notifier function although the number may be more or less > than this (our big systems have 512p). > > Of the 4 users of the PAGG/pnotifier/task_callout mechanism, 3 require per-task > data to manage additional task state. Requiring 4 separate functions to > scan for per-task state when typically none exists is unnecessary overhead. > > For our typical usage, statistically most fork/exec/exits do not need a callout. > Unless I'm overlooking something, using the newer mechanism would require the > following at every fork/exec/exit: > > - scan the global notifier list > - call 4 notifier functions > - each notifier function scans it's own task-state data to see if the > task needs additional work done as part of the callout > - no task data found because callout was not needed > - callout was a waste of time > > With the original task_notifier mechanism, the notifier_block pointer in most > task_structs is NULL. The notifier mechanism is skipped and incurs almost zero > overhead. Correct me if I am grossing wrong in the following usage model for PAGG: You could maintain your notifiers in the task structure ( same as your current implementation ), register a single global function and do your callouts in that global function depending on what is there in the given task data structure. >From the code point of view, you would register your _task_notifier_call_chain() (in your patch) with task_notify_register() (in my patch). So, the overhead for tasks with no notifiers would be calling your registered global function. > > > I want to reiterate an earlier point. Although our requirements appear to very > similar, it may be a mistake to try to create a _single_ mechanism that covers > both cases. Maybe a better approach is to design two similar mechanisms, each > optimized for a particular purpose. The 2 mechanisms should have similar > APIs and could possibly share some code. > > > -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Paul E. M. <pa...@us...> - 2005-10-25 01:11:01
|
On Tue, Oct 11, 2005 at 07:49:21PM -0700, Chandra Seetharaman wrote: > On Tue, 2005-10-11 at 11:47 -0700, Chandra Seetharaman wrote: > > > I will write up some code and post it today. > > Here is the code. Do let me know how it stands. > Task Event notifier > > On exec, fork, exit, real/effective gid/uid changes notify the > registered > modules. Seems plausible -- a long-lived notification receiver would cause this to be quite read-intensive, so likely a reasonable application of RCU. However, I don't see how the update side is safe. What prevents multiple updates from running concurrently? (See interspersed for more detail.) Thanx, Paul > fs/exec.c | 2 > include/linux/task_notify.h | 44 +++++++++++++++++++++ > init/Kconfig | 10 ++++ > kernel/Makefile | 1 > kernel/exit.c | 2 > kernel/fork.c | 2 > kernel/sys.c | 12 +++++ > kernel/task_notify.c | 89 ++++++++++++++++++++++++++++++++++++ > ++++++++ > 8 files changed, 162 insertions(+) > > Index: l2613-t_notify/fs/exec.c > =================================================================== > --- l2613-t_notify.orig/fs/exec.c > +++ l2613-t_notify/fs/exec.c > @@ -48,6 +48,7 @@ > #include <linux/syscalls.h> > #include <linux/rmap.h> > #include <linux/acct.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/mmu_context.h> > @@ -1104,6 +1105,7 @@ int search_binary_handler(struct linux_b > fput(bprm->file); > bprm->file = NULL; > current->did_exec = 1; > + task_notify_event(TASK_NOTIFY_EXEC, current); > return retval; > } > read_lock(&binfmt_lock); > Index: l2613-t_notify/include/linux/task_notify.h > =================================================================== > --- /dev/null > +++ l2613-t_notify/include/linux/task_notify.h > @@ -0,0 +1,44 @@ > +/* > + * task_notify.h - Task event notification > + * > + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 > + * > + * Provides the interface and macros for managing in-kernel task > + * notification. > + * > + * This program is free software; you can redistribute it and/or modify > it > + * under the terms of version 2.1 of the GNU Lesser General Public > License > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it would be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > + * > + */ > + > +#ifndef _LINUX_TASK_NOTIFY_H > +#define _LINUX_TASK_NOTIFY_H > + > +#define TASK_NOTIFY_FORK 0x00000001 > +#define TASK_NOTIFY_EXEC 0x00000002 > +#define TASK_NOTIFY_EXIT 0x00000004 > +#define TASK_NOTIFY_UID 0x00000008 > +#define TASK_NOTIFY_GID 0x00000010 > + > +#define TASK_NOTIFY_ALL 0x0000001f > + > +#ifdef CONFIG_TASK_NOTIFY > + > +typedef void (task_notifier_func)(unsigned int, struct task_struct *); > + > +extern int task_notify_register(task_notifier_func *, unsigned int); > +extern void task_notify_unregister(task_notifier_func *); > +extern void task_notify_event(unsigned int event, struct task_struct > *task); > + > +#else /* !CONFIG_TASK_NOTIFY */ > + > +static inline void task_notify_event(unsigned int i, struct task_struct > *t) {} > + > +#endif /* CONFIG_TASK_NOTIFY */ > + > +#endif /* _LINUX_TASK_NOTIFY_H */ > Index: l2613-t_notify/init/Kconfig > =================================================================== > --- l2613-t_notify.orig/init/Kconfig > +++ l2613-t_notify/init/Kconfig > @@ -146,6 +146,16 @@ config BSD_PROCESS_ACCT_V3 > for processing it. A preliminary version of these tools is available > at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>. > > +config TASK_NOTIFY > + bool "Task Notification" > + depends on EXPERIMENTAL > + help > + Enabling this provides a in kernel notification of task events. > + Modules can register to be notified for specific task events and > + get notified when any task goes thru that event. > + > + Say N if you are unsure. > + > config SYSCTL > bool "Sysctl support" > ---help--- > Index: l2613-t_notify/kernel/task_notify.c > =================================================================== > --- /dev/null > +++ l2613-t_notify/kernel/task_notify.c > @@ -0,0 +1,89 @@ > +/* task_notify.c - in-kernel task event notification > + * > + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 > + * > + * Provides API for task event registration and notification. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + */ > + > +#include <linux/config.h> > +#include <linux/module.h> > +#include <linux/list.h> > +#include <linux/rcupdate.h> > +#include <linux/task_notify.h> > + > +static LIST_HEAD(notify_list); > + > +struct task_notifier_block { > + struct list_head list; > + struct rcu_head rcu; > + task_notifier_func *func; > + unsigned int mask; > +}; > + > +/******************************************************************* > + * Event callback invocation > + *******************************************************************/ > +int > +task_notify_register(task_notifier_func *func, unsigned int mask) > +{ > + struct task_notifier_block *nb; > + > + if ((func == NULL) || ((mask & ~TASK_NOTIFY_ALL) != 0)) > + return -EINVAL; > + > + nb = kmalloc(sizeof(struct task_notifier_block), GFP_KERNEL); > + if (nb == NULL) > + return -ENOMEM; > + nb->func = func; > + nb->mask = mask; > + list_add_rcu(&nb->list, ¬ify_list); What prevents a pair of task_notify_register() invocations from running concurrently, mangling notify_list in the process? > + return 0; > +} > + > +static void > +task_notify_free(struct rcu_head *head) > +{ > + struct task_notifier_block *nb = > + container_of(head, struct task_notifier_block, rcu); > + kfree(nb); > + return; > +} > + > +void > +task_notify_unregister(task_notifier_func *func) > +{ > + struct task_notifier_block *nb; > + > + /* Do not use the _rcu iterator here */ > + list_for_each_entry(nb, ¬ify_list, list) { > + if (nb->func == func) { > + list_del_rcu(&nb->list); Same here -- what prevents a pair of task_notify_unregister()s from running concurrently, either with each other or with a task_notify_register(), again mangling notify_list? > + call_rcu(&nb->rcu, task_notify_free); > + return; > + } > + } > + return; > +} > + > +EXPORT_SYMBOL_GPL(task_notify_register); > +EXPORT_SYMBOL_GPL(task_notify_unregister); > + > +void > +task_notify_event(unsigned int event, struct task_struct *task) > +{ > + struct task_notifier_block *nb; > + > + rcu_read_lock(); > + list_for_each_entry_rcu(nb, ¬ify_list, list) { > + if ((nb->mask & event) == event) > + nb->func(event, task); > + } > + rcu_read_unlock(); > + return; > +} > Index: l2613-t_notify/kernel/exit.c > =================================================================== > --- l2613-t_notify.orig/kernel/exit.c > +++ l2613-t_notify/kernel/exit.c > @@ -28,6 +28,7 @@ > #include <linux/cpuset.h> > #include <linux/syscalls.h> > #include <linux/signal.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/unistd.h> > @@ -851,6 +852,7 @@ fastcall NORET_TYPE void do_exit(long co > module_put(tsk->binfmt->module); > > tsk->exit_code = code; > + task_notify_event(TASK_NOTIFY_EXIT, tsk); > exit_notify(tsk); > #ifdef CONFIG_NUMA > mpol_free(tsk->mempolicy); > Index: l2613-t_notify/kernel/fork.c > =================================================================== > --- l2613-t_notify.orig/kernel/fork.c > +++ l2613-t_notify/kernel/fork.c > @@ -41,6 +41,7 @@ > #include <linux/profile.h> > #include <linux/rmap.h> > #include <linux/acct.h> > +#include <linux/task_notify.h> > > #include <asm/pgtable.h> > #include <asm/pgalloc.h> > @@ -1102,6 +1103,7 @@ static task_t *copy_process(unsigned lon > __ptrace_link(p, current->parent); > > cpuset_fork(p); > + task_notify_event(TASK_NOTIFY_FORK, p); > > attach_pid(p, PIDTYPE_PID, p->pid); > attach_pid(p, PIDTYPE_TGID, p->tgid); > Index: l2613-t_notify/kernel/Makefile > =================================================================== > --- l2613-t_notify.orig/kernel/Makefile > +++ l2613-t_notify/kernel/Makefile > @@ -30,6 +30,7 @@ obj-$(CONFIG_SYSFS) += ksysfs.o > obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ > obj-$(CONFIG_CRASH_DUMP) += crash_dump.o > obj-$(CONFIG_SECCOMP) += seccomp.o > +obj-$(CONFIG_TASK_NOTIFY) += task_notify.o > > ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) > # According to Alan Modra <al...@li...>, the -fno-omit-frame- > pointer is > Index: l2613-t_notify/kernel/sys.c > =================================================================== > --- l2613-t_notify.orig/kernel/sys.c > +++ l2613-t_notify/kernel/sys.c > @@ -31,6 +31,7 @@ > > #include <linux/compat.h> > #include <linux/syscalls.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/io.h> > @@ -583,6 +584,7 @@ asmlinkage long sys_setregid(gid_t rgid, > current->egid = new_egid; > current->gid = new_rgid; > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > @@ -622,6 +624,7 @@ asmlinkage long sys_setgid(gid_t gid) > return -EPERM; > > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > @@ -712,6 +715,8 @@ asmlinkage long sys_setreuid(uid_t ruid, > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, > LSM_SETID_RE); > } > > @@ -759,6 +764,8 @@ asmlinkage long sys_setuid(uid_t uid) > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, > LSM_SETID_ID); > } > > @@ -807,6 +814,8 @@ asmlinkage long sys_setresuid(uid_t ruid > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, > LSM_SETID_RES); > } > > @@ -858,6 +867,7 @@ asmlinkage long sys_setresgid(gid_t rgid > current->sgid = sgid; > > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > @@ -900,6 +910,7 @@ asmlinkage long sys_setfsuid(uid_t uid) > } > > key_fsuid_changed(current); > + task_notify_event(TASK_NOTIFY_UID, current); > > security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, > LSM_SETID_FS); > > @@ -929,6 +940,7 @@ asmlinkage long sys_setfsgid(gid_t gid) > current->fsgid = gid; > key_fsgid_changed(current); > } > + task_notify_event(TASK_NOTIFY_GID, current); > return old_fsgid; > } > > > -- > > ---------------------------------------------------------------------- > Chandra Seetharaman | Be careful what you choose.... > - sek...@us... | .......you may get it. > ---------------------------------------------------------------------- > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: > Power Architecture Resource Center: Free content, downloads, discussions, > and more. http://solutions.newsforge.com/ibmarch.tmpl > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech > |
From: Matt H. <mat...@us...> - 2005-10-25 05:22:39
|
On Mon, 2005-10-24 at 18:11 -0700, Paul E. McKenney wrote: > On Tue, Oct 11, 2005 at 07:49:21PM -0700, Chandra Seetharaman wrote: > > On Tue, 2005-10-11 at 11:47 -0700, Chandra Seetharaman wrote: > > > > > I will write up some code and post it today. > > > > Here is the code. Do let me know how it stands. > > Task Event notifier > > > > On exec, fork, exit, real/effective gid/uid changes notify the > > registered > > modules. > > Seems plausible -- a long-lived notification receiver would cause this > to be quite read-intensive, so likely a reasonable application of RCU. > > However, I don't see how the update side is safe. What prevents multiple > updates from running concurrently? (See interspersed for more detail.) > > Thanx, Paul > > > fs/exec.c | 2 > > include/linux/task_notify.h | 44 +++++++++++++++++++++ > > init/Kconfig | 10 ++++ > > kernel/Makefile | 1 > > kernel/exit.c | 2 > > kernel/fork.c | 2 > > kernel/sys.c | 12 +++++ > > kernel/task_notify.c | 89 ++++++++++++++++++++++++++++++++++++ > > ++++++++ > > 8 files changed, 162 insertions(+) > > > > Index: l2613-t_notify/fs/exec.c > > =================================================================== > > --- l2613-t_notify.orig/fs/exec.c > > +++ l2613-t_notify/fs/exec.c > > @@ -48,6 +48,7 @@ > > #include <linux/syscalls.h> > > #include <linux/rmap.h> > > #include <linux/acct.h> > > +#include <linux/task_notify.h> > > > > #include <asm/uaccess.h> > > #include <asm/mmu_context.h> > > @@ -1104,6 +1105,7 @@ int search_binary_handler(struct linux_b > > fput(bprm->file); > > bprm->file = NULL; > > current->did_exec = 1; > > + task_notify_event(TASK_NOTIFY_EXEC, current); > > return retval; > > } > > read_lock(&binfmt_lock); > > Index: l2613-t_notify/include/linux/task_notify.h > > =================================================================== > > --- /dev/null > > +++ l2613-t_notify/include/linux/task_notify.h > > @@ -0,0 +1,44 @@ > > +/* > > + * task_notify.h - Task event notification > > + * > > + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 > > + * > > + * Provides the interface and macros for managing in-kernel task > > + * notification. > > + * > > + * This program is free software; you can redistribute it and/or modify > > it > > + * under the terms of version 2.1 of the GNU Lesser General Public > > License > > + * as published by the Free Software Foundation. > > + * > > + * This program is distributed in the hope that it would be useful, but > > + * WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > > + * > > + */ > > + > > +#ifndef _LINUX_TASK_NOTIFY_H > > +#define _LINUX_TASK_NOTIFY_H > > + > > +#define TASK_NOTIFY_FORK 0x00000001 > > +#define TASK_NOTIFY_EXEC 0x00000002 > > +#define TASK_NOTIFY_EXIT 0x00000004 > > +#define TASK_NOTIFY_UID 0x00000008 > > +#define TASK_NOTIFY_GID 0x00000010 > > + > > +#define TASK_NOTIFY_ALL 0x0000001f > > + > > +#ifdef CONFIG_TASK_NOTIFY > > + > > +typedef void (task_notifier_func)(unsigned int, struct task_struct *); > > + > > +extern int task_notify_register(task_notifier_func *, unsigned int); > > +extern void task_notify_unregister(task_notifier_func *); > > +extern void task_notify_event(unsigned int event, struct task_struct > > *task); > > + > > +#else /* !CONFIG_TASK_NOTIFY */ > > + > > +static inline void task_notify_event(unsigned int i, struct task_struct > > *t) {} > > + > > +#endif /* CONFIG_TASK_NOTIFY */ > > + > > +#endif /* _LINUX_TASK_NOTIFY_H */ > > Index: l2613-t_notify/init/Kconfig > > =================================================================== > > --- l2613-t_notify.orig/init/Kconfig > > +++ l2613-t_notify/init/Kconfig > > @@ -146,6 +146,16 @@ config BSD_PROCESS_ACCT_V3 > > for processing it. A preliminary version of these tools is available > > at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>. > > > > +config TASK_NOTIFY > > + bool "Task Notification" > > + depends on EXPERIMENTAL > > + help > > + Enabling this provides a in kernel notification of task events. > > + Modules can register to be notified for specific task events and > > + get notified when any task goes thru that event. > > + > > + Say N if you are unsure. > > + > > config SYSCTL > > bool "Sysctl support" > > ---help--- > > Index: l2613-t_notify/kernel/task_notify.c > > =================================================================== > > --- /dev/null > > +++ l2613-t_notify/kernel/task_notify.c > > @@ -0,0 +1,89 @@ > > +/* task_notify.c - in-kernel task event notification > > + * > > + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 > > + * > > + * Provides API for task event registration and notification. > > + * > > + * This program is free software; you can redistribute it and/or modify > > + * it under the terms of the GNU General Public License as published by > > + * the Free Software Foundation; either version 2 of the License, or > > + * (at your option) any later version. > > + * > > + */ > > + > > +#include <linux/config.h> > > +#include <linux/module.h> > > +#include <linux/list.h> > > +#include <linux/rcupdate.h> > > +#include <linux/task_notify.h> > > + > > +static LIST_HEAD(notify_list); > > + > > +struct task_notifier_block { > > + struct list_head list; > > + struct rcu_head rcu; > > + task_notifier_func *func; > > + unsigned int mask; > > +}; > > + > > +/******************************************************************* > > + * Event callback invocation > > + *******************************************************************/ > > +int > > +task_notify_register(task_notifier_func *func, unsigned int mask) > > +{ > > + struct task_notifier_block *nb; > > + > > + if ((func == NULL) || ((mask & ~TASK_NOTIFY_ALL) != 0)) > > + return -EINVAL; > > + > > + nb = kmalloc(sizeof(struct task_notifier_block), GFP_KERNEL); > > + if (nb == NULL) > > + return -ENOMEM; > > + nb->func = func; > > + nb->mask = mask; > > + list_add_rcu(&nb->list, ¬ify_list); > > What prevents a pair of task_notify_register() invocations from running > concurrently, mangling notify_list in the process? Good point! It's missing the "write-side" synchronization. I think a simple spin lock will do: spin_lock(¬ify_list_modify_lock); list_add_rcu(&nb->list, ¬ify_list); spin_unlock(¬ify_list_modify_lock); > > + return 0; > > +} > > + > > +static void > > +task_notify_free(struct rcu_head *head) > > +{ > > + struct task_notifier_block *nb = > > + container_of(head, struct task_notifier_block, rcu); > > + kfree(nb); > > + return; > > +} > > + > > +void > > +task_notify_unregister(task_notifier_func *func) > > +{ > > + struct task_notifier_block *nb; > > + > > + /* Do not use the _rcu iterator here */ > > + list_for_each_entry(nb, ¬ify_list, list) { > > + if (nb->func == func) { > > + list_del_rcu(&nb->list); > > Same here -- what prevents a pair of task_notify_unregister()s > from running concurrently, either with each other or with a > task_notify_register(), again mangling notify_list? I think getting the spinlock here too will fix it. Will wrapping the list_del_rcu() call tightly: spin_lock(¬ify_list_modify_lock); list_del_rcu(&nb->list); spin_unlock(¬ify_list_modify_lock); work or is some _rcu variant of list_for_each_entry() required (perhaps a _safe() variant too)? Thanks, -Matt Helsley matthltc @ us.ibm.com |
From: Paul E. M. <pa...@us...> - 2005-10-25 11:03:51
|
On Mon, Oct 24, 2005 at 10:18:36PM -0700, Matt Helsley wrote: > On Mon, 2005-10-24 at 18:11 -0700, Paul E. McKenney wrote: > > On Tue, Oct 11, 2005 at 07:49:21PM -0700, Chandra Seetharaman wrote: > > > On Tue, 2005-10-11 at 11:47 -0700, Chandra Seetharaman wrote: > > > > > > > I will write up some code and post it today. > > > > > > Here is the code. Do let me know how it stands. > > > Task Event notifier > > > > > > On exec, fork, exit, real/effective gid/uid changes notify the > > > registered > > > modules. > > > > Seems plausible -- a long-lived notification receiver would cause this > > to be quite read-intensive, so likely a reasonable application of RCU. > > > > However, I don't see how the update side is safe. What prevents multiple > > updates from running concurrently? (See interspersed for more detail.) > > > > Thanx, Paul [ . . . ] > > > +int > > > +task_notify_register(task_notifier_func *func, unsigned int mask) > > > +{ > > > + struct task_notifier_block *nb; > > > + > > > + if ((func == NULL) || ((mask & ~TASK_NOTIFY_ALL) != 0)) > > > + return -EINVAL; > > > + > > > + nb = kmalloc(sizeof(struct task_notifier_block), GFP_KERNEL); > > > + if (nb == NULL) > > > + return -ENOMEM; > > > + nb->func = func; > > > + nb->mask = mask; > > > + list_add_rcu(&nb->list, ¬ify_list); > > > > What prevents a pair of task_notify_register() invocations from running > > concurrently, mangling notify_list in the process? > > Good point! It's missing the "write-side" synchronization. I think a > simple spin lock will do: > > spin_lock(¬ify_list_modify_lock); > list_add_rcu(&nb->list, ¬ify_list); > spin_unlock(¬ify_list_modify_lock); Works for me! > > > + return 0; > > > +} > > > + > > > +static void > > > +task_notify_free(struct rcu_head *head) > > > +{ > > > + struct task_notifier_block *nb = > > > + container_of(head, struct task_notifier_block, rcu); > > > + kfree(nb); > > > + return; > > > +} > > > + > > > +void > > > +task_notify_unregister(task_notifier_func *func) > > > +{ > > > + struct task_notifier_block *nb; > > > + > > > + /* Do not use the _rcu iterator here */ > > > + list_for_each_entry(nb, ¬ify_list, list) { > > > + if (nb->func == func) { > > > + list_del_rcu(&nb->list); > > > > Same here -- what prevents a pair of task_notify_unregister()s > > from running concurrently, either with each other or with a > > task_notify_register(), again mangling notify_list? > > I think getting the spinlock here too will fix it. Will wrapping the > list_del_rcu() call tightly: > spin_lock(¬ify_list_modify_lock); > list_del_rcu(&nb->list); > spin_unlock(¬ify_list_modify_lock); > > work or is some _rcu variant of list_for_each_entry() required (perhaps > a _safe() variant too)? Good catch! But there is more than one possible fix. The list_del_rcu() primitive should leave the pointers around that list_for_each_entry() needs, so it should not be necessary to use list_for_each_entry_safe(). However, to rely on that, one would need to enclose this loop in rcu_read_lock()/rcu_read_unlock() for realtime kernels. So either of the following two fixes may be used: 1. Change the list_for_each_entry() to list_for_each_entry_safe() 2. Enclose the loop with rcu_read_lock() and rcu_read_unlock(), and also change list_for_each_entry() to list_for_each_entry_rcu() #1 does have the virtue of requiring fewer lines of code, and of being more conventional. Thanx, Paul |
From: Chandra S. <sek...@us...> - 2005-10-25 18:26:11
|
Thanks Paul for your comments. There is still discussion whether we want to use RCU at all as some of the callouts might want to block(which are currently being called between rcu_read_lock()/rcu_read_unlock()). Will follow option 1 (below) with lock, if we are going ahead with it.. Thanks, chandra On Tue, 2005-10-25 at 04:04 -0700, Paul E. McKenney wrote: > > > > I think getting the spinlock here too will fix it. Will wrapping the > > list_del_rcu() call tightly: > > spin_lock(¬ify_list_modify_lock); > > list_del_rcu(&nb->list); > > spin_unlock(¬ify_list_modify_lock); > > > > work or is some _rcu variant of list_for_each_entry() required (perhaps > > a _safe() variant too)? > > Good catch! But there is more than one possible fix. > > The list_del_rcu() primitive should leave the pointers around that > list_for_each_entry() needs, so it should not be necessary to use > list_for_each_entry_safe(). However, to rely on that, one would > need to enclose this loop in rcu_read_lock()/rcu_read_unlock() for > realtime kernels. > > So either of the following two fixes may be used: > > 1. Change the list_for_each_entry() to list_for_each_entry_safe() > > 2. Enclose the loop with rcu_read_lock() and rcu_read_unlock(), and > also change list_for_each_entry() to list_for_each_entry_rcu() > > #1 does have the virtue of requiring fewer lines of code, and of being > more conventional. > > Thanx, Paul > > > ------------------------------------------------------- > This SF.Net email is sponsored by the JBoss Inc. > Get Certified Today * Register for a JBoss Training Course > Free Certification Exam for All Training Attendees Through End of 2005 > Visit http://www.jboss.com/services/certification for more information > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech > -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Erik J. <er...@sg...> - 2005-10-25 18:31:30
|
> There is still discussion whether we want to use RCU at all as some of > the callouts might want to block(which are currently being called Right. For what I need, there are some semaphores that have to be held and memory that needs to be allocated... So I need to be able to sleep in order to use this for my needs anyway... > between rcu_read_lock()/rcu_read_unlock()). > > Will follow option 1 (below) with lock, if we are going ahead with it.. > > Thanks, > > chandra > > On Tue, 2005-10-25 at 04:04 -0700, Paul E. McKenney wrote: > > > > > > I think getting the spinlock here too will fix it. Will wrapping the > > > list_del_rcu() call tightly: > > > spin_lock(¬ify_list_modify_lock); > > > list_del_rcu(&nb->list); > > > spin_unlock(¬ify_list_modify_lock); > > > > > > work or is some _rcu variant of list_for_each_entry() required (perhaps > > > a _safe() variant too)? > > > > Good catch! But there is more than one possible fix. > > > > The list_del_rcu() primitive should leave the pointers around that > > list_for_each_entry() needs, so it should not be necessary to use > > list_for_each_entry_safe(). However, to rely on that, one would > > need to enclose this loop in rcu_read_lock()/rcu_read_unlock() for > > realtime kernels. > > > > So either of the following two fixes may be used: > > > > 1. Change the list_for_each_entry() to list_for_each_entry_safe() > > > > 2. Enclose the loop with rcu_read_lock() and rcu_read_unlock(), and > > also change list_for_each_entry() to list_for_each_entry_rcu() > > > > #1 does have the virtue of requiring fewer lines of code, and of being > > more conventional. > > > > Thanx, Paul > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by the JBoss Inc. > > Get Certified Today * Register for a JBoss Training Course > > Free Certification Exam for All Training Attendees Through End of 2005 > > Visit http://www.jboss.com/services/certification for more information > > _______________________________________________ > > Lse-tech mailing list > > Lse...@li... > > https://lists.sourceforge.net/lists/listinfo/lse-tech > > > -- > > ---------------------------------------------------------------------- > Chandra Seetharaman | Be careful what you choose.... > - sek...@us... | .......you may get it. > ---------------------------------------------------------------------- > -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota |
From: Paul E. M. <pa...@us...> - 2005-10-25 19:22:01
|
On Tue, Oct 25, 2005 at 01:31:12PM -0500, Erik Jacobson wrote: > > There is still discussion whether we want to use RCU at all as some of > > the callouts might want to block(which are currently being called > > Right. For what I need, there are some semaphores that have to be held > and memory that needs to be allocated... So I need to be able to sleep > in order to use this for my needs anyway... OK... If the performance is good enough with locking, then just use locking. ;-) If performance becomes a problem, there are a number of possible approaches, each with advantages and disadvantages: o Use reference counts. If need be, keep the reference counts per-CPU, combining them only when it is time to reap the element. Still need memory barriers. o Keep the reference count for the list as a whole, and replace the entire list on update. This reduces the amount of reference counting, especially for long lists, reducing the number of expensive memory barriers required. o Tag each notifier with a sequence number, use RCU to scan the list multiple times, using the sequence number to avoid invoking the same notifier twice. This algorithm is quadratic, and so is reasonable only if the list is guaranteed to remain reasonably short. o Keep a per-task list of notifiers, thereby avoiding -all- locking. (Hey, at least this approach will keep memory manufacturers happy!) I am sure that there are other approaches as well... Thanx, Paul |
From: Matt H. <mat...@us...> - 2005-10-25 21:10:17
|
On Tue, 2005-10-25 at 13:31 -0500, Erik Jacobson wrote: > > There is still discussion whether we want to use RCU at all as some of > > the callouts might want to block(which are currently being called > > Right. For what I need, there are some semaphores that have to be held > and memory that needs to be allocated... So I need to be able to sleep > in order to use this for my needs anyway... Did you see my variant that used RCU on the global list but did not require locks for the per-task list? It included some documentation. Cheers, -Matt Helsley |
From: Chandra S. <sek...@us...> - 2005-10-25 21:35:36
|
On Tue, 2005-10-25 at 14:04 -0700, Matt Helsley wrote: > On Tue, 2005-10-25 at 13:31 -0500, Erik Jacobson wrote: > > > There is still discussion whether we want to use RCU at all as some of > > > the callouts might want to block(which are currently being called > > > > Right. For what I need, there are some semaphores that have to be held > > and memory that needs to be allocated... So I need to be able to sleep > > in order to use this for my needs anyway... > > Did you see my variant that used RCU on the global list but did not > require locks for the per-task list? It included some documentation. Hmm.. I thought your "Process Event Connector" also needed the ability to block in the callout routine. Are you sure connector/netlink doesn't need to block. Apart from that, as i already mentioned it is better to stack up the per-task notifier on top of the simple notifier, instead of crowding the task notifier. chandra > > Cheers, > -Matt Helsley > > -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: <kin...@au...> - 2005-10-28 06:25:09
|
On Tue, Oct 25, 2005 at 02:04:53PM -0700, Matt Helsley wrote: > On Tue, 2005-10-25 at 13:31 -0500, Erik Jacobson wrote: > > > There is still discussion whether we want to use RCU at all as some of > > > the callouts might want to block(which are currently being called > > > > Right. For what I need, there are some semaphores that have to be held > > and memory that needs to be allocated... So I need to be able to sleep > > in order to use this for my needs anyway... > > Did you see my variant that used RCU on the global list but did not > require locks for the per-task list? It included some documentation. Matt, Are you saying that if our callouts need to block we should be using the per-task list callouts? -- Kingsley |
From: Matt H. <mat...@us...> - 2005-10-28 10:31:12
|
On Fri, 2005-10-28 at 16:15 +1000, kin...@au... wrote: > On Tue, Oct 25, 2005 at 02:04:53PM -0700, Matt Helsley wrote: > > On Tue, 2005-10-25 at 13:31 -0500, Erik Jacobson wrote: > > > > There is still discussion whether we want to use RCU at all as some of > > > > the callouts might want to block(which are currently being called > > > > > > Right. For what I need, there are some semaphores that have to be held > > > and memory that needs to be allocated... So I need to be able to sleep > > > in order to use this for my needs anyway... > > > > Did you see my variant that used RCU on the global list but did not > > require locks for the per-task list? It included some documentation. > > Matt, > > Are you saying that if our callouts need to block we should be using > the per-task list callouts? Yes. Though callouts that might otherwise need to block could perhaps utilize a work queue to delay processing. This is a definite limitation of the patch I posted. A rwsem might be more appropriate than RCU in this case. For reference, Paul McKenney posted some interesting alternatives and this LKML thread entitled "Notifier chains are unsafe": http://lkml.org/lkml/2005/10/24/165 discusses many of the same issues/solutions. Cheers, -Matt Helsley |
From: Erik J. <er...@sg...> - 2005-11-10 00:53:33
|
Here is a new version of the patch. This now builds and at least the task based notifiers are now working. A reminder, the "watch all tasks" functionality is written in such a way that users cannot sleep. This is because it makes use of RCU. An example of something you cannot do, then, would be any kmalloc/kmem_cache_alloc/etc with GFP_KERNEL, which can sleep. In my opinion, this limits you too much. However, there has been pushback on using rwsem locks for protection in that area and I mostly don't need the global notifier anyway, I need the per-task ones. So I won't argue on this any longer :) I made several adjustments to make Matt's version of the patch come to life. Matt and I kicked around some ideas in email too, some of which are implemented here. It seems like the task_watcher list never shrinks. It just has parts of the list pointed at NULL when they aren't in use. Some changes from Matt's version of the patch: When task_notify_event is called to notify task_notifiers of a fork, task_notify_event was looking at p's (in-construction-child) watcher list. That isn't interesting since p isn't running and it would always have an empty watcher list. I changed task_notify_event to search the parent's watcher list in the fork case. It also provides the parent's data for the function call. It might be good to add this to the README if it's kept. Matt had dome suggestions to clean up what I had done, which I put in to place too. I need a way to get to the notifier block task-associated data outside of the callout context. In my Job example, this would happen when you want to know the Job ID of a given process. One could argue that pnotify/PAGG layered on top could handle this. However, I'm trying to avoid using pnotify/PAGG altogether. I added a task_notify_get_data function that returns the data pointer for the function pointer and task matching the request. However, there are lingering concerns about safety with this function that I'd like to see discussed. In other words, this function isn't done yet. For my Job example, there needed to be a way for an arbitrary process to add to the watcher list of another process. Because of pushback, I've dumbed down Job (removing much of it's functionality) to make use of this patch here. To bring my Job patch functionality back to 100%, I'd need locks for the per-task watcher lists. I need to decide if this reduced version of Job will be sufficient. Indeed, it may be necessary to have some sort of locking mechanisms for task_notify_get_data anyway. I have a version of the Job patch (let me know if you're interested) working with this. I didn't run regression tests against because many of the tests rely on features that are missing now :) Basic functionality does seem to work. The job patch exercises per-task watchers and the FORK and EXIT events. PS: See the FIXMEs for questions and areas where feedback would be nice. Index: linux/fs/exec.c =================================================================== --- linux.orig/fs/exec.c 2005-11-09 14:27:31.260288424 -0600 +++ linux/fs/exec.c 2005-11-09 14:27:38.637486413 -0600 @@ -48,6 +48,7 @@ #include <linux/syscalls.h> #include <linux/rmap.h> #include <linux/acct.h> +#include <linux/task_notify.h> #include <asm/uaccess.h> #include <asm/mmu_context.h> @@ -1100,6 +1101,7 @@ fput(bprm->file); bprm->file = NULL; current->did_exec = 1; + task_notify_event(TASK_NOTIFY_EXEC, current); return retval; } read_lock(&binfmt_lock); Index: linux/include/linux/task_notify.h =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/include/linux/task_notify.h 2005-11-09 18:47:39.741702629 -0600 @@ -0,0 +1,58 @@ +/* task_notify.h - Task event notification + * + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. + * (C) Chandra Seetharaman, IBM Corp. 2005 + * (C) Matt Helsley, IBM Corp. 2005 + * + * Provides the interface and macros for managing in-kernel task + * notification. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * 10/7/2005 Jack Steiner <st...@sg...> Original task_notifiers + * proposal. + * + * 11/9/2005 Erik Jacobson <er...@sg...> add task_notify_get_data. + * Changes to get it building and working. + * + */ + + +#ifndef _LINUX_TASK_NOTIFY_H +#define _LINUX_TASK_NOTIFY_H + +typedef __bitwise unsigned int task_notify_event_bitwise; +#define TASK_NOTIFY_FORK ((__force task_notify_event_bitwise)0x00000001) +#define TASK_NOTIFY_EXEC ((__force task_notify_event_bitwise)0x00000002) +#define TASK_NOTIFY_EXIT ((__force task_notify_event_bitwise)0x00000004) +#define TASK_NOTIFY_DESTROY ((__force task_notify_event_bitwise)0x00000008) +#define TASK_NOTIFY_UID ((__force task_notify_event_bitwise)0x00000010) +#define TASK_NOTIFY_GID ((__force task_notify_event_bitwise)0x00000020) +#define TASK_NOTIFY_ALL_EVENTS ((__force task_notify_event_bitwise)0x0000003f) + +typedef __bitwise unsigned int task_notify_flag; +#define TASK_NOTIFY_DONT_SLEEP ((__force task_notify_flag)0x00000001) + +#ifdef CONFIG_TASK_NOTIFY +typedef void (task_notify_func)(unsigned int events, unsigned int flags, + struct task_struct * task, void *watcher_data); + +extern int task_notify_watch_all (task_notify_func *, void*, unsigned int); +extern int task_notify_watch_task (task_notify_func *, void*, unsigned int, + struct task_struct *); +extern void task_notify_ignore_all (task_notify_func *); +extern void task_notify_ignore_task (task_notify_func *, struct task_struct *); +extern void task_notify_event(unsigned int, struct task_struct *); +extern void * task_notify_get_data(task_notify_func *, struct task_struct *); + + +#else /* !CONFIG_TASK_NOTIFY */ + +static inline void task_notify_event(unsigned int i, struct task_struct *t) {} + +#endif /* CONFIG_TASK_NOTIFY */ + +#endif /* _LINUX_TASK_NOTIFY_H */ Index: linux/init/Kconfig =================================================================== --- linux.orig/init/Kconfig 2005-11-09 14:27:31.262241355 -0600 +++ linux/init/Kconfig 2005-11-09 18:47:20.132323251 -0600 @@ -162,6 +162,16 @@ for processing it. A preliminary version of these tools is available at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>. +config TASK_NOTIFY + bool "Task Notification" + depends on EXPERIMENTAL + help + Enabling this provides a in kernel notification of task events. + Modules can register to be notified for specific task events and + get notified when any task goes thru that event. + + Say N if you are unsure. + config SYSCTL bool "Sysctl support" ---help--- Index: linux/kernel/task_notify.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/kernel/task_notify.c 2005-11-09 18:47:28.563126040 -0600 @@ -0,0 +1,245 @@ +/* task_notify.c - in-kernel task event notification + * + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. + * (C) Chandra Seetharaman, IBM Corp. 2005 + * (C) Matt Helsley, IBM Corp. 2005 + * + * Provides API for task event registration and notification. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * 10/7/2005 Jack Steiner <st...@sg...> Original task_notifiers + * proposal. + * + * 11/9/2005 Erik Jacobson <er...@sg...> add task_notify_get_data to + * retrieve task-associated data. Changed task_notify_event + * to better handle fork events. Several changes and bug fixes to + * get it building and working. + * + */ + +#include <linux/config.h> +#include <linux/module.h> +#include <linux/list.h> +#include <linux/rcupdate.h> +#include <linux/task_notify.h> + +static LIST_HEAD(notify_list); + +struct task_notifier_block { + unsigned int event_mask; /* set of events to pass to func */ + struct list_head list; + union { + struct rcu_head rcu; + void *private_data; /* when not freeing via rcu -- bad idea? */ + }; + task_notify_func *func; +}; + +static kmem_cache_t *nb_cache = NULL; + +static int __init task_notify_init(void) +{ + /* Info on SLAB_DESTROY_BY_RCU : + * http://www.ussg.iu.edu/hypermail/linux/kernel/0407.1/1108.html */ + nb_cache = kmem_cache_create("task_notify", + sizeof(struct task_notifier_block), 0, + SLAB_NO_REAP | SLAB_DESTROY_BY_RCU, NULL, + NULL); + if(!nb_cache) + return -1; + else + return 0; + +} +core_initcall(task_notify_init); + +/* FIXME - commented out as it leads to failing to build + * with 'symbol x: discarded in section...' error + */ +//void __exit_call task_notify_exit (void) +//{ +// kmem_cache_destroy(nb_cache); +// nb_cache = NULL; +//} + +/* internal use only */ +#define TASK_NOTIFY_ALL_FLAGS ((__force task_notify_flag_t)0x0001) + +static inline struct task_notifier_block* +get_notifier_block (task_notify_func *func, void* private_data, + unsigned int event_mask, task_notify_flag flag) +{ + struct task_notifier_block *nb; + + nb = kmem_cache_alloc(nb_cache, GFP_KERNEL); + if (nb == NULL) + return NULL; + nb->private_data = private_data; + nb->func = func; + nb->event_mask = event_mask; + + return nb; +} + +static inline void put_notifier_block (struct task_notifier_block *nb) +{ + kmem_cache_free(nb_cache, nb); +} + +/******************************************************************* + * Event callback invocation + *******************************************************************/ + +int +task_notify_watch_all (task_notify_func *func, void* func_data, + unsigned int event_mask) +{ + struct task_notifier_block *nb; + + if ((func == NULL) || ((event_mask & ~TASK_NOTIFY_ALL_EVENTS) != 0)) + return -EINVAL; + + nb = get_notifier_block(func, func_data, + event_mask, TASK_NOTIFY_DONT_SLEEP); + if (nb == NULL) + return -ENOMEM; + list_add_rcu(&nb->list, ¬ify_list); + return 0; +} + +int +task_notify_watch_task (task_notify_func *func, void* func_data, + unsigned int event_mask, struct task_struct *task) +{ + struct task_notifier_block *nb; + + if ((func == NULL) || ((event_mask & ~TASK_NOTIFY_ALL_EVENTS) != 0)) + return -EINVAL; + + + nb = get_notifier_block(func, func_data, event_mask, 0); + if (nb == NULL) + return -ENOMEM; + + list_add(&nb->list, &task->task_watchers); + + return 0; +} + +static void +task_notify_free (struct rcu_head *head) +{ + struct task_notifier_block *nb = + container_of(head, struct task_notifier_block, rcu); + put_notifier_block(nb); + return; +} + +void +task_notify_ignore_all (task_notify_func *func) +{ + struct task_notifier_block *nb; + void *private_data; + + /* Do not use the _rcu iterator here */ + list_for_each_entry(nb, ¬ify_list, list) { + if (nb->func != func) + continue; + list_del_rcu(&nb->list); + private_data = nb->private_data; + call_rcu(&nb->rcu, task_notify_free); + + /* Must we restrict from sleeping? */ + nb->func(0, TASK_NOTIFY_DONT_SLEEP, NULL, private_data); + return; + } + return; +} + +void +task_notify_ignore_task (task_notify_func *func, struct task_struct *task) +{ + struct task_notifier_block *nb; + + list_for_each_entry(nb, &task->task_watchers, list) { + if (nb->func != func) + continue; + list_del(&nb->list); + nb->func(0, 0, task, nb->private_data); + put_notifier_block(nb); + return; + } + return; +} + +/* FIXME - possible issues: + * What if osmething disappears from the list? However, as written, we + * seem to not delete from the watcher list but rather set indiviudal + * entries to NULL when they are ignored. So maybe this isn't a worry. + * + * For the data pointer return, it could disappear on us and there is no + * locking here. What to do.... + */ +void * +task_notify_get_data(task_notify_func *func, struct task_struct *task) +{ + struct task_notifier_block *nb; + + list_for_each_entry(nb, &task->task_watchers, list) { + if (nb->func == func) + return(nb->private_data); + } + return NULL; +} + +EXPORT_SYMBOL_GPL(task_notify_watch_all); +EXPORT_SYMBOL_GPL(task_notify_watch_task); +EXPORT_SYMBOL_GPL(task_notify_ignore_all); +EXPORT_SYMBOL_GPL(task_notify_ignore_task); +EXPORT_SYMBOL_GPL(task_notify_get_data); + +void +task_notify_event(unsigned int event, struct task_struct *task) +{ + struct task_notifier_block *nb; + struct task_notifier_block *nbn; + struct list_head *per_task_watcher_list; + + /* + * Call the per-task watchers -- use safe variant in case + * watchers want to remove themselves + */ + + /* if the event is fork, we need to look at the parent's watchers + * since the child isn't even running yet. + * + * FIXME: This special case for FORK needs to be in the README! + * For the case of fork, we return the in-construction-child + * but look at the parent's watcher list and return the parent's + * data. An in-construction-child couldn't have a watcher list + * or data yet. + */ + if (event == TASK_NOTIFY_FORK) { + per_task_watcher_list = &task->parent->task_watchers; + } + else { + per_task_watcher_list = &task->task_watchers; + } + + list_for_each_entry_safe(nb, nbn, per_task_watcher_list, list) + if ((nb->event_mask & event) == event) + nb->func(event, 0, task, nb->private_data); + + /* Call the watchers interested in all tasks */ + rcu_read_lock(); + list_for_each_entry_rcu(nb, ¬ify_list, list) + if ((nb->event_mask & event) == event) + nb->func(event, TASK_NOTIFY_DONT_SLEEP, task, + nb->private_data); + rcu_read_unlock(); + return; +} Index: linux/kernel/exit.c =================================================================== --- linux.orig/kernel/exit.c 2005-11-09 14:27:31.263217821 -0600 +++ linux/kernel/exit.c 2005-11-09 14:27:38.646274604 -0600 @@ -31,6 +31,7 @@ #include <linux/cpuset.h> #include <linux/syscalls.h> #include <linux/signal.h> +#include <linux/task_notify.h> #include <asm/uaccess.h> #include <asm/unistd.h> @@ -866,6 +867,7 @@ module_put(tsk->binfmt->module); tsk->exit_code = code; + task_notify_event(TASK_NOTIFY_EXIT, tsk); exit_notify(tsk); #ifdef CONFIG_NUMA mpol_free(tsk->mempolicy); Index: linux/kernel/fork.c =================================================================== --- linux.orig/kernel/fork.c 2005-11-09 14:27:31.263217821 -0600 +++ linux/kernel/fork.c 2005-11-09 18:47:20.134276182 -0600 @@ -42,6 +42,7 @@ #include <linux/profile.h> #include <linux/rmap.h> #include <linux/acct.h> +#include <linux/task_notify.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> @@ -1038,6 +1039,8 @@ p->pdeath_signal = 0; p->exit_state = 0; + INIT_LIST_HEAD(&p->task_watchers); + /* * Ok, make it visible to the rest of the system. * We dont wake it up yet. @@ -1146,6 +1149,8 @@ __get_cpu_var(process_counts)++; } + task_notify_event(TASK_NOTIFY_FORK, p); + if (!current->signal->tty && p->signal->tty) p->signal->tty = NULL; Index: linux/kernel/Makefile =================================================================== --- linux.orig/kernel/Makefile 2005-11-09 14:27:31.264194287 -0600 +++ linux/kernel/Makefile 2005-11-09 18:47:20.133299716 -0600 @@ -32,6 +32,7 @@ obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_SECCOMP) += seccomp.o +obj-$(CONFIG_TASK_NOTIFY) += task_notify.o ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) # According to Alan Modra <al...@li...>, the -fno-omit-frame-pointer is Index: linux/kernel/sys.c =================================================================== --- linux.orig/kernel/sys.c 2005-11-09 14:27:31.264194287 -0600 +++ linux/kernel/sys.c 2005-11-09 14:27:38.649204000 -0600 @@ -31,6 +31,7 @@ #include <linux/compat.h> #include <linux/syscalls.h> +#include <linux/task_notify.h> #include <asm/uaccess.h> #include <asm/io.h> @@ -583,6 +584,7 @@ current->egid = new_egid; current->gid = new_rgid; key_fsgid_changed(current); + task_notify_event(TASK_NOTIFY_GID, current); return 0; } @@ -622,6 +624,7 @@ return -EPERM; key_fsgid_changed(current); + task_notify_event(TASK_NOTIFY_GID, current); return 0; } @@ -712,6 +715,8 @@ key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); + return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); } @@ -759,6 +764,8 @@ key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); + return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); } @@ -807,6 +814,8 @@ key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); + return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); } @@ -858,6 +867,7 @@ current->sgid = sgid; key_fsgid_changed(current); + task_notify_event(TASK_NOTIFY_GID, current); return 0; } @@ -900,6 +910,7 @@ } key_fsuid_changed(current); + task_notify_event(TASK_NOTIFY_UID, current); security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); @@ -929,6 +940,7 @@ current->fsgid = gid; key_fsgid_changed(current); } + task_notify_event(TASK_NOTIFY_GID, current); return old_fsgid; } Index: linux/include/linux/init_task.h =================================================================== --- linux.orig/include/linux/init_task.h 2005-11-09 14:27:31.262241355 -0600 +++ linux/include/linux/init_task.h 2005-11-09 14:27:38.650180466 -0600 @@ -122,6 +122,7 @@ .journal_info = NULL, \ .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \ .fs_excl = ATOMIC_INIT(0), \ + .task_watchers = LIST_HEAD_INIT(tsk.task_watchers), \ } Index: linux/include/linux/sched.h =================================================================== --- linux.orig/include/linux/sched.h 2005-11-09 14:27:31.262241355 -0600 +++ linux/include/linux/sched.h 2005-11-09 14:27:38.651156932 -0600 @@ -36,6 +36,7 @@ #include <linux/seccomp.h> #include <linux/auxvec.h> /* For AT_VECTOR_SIZE */ +#include <linux/task_notify.h> struct exec_domain; @@ -801,6 +802,9 @@ int cpuset_mems_generation; #endif atomic_t fs_excl; /* holding fs exclusive resources */ +#ifdef CONFIG_TASK_NOTIFY + struct list_head task_watchers; +#endif }; static inline pid_t process_group(struct task_struct *tsk) Index: linux/Documentation/task_notify.txt =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ linux/Documentation/task_notify.txt 2005-11-09 14:27:38.652133397 -0600 @@ -0,0 +1,48 @@ +task_notify is a means for a kernel system or module to attach per-task data +and/or get called at certain important events in the lifetime of a task. + +Events include: + + fork + exec + exit + uid/gid change + per-task data going away (more on this below) + +A module interested in these events must register itself with one of two +task_notify_watch functions by specifying the set of interested events, a +function to be called during the event, and a data pointer to be passed to +the function. While functions may be registered multiple times callers should +be aware that the first registration of the function is removed first. + +Registering with task_notify_watch_all() allows the caller to watch the +specified set of events for all tasks. Note that in this case the data passed +to the function isn't associated with a task. Furthermore, functions registered +for all tasks must not sleep when the flag TASK_NOTIFY_DONT_SLEEP is specified. + +If a caller is interested only in a single task or a small number of tasks, it +may choose to utilize the more effecient task_notify_watch_task() function. The +data passed can be specific to the task, or it can be shared. When this per-task +data pointer is no longer in use the function will be called with no events. + +When an event occurs the specified function and is called with several +parameters: + +events - the set of events that occurred + +flags - special flags indicating the context of the call: + TASK_NOTIFY_DONT_SLEEP tells the function it must not + sleep. This only will only happen if a function is registered + for all tasks and it gets called. + +data - the data pointer passed in at registration time + +If a data pointer is provided, one last invocation of the function will be made +with no events. This indicates that the reference to the data is being dropped. +As with other invocations, functions registered for all tasks must still honor +the TASK_NOTIFY_DONT_SLEEP flag. + +When a caller is no longer interested in being called during a task event it +must use a corresponding task_notify_ignore call by specifying the function +which should ignore events. The first registration of the specified function +will be removed. |
From: Paul E. M. <pa...@us...> - 2005-11-10 02:41:20
|
On Wed, Nov 09, 2005 at 06:53:19PM -0600, Erik Jacobson wrote: > Here is a new version of the patch. This now builds and at least the > task based notifiers are now working. > > A reminder, the "watch all tasks" functionality is written in such a way that > users cannot sleep. This is because it makes use of RCU. An example of > something you cannot do, then, would be any kmalloc/kmem_cache_alloc/etc with > GFP_KERNEL, which can sleep. There was some work on RCU implementations that could tolerate blocking in RCU read-side critical sections a few years ago (e.g., Dipankar's rcu-preempt), but these all had problems with indefinite-duration grace periods. Although I haven't given up on this problem, I don't expect anything useful any time soon. :-( In any case, seems like it would have have a large read-to-update ratio, which would be appropriate for RCU. More comments interspersed, search for empty lines. Thanx, Paul > In my opinion, this limits you too much. However, there has been > pushback on using rwsem locks for protection in that area and I mostly > don't need the global notifier anyway, I need the per-task ones. So > I won't argue on this any longer :) > > I made several adjustments to make Matt's version of the patch come to life. > Matt and I kicked around some ideas in email too, some of which are > implemented here. > > It seems like the task_watcher list never shrinks. It just has parts of > the list pointed at NULL when they aren't in use. > > Some changes from Matt's version of the patch: > > When task_notify_event is called to notify task_notifiers of a fork, > task_notify_event was looking at p's (in-construction-child) watcher list. > That isn't interesting since p isn't running and it would always have an empty > watcher list. I changed task_notify_event to search the parent's watcher > list in the fork case. It also provides the parent's data for the function > call. It might be good to add this to the README if it's kept. > Matt had dome suggestions to clean up what I had done, which I put in to > place too. > > I need a way to get to the notifier block task-associated data outside > of the callout context. In my Job example, this would happen when you want > to know the Job ID of a given process. One could argue that pnotify/PAGG > layered on top could handle this. However, I'm trying to avoid using > pnotify/PAGG altogether. I added a task_notify_get_data function that > returns the data pointer for the function pointer and task matching the > request. However, there are lingering concerns about safety with this > function that I'd like to see discussed. In other words, this function > isn't done yet. > > For my Job example, there needed to be a way for an arbitrary process to > add to the watcher list of another process. Because of pushback, I've > dumbed down Job (removing much of it's functionality) to make use of > this patch here. To bring my Job patch functionality back to 100%, I'd need > locks for the per-task watcher lists. I need to decide if this reduced > version of Job will be sufficient. Indeed, it may be necessary to > have some sort of locking mechanisms for task_notify_get_data anyway. > > I have a version of the Job patch (let me know if you're interested) working > with this. I didn't run regression tests against because many of the tests > rely on features that are missing now :) Basic functionality does seem > to work. The job patch exercises per-task watchers and the FORK and > EXIT events. > > PS: See the FIXMEs for questions and areas where feedback would be nice. > > > Index: linux/fs/exec.c > =================================================================== > --- linux.orig/fs/exec.c 2005-11-09 14:27:31.260288424 -0600 > +++ linux/fs/exec.c 2005-11-09 14:27:38.637486413 -0600 > @@ -48,6 +48,7 @@ > #include <linux/syscalls.h> > #include <linux/rmap.h> > #include <linux/acct.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/mmu_context.h> > @@ -1100,6 +1101,7 @@ > fput(bprm->file); > bprm->file = NULL; > current->did_exec = 1; > + task_notify_event(TASK_NOTIFY_EXEC, current); > return retval; > } > read_lock(&binfmt_lock); > Index: linux/include/linux/task_notify.h > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux/include/linux/task_notify.h 2005-11-09 18:47:39.741702629 -0600 > @@ -0,0 +1,58 @@ > +/* task_notify.h - Task event notification > + * > + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. > + * (C) Chandra Seetharaman, IBM Corp. 2005 > + * (C) Matt Helsley, IBM Corp. 2005 > + * > + * Provides the interface and macros for managing in-kernel task > + * notification. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * 10/7/2005 Jack Steiner <st...@sg...> Original task_notifiers > + * proposal. > + * > + * 11/9/2005 Erik Jacobson <er...@sg...> add task_notify_get_data. > + * Changes to get it building and working. > + * > + */ > + > + > +#ifndef _LINUX_TASK_NOTIFY_H > +#define _LINUX_TASK_NOTIFY_H > + > +typedef __bitwise unsigned int task_notify_event_bitwise; > +#define TASK_NOTIFY_FORK ((__force task_notify_event_bitwise)0x00000001) > +#define TASK_NOTIFY_EXEC ((__force task_notify_event_bitwise)0x00000002) > +#define TASK_NOTIFY_EXIT ((__force task_notify_event_bitwise)0x00000004) > +#define TASK_NOTIFY_DESTROY ((__force task_notify_event_bitwise)0x00000008) > +#define TASK_NOTIFY_UID ((__force task_notify_event_bitwise)0x00000010) > +#define TASK_NOTIFY_GID ((__force task_notify_event_bitwise)0x00000020) > +#define TASK_NOTIFY_ALL_EVENTS ((__force task_notify_event_bitwise)0x0000003f) > + > +typedef __bitwise unsigned int task_notify_flag; > +#define TASK_NOTIFY_DONT_SLEEP ((__force task_notify_flag)0x00000001) > + > +#ifdef CONFIG_TASK_NOTIFY > +typedef void (task_notify_func)(unsigned int events, unsigned int flags, > + struct task_struct * task, void *watcher_data); > + > +extern int task_notify_watch_all (task_notify_func *, void*, unsigned int); > +extern int task_notify_watch_task (task_notify_func *, void*, unsigned int, > + struct task_struct *); > +extern void task_notify_ignore_all (task_notify_func *); > +extern void task_notify_ignore_task (task_notify_func *, struct task_struct *); > +extern void task_notify_event(unsigned int, struct task_struct *); > +extern void * task_notify_get_data(task_notify_func *, struct task_struct *); > + > + > +#else /* !CONFIG_TASK_NOTIFY */ > + > +static inline void task_notify_event(unsigned int i, struct task_struct *t) {} > + > +#endif /* CONFIG_TASK_NOTIFY */ > + > +#endif /* _LINUX_TASK_NOTIFY_H */ > Index: linux/init/Kconfig > =================================================================== > --- linux.orig/init/Kconfig 2005-11-09 14:27:31.262241355 -0600 > +++ linux/init/Kconfig 2005-11-09 18:47:20.132323251 -0600 > @@ -162,6 +162,16 @@ > for processing it. A preliminary version of these tools is available > at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>. > > +config TASK_NOTIFY > + bool "Task Notification" > + depends on EXPERIMENTAL > + help > + Enabling this provides a in kernel notification of task events. > + Modules can register to be notified for specific task events and > + get notified when any task goes thru that event. > + > + Say N if you are unsure. > + > config SYSCTL > bool "Sysctl support" > ---help--- > Index: linux/kernel/task_notify.c > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux/kernel/task_notify.c 2005-11-09 18:47:28.563126040 -0600 > @@ -0,0 +1,245 @@ > +/* task_notify.c - in-kernel task event notification > + * > + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. > + * (C) Chandra Seetharaman, IBM Corp. 2005 > + * (C) Matt Helsley, IBM Corp. 2005 > + * > + * Provides API for task event registration and notification. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + * 10/7/2005 Jack Steiner <st...@sg...> Original task_notifiers > + * proposal. > + * > + * 11/9/2005 Erik Jacobson <er...@sg...> add task_notify_get_data to > + * retrieve task-associated data. Changed task_notify_event > + * to better handle fork events. Several changes and bug fixes to > + * get it building and working. > + * > + */ > + > +#include <linux/config.h> > +#include <linux/module.h> > +#include <linux/list.h> > +#include <linux/rcupdate.h> > +#include <linux/task_notify.h> > + > +static LIST_HEAD(notify_list); > + > +struct task_notifier_block { > + unsigned int event_mask; /* set of events to pass to func */ > + struct list_head list; > + union { > + struct rcu_head rcu; > + void *private_data; /* when not freeing via rcu -- bad idea? */ > + }; > + task_notify_func *func; > +}; > + > +static kmem_cache_t *nb_cache = NULL; > + > +static int __init task_notify_init(void) > +{ > + /* Info on SLAB_DESTROY_BY_RCU : > + * http://www.ussg.iu.edu/hypermail/linux/kernel/0407.1/1108.html */ > + nb_cache = kmem_cache_create("task_notify", > + sizeof(struct task_notifier_block), 0, > + SLAB_NO_REAP | SLAB_DESTROY_BY_RCU, NULL, > + NULL); > + if(!nb_cache) > + return -1; > + else > + return 0; > + > +} > +core_initcall(task_notify_init); > + > +/* FIXME - commented out as it leads to failing to build > + * with 'symbol x: discarded in section...' error > + */ > +//void __exit_call task_notify_exit (void) > +//{ > +// kmem_cache_destroy(nb_cache); > +// nb_cache = NULL; > +//} > + > +/* internal use only */ > +#define TASK_NOTIFY_ALL_FLAGS ((__force task_notify_flag_t)0x0001) > + > +static inline struct task_notifier_block* > +get_notifier_block (task_notify_func *func, void* private_data, > + unsigned int event_mask, task_notify_flag flag) > +{ > + struct task_notifier_block *nb; > + > + nb = kmem_cache_alloc(nb_cache, GFP_KERNEL); > + if (nb == NULL) > + return NULL; > + nb->private_data = private_data; > + nb->func = func; > + nb->event_mask = event_mask; > + > + return nb; > +} > + > +static inline void put_notifier_block (struct task_notifier_block *nb) > +{ > + kmem_cache_free(nb_cache, nb); > +} > + > +/******************************************************************* > + * Event callback invocation > + *******************************************************************/ > + > +int > +task_notify_watch_all (task_notify_func *func, void* func_data, > + unsigned int event_mask) > +{ > + struct task_notifier_block *nb; > + > + if ((func == NULL) || ((event_mask & ~TASK_NOTIFY_ALL_EVENTS) != 0)) > + return -EINVAL; > + > + nb = get_notifier_block(func, func_data, > + event_mask, TASK_NOTIFY_DONT_SLEEP); > + if (nb == NULL) > + return -ENOMEM; > + list_add_rcu(&nb->list, ¬ify_list); > + return 0; > +} How are concurrent task_notify_watch_all() invocations avoided? Seems like this could scramble notify_list. Note that list_add_rcu() protects RCU readers from the update -- it does -not- protect against other list_add_rcu() invocations. The usual approach is to have an update-side lock. If there is some sort of synchronization for notify_list, I am missing it completely. > + > +int > +task_notify_watch_task (task_notify_func *func, void* func_data, > + unsigned int event_mask, struct task_struct *task) > +{ > + struct task_notifier_block *nb; > + > + if ((func == NULL) || ((event_mask & ~TASK_NOTIFY_ALL_EVENTS) != 0)) > + return -EINVAL; > + > + > + nb = get_notifier_block(func, func_data, event_mask, 0); > + if (nb == NULL) > + return -ENOMEM; > + > + list_add(&nb->list, &task->task_watchers); > + > + return 0; > +} Same here -- how are concurrent task_notify_watch_task() invocations avoided? A per-task lock would be one way to protect this. Some of the callers to task_notify_event() hold tasklist_lock, but it would be good to avoid more tasklist_lock... Also, I don't see any synchronization between task_notify_watch_task() and task_notify_event(). Or is it only legal to invoke task_notify_watch_task() from the task to be watched? If so, why can't task_notify_watch_task() drop the "task" argument and just use "current" instead? > + > +static void > +task_notify_free (struct rcu_head *head) > +{ > + struct task_notifier_block *nb = > + container_of(head, struct task_notifier_block, rcu); > + put_notifier_block(nb); > + return; > +} > + > +void > +task_notify_ignore_all (task_notify_func *func) > +{ > + struct task_notifier_block *nb; > + void *private_data; > + > + /* Do not use the _rcu iterator here */ > + list_for_each_entry(nb, ¬ify_list, list) { > + if (nb->func != func) > + continue; > + list_del_rcu(&nb->list); > + private_data = nb->private_data; > + call_rcu(&nb->rcu, task_notify_free); > + > + /* Must we restrict from sleeping? */ > + nb->func(0, TASK_NOTIFY_DONT_SLEEP, NULL, private_data); > + return; > + } > + return; > +} And same concern for concurrent calls to task_notify_ignore_all(). > + > +void > +task_notify_ignore_task (task_notify_func *func, struct task_struct *task) > +{ > + struct task_notifier_block *nb; > + > + list_for_each_entry(nb, &task->task_watchers, list) { > + if (nb->func != func) > + continue; > + list_del(&nb->list); > + nb->func(0, 0, task, nb->private_data); > + put_notifier_block(nb); > + return; > + } > + return; > +} And also for task_notify_ignore_task(). > + > +/* FIXME - possible issues: > + * What if osmething disappears from the list? However, as written, we > + * seem to not delete from the watcher list but rather set indiviudal > + * entries to NULL when they are ignored. So maybe this isn't a worry. > + * > + * For the data pointer return, it could disappear on us and there is no > + * locking here. What to do.... > + */ > +void * > +task_notify_get_data(task_notify_func *func, struct task_struct *task) > +{ > + struct task_notifier_block *nb; > + > + list_for_each_entry(nb, &task->task_watchers, list) { > + if (nb->func == func) > + return(nb->private_data); > + } > + return NULL; > +} > + > +EXPORT_SYMBOL_GPL(task_notify_watch_all); > +EXPORT_SYMBOL_GPL(task_notify_watch_task); > +EXPORT_SYMBOL_GPL(task_notify_ignore_all); > +EXPORT_SYMBOL_GPL(task_notify_ignore_task); > +EXPORT_SYMBOL_GPL(task_notify_get_data); > + > +void > +task_notify_event(unsigned int event, struct task_struct *task) > +{ > + struct task_notifier_block *nb; > + struct task_notifier_block *nbn; > + struct list_head *per_task_watcher_list; > + > + /* > + * Call the per-task watchers -- use safe variant in case > + * watchers want to remove themselves > + */ > + > + /* if the event is fork, we need to look at the parent's watchers > + * since the child isn't even running yet. > + * > + * FIXME: This special case for FORK needs to be in the README! > + * For the case of fork, we return the in-construction-child > + * but look at the parent's watcher list and return the parent's > + * data. An in-construction-child couldn't have a watcher list > + * or data yet. > + */ > + if (event == TASK_NOTIFY_FORK) { > + per_task_watcher_list = &task->parent->task_watchers; > + } > + else { > + per_task_watcher_list = &task->task_watchers; > + } > + > + list_for_each_entry_safe(nb, nbn, per_task_watcher_list, list) > + if ((nb->event_mask & event) == event) > + nb->func(event, 0, task, nb->private_data); The other end of the task_notify_watch_task() reader-updater concurrency. Again, I can't see how this is safe unless it is illegal to invoke task_notify_watch_task() on other than "current". Seems to me that you would need the rcu_read_lock() below to also cover the preceding loop and to use list_for_each_entry_rcu() instead of the list_for_each_entry_safe(). Please enlighten me if I am missing something subtle. Or obvious, for that matter. ;-) > + > + /* Call the watchers interested in all tasks */ > + rcu_read_lock(); > + list_for_each_entry_rcu(nb, ¬ify_list, list) > + if ((nb->event_mask & event) == event) > + nb->func(event, TASK_NOTIFY_DONT_SLEEP, task, > + nb->private_data); > + rcu_read_unlock(); > + return; > +} > Index: linux/kernel/exit.c > =================================================================== > --- linux.orig/kernel/exit.c 2005-11-09 14:27:31.263217821 -0600 > +++ linux/kernel/exit.c 2005-11-09 14:27:38.646274604 -0600 > @@ -31,6 +31,7 @@ > #include <linux/cpuset.h> > #include <linux/syscalls.h> > #include <linux/signal.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/unistd.h> > @@ -866,6 +867,7 @@ > module_put(tsk->binfmt->module); > > tsk->exit_code = code; > + task_notify_event(TASK_NOTIFY_EXIT, tsk); > exit_notify(tsk); > #ifdef CONFIG_NUMA > mpol_free(tsk->mempolicy); > Index: linux/kernel/fork.c > =================================================================== > --- linux.orig/kernel/fork.c 2005-11-09 14:27:31.263217821 -0600 > +++ linux/kernel/fork.c 2005-11-09 18:47:20.134276182 -0600 > @@ -42,6 +42,7 @@ > #include <linux/profile.h> > #include <linux/rmap.h> > #include <linux/acct.h> > +#include <linux/task_notify.h> > > #include <asm/pgtable.h> > #include <asm/pgalloc.h> > @@ -1038,6 +1039,8 @@ > p->pdeath_signal = 0; > p->exit_state = 0; > > + INIT_LIST_HEAD(&p->task_watchers); > + > /* > * Ok, make it visible to the rest of the system. > * We dont wake it up yet. > @@ -1146,6 +1149,8 @@ > __get_cpu_var(process_counts)++; > } > > + task_notify_event(TASK_NOTIFY_FORK, p); > + > if (!current->signal->tty && p->signal->tty) > p->signal->tty = NULL; > > Index: linux/kernel/Makefile > =================================================================== > --- linux.orig/kernel/Makefile 2005-11-09 14:27:31.264194287 -0600 > +++ linux/kernel/Makefile 2005-11-09 18:47:20.133299716 -0600 > @@ -32,6 +32,7 @@ > obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ > obj-$(CONFIG_CRASH_DUMP) += crash_dump.o > obj-$(CONFIG_SECCOMP) += seccomp.o > +obj-$(CONFIG_TASK_NOTIFY) += task_notify.o > > ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) > # According to Alan Modra <al...@li...>, the -fno-omit-frame-pointer is > Index: linux/kernel/sys.c > =================================================================== > --- linux.orig/kernel/sys.c 2005-11-09 14:27:31.264194287 -0600 > +++ linux/kernel/sys.c 2005-11-09 14:27:38.649204000 -0600 > @@ -31,6 +31,7 @@ > > #include <linux/compat.h> > #include <linux/syscalls.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/io.h> > @@ -583,6 +584,7 @@ > current->egid = new_egid; > current->gid = new_rgid; > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > @@ -622,6 +624,7 @@ > return -EPERM; > > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > @@ -712,6 +715,8 @@ > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); > } > > @@ -759,6 +764,8 @@ > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); > } > > @@ -807,6 +814,8 @@ > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); > } > > @@ -858,6 +867,7 @@ > current->sgid = sgid; > > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > @@ -900,6 +910,7 @@ > } > > key_fsuid_changed(current); > + task_notify_event(TASK_NOTIFY_UID, current); > > security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); > > @@ -929,6 +940,7 @@ > current->fsgid = gid; > key_fsgid_changed(current); > } > + task_notify_event(TASK_NOTIFY_GID, current); > return old_fsgid; > } > > Index: linux/include/linux/init_task.h > =================================================================== > --- linux.orig/include/linux/init_task.h 2005-11-09 14:27:31.262241355 -0600 > +++ linux/include/linux/init_task.h 2005-11-09 14:27:38.650180466 -0600 > @@ -122,6 +122,7 @@ > .journal_info = NULL, \ > .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \ > .fs_excl = ATOMIC_INIT(0), \ > + .task_watchers = LIST_HEAD_INIT(tsk.task_watchers), \ Doesn't the above break if you are running without CONFIG_TASK_NOTIFY? If so, one way to fix would be to define a macro that expands either to the above line if CONFIG_TASK_NOTIFY, or to nothingness otherwise. There might be a better way, but that is what comes immediately to mind. > } > > > Index: linux/include/linux/sched.h > =================================================================== > --- linux.orig/include/linux/sched.h 2005-11-09 14:27:31.262241355 -0600 > +++ linux/include/linux/sched.h 2005-11-09 14:27:38.651156932 -0600 > @@ -36,6 +36,7 @@ > #include <linux/seccomp.h> > > #include <linux/auxvec.h> /* For AT_VECTOR_SIZE */ > +#include <linux/task_notify.h> > > struct exec_domain; > > @@ -801,6 +802,9 @@ > int cpuset_mems_generation; > #endif > atomic_t fs_excl; /* holding fs exclusive resources */ > +#ifdef CONFIG_TASK_NOTIFY > + struct list_head task_watchers; > +#endif > }; > > static inline pid_t process_group(struct task_struct *tsk) > Index: linux/Documentation/task_notify.txt > =================================================================== > --- /dev/null 1970-01-01 00:00:00.000000000 +0000 > +++ linux/Documentation/task_notify.txt 2005-11-09 14:27:38.652133397 -0600 > @@ -0,0 +1,48 @@ > +task_notify is a means for a kernel system or module to attach per-task data > +and/or get called at certain important events in the lifetime of a task. > + > +Events include: > + > + fork > + exec > + exit > + uid/gid change > + per-task data going away (more on this below) > + > +A module interested in these events must register itself with one of two > +task_notify_watch functions by specifying the set of interested events, a > +function to be called during the event, and a data pointer to be passed to > +the function. While functions may be registered multiple times callers should > +be aware that the first registration of the function is removed first. > + > +Registering with task_notify_watch_all() allows the caller to watch the > +specified set of events for all tasks. Note that in this case the data passed > +to the function isn't associated with a task. Furthermore, functions registered > +for all tasks must not sleep when the flag TASK_NOTIFY_DONT_SLEEP is specified. > + > +If a caller is interested only in a single task or a small number of tasks, it > +may choose to utilize the more effecient task_notify_watch_task() function. The > +data passed can be specific to the task, or it can be shared. When this per-task > +data pointer is no longer in use the function will be called with no events. > + > +When an event occurs the specified function and is called with several > +parameters: > + > +events - the set of events that occurred > + > +flags - special flags indicating the context of the call: > + TASK_NOTIFY_DONT_SLEEP tells the function it must not > + sleep. This only will only happen if a function is registered > + for all tasks and it gets called. > + > +data - the data pointer passed in at registration time > + > +If a data pointer is provided, one last invocation of the function will be made > +with no events. This indicates that the reference to the data is being dropped. > +As with other invocations, functions registered for all tasks must still honor > +the TASK_NOTIFY_DONT_SLEEP flag. > + > +When a caller is no longer interested in being called during a task event it > +must use a corresponding task_notify_ignore call by specifying the function > +which should ignore events. The first registration of the specified function > +will be removed. > |
From: Chandra S. <sek...@us...> - 2005-11-10 22:01:05
|
Eric, Why can't PAGG put its functionality over a simplified task_notifier as I pointed here: http://marc.theaimsgroup.com/?l=lse- tech&m=113053026032200&w=2 IMHO, Adding the task specific watchers list crowds what otherwise is a simple implementation. Also it doesn't allow "task notifier" to fit nicely on top the basic notifier chain functionality (when it is made robust - please look at the discussion in lkml: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2) regards, chandra On Wed, 2005-11-09 at 18:53 -0600, Erik Jacobson wrote: > Here is a new version of the patch. This now builds and at least the > task based notifiers are now working. > > A reminder, the "watch all tasks" functionality is written in such a way that > users cannot sleep. This is because it makes use of RCU. An example of > something you cannot do, then, would be any kmalloc/kmem_cache_alloc/etc with > GFP_KERNEL, which can sleep. > > In my opinion, this limits you too much. However, there has been > pushback on using rwsem locks for protection in that area and I mostly > don't need the global notifier anyway, I need the per-task ones. So > I won't argue on this any longer :) > > I made several adjustments to make Matt's version of the patch come to life. > Matt and I kicked around some ideas in email too, some of which are > implemented here. > > It seems like the task_watcher list never shrinks. It just has parts of > the list pointed at NULL when they aren't in use. > > Some changes from Matt's version of the patch: > > When task_notify_event is called to notify task_notifiers of a fork, > task_notify_event was looking at p's (in-construction-child) watcher list. > That isn't interesting since p isn't running and it would always have an empty > watcher list. I changed task_notify_event to search the parent's watcher > list in the fork case. It also provides the parent's data for the function > call. It might be good to add this to the README if it's kept. > Matt had dome suggestions to clean up what I had done, which I put in to > place too. > > I need a way to get to the notifier block task-associated data outside > of the callout context. In my Job example, this would happen when you want > to know the Job ID of a given process. One could argue that pnotify/PAGG > layered on top could handle this. However, I'm trying to avoid using > pnotify/PAGG altogether. I added a task_notify_get_data function that > returns the data pointer for the function pointer and task matching the > request. However, there are lingering concerns about safety with this > function that I'd like to see discussed. In other words, this function > isn't done yet. > > For my Job example, there needed to be a way for an arbitrary process to > add to the watcher list of another process. Because of pushback, I've > dumbed down Job (removing much of it's functionality) to make use of > this patch here. To bring my Job patch functionality back to 100%, I'd need > locks for the per-task watcher lists. I need to decide if this reduced > version of Job will be sufficient. Indeed, it may be necessary to > have some sort of locking mechanisms for task_notify_get_data anyway. > > I have a version of the Job patch (let me know if you're interested) working > with this. I didn't run regression tests against because many of the tests > rely on features that are missing now :) Basic functionality does seem > to work. The job patch exercises per-task watchers and the FORK and > EXIT events. > > PS: See the FIXMEs for questions and areas where feedback would be nice. > -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Erik J. <er...@sg...> - 2005-11-10 22:31:35
|
I thought there were a couple issues with your changes here: http://marc.theaimsgroup.com/?l=lse-tech&m=112908542104457&w=2 One is that the notification function pointers are called within rcu_read_lock. This means we cannot sleep... Including not being able to allocate memory in normal ways or using rwsems in the subscriber kernel modules. Another issue, as I recall, was that some felt the global nature of that was too costly. This is perhaps why it's using RCU at the moment. One might argue that it's a good fit for RCU, but I feel there is a lot of writing going on if you dig in to what is going on with typical users. Besides, as I said, it makes it so you can't sleep. Finally, I'd love for PAGG/pnotify to make it, even layered on something. But it's been pretty beat up and I'm nearly resigned to the fact that it won't make it in any form. At this point I'm looking for alternatives that fit my needs. Matt's version of the patch almost fits the bill for me, but is currently (even in my last posting) missing protections. I fear the protections will need to be rwsems protecting the task watcher list, but using semaphores seems to scare people and some felt the beauty of the original task notifier implementation from Jack was that it was lock-less. Unfortunately, even if I dumb down my subscriber modules to not need to add to watcher lists of other 'random' tasks (so only current can update), I still need to retrieve task specific data... and that operation needs protection anyway as far as I can tell (and as Paul pointed out). I've been considering adding protections to the version I sent last night in the form of rwsems protecting the watcher list that have to be held when accessing the list or task data. But we've been having trouble agreeing on one approach so I'm not sure which path to follow for my next experiment :) I'm all ears though and I do have some time to poke around with ideas if we can agree on something to try next. Should I try something closer to http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 but using rw semaphores instead of RCU? I could try to layer pnotify in that case, as long as sleeping is ok. However, if we feel this would be a futile exercise, I'd prefer not to waste the time :) > Why can't PAGG put its functionality over a simplified task_notifier as > I pointed here: http://marc.theaimsgroup.com/?l=lse- > tech&m=113053026032200&w=2 > > IMHO, Adding the task specific watchers list crowds what otherwise is a > simple implementation. Also it doesn't allow "task notifier" to fit > nicely on top the basic notifier chain functionality (when it is made > robust - please look at the discussion in lkml: > http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2) > > regards, > > chandra > > On Wed, 2005-11-09 at 18:53 -0600, Erik Jacobson wrote: > > Here is a new version of the patch. This now builds and at least the > > task based notifiers are now working. > > > > A reminder, the "watch all tasks" functionality is written in such a way that > > users cannot sleep. This is because it makes use of RCU. An example of > > something you cannot do, then, would be any kmalloc/kmem_cache_alloc/etc with > > GFP_KERNEL, which can sleep. > > > > In my opinion, this limits you too much. However, there has been > > pushback on using rwsem locks for protection in that area and I mostly > > don't need the global notifier anyway, I need the per-task ones. So > > I won't argue on this any longer :) > > > > I made several adjustments to make Matt's version of the patch come to life. > > Matt and I kicked around some ideas in email too, some of which are > > implemented here. > > > > It seems like the task_watcher list never shrinks. It just has parts of > > the list pointed at NULL when they aren't in use. > > > > Some changes from Matt's version of the patch: > > > > When task_notify_event is called to notify task_notifiers of a fork, > > task_notify_event was looking at p's (in-construction-child) watcher list. > > That isn't interesting since p isn't running and it would always have an empty > > watcher list. I changed task_notify_event to search the parent's watcher > > list in the fork case. It also provides the parent's data for the function > > call. It might be good to add this to the README if it's kept. > > Matt had dome suggestions to clean up what I had done, which I put in to > > place too. > > > > I need a way to get to the notifier block task-associated data outside > > of the callout context. In my Job example, this would happen when you want > > to know the Job ID of a given process. One could argue that pnotify/PAGG > > layered on top could handle this. However, I'm trying to avoid using > > pnotify/PAGG altogether. I added a task_notify_get_data function that > > returns the data pointer for the function pointer and task matching the > > request. However, there are lingering concerns about safety with this > > function that I'd like to see discussed. In other words, this function > > isn't done yet. > > > > For my Job example, there needed to be a way for an arbitrary process to > > add to the watcher list of another process. Because of pushback, I've > > dumbed down Job (removing much of it's functionality) to make use of > > this patch here. To bring my Job patch functionality back to 100%, I'd need > > locks for the per-task watcher lists. I need to decide if this reduced > > version of Job will be sufficient. Indeed, it may be necessary to > > have some sort of locking mechanisms for task_notify_get_data anyway. > > > > I have a version of the Job patch (let me know if you're interested) working > > with this. I didn't run regression tests against because many of the tests > > rely on features that are missing now :) Basic functionality does seem > > to work. The job patch exercises per-task watchers and the FORK and > > EXIT events. > > > > PS: See the FIXMEs for questions and areas where feedback would be nice. > > > > -- > > ---------------------------------------------------------------------- > Chandra Seetharaman | Be careful what you choose.... > - sek...@us... | .......you may get it. > ---------------------------------------------------------------------- > > > > > ------------------------------------------------------- > SF.Net email is sponsored by: > Tame your development challenges with Apache's Geronimo App Server. Download > it for free - -and be entered to win a 42" plasma tv or your very own > Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota |
From: Chandra S. <sek...@us...> - 2005-11-10 23:34:21
|
On Thu, 2005-11-10 at 16:31 -0600, Erik Jacobson wrote: Thanks for the fast reply Eric. > I thought there were a couple issues with your changes here: > http://marc.theaimsgroup.com/?l=lse-tech&m=112908542104457&w=2 > > One is that the notification function pointers are called within > rcu_read_lock. This means we cannot sleep... Including not being able to > allocate memory in normal ways or using rwsems in the subscriber > kernel modules. > > Another issue, as I recall, was that some felt the global nature of that > was too costly. This is perhaps why it's using RCU at the moment. One Of the three users that are being discussed in this thread (PAGG, Process Event Connector and CKRM), two of them need global callout. Only PAGG needs per task callout. > might argue that it's a good fit for RCU, but I feel there is a lot of > writing going on if you dig in to what is going on with typical users. > Besides, as I said, it makes it so you can't sleep. Good to know that these are your main concerns. If you follow the thread on notify_chain in lkml you can see that we came to a point where we concluded that we need to have two types of notifier chains: - call_chain called in process context (i.e allowed to block) - call_chain called in interrupt context (not allowed to block). Alan Stern and I are working together to get that implemented that way. Once that interface is robust and provide both types (and is acceptable to the community) then we can lay the simple task notifiers of both types over that. > > Finally, I'd love for PAGG/pnotify to make it, even layered on something. > But it's been pretty beat up and I'm nearly resigned to the fact that it > won't make it in any form. At this point I'm looking for alternatives > that fit my needs. Matt's version of the patch almost fits the bill > for me, but is currently (even in my last posting) missing protections. > > I fear the protections will need to be rwsems protecting the task watcher > list, but using semaphores seems to scare people and some felt the beauty of > the original task notifier implementation from Jack was that it was > lock-less. Unfortunately, even if I dumb down my subscriber modules to > not need to add to watcher lists of other 'random' tasks (so only current > can update), I still need to retrieve task specific data... and that > operation needs protection anyway as far as I can tell (and as Paul pointed > out). > > I've been considering adding protections to the version I sent last night > in the form of rwsems protecting the watcher list that have to be held > when accessing the list or task data. But we've been having trouble > agreeing on one approach so I'm not sure which path to follow for my > next experiment :) > > I'm all ears though and I do have some time to poke around with ideas if we > can agree on something to try next. Should I try something closer to > http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 > but using rw semaphores instead of RCU? I could try to layer pnotify in > that case, as long as sleeping is ok. However, if we feel this would be > a futile exercise, I'd prefer not to waste the time :) It gives a big sigh of relief to me that you are ok with the notifier chain approach. Will stabilize the notify chain infrastructure, then you can write up the simple task notify on top of that. Thanks again for your time and comments, chandra > <snip> -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |