From: <jt...@us...> - 2005-11-11 14:11:27
|
>>>>> "Chandra" == Chandra Seetharaman <sek...@us...> writes: Chandra> Of the three users that are being discussed in this thread Chandra> (PAGG, Process Event Connector and CKRM), two of them need Chandra> global callout. Only PAGG needs per task callout. Hi Chandra, I'm the tech lead for ClearCase MVFS, and we've been talking with Ram Pai about our need for MVFS-specific private process state. We need 100% reliable inheritance of our private per-process state to forked processes. We're happy to do it ourselves in private data (e.g. private hash table) if we can get callbacks on fork/exit. Or, if we had some other general mechanism to attach private data to a process, and get called on fork and exit, we'd be glad to hang it off of 'struct task' somewhere. We don't need to track every process in the system, only those that have called ClearCase specifically (and their descendents). We will need the ability to block (allocate memory, etc.) during any callbacks. So we'd like to add our voice to the desire for in-kernel process callbacks. -- John Kohl Senior Software Engineer Rational Software IBM Software Group Lexington, Massachusetts, USA jt...@us... The opinions expressed in this message do not reflect the views of my employer. |
From: Chandra S. <sek...@us...> - 2005-11-11 22:13:32
|
On Fri, 2005-11-11 at 09:11 -0500, John T. Kohl wrote: > >>>>> "Chandra" == Chandra Seetharaman <sek...@us...> writes: > > Chandra> Of the three users that are being discussed in this thread > Chandra> (PAGG, Process Event Connector and CKRM), two of them need > Chandra> global callout. Only PAGG needs per task callout. > > Hi Chandra, > > I'm the tech lead for ClearCase MVFS, and we've been talking with Ram > Pai about our need for MVFS-specific private process state. We need > 100% reliable inheritance of our private per-process state to forked > processes. We're happy to do it ourselves in private data (e.g. private > hash table) if we can get callbacks on fork/exit. Or, if we had some Good to know that there is one more user for this interface. It will have more value if we have more users. Any more users of this interface out there ? :) Besides new users, IMO it can be used in some of the existing usages too, thereby removing some existing code. > other general mechanism to attach private data to a process, and get > called on fork and exit, we'd be glad to hang it off of 'struct task' > somewhere. Agreed, some of us do need to have some task specific information (CKRM also needs). But, for the interface to be simple, i think is is a better idea to keep it outside of this basic infrastructure. > > We don't need to track every process in the system, only those that have > called ClearCase specifically (and their descendents). We will need the > ability to block (allocate memory, etc.) during any callbacks. > > So we'd like to add our voice to the desire for in-kernel process > callbacks. > -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Kingsley C. <kin...@au...> - 2005-11-14 01:29:24
|
On Fri, Nov 11, 2005 at 02:13:17PM -0800, Chandra Seetharaman wrote: > On Fri, 2005-11-11 at 09:11 -0500, John T. Kohl wrote: > > >>>>> "Chandra" == Chandra Seetharaman <sek...@us...> writes: > > > > Chandra> Of the three users that are being discussed in this thread > > Chandra> (PAGG, Process Event Connector and CKRM), two of them need > > Chandra> global callout. Only PAGG needs per task callout. > > > > Hi Chandra, > > > > I'm the tech lead for ClearCase MVFS, and we've been talking with Ram > > Pai about our need for MVFS-specific private process state. We need > > 100% reliable inheritance of our private per-process state to forked > > processes. We're happy to do it ourselves in private data (e.g. private > > hash table) if we can get callbacks on fork/exit. Or, if we had some > > Good to know that there is one more user for this interface. It will > have more value if we have more users. > > Any more users of this interface out there ? :) Of course ;) We at Aurema need maintain per process data in our pEvnts kernel modules. Currently we are using PAGG's fork, exec, and exit callbacks in a 'global' sense. We would certainly like to see a common interface that continues to support these callbacks. Having uid and gid callbacks supported would be useful to us too. > Besides new users, IMO it can be used in some of the existing usages > too, thereby removing some existing code. > > > other general mechanism to attach private data to a process, and get > > called on fork and exit, we'd be glad to hang it off of 'struct task' > > somewhere. > > Agreed, some of us do need to have some task specific information (CKRM > also needs). But, for the interface to be simple, i think is is a better > idea to keep it outside of this basic infrastructure. Agreed. > > We don't need to track every process in the system, only those that have > > called ClearCase specifically (and their descendents). We will need the > > ability to block (allocate memory, etc.) during any callbacks. Likewise for us. We require being able to block during these callbacks. > > So we'd like to add our voice to the desire for in-kernel process > > callbacks. -- Kingsley |
From: Erik J. <er...@sg...> - 2005-10-28 14:00:16
|
> Are you saying that if our callouts need to block we should be using > the per-task list callouts? I'd like to see some benchmarks to understand the true cost of having an rwsem protecting the global callout so user functions can sleep there too. Not being able to sleep places great restrictions on what can be done. I agree it's nice that the per-task notifiers don't have this restriction, but it would be better if it wasn't there at all. I had done some benchmarking with pnotify, but those weren't global. I've had this feeling in my head that people are being extra-cautious about costly semaphores, which is good. But if we can prove it's nearly free and would allow flexibility and simplicity for users, I think that would be better. We'd still need to keep a watchful eye on any kernel users because it isn't free any more if the users take forever at that callout. One issue I have with RCU is that, while I understand the basics, I sometimes miss a subtle issue I didn't think about. That leads to bugs if nobody else catches them. If an rwsem is nearly free, the ease of use and readability are a big win as far as I'm concerned. If it's costly in terms of performance, I wouldn't be in favor of it :) I can help benchmark this if we can agree on test cases and tools for the benchmark. Probably we'll find it's costly, and then I'll be quiet :) -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota |
From: Paul E. M. <pa...@us...> - 2005-10-28 15:53:20
|
On Fri, Oct 28, 2005 at 09:00:04AM -0500, Erik Jacobson wrote: > One issue I have with RCU is that, while I understand the basics, I sometimes > miss a subtle issue I didn't think about. That leads to bugs if nobody > else catches them. If an rwsem is nearly free, the ease of use and > readability are a big win as far as I'm concerned. If simple locking does the job, then use simple locking. Even I can agree with that one. ;-) That said, please do let me know what subtleties of RCU are causing trouble. I have been using RCU for way too long to have very clear memories of learning it, and besides, I learned it incrementally over a very long time, making much of it up as I went along. So I really do need your help in pointing out the rough edges in RCU's API, implementation, and documentation. Some of the rough edges might well be inherent in the concept of RCU, much as deadlock is inherent in locking, but then again, until Manfred pointed out the possibility of burying memory barriers in the list macros, I used to think that explicit use of memory barriers was inherent to RCU. So, please don't keep troublesome subtle RCU-related issues a secret! Thanx, Paul |
From: Erik J. <er...@sg...> - 2005-10-28 15:15:35
|
I guess the other thing missing, from my perspective, is the ability for any process to be able to add to the watcher list of any task. In Linux Job, this is when you say "I would like that certain process to be part of a certain job container". This might involve locking on the watcher list. As stated previously, Jack Steiner doesn't agree with me here but it really is a need for the current Job without removing functionality which may be in use. It seems like (unless I'm reading wrong, which may be the case :) the watcher list in Matt's example is only added too. When a module decides it doesn't want to watch a task any more, it ignores it. That doesn't take anything out of the watcher list, it just makes it so an entry in the list points to NULL, right? Is there a safe way to add to the end of the watcher list without locks? Would locks be an option if we could show they don't cost much? Like I said earlier in the discussions, I'm certainly not interested in making Linux perform poorly and that issue takes precedent over anything else. Erik |
From: Paul E. M. <pa...@us...> - 2005-10-28 17:11:44
|
On Fri, Oct 28, 2005 at 10:15:17AM -0500, Erik Jacobson wrote: > I guess the other thing missing, from my perspective, is the ability for > any process to be able to add to the watcher list of any task. > > In Linux Job, this is when you say "I would like that certain process to be > part of a certain job container". > > This might involve locking on the watcher list. As stated previously, Jack > Steiner doesn't agree with me here but it really is a need for the current > Job without removing functionality which may be in use. > > It seems like (unless I'm reading wrong, which may be the case :) the watcher > list in Matt's example is only added too. When a module decides it > doesn't want to watch a task any more, it ignores it. That doesn't take > anything out of the watcher list, it just makes it so an entry in the list > points to NULL, right? > > Is there a safe way to add to the end of the watcher list without locks? Adding is easy. It is removing that is difficult, since you have to wait until readers are done before freeing the removed element. To add, something like the following will work: /* acquire some mutex to serialize updaters */ /* fill out structure. */ list_add_tail_rcu(p, &list); /* release the mutex. */ Traversal could be something like the following: list_for_each_entry(p, &list, member) do_stuff(rcu_dereference(p)); This does constitute minor abuse of these RCU interfaces, but it does work as long as one -never- deletes anything from the list. Is this what you had in mind? > Would locks be an option if we could show they don't cost much? Like I > said earlier in the discussions, I'm certainly not interested in making > Linux perform poorly and that issue takes precedent over anything else. If locks work well in a given situation, use locks! ;-) Thanx, Paul |
From: Erik J. <er...@sg...> - 2005-10-28 18:39:30
|
> This does constitute minor abuse of these RCU interfaces, but it does > work as long as one -never- deletes anything from the list. > Is this what you had in mind? Yes, thank you. :) I'm not sure it's "correct" for the watcher list to only grow and never clean up, but maybe I missed the code that does that... Or maybe we assume the number of users isn't big enough to really ever be a 'memory leak'. I suppose if you had 5 kernel modules using the task based notifiers, you'd at most only have 5 watchers... > If locks work well in a given situation, use locks! ;-) I'm happy to run tests when we have a working version of Matt's example patch going. I'm prepared to build pnotify on top of it as a test and see how the things that depend on pnotify work with it. I guess I'll need to come up with a pnotify kernel module that does some simple stuff that doesn't _require_ the locking and compare timings with it with locks vs no locks. I'm prepared to do that too. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota |
From: Chandra S. <sek...@us...> - 2005-10-28 20:09:29
|
On Fri, 2005-10-28 at 10:15 -0500, Erik Jacobson wrote: Hi Eric, > I guess the other thing missing, from my perspective, is the ability for > any process to be able to add to the watcher list of any task. Yes, I agree. That problem can be taken care of by using the proposal I sent on this thread earlier (http://marc.theaimsgroup.com/?l=lse- tech&m=112924564123500&w=2 ). Basically by overlaying the pnotify functionality on top of the "basic task notifier" I posted (http://marc.theaimsgroup.com/?l=lse-tech&m=112908542104457&w=2). The logic i suggest is: - task notifier exists as a base infrastructure in kernel - PAGG adds necessary fields to the task data structure to maintain its watchers list - PAGG registers a _single_ global callout function with task notifier - PAGG defines and uses its per task register/unregister functions - PAGG updates/manages task's watchers list in these register/unregister functions. - PAGG gets notified by task notifier at specified task events through the registered callout - In the callout, PAGG decides what to do depending on the watcher list information in that task Let me know if there is something i missed. There is some discussion going on in lkml (http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2) to make sure the exiting call chain notification mechanism robust. Once it is robust, we can lay the task notifier functionality on top of it with even less code. regards, chandra > > In Linux Job, this is when you say "I would like that certain process to be > part of a certain job container". > > This might involve locking on the watcher list. As stated previously, Jack > Steiner doesn't agree with me here but it really is a need for the current > Job without removing functionality which may be in use. > > It seems like (unless I'm reading wrong, which may be the case :) the watcher > list in Matt's example is only added too. When a module decides it > doesn't want to watch a task any more, it ignores it. That doesn't take > anything out of the watcher list, it just makes it so an entry in the list > points to NULL, right? > > Is there a safe way to add to the end of the watcher list without locks? > > Would locks be an option if we could show they don't cost much? Like I > said earlier in the discussions, I'm certainly not interested in making > Linux perform poorly and that issue takes precedent over anything else. > > Erik > -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Jack S. <st...@sg...> - 2005-10-13 14:54:29
|
On Wed, Oct 12, 2005 at 04:53:25PM -0700, Chandra Seetharaman wrote: > On Wed, 2005-10-12 at 14:31 -0500, Jack Steiner wrote: > > On Tue, Oct 11, 2005 at 07:49:21PM -0700, Chandra Seetharaman wrote: > > > On Tue, 2005-10-11 at 11:47 -0700, Chandra Seetharaman wrote: > > > > > > > I will write up some code and post it today. > > > > > > The patch looks very similar to the original task_notifier patch but has some > > yup... That is why i was trying to get to a small subset (instead of > having multiple callouts from fork()/exec() etc., to satisfy for each of > our needs) > > > significant changes that concern me. These changes make the patch much less > > useful for our needs, (although I'm sure the changes make it better suited for > > your needs :-) > > CKRM needs are very simple. It just needs global notifiers that is > called on every task. I could use your task notifier, but it is heavy > weight for our usage. > > I would guess that is the case with the Process Event connector > (formerly fork/exit connectors) too. > > > > > > The changes that concern me are: > > > > 1) Task notifiers are registered in a global list instead of a per-task > > list. This makes sense if most tasks require callouts, but for > > our usage, only a small percentage of fork/exec/exits actually > > need a callout. Scanning the notifier list and calling > > each function on every fork/exit/exec is unnecessary overhead > > if it is not needed. On the otherhand, if all tasks needed > > the callouts (your case, I think), I can see the advantage of a global list. > > Yes, that is correct. > > > > 2) There is no easy way to associate any per-task data with a task. The > > original task_notifier mechanism registered a notifier block for > > each task. Typically, the notifier block would be embedded in a larger > > container that contains the additional per-task data. Locating the > > task data is nothing more than a typecast (embedded in macros, of course). > > > > Nothing prevents the notifier function itself from managing per-task state > > in additional data structures but it is not straightforward. Although > > it may be able to be done without locks (not sure), there are > > frequently 100's of tasks per notifier function that require task data. > > Managing the data in a single linked list is not practical. An array, hash > > or tree structure is probably needed. Certainly doable, but not trivial > > without adding data to the task_struct. > > I agree that PAGG (for that matter CKRM also) need to add some field in > the task data structure. In your implementaion it was a void pointer in > the task data structure. Can't you keep the same and use it ? Partially true. In my implementation, the void * pointer in the task_struct is actually a pointer to a task_notifier_block that describes a specific callback that is registered for the task. If no functions require per-task data, then there will be a linked list of task_notifier_block hanging off of the task_struct. If a function need per-task data, then it's task_notifier_block will be embedded within a larger container. The function can use container_of() to locate the per-task data. > > > > > 3) The notifier functions are called from within a rcu_read_lock()/ > > rcu_read_unlock() block. Doesn't this mean that the notifier function > > cannot sleep or do anything that has the potential to sleep? > > That would be a significant limitation. > > oh yeah... i did not think about that. Need to think about a different > way to do it. > > > > > > > > > Let me explain our typical usage of PAGG/pnotifier/task_callouts. Maybe this > > will help explain some of the issues and tradeoffs that concern me. (I know > > that your requirements are different). > > > > > > We currently have 4 users of the PAGG/pnotifier/task_callout mechanism. That > > means that at all times there would be 4 notifier functions registered in a > > global callout list. Of the 4 functions, 3 are infrequently used although all 4 > > are usually in use by at least 1 task. At any point in time, _most_ tasks > > typically require no notifiers. > > > > A typical system has 1000's or 10000's of tasks. At any point in time, most > > tasks have no need for any notifiers. There is roughly no more than 1 task per > > cpu that is using a notifier function although the number may be more or less > > than this (our big systems have 512p). > > > > Of the 4 users of the PAGG/pnotifier/task_callout mechanism, 3 require per-task > > data to manage additional task state. Requiring 4 separate functions to > > scan for per-task state when typically none exists is unnecessary overhead. > > > > For our typical usage, statistically most fork/exec/exits do not need a callout. > > Unless I'm overlooking something, using the newer mechanism would require the > > following at every fork/exec/exit: > > > > - scan the global notifier list > > - call 4 notifier functions > > - each notifier function scans it's own task-state data to see if the > > task needs additional work done as part of the callout > > - no task data found because callout was not needed > > - callout was a waste of time > > > > With the original task_notifier mechanism, the notifier_block pointer in most > > task_structs is NULL. The notifier mechanism is skipped and incurs almost zero > > overhead. > > Correct me if I am grossing wrong in the following usage model for PAGG: > > You could maintain your notifiers in the task structure ( same as your > current implementation ), register a single global function and do your > callouts in that global function depending on what is there in the given > task data structure. Can you be more specific. I don't think I understand. If you maintain notifiers in the task_struct, wouldn't the notifier contain the function pointer as well. Or are you proposing to keep only a "hint" mask in the task_struct. That would partially solve the overhead problem but doesnt help with per-task data. > > >From the code point of view, you would register your > _task_notifier_call_chain() (in your patch) with task_notify_register() > (in my patch). > > So, the overhead for tasks with no notifiers would be calling your > registered global function. Both PAGG and the task_notifier mechanism work about the same way. All notifier functions are per-task. There is no global callout list that is maintained & used at fork/exit/etc. If a task does not require a notifier function to be called at fork/exit/etc, it's task_notifier_block pointer in the task_struct is NULL. The inline code in fork/exit/etc detects the NULL & skips the out-of-line call to the function that performs the notifier callouts. > > > > > > > I want to reiterate an earlier point. Although our requirements appear to very > > similar, it may be a mistake to try to create a _single_ mechanism that covers > > both cases. Maybe a better approach is to design two similar mechanisms, each > > optimized for a particular purpose. The 2 mechanisms should have similar > > APIs and could possibly share some code. > > > > > > > > -- > > ---------------------------------------------------------------------- > Chandra Seetharaman | Be careful what you choose.... > - sek...@us... | .......you may get it. > ---------------------------------------------------------------------- > -- Thanks Jack Steiner (st...@sg...) 651-683-5302 Principal Engineer SGI - Silicon Graphics, Inc. |
From: Chandra S. <sek...@us...> - 2005-10-13 23:20:14
|
On Thu, 2005-10-13 at 09:54 -0500, Jack Steiner wrote: > > > You could maintain your notifiers in the task structure ( same as your > > current implementation ), register a single global function and do your > > callouts in that global function depending on what is there in the given > > task data structure. > > Can you be more specific. I don't think I understand. If you maintain notifiers > in the task_struct, wouldn't the notifier contain the function pointer as well. > Or are you proposing to keep only a "hint" mask in the task_struct. That would > partially solve the overhead problem but doesnt help with per-task data. Ok, i changed the code you posted to work on top of the task_notify patch i posted (as per my suggestion above). Let me know if it makes sense. note: since both code uses task*, i changed your code to be jtask*. > > > > > > >From the code point of view, you would register your > > _task_notifier_call_chain() (in your patch) with task_notify_register() > > (in my patch). > > > > So, the overhead for tasks with no notifiers would be calling your > > registered global function. in this code you can see that the additional overhead for tasks that have no notifiers attached is the calling of the function _jtask_notifier_call_chain(). > > Both PAGG and the task_notifier mechanism work about the same way. All notifier > functions are per-task. There is no global callout list that is maintained & > used at fork/exit/etc. > > If a task does not require a notifier function to be called at fork/exit/etc, > it's task_notifier_block pointer in the task_struct is NULL. The inline code in > fork/exit/etc detects the NULL & skips the out-of-line call to the function that > performs the notifier callouts. > include/linux/init_task.h | 1 include/linux/sched.h | 1 include/linux/task_notifier.h | 41 +++++++++++++ kernel/Makefile | 2 kernel/task_notifier.c | 128 ++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 172 insertions(+), 1 deletion(-) Index: l2613-t_notify/include/linux/task_notifier.h =================================================================== --- /dev/null +++ l2613-t_notify/include/linux/task_notifier.h @@ -0,0 +1,41 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. + * + * + * Routines to manage task notifier chains for passing task state changes to any + * interested routines. + * + */ + +#ifndef _LINUX_TASK_NOTIFIER_H +#define _LINUX_TASK_NOTIFIER_H +#include <linux/compiler.h> +#include <linux/types.h> +#include <linux/sched.h> +#include <linux/task_notify.h> + +#include <asm/current.h> + +struct jtask_notifier_block; + +typedef void (jtask_notifier_func)(unsigned int, struct jtask_notifier_block *nb); + +struct jtask_notifier_block +{ + jtask_notifier_func *jtask_notifier_func; + struct jtask_notifier_block *next; + int priority; + unsigned int id_select; +}; + + +extern void jtask_notifier_chain_register(struct jtask_notifier_block *nb); +extern void jtask_notifier_chain_register_child(struct jtask_notifier_block *nb, struct task_struct *p); +extern void jtask_notifier_chain_unregister(struct jtask_notifier_block *nb); +extern struct jtask_notifier_block *find_task_notifier_block_proc(jtask_notifier_func *func); + +#endif /* _LINUX_TASK_NOTIFIER_H */ Index: l2613-t_notify/kernel/task_notifier.c =================================================================== --- /dev/null +++ l2613-t_notify/kernel/task_notifier.c @@ -0,0 +1,128 @@ +/* + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + * + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. + */ + +#include <linux/config.h> +#include <linux/module.h> +#include <linux/task_notifier.h> +#include <asm/current.h> + +/** + * jtask_notifier_chain_register_task - Add a task_notifier to a task. + * @n: Entry to be added to the task notifier chain + * @p: Task (must be current or a child being created during fork/clone + */ +static inline void jtask_notifier_chain_register_task(struct jtask_notifier_block + *nb, struct task_struct *p) +{ + struct jtask_notifier_block **list = + (struct jtask_notifier_block **)&p->task_notifier; + + while (*list && nb->priority > (*list)->priority) + list = &((*list)->next); + nb->next = *list; + *list = nb; +} + +/** + * jtask_notifier_chain_register_child - Add a task_notifier to a task during fork/clone. + * @n: Entry to be added to the task notifier chain + * @p: Task (must be current or a child being created during fork/clone + */ +void jtask_notifier_chain_register_child(struct jtask_notifier_block *nb, + struct task_struct *p) +{ + jtask_notifier_chain_register_task(nb, p); +} + +/** + * jtask_notifier_chain_register - Add a task_notifier to the current task + * @n: Entry to be added to the task notifier chain + */ +void jtask_notifier_chain_register(struct jtask_notifier_block *nb) +{ + jtask_notifier_chain_register_task(nb, current); +} + +/** + * jtask_notifier_chain_unregister - Remove notifier from the current task + * @nb: Entry in notifier chain + * + */ + +void jtask_notifier_chain_unregister(struct jtask_notifier_block *nb) +{ + struct jtask_notifier_block **list = + (struct jtask_notifier_block **)¤t->task_notifier; + + while (*list && *list != nb) + list = &((*list)->next); + + if (*list == nb) + *list = nb->next; +} + +/** + * find_task_notifier_block_proc - Scan the current task's notifiers for a + * jtask_notifier_block with a pointer to @func + * @func: Function + * + */ + +struct jtask_notifier_block *find_task_notifier_block_proc(jtask_notifier_func * + func) +{ + struct jtask_notifier_block *nb = current->task_notifier; + + while (nb && nb->jtask_notifier_func != func) + nb = nb->next; + + return nb; +} + +/** + * jtask_notifier_call_chain - Call functions in a notifier chain + * @id: Callout identifier + * @arg: Argument to pass to called fumctions + * + * Calls each function in a notifier chain in turn. + * + */ + +static void _jtask_notifier_call_chain(unsigned int id, struct task_struct *tsk) +{ + struct jtask_notifier_block *nb, *next_nb = tsk->task_notifier; + + if (likely(tsk->task_notifier)) + return; + + while ((nb = next_nb)) { + next_nb = nb->next; + if (nb->id_select & id) + nb->jtask_notifier_func(id, nb); + /* comment by chandra: */ + /* This function now have to take tsk parameter */ + } +} + +void __init +jtask_notifier_init(void) +{ + task_notify_register(_jtask_notifier_call_chain, + TASK_NOTIFY_FORK|TASK_NOTIFY_EXEC|TASK_NOTIFY_EXIT); +} + +void __exit +jtask_notifier_exit(void) +{ + task_notify_unregister(_jtask_notifier_call_chain); +} + +EXPORT_SYMBOL_GPL(jtask_notifier_chain_register); +EXPORT_SYMBOL_GPL(jtask_notifier_chain_register_child); +EXPORT_SYMBOL_GPL(jtask_notifier_chain_unregister); +EXPORT_SYMBOL_GPL(find_task_notifier_block_proc); Index: l2613-t_notify/include/linux/init_task.h =================================================================== --- l2613-t_notify.orig/include/linux/init_task.h +++ l2613-t_notify/include/linux/init_task.h @@ -112,6 +112,7 @@ extern struct group_info init_groups; .journal_info = NULL, \ .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \ .fs_excl = ATOMIC_INIT(0), \ + .task_notifier = NULL, \ } Index: l2613-t_notify/include/linux/sched.h =================================================================== --- l2613-t_notify.orig/include/linux/sched.h +++ l2613-t_notify/include/linux/sched.h @@ -770,6 +770,7 @@ struct task_struct { int cpuset_mems_generation; #endif atomic_t fs_excl; /* holding fs exclusive resources */ + void *task_notifier; }; static inline pid_t process_group(struct task_struct *tsk) Index: l2613-t_notify/kernel/Makefile =================================================================== --- l2613-t_notify.orig/kernel/Makefile +++ l2613-t_notify/kernel/Makefile @@ -30,7 +30,7 @@ obj-$(CONFIG_SYSFS) += ksysfs.o obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ obj-$(CONFIG_CRASH_DUMP) += crash_dump.o obj-$(CONFIG_SECCOMP) += seccomp.o -obj-$(CONFIG_TASK_NOTIFY) += task_notify.o +obj-$(CONFIG_TASK_NOTIFY) += task_notify.o task_notifier.o ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) # According to Alan Modra <al...@li...>, the -fno-omit-frame-pointer is -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Matt H. <mat...@us...> - 2005-10-14 01:26:49
|
On Thu, 2005-10-13 at 16:20 -0700, Chandra Seetharaman wrote: > On Thu, 2005-10-13 at 09:54 -0500, Jack Steiner wrote: <snip> > > > So, the overhead for tasks with no notifiers would be calling your > > > registered global function. > > in this code you can see that the additional overhead for tasks that > have no notifiers attached is the calling of the function > _jtask_notifier_call_chain(). I think this is an interesting approach, and quite possibly the right way to do it. My only concern is the synchronization primitive used to protect the global list might prevent notifier functions from sleeping. Perhaps a read-write semaphore would be better than RCU since it would allow them to sleep. > > Both PAGG and the task_notifier mechanism work about the same way. All notifier > > functions are per-task. There is no global callout list that is maintained & > > used at fork/exit/etc. > > > > If a task does not require a notifier function to be called at fork/exit/etc, > > it's task_notifier_block pointer in the task_struct is NULL. The inline code in > > fork/exit/etc detects the NULL & skips the out-of-line call to the function that > > performs the notifier callouts. > > > > > include/linux/init_task.h | 1 > include/linux/sched.h | 1 > include/linux/task_notifier.h | 41 +++++++++++++ > kernel/Makefile | 2 > kernel/task_notifier.c | 128 ++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 172 insertions(+), 1 deletion(-) > > Index: l2613-t_notify/include/linux/task_notifier.h > =================================================================== > --- /dev/null > +++ l2613-t_notify/include/linux/task_notifier.h > @@ -0,0 +1,41 @@ > +/* > + * This file is subject to the terms and conditions of the GNU General Public > + * License. See the file "COPYING" in the main directory of this archive > + * for more details. > + * > + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. > + * > + * > + * Routines to manage task notifier chains for passing task state changes to any > + * interested routines. > + * > + */ > + > +#ifndef _LINUX_TASK_NOTIFIER_H > +#define _LINUX_TASK_NOTIFIER_H > +#include <linux/compiler.h> > +#include <linux/types.h> > +#include <linux/sched.h> > +#include <linux/task_notify.h> > + > +#include <asm/current.h> > + > +struct jtask_notifier_block; > + > +typedef void (jtask_notifier_func)(unsigned int, struct jtask_notifier_block *nb); > + > +struct jtask_notifier_block > +{ > + jtask_notifier_func *jtask_notifier_func; > + struct jtask_notifier_block *next; > + int priority; > + unsigned int id_select; > +}; > + > + > +extern void jtask_notifier_chain_register(struct jtask_notifier_block *nb); > +extern void jtask_notifier_chain_register_child(struct jtask_notifier_block *nb, struct task_struct *p); > +extern void jtask_notifier_chain_unregister(struct jtask_notifier_block *nb); > +extern struct jtask_notifier_block *find_task_notifier_block_proc(jtask_notifier_func *func); > + > +#endif /* _LINUX_TASK_NOTIFIER_H */ > Index: l2613-t_notify/kernel/task_notifier.c > =================================================================== > --- /dev/null > +++ l2613-t_notify/kernel/task_notifier.c > @@ -0,0 +1,128 @@ > +/* > + * This file is subject to the terms and conditions of the GNU General Public > + * License. See the file "COPYING" in the main directory of this archive > + * for more details. > + * > + * Copyright (C) 2005 Silicon Graphics, Inc. All rights reserved. > + */ > + > +#include <linux/config.h> > +#include <linux/module.h> > +#include <linux/task_notifier.h> > +#include <asm/current.h> > + > +/** > + * jtask_notifier_chain_register_task - Add a task_notifier to a task. > + * @n: Entry to be added to the task notifier chain > + * @p: Task (must be current or a child being created during fork/clone > + */ > +static inline void jtask_notifier_chain_register_task(struct jtask_notifier_block > + *nb, struct task_struct *p) > +{ > + struct jtask_notifier_block **list = > + (struct jtask_notifier_block **)&p->task_notifier; > + > + while (*list && nb->priority > (*list)->priority) > + list = &((*list)->next); > + nb->next = *list; > + *list = nb; > +} > + > +/** > + * jtask_notifier_chain_register_child - Add a task_notifier to a task during fork/clone. > + * @n: Entry to be added to the task notifier chain > + * @p: Task (must be current or a child being created during fork/clone > + */ > +void jtask_notifier_chain_register_child(struct jtask_notifier_block *nb, > + struct task_struct *p) > +{ > + jtask_notifier_chain_register_task(nb, p); > +} > + > +/** > + * jtask_notifier_chain_register - Add a task_notifier to the current task > + * @n: Entry to be added to the task notifier chain > + */ > +void jtask_notifier_chain_register(struct jtask_notifier_block *nb) > +{ > + jtask_notifier_chain_register_task(nb, current); > +} > + > +/** > + * jtask_notifier_chain_unregister - Remove notifier from the current task > + * @nb: Entry in notifier chain > + * > + */ > + > +void jtask_notifier_chain_unregister(struct jtask_notifier_block *nb) > +{ > + struct jtask_notifier_block **list = > + (struct jtask_notifier_block **)¤t->task_notifier; > + > + while (*list && *list != nb) > + list = &((*list)->next); > + > + if (*list == nb) > + *list = nb->next; > +} > + > +/** > + * find_task_notifier_block_proc - Scan the current task's notifiers for a > + * jtask_notifier_block with a pointer to @func > + * @func: Function > + * > + */ > + > +struct jtask_notifier_block *find_task_notifier_block_proc(jtask_notifier_func * > + func) > +{ > + struct jtask_notifier_block *nb = current->task_notifier; > + > + while (nb && nb->jtask_notifier_func != func) > + nb = nb->next; > + > + return nb; > +} > + > +/** > + * jtask_notifier_call_chain - Call functions in a notifier chain > + * @id: Callout identifier > + * @arg: Argument to pass to called fumctions > + * > + * Calls each function in a notifier chain in turn. > + * > + */ > + > +static void _jtask_notifier_call_chain(unsigned int id, struct task_struct *tsk) > +{ > + struct jtask_notifier_block *nb, *next_nb = tsk->task_notifier; > + > + if (likely(tsk->task_notifier)) > + return; I think the test above should be inverted according to previous discussions: if (likely(!tsk->task_notifier)) return; <snip> Cheers, -Matt Helsley |
From: Jack S. <st...@sg...> - 2005-10-14 21:30:42
|
On Wed, Oct 12, 2005 at 04:17:56PM -0700, Matt Helsley wrote: > On Wed, 2005-10-12 at 14:31 -0500, Jack Steiner wrote: > > On Tue, Oct 11, 2005 at 07:49:21PM -0700, Chandra Seetharaman wrote: > > > On Tue, 2005-10-11 at 11:47 -0700, Chandra Seetharaman wrote: > > > > > > > I will write up some code and post it today. > > > > > > The patch looks very similar to the original task_notifier patch but has some > > significant changes that concern me. These changes make the patch much less > > useful for our needs, (although I'm sure the changes make it better suited for > > your needs :-) > > > > > > The changes that concern me are: > > > > 1) Task notifiers are registered in a global list instead of a per-task > > list. This makes sense if most tasks require callouts, but for > > our usage, only a small percentage of fork/exec/exits actually > > need a callout. Scanning the notifier list and calling > > each function on every fork/exit/exec is unnecessary overhead > > if it is not needed. On the otherhand, if all tasks needed > > the callouts (your case, I think), I can see the advantage of a global list. > > > > 2) There is no easy way to associate any per-task data with a task. The > > original task_notifier mechanism registered a notifier block for > > each task. Typically, the notifier block would be embedded in a larger > > container that contains the additional per-task data. Locating the > > task data is nothing more than a typecast (embedded in macros, of course). > > > > Nothing prevents the notifier function itself from managing per-task state > > in additional data structures but it is not straightforward. Although > > it may be able to be done without locks (not sure), there are > > frequently 100's of tasks per notifier function that require task data. > > Managing the data in a single linked list is not practical. An array, hash > > or tree structure is probably needed. Certainly doable, but not trivial > > without adding data to the task_struct. > > > > 3) The notifier functions are called from within a rcu_read_lock()/ > > rcu_read_unlock() block. Doesn't this mean that the notifier function > > cannot sleep or do anything that has the potential to sleep? > > That would be a significant limitation. > > > > > > > > Let me explain our typical usage of PAGG/pnotifier/task_callouts. Maybe this > > will help explain some of the issues and tradeoffs that concern me. (I know > > that your requirements are different). > > > > > > We currently have 4 users of the PAGG/pnotifier/task_callout mechanism. That > > means that at all times there would be 4 notifier functions registered in a > > global callout list. Of the 4 functions, 3 are infrequently used although all 4 > > are usually in use by at least 1 task. At any point in time, _most_ tasks > > typically require no notifiers. > > > > A typical system has 1000's or 10000's of tasks. At any point in time, most > > tasks have no need for any notifiers. There is roughly no more than 1 task per > > cpu that is using a notifier function although the number may be more or less > > than this (our big systems have 512p). > > > > Of the 4 users of the PAGG/pnotifier/task_callout mechanism, 3 require per-task > > data to manage additional task state. Requiring 4 separate functions to > > scan for per-task state when typically none exists is unnecessary overhead. > > > > For our typical usage, statistically most fork/exec/exits do not need a callout. > > Unless I'm overlooking something, using the newer mechanism would require the > > following at every fork/exec/exit: > > > > - scan the global notifier list > > - call 4 notifier functions > > - each notifier function scans it's own task-state data to see if the > > task needs additional work done as part of the callout > > - no task data found because callout was not needed > > - callout was a waste of time > > > > With the original task_notifier mechanism, the notifier_block pointer in most > > task_structs is NULL. The notifier mechanism is skipped and incurs almost zero > > overhead. > > > > > > I want to reiterate an earlier point. Although our requirements appear to very > > similar, it may be a mistake to try to create a _single_ mechanism that covers > > both cases. Maybe a better approach is to design two similar mechanisms, each > > optimized for a particular purpose. The 2 mechanisms should have similar > > APIs and could possibly share some code. > > Jack, > > OK, how about this patch? It's essentially an amalgam of the features > of task_notifiers and task_notify based on Chandra's task_notify patch. > > It's also completely untested. This is just to demonstrate what I'm > proposing. > > Cheers, > -Matt Helsley I like the way you have integrated the global & task-based notifier lists. That seems like the right way to go. On the task-based notifiers, I noticed: - it looks like you support attaching/removing a notifier from an arbitrary task (task_notify_watch_task(..., struct task_struct *)) - but there is no locking on the list management or walking routines Have we clearly established whether task_notify_watch_task() can target other than the current task? I know that some of the "job" interfaces use this capability of PAGG, but I don't know if it is essential. BTW, I leave tonight for 2 weeks vacation & will be unable to respond to future mail during that time. Don't take lack of response for lack of interest!! > > -- > > Task Event notifier > > On exec, fork, exit, real/effective gid/uid changes notify the registered > callers. > > --- > > Applies to 2.6.14-rc4 > > Documentation/task_notify.txt | 48 ++++++++ > fs/exec.c | 2 > include/linux/init_task.h | 1 > include/linux/sched.h | 4 > include/linux/task_notify.h | 53 +++++++++ > init/Kconfig | 10 + > kernel/Makefile | 1 > kernel/exit.c | 2 > kernel/fork.c | 5 > kernel/sys.c | 12 ++ > kernel/task_notify.c | 183 ++++++++++++++++++++++++++++++++++ > 11 files changed, 321 insertions(+) > > Index: linux-2.6.14-rc4/fs/exec.c > =================================================================== > --- linux-2.6.14-rc4.orig/fs/exec.c > +++ linux-2.6.14-rc4/fs/exec.c > @@ -46,10 +46,11 @@ > #include <linux/mount.h> > #include <linux/security.h> > #include <linux/syscalls.h> > #include <linux/rmap.h> > #include <linux/acct.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/mmu_context.h> > > #ifdef CONFIG_KMOD > @@ -1098,10 +1099,11 @@ int search_binary_handler(struct linux_b > allow_write_access(bprm->file); > if (bprm->file) > fput(bprm->file); > bprm->file = NULL; > current->did_exec = 1; > + task_notify_event(TASK_NOTIFY_EXEC, current); > return retval; > } > read_lock(&binfmt_lock); > put_binfmt(fmt); > if (retval != -ENOEXEC || bprm->mm == NULL) > Index: linux-2.6.14-rc4/include/linux/task_notify.h > =================================================================== > --- /dev/null > +++ linux-2.6.14-rc4/include/linux/task_notify.h > @@ -0,0 +1,53 @@ > +/* > + * task_notify.h - Task event notification > + * > + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 > + * (C) Matt Helsley, IBM Corp. 2005 > + * > + * Provides the interface and macros for managing in-kernel task > + * notification. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of version 2.1 of the GNU Lesser General Public License > + * as published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it would be useful, but > + * WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. > + * > + */ > + > +#ifndef _LINUX_TASK_NOTIFY_H > +#define _LINUX_TASK_NOTIFY_H > + > +typedef __bitwise unsigned int task_notify_event; > +#define TASK_NOTIFY_FORK ((__force task_notify_event)0x00000001) > +#define TASK_NOTIFY_EXEC ((__force task_notify_event)0x00000002) > +#define TASK_NOTIFY_EXIT ((__force task_notify_event)0x00000004) > +#define TASK_NOTIFY_DESTROY ((__force task_notify_event)0x00000008) > +#define TASK_NOTIFY_UID ((__force task_notify_event)0x00000010) > +#define TASK_NOTIFY_GID ((__force task_notify_event)0x00000020) > +#define TASK_NOTIFY_ALL_EVENTS ((__force task_notify_event)0x0000003f) > + > +typedef __bitwise unsigned int task_notify_flag; > +#define TASK_NOTIFY_DONT_SLEEP ((__force task_notify_flag)0x00000001) > + > +#ifdef CONFIG_TASK_NOTIFY > +typedef void (task_notify_func)(unsigned int events, unsigned int flags, > + struct task_struct * task, void *watcher_data); > +extern struct task_notifier_block; > + > +extern int task_notify_watch_all (task_notify_func *, void*, unsigned int) > +extern int task_notify_watch_task (task_notify_func *, void*, unsigned int, > + struct task_struct *) > +extern void task_notify_ignore_all (task_notify_func *) > +extern void task_notify_ignore_task (task_notify_func *, struct task_struct *) > +extern void task_notify_event(unsigned int, struct task_struct *); > + > +#else /* !CONFIG_TASK_NOTIFY */ > + > +static inline void task_notify_event(unsigned int i, struct task_struct *t) {} > + > +#endif /* CONFIG_TASK_NOTIFY */ > + > +#endif /* _LINUX_TASK_NOTIFY_H */ > Index: linux-2.6.14-rc4/init/Kconfig > =================================================================== > --- linux-2.6.14-rc4.orig/init/Kconfig > +++ linux-2.6.14-rc4/init/Kconfig > @@ -160,10 +160,20 @@ config BSD_PROCESS_ACCT_V3 > process and it's parent. Note that this file format is incompatible > with previous v0/v1/v2 file formats, so you will need updated tools > for processing it. A preliminary version of these tools is available > at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>. > > +config TASK_NOTIFY > + bool "Task Notification" > + depends on EXPERIMENTAL > + help > + Enabling this provides a in kernel notification of task events. > + Modules can register to be notified for specific task events and > + get notified when any task goes thru that event. > + > + Say N if you are unsure. > + > config SYSCTL > bool "Sysctl support" > ---help--- > The sysctl interface provides a means of dynamically changing > certain kernel parameters and variables on the fly without requiring > Index: linux-2.6.14-rc4/kernel/task_notify.c > =================================================================== > --- /dev/null > +++ linux-2.6.14-rc4/kernel/task_notify.c > @@ -0,0 +1,183 @@ > +/* task_notify.c - in-kernel task event notification > + * > + * Copyright (C) Chandra Seetharaman, IBM Corp. 2005 > + * (C) Matt Helsley, IBM Corp. 2005 > + * > + * Provides API for task event registration and notification. > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation; either version 2 of the License, or > + * (at your option) any later version. > + * > + */ > + > +#include <linux/config.h> > +#include <linux/module.h> > +#include <linux/list.h> > +#include <linux/rcupdate.h> > +#include <linux/task_notify.h> > + > +static LIST_HEAD(notify_list); > + > +struct task_notifier_block { > + struct list_head list; > + union { > + struct rcu_head rcu; > + void *private_data; /* when not freeing via rcu -- bad idea? */ > + }; > + task_notify_func *func; > + unsigned int event_mask; /* set of events to pass to func */ > +}; > + > +static kmem_cache_t *nb_cache = NULL; > + > +void __init task_notify_init (void) > +{ > + /* Info on SLAB_DESTROY_BY_RCU : > + * http://www.ussg.iu.edu/hypermail/linux/kernel/0407.1/1108.html */ > + nb_cache = kmem_cache_create("task_notify", > + sizeof(struct task_notifier_block), 0, > + SLAB_KERNEL | SLAB_NO_REAP | > + SLAB_DESTROY_BY_RCU, NULL, NULL); > +} > +core_initcall(task_notify_init); > + > +void __exit_call task_notify_exit (void) > +{ > + kmem_cache_destroy(nb_cache); > + nb_cache = NULL; > +} > + > +/* internal use only */ > +#define TASK_NOTIFY_ALL_FLAGS ((__force task_notify_flag_t)0x0001) > + > +static inline struct task_notifier_block* > +get_notifier_block (task_notify_func *func, void* private_data, > + unsigned short event_mask) > +{ > + struct task_notifier_block *nb; > + > + nb = kmem_cache_alloc(nb_cache, GFP_KERNEL); > + if (nb == NULL) > + return NULL; > + nb->private_data = private_data; > + nb->func = func; > + nb->event_mask = event_mask; > +} > + > +static inline void put_notifier_block (struct task_notifier_block *nb) > +{ > + kmem_cache_free(nb_cache, nb); > +} > + > +/******************************************************************* > + * Event callback invocation > + *******************************************************************/ > + > +int > +task_notify_watch_all (task_notify_func *func, void* func_data, > + unsigned short event_mask) > +{ > + struct task_notifier_block *nb; > + > + if ((func == NULL) || ((event_mask & ~TASK_NOTIFY_ALL_EVENTS) != 0)) > + return -EINVAL; > + > + nb = get_notifier_block(func, func_data, > + event_mask, TASK_NOTIFY_DONT_SLEEP); > + if (nb == NULL) > + return -ENOMEM; > + list_add_rcu(&nb->list, ¬ify_list); > + return 0; > +} > + > +int > +task_notify_watch_task (task_notify_func *func, void* func_data, > + unsigned short event_mask, struct task_struct *task) > +{ > + struct task_notifier_block *nb; > + > + if ((func == NULL) || ((event_mask & ~TASK_NOTIFY_ALL_EVENTS) != 0)) > + return -EINVAL; > + > + nb = get_notifier_block(func, func_data, event_mask, 0); > + if (nb == NULL) > + return -ENOMEM; > + list_add(&nb->list, &task->task_watchers); > + return 0; > +} > + > +static void > +task_notify_free (struct rcu_head *head) > +{ > + struct task_notifier_block *nb = > + container_of(head, struct task_notifier_block, rcu); > + put_notifier_block(nb); > + return; > +} > + > +void > +task_notify_ignore_all (task_notify_func *func) > +{ > + struct task_notifier_block *nb; > + void *private_data; > + > + /* Do not use the _rcu iterator here */ > + list_for_each_entry(nb, ¬ify_list, list) { > + if (nb->func != func) > + continue; > + list_del_rcu(&nb->list); > + private_data = nb->private_data; > + call_rcu(&nb->rcu, task_notify_free); > + > + /* Must we restrict from sleeping? */ > + nb->func(0, TASK_NOTIFY_DONT_SLEEP, NULL, private_data); > + return; > + } > + return; > +} > + > +void > +task_notify_ignore_task (task_notify_func *func, struct task_struct *task) > +{ > + struct task_notifier_block *nb; > + > + list_for_each_entry(nb, &task->task_watchers, list) { > + if (nb->func != func) > + continue; > + list_del(&nb->list); > + nb->func(0, 0, task, nb->private_data); > + put_notifier_block(nb); > + return; > + } > + return; > +} > + > +EXPORT_SYMBOL_GPL(task_notify_watch_all); > +EXPORT_SYMBOL_GPL(task_notify_watch_task); > +EXPORT_SYMBOL_GPL(task_notify_ignore_all); > +EXPORT_SYMBOL_GPL(task_notify_ignore_task); > + > +void > +task_notify_event(unsigned int event, struct task_struct *task) > +{ > + struct task_notifier_block *nb; > + > + /* > + * Call the per-task watchers -- use safe variant in case > + * watchers want to remove themselves > + */ > + list_for_each_entry_safe(nb, nbn, &task->task_watchers, list) > + if ((nb->event_mask & event) == event) > + nb->func(event, 0, task, nb->private_data); > + > + /* Call the watchers interested in all tasks */ > + rcu_read_lock(); > + list_for_each_entry_rcu(nb, ¬ify_list, list) > + if ((nb->event_mask & event) == event) > + nb->func(event, TASK_NOTIFY_DONT_SLEEP, task, > + nb->private_data); > + rcu_read_unlock(); > + return; > +} > Index: linux-2.6.14-rc4/kernel/exit.c > =================================================================== > --- linux-2.6.14-rc4.orig/kernel/exit.c > +++ linux-2.6.14-rc4/kernel/exit.c > @@ -26,10 +26,11 @@ > #include <linux/proc_fs.h> > #include <linux/mempolicy.h> > #include <linux/cpuset.h> > #include <linux/syscalls.h> > #include <linux/signal.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/unistd.h> > #include <asm/pgtable.h> > #include <asm/mmu_context.h> > @@ -861,10 +862,11 @@ fastcall NORET_TYPE void do_exit(long co > module_put(tsk->thread_info->exec_domain->module); > if (tsk->binfmt) > module_put(tsk->binfmt->module); > > tsk->exit_code = code; > + task_notify_event(TASK_NOTIFY_EXIT, tsk); > exit_notify(tsk); > #ifdef CONFIG_NUMA > mpol_free(tsk->mempolicy); > tsk->mempolicy = NULL; > #endif > Index: linux-2.6.14-rc4/kernel/fork.c > =================================================================== > --- linux-2.6.14-rc4.orig/kernel/fork.c > +++ linux-2.6.14-rc4/kernel/fork.c > @@ -40,10 +40,11 @@ > #include <linux/mount.h> > #include <linux/audit.h> > #include <linux/profile.h> > #include <linux/rmap.h> > #include <linux/acct.h> > +#include <linux/task_notify.h> > > #include <asm/pgtable.h> > #include <asm/pgalloc.h> > #include <asm/uaccess.h> > #include <asm/mmu_context.h> > @@ -1036,10 +1037,12 @@ static task_t *copy_process(unsigned lon > /* ok, now we should be set up.. */ > p->exit_signal = (clone_flags & CLONE_THREAD) ? -1 : (clone_flags & CSIGNAL); > p->pdeath_signal = 0; > p->exit_state = 0; > > + p->task_watchers = NULL; > + > /* > * Ok, make it visible to the rest of the system. > * We dont wake it up yet. > */ > p->group_leader = p; > @@ -1144,10 +1147,12 @@ static task_t *copy_process(unsigned lon > attach_pid(p, PIDTYPE_SID, p->signal->session); > if (p->pid) > __get_cpu_var(process_counts)++; > } > > + task_notify_event(TASK_NOTIFY_FORK, p); > + > if (!current->signal->tty && p->signal->tty) > p->signal->tty = NULL; > > nr_threads++; > total_forks++; > Index: linux-2.6.14-rc4/kernel/Makefile > =================================================================== > --- linux-2.6.14-rc4.orig/kernel/Makefile > +++ linux-2.6.14-rc4/kernel/Makefile > @@ -30,10 +30,11 @@ obj-$(CONFIG_KPROBES) += kprobes.o > obj-$(CONFIG_SYSFS) += ksysfs.o > obj-$(CONFIG_DETECT_SOFTLOCKUP) += softlockup.o > obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ > obj-$(CONFIG_CRASH_DUMP) += crash_dump.o > obj-$(CONFIG_SECCOMP) += seccomp.o > +obj-$(CONFIG_TASK_NOTIFY) += task_notify.o > > ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y) > # According to Alan Modra <al...@li...>, the -fno-omit-frame-pointer is > # needed for x86 only. Why this used to be enabled for all architectures is beyond > # me. I suspect most platforms don't need this, but until we know that for sure > Index: linux-2.6.14-rc4/kernel/sys.c > =================================================================== > --- linux-2.6.14-rc4.orig/kernel/sys.c > +++ linux-2.6.14-rc4/kernel/sys.c > @@ -29,10 +29,11 @@ > #include <linux/tty.h> > #include <linux/signal.h> > > #include <linux/compat.h> > #include <linux/syscalls.h> > +#include <linux/task_notify.h> > > #include <asm/uaccess.h> > #include <asm/io.h> > #include <asm/unistd.h> > > @@ -621,10 +622,11 @@ asmlinkage long sys_setregid(gid_t rgid, > current->sgid = new_egid; > current->fsgid = new_egid; > current->egid = new_egid; > current->gid = new_rgid; > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > /* > * setgid() is implemented like SysV w/ SAVED_IDS > @@ -660,10 +662,11 @@ asmlinkage long sys_setgid(gid_t gid) > } > else > return -EPERM; > > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > static int set_user(uid_t new_ruid, int dumpclear) > { > @@ -750,10 +753,12 @@ asmlinkage long sys_setreuid(uid_t ruid, > current->suid = current->euid; > current->fsuid = current->euid; > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); > } > > > > @@ -797,10 +802,12 @@ asmlinkage long sys_setuid(uid_t uid) > current->fsuid = current->euid = uid; > current->suid = new_suid; > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); > } > > > /* > @@ -845,10 +852,12 @@ asmlinkage long sys_setresuid(uid_t ruid > if (suid != (uid_t) -1) > current->suid = suid; > > key_fsuid_changed(current); > > + task_notify_event(TASK_NOTIFY_UID, current); > + > return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); > } > > asmlinkage long sys_getresuid(uid_t __user *ruid, uid_t __user *euid, uid_t __user *suid) > { > @@ -896,10 +905,11 @@ asmlinkage long sys_setresgid(gid_t rgid > current->gid = rgid; > if (sgid != (gid_t) -1) > current->sgid = sgid; > > key_fsgid_changed(current); > + task_notify_event(TASK_NOTIFY_GID, current); > return 0; > } > > asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid) > { > @@ -938,10 +948,11 @@ asmlinkage long sys_setfsuid(uid_t uid) > } > current->fsuid = uid; > } > > key_fsuid_changed(current); > + task_notify_event(TASK_NOTIFY_UID, current); > > security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); > > return old_fsuid; > } > @@ -967,10 +978,11 @@ asmlinkage long sys_setfsgid(gid_t gid) > smp_wmb(); > } > current->fsgid = gid; > key_fsgid_changed(current); > } > + task_notify_event(TASK_NOTIFY_GID, current); > return old_fsgid; > } > > asmlinkage long sys_times(struct tms __user * tbuf) > { > Index: linux-2.6.14-rc4/include/linux/init_task.h > =================================================================== > --- linux-2.6.14-rc4.orig/include/linux/init_task.h > +++ linux-2.6.14-rc4/include/linux/init_task.h > @@ -120,10 +120,11 @@ extern struct group_info init_groups; > .alloc_lock = SPIN_LOCK_UNLOCKED, \ > .proc_lock = SPIN_LOCK_UNLOCKED, \ > .journal_info = NULL, \ > .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \ > .fs_excl = ATOMIC_INIT(0), \ > + .task_watchers = NULL, \ > } > > > #define INIT_CPU_TIMERS(cpu_timers) \ > { \ > Index: linux-2.6.14-rc4/include/linux/sched.h > =================================================================== > --- linux-2.6.14-rc4.orig/include/linux/sched.h > +++ linux-2.6.14-rc4/include/linux/sched.h > @@ -34,10 +34,11 @@ > #include <linux/percpu.h> > #include <linux/topology.h> > #include <linux/seccomp.h> > > #include <linux/auxvec.h> /* For AT_VECTOR_SIZE */ > +#include <linux/task_notify.h> > > struct exec_domain; > > /* > * cloning flags: > @@ -811,10 +812,13 @@ struct task_struct { > struct cpuset *cpuset; > nodemask_t mems_allowed; > int cpuset_mems_generation; > #endif > atomic_t fs_excl; /* holding fs exclusive resources */ > +#ifdef CONFIG_TASK_NOTIFY > + struct task_notifier_block *task_watchers; > +#endif > }; > > static inline pid_t process_group(struct task_struct *tsk) > { > return tsk->signal->pgrp; > Index: linux-2.6.14-rc4/Documentation/task_notify.txt > =================================================================== > --- /dev/null > +++ linux-2.6.14-rc4/Documentation/task_notify.txt > @@ -0,0 +1,48 @@ > +task_notify is a means for a kernel system or module to attach per-task data > +and/or get called at certain important events in the lifetime of a task. > + > +Events include: > + > + fork > + exec > + exit > + uid/gid change > + per-task data going away (more on this below) > + > +A module interested in these events must register itself with one of two > +task_notify_watch functions by specifying the set of interested events, a > +function to be called during the event, and a data pointer to be passed to > +the function. While functions may be registered multiple times callers should > +be aware that the first registration of the function is removed first. > + > +Registering with task_notify_watch_all() allows the caller to watch the > +specified set of events for all tasks. Note that in this case the data passed > +to the function isn't associated with a task. Furthermore, functions registered > +for all tasks must not sleep when the flag TASK_NOTIFY_DONT_SLEEP is specified. > + > +If a caller is interested only in a single task or a small number of tasks, it > +may choose to utilize the more effecient task_notify_watch_task() function. The > +data passed can be specific to the task, or it can be shared. When this per-task > +data pointer is no longer in use the function will be called with no events. > + > +When an event occurs the specified function and is called with several > +parameters: > + > +events - the set of events that occurred > + > +flags - special flags indicating the context of the call: > + TASK_NOTIFY_DONT_SLEEP tells the function it must not > + sleep. This only will only happen if a function is registered > + for all tasks and it gets called. > + > +data - the data pointer passed in at registration time > + > +If a data pointer is provided, one last invocation of the function will be made > +with no events. This indicates that the reference to the data is being dropped. > +As with other invocations, functions registered for all tasks must still honor > +the TASK_NOTIFY_DONT_SLEEP flag. > + > +When a caller is no longer interested in being called during a task event it > +must use a corresponding task_notify_ignore call by specifying the function > +which should ignore events. The first registration of the specified function > +will be removed. > > -- Thanks Jack Steiner (st...@sg...) 651-683-5302 Principal Engineer SGI - Silicon Graphics, Inc. |
From: Chandra S. <sek...@us...> - 2005-10-14 23:03:07
|
On Fri, 2005-10-14 at 16:29 -0500, Jack Steiner wrote: <snip> > > Jack, > > > > OK, how about this patch? It's essentially an amalgam of the features > > of task_notifiers and task_notify based on Chandra's task_notify patch. > > > > It's also completely untested. This is just to demonstrate what I'm > > proposing. > > > > Cheers, > > -Matt Helsley > > I like the way you have integrated the global & task-based notifier lists. That > seems like the right way to go. Though it satisfies both the needs it unnecessarily crowds the module. PAGG can achieve the additional functionality just by layering it on top of the simple module. > > > On the task-based notifiers, I noticed: > > - it looks like you support attaching/removing a notifier from an > arbitrary task (task_notify_watch_task(..., struct task_struct *)) > > - but there is no locking on the list management or walking routines > > Have we clearly established whether task_notify_watch_task() can target > other than the current task? I know that some of the "job" interfaces > use this capability of PAGG, but I don't know if it is essential. > > > BTW, I leave tonight for 2 weeks vacation & will be unable to respond to > future mail during that time. Don't take lack of response for lack of interest!! -- ---------------------------------------------------------------------- Chandra Seetharaman | Be careful what you choose.... - sek...@us... | .......you may get it. ---------------------------------------------------------------------- |
From: Matt H. <mat...@us...> - 2005-10-17 22:11:26
|
On Fri, 2005-10-14 at 16:29 -0500, Jack Steiner wrote: > On Wed, Oct 12, 2005 at 04:17:56PM -0700, Matt Helsley wrote: <snip> > > Jack, > > > > OK, how about this patch? It's essentially an amalgam of the features > > of task_notifiers and task_notify based on Chandra's task_notify patch. > > > > It's also completely untested. This is just to demonstrate what I'm > > proposing. > > > > Cheers, > > -Matt Helsley > > I like the way you have integrated the global & task-based notifier lists. That > seems like the right way to go. > > > On the task-based notifiers, I noticed: > > - it looks like you support attaching/removing a notifier from an > arbitrary task (task_notify_watch_task(..., struct task_struct *)) > > - but there is no locking on the list management or walking routines > > Have we clearly established whether task_notify_watch_task() can target > other than the current task? I know that some of the "job" interfaces > use this capability of PAGG, but I don't know if it is essential. No we haven't established that -- however I think the same problem exists with your original registration functions. We could rely on callers to follow these rules or we could attempt to check them at runtime (perhaps only when debugging is configured..). I'll add the following to the documentation: - task_notify_watch_task() may only be called from process context of: - the process being watched - the process that is forking and will be the parent of the watched process And - task_notify_ignore_task() may only be called from the process context of the task being ignored. I believe these are the same constraints your original patch was operating under. Correct? While - task_notify_watch_all() and task_notify_ignore_all() may be called from any context > BTW, I leave tonight for 2 weeks vacation & will be unable to respond to > future mail during that time. Don't take lack of response for lack of interest!! OK, thanks for reviewing my patch and the vacation notice. Cheers, -Matt Helsley <matthltc @ us.ibm.com> |