From: Matt H. <mat...@us...> - 2006-09-29 02:13:14
|
This is version 2 of my Task Watchers patches. Task watchers calls functions whenever a task forks, execs, changes its [re][ug]id, or exits. Task watchers is primarily useful to existing kernel code as a means making the code in fork and exit more readable. Kernel code uses these paths by marking a function as a task watcher much like modules mark their init functions with module_init(). This reduces the code length and complexity of copy_process(). The first patch adds the basic infrastructure of task watchers: notification function calls in the various paths and a table of function pointers to be called. It uses an ELF section because parts of the table must be gathered from all over the kernel code and using the linker is easier than resolving and maintaining complex header interdependencies. An ELF table is also ideal because its read-only nature means that no locking nor list traversal are required. Subsequent patches adapt existing parts of the kernel to use a task watcher -- typically in the fork, clone, and exit paths: audit semundo cpusets mempolicy trace irqflags lockdep keys (for processes -- not for thread groups) process events connector I'm working on three more patches that add support for creating a task watcher from within a module using an ELF section. I've not posted that work because it hasn't successfully booted much less completed the small selection of smoke tests I ran on these. TODO: Mark the task watcher table ELF section read-only. I've googled, read man pages, navigated the info pages, tried using PHDR, and according to the output of objdump, had no success. I'd really appreciate a pointer to an example showing what makes ld mark a kernel ELF section read-only. Changes: v2: Dropped use of notifier chains Dropped per-task watchers Can be implemented on top of this Still requires notifier chains Dropped taskstats conversion Parts of taskstats had to move away from the regions of copy_process() and do_exit() where task_watchers are notified Used linker script mechanism suggested by Al Viro Created one "list" of watchers per event as requested by Andrew Morton No need to multiplex a single function call Easier to static register/unregister watchers: 1 line of code val param now used for: WATCH_TASK_INIT: clone_flags WATCH_TASK_CLONE: clone_flags WATCH_TASK_EXIT: exit code WATCH_TASK_*: <unused> Renamed notify_watchers() to notify_task_watchers() Replaced: if (err != 0) --> if (err) Added patches converting more "features" to use task watchers Added return code handling to WATCH_TASK_INIT Return code handling elsewhere didn't seem appropriate since there was generally no response necessary Fixed process keys free to handle failure in fork as originally coded in copy_process Added process keys code to watch for [er][ug]id changes v1: Added ability to cause fork to fail with NOTIFY_STOP_MASK Added WARN_ON() when watchers cause WATCH_TASK_FREE to stop early Moved fork invocation Moved exec invocation Added current as argument to exec invocation Moved exit code assignment Added id change invocations v0: Based on Jes Sorensen's Task Notifiers patches Cheers, -Matt Helsley -- |
From: Matt H. <mat...@us...> - 2006-09-29 02:13:10
|
Change audit to register a task watcher function rather than modify the copy_process() and do_exit() paths directly. Removes an unlikely() hint from kernel/exit.c: if (unlikely(tsk->audit_context)) audit_free(tsk); This use of unlikely() is an artifact of audit_free()'s former invocation from __put_task_struct() (commit: fa84cb935d4ec601528f5e2f0d5d31e7876a5044). Clearly in the __put_task_struct() path it would be called much more frequently than do_exit() and hence the use of unlikely() there was justified. However, in the new location the hint most likely offers no measurable performance impact. Signed-off-by: Matt Helsley <mat...@us...> Cc: Al Viro <vi...@ze...> Cc: Steve Grubb <sg...@re...> Cc: lin...@re... --- include/linux/audit.h | 4 ---- kernel/auditsc.c | 10 +++++++--- kernel/exit.c | 3 --- kernel/fork.c | 7 +------ 4 files changed, 8 insertions(+), 16 deletions(-) Index: linux-2.6.18-mm1/kernel/auditsc.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/auditsc.c +++ linux-2.6.18-mm1/kernel/auditsc.c @@ -63,10 +63,11 @@ #include <linux/list.h> #include <linux/tty.h> #include <linux/selinux.h> #include <linux/binfmts.h> #include <linux/syscalls.h> +#include <linux/task_watchers.h> #include "audit.h" extern struct list_head audit_filter_list[]; @@ -674,11 +675,11 @@ static inline struct audit_context *audi * Filter on the task information and allocate a per-task audit context * if necessary. Doing so turns on system call auditing for the * specified task. This is called from copy_process, so no lock is * needed. */ -int audit_alloc(struct task_struct *tsk) +static int audit_alloc(unsigned long val, struct task_struct *tsk) { struct audit_context *context; enum audit_state state; if (likely(!audit_enabled)) @@ -700,10 +701,11 @@ int audit_alloc(struct task_struct *tsk) tsk->audit_context = context; set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT); return 0; } +task_watcher_func(init, audit_alloc); static inline void audit_free_context(struct audit_context *context) { struct audit_context *previous; int count = 0; @@ -1029,28 +1031,30 @@ static void audit_log_exit(struct audit_ * audit_free - free a per-task audit context * @tsk: task whose audit context block to free * * Called from copy_process and do_exit */ -void audit_free(struct task_struct *tsk) +static int audit_free(unsigned long val, struct task_struct *tsk) { struct audit_context *context; context = audit_get_context(tsk, 0, 0); if (likely(!context)) - return; + return 0; /* Check for system calls that do not go through the exit * function (e.g., exit_group), then free context block. * We use GFP_ATOMIC here because we might be doing this * in the context of the idle thread */ /* that can happen only if we are called from do_exit() */ if (context->in_syscall && context->auditable) audit_log_exit(context, tsk); audit_free_context(context); + return 0; } +task_watcher_func(free, audit_free); /** * audit_syscall_entry - fill in an audit record at syscall entry * @tsk: task being audited * @arch: architecture type Index: linux-2.6.18-mm1/include/linux/audit.h =================================================================== --- linux-2.6.18-mm1.orig/include/linux/audit.h +++ linux-2.6.18-mm1/include/linux/audit.h @@ -326,12 +326,10 @@ struct mqstat; extern int __init audit_register_class(int class, unsigned *list); extern int audit_classify_syscall(int abi, unsigned syscall); #ifdef CONFIG_AUDITSYSCALL /* These are defined in auditsc.c */ /* Public API */ -extern int audit_alloc(struct task_struct *task); -extern void audit_free(struct task_struct *task); extern void audit_syscall_entry(int arch, int major, unsigned long a0, unsigned long a1, unsigned long a2, unsigned long a3); extern void audit_syscall_exit(int failed, long return_code); extern void __audit_getname(const char *name); @@ -426,12 +424,10 @@ static inline int audit_mq_getsetattr(mq return __audit_mq_getsetattr(mqdes, mqstat); return 0; } extern int audit_n_rules; #else -#define audit_alloc(t) ({ 0; }) -#define audit_free(t) do { ; } while (0) #define audit_syscall_entry(ta,a,b,c,d,e) do { ; } while (0) #define audit_syscall_exit(f,r) do { ; } while (0) #define audit_dummy_context() 1 #define audit_getname(n) do { ; } while (0) #define audit_putname(n) do { ; } while (0) Index: linux-2.6.18-mm1/kernel/fork.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/fork.c +++ linux-2.6.18-mm1/kernel/fork.c @@ -37,11 +37,10 @@ #include <linux/jiffies.h> #include <linux/futex.h> #include <linux/rcupdate.h> #include <linux/ptrace.h> #include <linux/mount.h> -#include <linux/audit.h> #include <linux/profile.h> #include <linux/rmap.h> #include <linux/acct.h> #include <linux/tsacct_kern.h> #include <linux/cn_proc.h> @@ -1103,15 +1102,13 @@ static struct task_struct *copy_process( p->blocked_on = NULL; /* not blocked yet */ #endif if ((retval = security_task_alloc(p))) goto bad_fork_cleanup_policy; - if ((retval = audit_alloc(p))) - goto bad_fork_cleanup_security; /* copy all the process information */ if ((retval = copy_semundo(clone_flags, p))) - goto bad_fork_cleanup_audit; + goto bad_fork_cleanup_security; if ((retval = copy_files(clone_flags, p))) goto bad_fork_cleanup_semundo; if ((retval = copy_fs(clone_flags, p))) goto bad_fork_cleanup_files; if ((retval = copy_sighand(clone_flags, p))) @@ -1282,12 +1279,10 @@ bad_fork_cleanup_fs: exit_fs(p); /* blocking */ bad_fork_cleanup_files: exit_files(p); /* blocking */ bad_fork_cleanup_semundo: exit_sem(p); -bad_fork_cleanup_audit: - audit_free(p); bad_fork_cleanup_security: security_task_free(p); bad_fork_cleanup_policy: #ifdef CONFIG_NUMA mpol_free(p->mempolicy); Index: linux-2.6.18-mm1/kernel/exit.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/exit.c +++ linux-2.6.18-mm1/kernel/exit.c @@ -36,11 +36,10 @@ #include <linux/cn_proc.h> #include <linux/mutex.h> #include <linux/futex.h> #include <linux/compat.h> #include <linux/pipe_fs_i.h> -#include <linux/audit.h> /* for audit_free() */ #include <linux/resource.h> #include <linux/blkdev.h> #include <linux/task_watchers.h> #include <asm/uaccess.h> @@ -908,12 +907,10 @@ fastcall NORET_TYPE void do_exit(long co exit_robust_list(tsk); #if defined(CONFIG_FUTEX) && defined(CONFIG_COMPAT) if (unlikely(tsk->compat_robust_list)) compat_exit_robust_list(tsk); #endif - if (unlikely(tsk->audit_context)) - audit_free(tsk); taskstats_exit_send(tsk, tidstats, group_dead, mycpu); taskstats_exit_free(tidstats); exit_mm(tsk); notify_task_watchers(WATCH_TASK_FREE, code, tsk); -- |
From: Matt H. <mat...@us...> - 2006-09-29 02:13:10
|
Register a task watcher for cpusets instead of hooking into copy_process() and do_exit() directly. Signed-off-by: Matt Helsley <mat...@us...> Cc: Paul Jackson <pj...@sg...> --- include/linux/cpuset.h | 4 ---- kernel/cpuset.c | 7 +++++-- kernel/exit.c | 2 -- kernel/fork.c | 6 +----- 4 files changed, 6 insertions(+), 13 deletions(-) Index: linux-2.6.18-mm1/kernel/fork.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/fork.c +++ linux-2.6.18-mm1/kernel/fork.c @@ -28,11 +28,10 @@ #include <linux/mman.h> #include <linux/fs.h> #include <linux/nsproxy.h> #include <linux/capability.h> #include <linux/cpu.h> -#include <linux/cpuset.h> #include <linux/security.h> #include <linux/swap.h> #include <linux/syscalls.h> #include <linux/jiffies.h> #include <linux/futex.h> @@ -1059,17 +1058,16 @@ static struct task_struct *copy_process( p->tgid = current->tgid; retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p); if (retval < 0) goto bad_fork_cleanup_delays_binfmt; - cpuset_fork(p); #ifdef CONFIG_NUMA p->mempolicy = mpol_copy(p->mempolicy); if (IS_ERR(p->mempolicy)) { retval = PTR_ERR(p->mempolicy); p->mempolicy = NULL; - goto bad_fork_cleanup_cpuset; + goto bad_fork_cleanup_delays_binfmt; } mpol_fix_fork_child_flag(p); #endif #ifdef CONFIG_TRACE_IRQFLAGS p->irq_events = 0; @@ -1280,13 +1278,11 @@ bad_fork_cleanup_files: bad_fork_cleanup_security: security_task_free(p); bad_fork_cleanup_policy: #ifdef CONFIG_NUMA mpol_free(p->mempolicy); -bad_fork_cleanup_cpuset: #endif - cpuset_exit(p); bad_fork_cleanup_delays_binfmt: delayacct_tsk_free(p); notify_task_watchers(WATCH_TASK_FREE, 0, p); if (p->binfmt) module_put(p->binfmt->module); Index: linux-2.6.18-mm1/kernel/cpuset.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/cpuset.c +++ linux-2.6.18-mm1/kernel/cpuset.c @@ -47,10 +47,11 @@ #include <linux/stat.h> #include <linux/string.h> #include <linux/time.h> #include <linux/backing-dev.h> #include <linux/sort.h> +#include <linux/task_watchers.h> #include <asm/uaccess.h> #include <asm/atomic.h> #include <linux/mutex.h> @@ -2173,17 +2174,18 @@ void __init cpuset_init_smp(void) * * At the point that cpuset_fork() is called, 'current' is the parent * task, and the passed argument 'child' points to the child task. **/ -void cpuset_fork(struct task_struct *child) +static void cpuset_fork(unsigned long clone_flags, struct task_struct *child) { task_lock(current); child->cpuset = current->cpuset; atomic_inc(&child->cpuset->count); task_unlock(current); } +task_watcher_func(init, cpuset_fork); /** * cpuset_exit - detach cpuset from exiting task * @tsk: pointer to task_struct of exiting process * @@ -2240,11 +2242,11 @@ void cpuset_fork(struct task_struct *chi * to NULL here, and check in cpuset_update_task_memory_state() * for a NULL pointer. This hack avoids that NULL check, for no * cost (other than this way too long comment ;). **/ -void cpuset_exit(struct task_struct *tsk) +static void cpuset_exit(unsigned long exit_code, struct task_struct *tsk) { struct cpuset *cs; cs = tsk->cpuset; tsk->cpuset = &top_cpuset; /* the_top_cpuset_hack - see above */ @@ -2259,10 +2261,11 @@ void cpuset_exit(struct task_struct *tsk cpuset_release_agent(pathbuf); } else { atomic_dec(&cs->count); } } +task_watcher_func(free, cpuset_exit); /** * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset. * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed. * Index: linux-2.6.18-mm1/kernel/exit.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/exit.c +++ linux-2.6.18-mm1/kernel/exit.c @@ -27,11 +27,10 @@ #include <linux/mount.h> #include <linux/proc_fs.h> #include <linux/mempolicy.h> #include <linux/taskstats_kern.h> #include <linux/delayacct.h> -#include <linux/cpuset.h> #include <linux/syscalls.h> #include <linux/signal.h> #include <linux/posix-timers.h> #include <linux/cn_proc.h> #include <linux/mutex.h> @@ -918,11 +917,10 @@ fastcall NORET_TYPE void do_exit(long co if (group_dead) acct_process(); __exit_files(tsk); __exit_fs(tsk); exit_thread(); - cpuset_exit(tsk); exit_keys(tsk); if (group_dead && tsk->signal->leader) disassociate_ctty(1); Index: linux-2.6.18-mm1/include/linux/cpuset.h =================================================================== --- linux-2.6.18-mm1.orig/include/linux/cpuset.h +++ linux-2.6.18-mm1/include/linux/cpuset.h @@ -17,12 +17,10 @@ extern int number_of_cpusets; /* How many cpusets are defined in system? */ extern int cpuset_init_early(void); extern int cpuset_init(void); extern void cpuset_init_smp(void); -extern void cpuset_fork(struct task_struct *p); -extern void cpuset_exit(struct task_struct *p); extern cpumask_t cpuset_cpus_allowed(struct task_struct *p); extern nodemask_t cpuset_mems_allowed(struct task_struct *p); void cpuset_init_current_mems_allowed(void); void cpuset_update_task_memory_state(void); #define cpuset_nodes_subset_current_mems_allowed(nodes) \ @@ -68,12 +66,10 @@ extern void cpuset_track_online_nodes(vo #else /* !CONFIG_CPUSETS */ static inline int cpuset_init_early(void) { return 0; } static inline int cpuset_init(void) { return 0; } static inline void cpuset_init_smp(void) {} -static inline void cpuset_fork(struct task_struct *p) {} -static inline void cpuset_exit(struct task_struct *p) {} static inline cpumask_t cpuset_cpus_allowed(struct task_struct *p) { return cpu_possible_map; } -- |
From: Matt H. <mat...@us...> - 2006-09-29 02:13:11
|
Register an irq-flag-tracing task watcher instead of hooking into copy_process(). Signed-off-by: Matt Helsley <mat...@us...> --- kernel/fork.c | 19 ------------------- kernel/irq/handle.c | 24 ++++++++++++++++++++++++ 2 files changed, 24 insertions(+), 19 deletions(-) Index: linux-2.6.18-mm1/kernel/fork.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/fork.c +++ linux-2.6.18-mm1/kernel/fork.c @@ -1058,29 +1058,10 @@ static struct task_struct *copy_process( p->tgid = current->tgid; retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p); if (retval < 0) goto bad_fork_cleanup_delays_binfmt; -#ifdef CONFIG_TRACE_IRQFLAGS - p->irq_events = 0; -#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW - p->hardirqs_enabled = 1; -#else - p->hardirqs_enabled = 0; -#endif - p->hardirq_enable_ip = 0; - p->hardirq_enable_event = 0; - p->hardirq_disable_ip = _THIS_IP_; - p->hardirq_disable_event = 0; - p->softirqs_enabled = 1; - p->softirq_enable_ip = _THIS_IP_; - p->softirq_enable_event = 0; - p->softirq_disable_ip = 0; - p->softirq_disable_event = 0; - p->hardirq_context = 0; - p->softirq_context = 0; -#endif #ifdef CONFIG_LOCKDEP p->lockdep_depth = 0; /* no locks held yet */ p->curr_chain_key = 0; p->lockdep_recursion = 0; #endif Index: linux-2.6.18-mm1/kernel/irq/handle.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/irq/handle.c +++ linux-2.6.18-mm1/kernel/irq/handle.c @@ -13,10 +13,11 @@ #include <linux/irq.h> #include <linux/module.h> #include <linux/random.h> #include <linux/interrupt.h> #include <linux/kernel_stat.h> +#include <linux/task_watchers.h> #include "internals.h" /** * handle_bad_irq - handle spurious and unhandled irqs @@ -269,6 +270,29 @@ void early_init_irq_lock_class(void) for (i = 0; i < NR_IRQS; i++) lockdep_set_class(&irq_desc[i].lock, &irq_desc_lock_class); } +static int init_task_trace_irqflags(unsigned long clone_flags, + struct task_struct *p) +{ + p->irq_events = 0; +#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW + p->hardirqs_enabled = 1; +#else + p->hardirqs_enabled = 0; +#endif + p->hardirq_enable_ip = 0; + p->hardirq_enable_event = 0; + p->hardirq_disable_ip = _THIS_IP_; + p->hardirq_disable_event = 0; + p->softirqs_enabled = 1; + p->softirq_enable_ip = _THIS_IP_; + p->softirq_enable_event = 0; + p->softirq_disable_ip = 0; + p->softirq_disable_event = 0; + p->hardirq_context = 0; + p->softirq_context = 0; + return 0; +} +task_watcher_func(init, init_task_trace_irqflags); #endif -- |
From: Matt H. <mat...@us...> - 2006-09-29 02:13:12
|
Register a task watcher for lockdep instead of hooking into copy_process(). Signed-off-by: Matt Helsley <mat...@us...> --- kernel/fork.c | 5 ----- kernel/lockdep.c | 9 +++++++++ 2 files changed, 9 insertions(+), 5 deletions(-) Index: linux-2.6.18-mm1/kernel/fork.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/fork.c +++ linux-2.6.18-mm1/kernel/fork.c @@ -1058,15 +1058,10 @@ static struct task_struct *copy_process( p->tgid = current->tgid; retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p); if (retval < 0) goto bad_fork_cleanup_delays_binfmt; -#ifdef CONFIG_LOCKDEP - p->lockdep_depth = 0; /* no locks held yet */ - p->curr_chain_key = 0; - p->lockdep_recursion = 0; -#endif rt_mutex_init_task(p); #ifdef CONFIG_DEBUG_MUTEXES p->blocked_on = NULL; /* not blocked yet */ Index: linux-2.6.18-mm1/kernel/lockdep.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/lockdep.c +++ linux-2.6.18-mm1/kernel/lockdep.c @@ -2555,10 +2555,19 @@ void __init lockdep_init(void) INIT_LIST_HEAD(chainhash_table + i); lockdep_initialized = 1; } +static int init_task_lockdep(unsigned long clone_flags, struct task_struct *p) +{ + p->lockdep_depth = 0; /* no locks held yet */ + p->curr_chain_key = 0; + p->lockdep_recursion = 0; + return 0; +} +task_watcher_func(init, init_task_lockdep); + void __init lockdep_info(void) { printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n"); printk("... MAX_LOCKDEP_SUBCLASSES: %lu\n", MAX_LOCKDEP_SUBCLASSES); -- |
From: Matt H. <mat...@us...> - 2006-09-29 02:13:12
|
This optional patch adds a fork/exit rate measurement facility using task watchers. It is intended to be a tool for measuring the impact of task watchers on fork and exit-heavy workloads. Signed-off-by: Matt Helsley <mat...@us...> --- kernel/Makefile | 1 kernel/twbench.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ lib/Kconfig.debug | 12 ++++++ 3 files changed, 116 insertions(+) Index: linux-2.6.18-mm1/kernel/twbench.c =================================================================== --- /dev/null +++ linux-2.6.18-mm1/kernel/twbench.c @@ -0,0 +1,103 @@ +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/task_watchers.h> + +#include <linux/debugfs.h> +#include <linux/time.h> +#include <linux/preempt.h> + +#include <asm/atomic.h> +#include <asm/div64.h> + +struct twb_counter { + atomic_t count; + u64 reset_time_ns; +}; + +static u64 read_reset_rate(void *p) +{ + u64 rate, ts_ns; + struct timespec ts; + struct twb_counter *ctr = p; + + /* begin pseudo-atomic region */ + preempt_disable(); + rate = atomic_xchg(((atomic_t*)&ctr->count), 0); + ktime_get_ts(&ts); + preempt_enable(); + /* end pseudo-atomic region */ + + ts_ns = timespec_to_ns(&ts); + rate *= NSEC_PER_SEC; + do_div(rate, (u32)(ts_ns - ctr->reset_time_ns)); + ctr->reset_time_ns = ts_ns; + return rate; +} + +/* Counter bits */ +static struct twb_counter num_clones; +static struct twb_counter num_exits; + +#ifndef MODULE +static int inc_clone(unsigned long val, struct task_struct *task) +{ + atomic_inc(&num_clones.count); + return 0; +} +task_watcher_func(clone, inc_clone); + +static int inc_exit(unsigned long val, struct task_struct *task) +{ + atomic_inc(&num_exits.count); + return 0; +} +task_watcher_func(exit, inc_exit); +#endif + +/* Debugfs bits */ +static struct dentry *twb_root, *twb_clone_file, *twb_exit_file; + +/* + * NOTE: Because open doesn't reset the count nor the reset time, + * userspace must discard the first values read before starting + * to monitor the rates. + */ +DEFINE_SIMPLE_ATTRIBUTE(rrr, read_reset_rate, NULL, "%llu\n"); + +static int __init twb_debugfs_init(void) +{ + twb_root = debugfs_create_dir("twbench", NULL); + if (!twb_root) + return -ENOMEM; + twb_clone_file = debugfs_create_file("clones", 0744, twb_root, + &num_clones, &rrr); + twb_exit_file = debugfs_create_file("exits", 0744, twb_root, + &num_exits, &rrr); + return 0; +} +#ifndef MODULE +__initcall(twb_debugfs_init); +#endif + +static void __exit twb_debugfs_exit(void) +{ + debugfs_remove(twb_clone_file); + debugfs_remove(twb_exit_file); + debugfs_remove(twb_root); +} + +static int __init twb_mod_init(void) +{ + int ret; + + ret = twb_debugfs_init(); + return ret; +} + +static void __exit twb_mod_exit(void) +{ + twb_debugfs_exit(); +} + +module_init(twb_mod_init); +module_exit(twb_mod_exit); Index: linux-2.6.18-mm1/lib/Kconfig.debug =================================================================== --- linux-2.6.18-mm1.orig/lib/Kconfig.debug +++ linux-2.6.18-mm1/lib/Kconfig.debug @@ -443,10 +443,22 @@ config RCU_TORTURE_TEST Say Y here if you want RCU torture tests to start automatically at boot time (you probably don't). Say M if you want the RCU torture tests to build as a module. Say N if you are unsure. +config TWBENCH + bool "Output fork/clone and exit rates" + depends on DEBUG_KERNEL && DEBUG_FS + default n + help + Print out the rate at which the system is fork/cloning new + processes to <debugfs>/twbench/clone + Print out the rate at which the system is exitting existing + processes to <debugfs>/twbench/exit + + If unsure, say N. + config LKDTM tristate "Linux Kernel Dump Test Tool Module" depends on KPROBES default n help Index: linux-2.6.18-mm1/kernel/Makefile =================================================================== --- linux-2.6.18-mm1.orig/kernel/Makefile +++ linux-2.6.18-mm1/kernel/Makefile @@ -45,10 +45,11 @@ obj-$(CONFIG_KPROBES) += kprobes.o obj-$(CONFIG_SYSFS) += ksysfs.o obj-$(CONFIG_DETECT_SOFTLOCKUP) += softlockup.o obj-$(CONFIG_GENERIC_HARDIRQS) += irq/ obj-$(CONFIG_SECCOMP) += seccomp.o obj-$(CONFIG_RCU_TORTURE_TEST) += rcutorture.o +obj-$(CONFIG_TWBENCH) += twbench.o obj-$(CONFIG_DEBUG_SYNCHRO_TEST) += synchro-test.o obj-$(CONFIG_RELAY) += relay.o obj-$(CONFIG_UTS_NS) += utsname.o obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o -- |
From: Matt H. <mat...@us...> - 2006-09-29 02:13:14
|
Associate function calls with significant events in a task's lifetime much like we handle init/exit functions. This creates a table for each of the following events in the task_watchers_table ELF section -- notify_task_watchers() is called with: WATCH_TASK_INIT at the beginning of a fork/clone system call when the new task struct first becomes available. WATCH_TASK_CLONE just before returning successfully from a fork/clone. WATCH_TASK_EXEC just before successfully returning from the exec system call. WATCH_TASK_UID every time a task's real or effective user id changes. WATCH_TASK_GID every time a task's real or effective group id changes. WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting for any reason. WATCH_TASK_FREE is called before critical task structures like the mm_struct become inaccessible and the task is subsequently freed. The next patch will add a debugfs interface for measuring fork and exit rates which can be used to calculate the overhead of the task watcher infrastructure. Subsequent patches will make use of task watchers to simplify fork, exit, and many of the system calls that set [er][ug]ids. Signed-off-by: Matt Helsley <mat...@us...> Cc: Andrew Morton <ak...@os...> Cc: Jes Sorensen <je...@sg...> Cc: Chandra S. Seetharaman <sek...@us...> Cc: Christoph Hellwig <hc...@ls...> Cc: Al Viro <vi...@ze...> Cc: Steve Grubb <sg...@re...> Cc: lin...@re... Cc: Paul Jackson <pj...@sg...> --- fs/exec.c | 3 +++ include/asm-generic/vmlinux.lds.h | 19 +++++++++++++++++++ include/linux/task_watchers.h | 31 +++++++++++++++++++++++++++++++ kernel/Makefile | 2 +- kernel/exit.c | 3 +++ kernel/fork.c | 15 +++++++++++---- kernel/sys.c | 9 +++++++++ kernel/task_watchers.c | 37 +++++++++++++++++++++++++++++++++++++ 8 files changed, 114 insertions(+), 5 deletions(-) Index: linux-2.6.18-mm1/kernel/sys.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/sys.c +++ linux-2.6.18-mm1/kernel/sys.c @@ -26,10 +26,11 @@ #include <linux/dcookies.h> #include <linux/suspend.h> #include <linux/tty.h> #include <linux/signal.h> #include <linux/cn_proc.h> +#include <linux/task_watchers.h> #include <linux/getcpu.h> #include <linux/compat.h> #include <linux/syscalls.h> #include <linux/kprobes.h> @@ -957,10 +958,11 @@ asmlinkage long sys_setregid(gid_t rgid, current->fsgid = new_egid; current->egid = new_egid; current->gid = new_rgid; key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); + notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } /* * setgid() is implemented like SysV w/ SAVED_IDS @@ -992,10 +994,11 @@ asmlinkage long sys_setgid(gid_t gid) else return -EPERM; key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); + notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } static int set_user(uid_t new_ruid, int dumpclear) { @@ -1080,10 +1083,11 @@ asmlinkage long sys_setreuid(uid_t ruid, current->suid = current->euid; current->fsuid = current->euid; key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); + notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); } @@ -1127,10 +1131,11 @@ asmlinkage long sys_setuid(uid_t uid) current->fsuid = current->euid = uid; current->suid = new_suid; key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); + notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); } @@ -1175,10 +1180,11 @@ asmlinkage long sys_setresuid(uid_t ruid if (suid != (uid_t) -1) current->suid = suid; key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); + notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); } asmlinkage long sys_getresuid(uid_t __user *ruid, uid_t __user *euid, uid_t __user *suid) @@ -1227,10 +1233,11 @@ asmlinkage long sys_setresgid(gid_t rgid if (sgid != (gid_t) -1) current->sgid = sgid; key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); + notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid) { @@ -1268,10 +1275,11 @@ asmlinkage long sys_setfsuid(uid_t uid) current->fsuid = uid; } key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); + notify_task_watchers(WATCH_TASK_UID, 0, current); security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); return old_fsuid; } @@ -1295,10 +1303,11 @@ asmlinkage long sys_setfsgid(gid_t gid) smp_wmb(); } current->fsgid = gid; key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); + notify_task_watchers(WATCH_TASK_GID, 0, current); } return old_fsgid; } asmlinkage long sys_times(struct tms __user * tbuf) Index: linux-2.6.18-mm1/kernel/exit.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/exit.c +++ linux-2.6.18-mm1/kernel/exit.c @@ -39,10 +39,11 @@ #include <linux/compat.h> #include <linux/pipe_fs_i.h> #include <linux/audit.h> /* for audit_free() */ #include <linux/resource.h> #include <linux/blkdev.h> +#include <linux/task_watchers.h> #include <asm/uaccess.h> #include <asm/unistd.h> #include <asm/pgtable.h> #include <asm/mmu_context.h> @@ -881,10 +882,11 @@ fastcall NORET_TYPE void do_exit(long co set_current_state(TASK_UNINTERRUPTIBLE); schedule(); } tsk->flags |= PF_EXITING; + notify_task_watchers(WATCH_TASK_EXIT, code, tsk); if (unlikely(in_atomic())) printk(KERN_INFO "note: %s[%d] exited with preempt_count %d\n", current->comm, current->pid, preempt_count()); @@ -912,10 +914,11 @@ fastcall NORET_TYPE void do_exit(long co audit_free(tsk); taskstats_exit_send(tsk, tidstats, group_dead, mycpu); taskstats_exit_free(tidstats); exit_mm(tsk); + notify_task_watchers(WATCH_TASK_FREE, code, tsk); if (group_dead) acct_process(); exit_sem(tsk); __exit_files(tsk); Index: linux-2.6.18-mm1/fs/exec.c =================================================================== --- linux-2.6.18-mm1.orig/fs/exec.c +++ linux-2.6.18-mm1/fs/exec.c @@ -47,10 +47,11 @@ #include <linux/syscalls.h> #include <linux/rmap.h> #include <linux/tsacct_kern.h> #include <linux/cn_proc.h> #include <linux/audit.h> +#include <linux/task_watchers.h> #include <asm/uaccess.h> #include <asm/mmu_context.h> #ifdef CONFIG_KMOD @@ -1082,10 +1083,12 @@ int search_binary_handler(struct linux_b allow_write_access(bprm->file); if (bprm->file) fput(bprm->file); bprm->file = NULL; current->did_exec = 1; + notify_task_watchers(WATCH_TASK_EXEC, 0, + current); proc_exec_connector(current); return retval; } read_lock(&binfmt_lock); put_binfmt(fmt); Index: linux-2.6.18-mm1/include/linux/task_watchers.h =================================================================== --- /dev/null +++ linux-2.6.18-mm1/include/linux/task_watchers.h @@ -0,0 +1,31 @@ +#ifndef _TASK_WATCHERS_H +#define _TASK_WATCHERS_H +#include <linux/sched.h> + +#define WATCH_TASK_INIT 0 +#define WATCH_TASK_CLONE 1 +#define WATCH_TASK_EXEC 2 +#define WATCH_TASK_UID 3 +#define WATCH_TASK_GID 4 +#define WATCH_TASK_EXIT 5 +#define WATCH_TASK_FREE 6 +#define NUM_WATCH_TASK_EVENTS 7 + +#ifndef MODULE +typedef int (*task_watcher_fn)(unsigned long, struct task_struct*); + +/* + * Watch for events occuring within a task and call the supplied function + * when (and only when) the given event happens. + * Only non-modular kernel code may register functions as task_watchers. + */ +#define task_watcher_func(ev, fn) \ +static task_watcher_fn __task_watcher_##ev##_##fn __attribute_used__ \ + __attribute__ ((__section__ (".task_watchers." #ev))) = fn +#else +#error "task_watcher() macro may not be used in modules." +#endif + +extern int notify_task_watchers(unsigned int ev_idx, unsigned long val, + struct task_struct *tsk); +#endif /* _TASK_WATCHERS_H */ Index: linux-2.6.18-mm1/kernel/task_watchers.c =================================================================== --- /dev/null +++ linux-2.6.18-mm1/kernel/task_watchers.c @@ -0,0 +1,37 @@ +#include <linux/task_watchers.h> + +/* Defined in include/asm-generic/common.lds.h */ +extern const task_watcher_fn __start_task_watchers_init[], + __start_task_watchers_clone[], __start_task_watchers_exec[], + __start_task_watchers_uid[], __start_task_watchers_gid[], + __start_task_watchers_exit[], __start_task_watchers_free[], + __stop_task_watchers_free[]; + +/* + * Tables of ptrs to the first watcher func for WATCH_TASK_* + */ +static const task_watcher_fn *twtable[] = { + __start_task_watchers_init, + __start_task_watchers_clone, + __start_task_watchers_exec, + __start_task_watchers_uid, + __start_task_watchers_gid, + __start_task_watchers_exit, + __start_task_watchers_free, + __stop_task_watchers_free, +}; + +int notify_task_watchers(unsigned int ev, unsigned long val, + struct task_struct *tsk) +{ + const task_watcher_fn *tw_call; + int ret_err = 0, err; + + /* Call all of the watchers, report the first error */ + for (tw_call = twtable[ev]; tw_call < twtable[ev + 1]; tw_call++) { + err = (*tw_call)(val, tsk); + if (unlikely((err < 0) && (ret_err == NOTIFY_OK))) + ret_err = err; + } + return ret_err; +} Index: linux-2.6.18-mm1/kernel/Makefile =================================================================== --- linux-2.6.18-mm1.orig/kernel/Makefile +++ linux-2.6.18-mm1/kernel/Makefile @@ -6,11 +6,11 @@ obj-y = sched.o fork.o exec_domain.o exit.o itimer.o time.o softirq.o resource.o \ sysctl.o capability.o ptrace.o timer.o user.o \ signal.o sys.o kmod.o workqueue.o pid.o \ rcupdate.o extable.o params.o posix-timers.o \ kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \ - hrtimer.o rwsem.o latency.o nsproxy.o srcu.o + hrtimer.o rwsem.o latency.o nsproxy.o srcu.o task_watchers.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-y += time/ obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o obj-$(CONFIG_LOCKDEP) += lockdep.o Index: linux-2.6.18-mm1/kernel/fork.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/fork.c +++ linux-2.6.18-mm1/kernel/fork.c @@ -46,10 +46,11 @@ #include <linux/tsacct_kern.h> #include <linux/cn_proc.h> #include <linux/delayacct.h> #include <linux/taskstats_kern.h> #include <linux/random.h> +#include <linux/task_watchers.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> #include <asm/uaccess.h> #include <asm/mmu_context.h> @@ -1051,10 +1052,18 @@ static struct task_struct *copy_process( do_posix_clock_monotonic_gettime(&p->start_time); p->security = NULL; p->io_context = NULL; p->io_wait = NULL; p->audit_context = NULL; + + p->tgid = p->pid; + if (clone_flags & CLONE_THREAD) + p->tgid = current->tgid; + + retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p); + if (retval < 0) + goto bad_fork_cleanup_delays_binfmt; cpuset_fork(p); #ifdef CONFIG_NUMA p->mempolicy = mpol_copy(p->mempolicy); if (IS_ERR(p->mempolicy)) { retval = PTR_ERR(p->mempolicy); @@ -1092,14 +1101,10 @@ static struct task_struct *copy_process( #ifdef CONFIG_DEBUG_MUTEXES p->blocked_on = NULL; /* not blocked yet */ #endif - p->tgid = p->pid; - if (clone_flags & CLONE_THREAD) - p->tgid = current->tgid; - if ((retval = security_task_alloc(p))) goto bad_fork_cleanup_policy; if ((retval = audit_alloc(p))) goto bad_fork_cleanup_security; /* copy all the process information */ @@ -1256,10 +1261,11 @@ static struct task_struct *copy_process( } total_forks++; spin_unlock(¤t->sighand->siglock); write_unlock_irq(&tasklist_lock); + notify_task_watchers(WATCH_TASK_CLONE, clone_flags, p); proc_fork_connector(p); return p; bad_fork_cleanup_namespaces: exit_task_namespaces(p); @@ -1288,10 +1294,11 @@ bad_fork_cleanup_policy: bad_fork_cleanup_cpuset: #endif cpuset_exit(p); bad_fork_cleanup_delays_binfmt: delayacct_tsk_free(p); + notify_task_watchers(WATCH_TASK_FREE, 0, p); if (p->binfmt) module_put(p->binfmt->module); bad_fork_cleanup_put_domain: module_put(task_thread_info(p)->exec_domain->module); bad_fork_cleanup_count: Index: linux-2.6.18-mm1/include/asm-generic/vmlinux.lds.h =================================================================== --- linux-2.6.18-mm1.orig/include/asm-generic/vmlinux.lds.h +++ linux-2.6.18-mm1/include/asm-generic/vmlinux.lds.h @@ -42,10 +42,29 @@ VMLINUX_SYMBOL(__start_rio_route_ops) = .; \ *(.rio_route_ops) \ VMLINUX_SYMBOL(__end_rio_route_ops) = .; \ } \ \ + .task_watchers_table : AT(ADDR(.task_watchers_table) - LOAD_OFFSET) { \ + *(.task_watchers_table) \ + VMLINUX_SYMBOL(__start_task_watchers_init) = .; \ + *(.task_watchers.init) \ + VMLINUX_SYMBOL(__start_task_watchers_clone) = .; \ + *(.task_watchers.clone) \ + VMLINUX_SYMBOL(__start_task_watchers_exec) = .; \ + *(.task_watchers.exec) \ + VMLINUX_SYMBOL(__start_task_watchers_uid) = .; \ + *(.task_watchers.uid) \ + VMLINUX_SYMBOL(__start_task_watchers_gid) = .; \ + *(.task_watchers.gid) \ + VMLINUX_SYMBOL(__start_task_watchers_exit) = .; \ + *(.task_watchers.exit) \ + VMLINUX_SYMBOL(__start_task_watchers_free) = .; \ + *(.task_watchers.free) \ + VMLINUX_SYMBOL(__stop_task_watchers_free) = .; \ + } \ + \ /* Kernel symbol table: Normal symbols */ \ __ksymtab : AT(ADDR(__ksymtab) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___ksymtab) = .; \ *(__ksymtab) \ VMLINUX_SYMBOL(__stop___ksymtab) = .; \ -- |
From: Matt H. <mat...@us...> - 2006-09-29 02:13:16
|
Make the semaphore undo code use a task watcher instead of hooking into copy_process() and do_exit() directly. Signed-off-by: Matt Helsley <mat...@us...> --- include/linux/sem.h | 17 ----------------- ipc/sem.c | 12 ++++++++---- kernel/exit.c | 1 - kernel/fork.c | 6 +----- 4 files changed, 9 insertions(+), 27 deletions(-) Index: linux-2.6.18-mm1/ipc/sem.c =================================================================== --- linux-2.6.18-mm1.orig/ipc/sem.c +++ linux-2.6.18-mm1/ipc/sem.c @@ -81,10 +81,11 @@ #include <linux/audit.h> #include <linux/capability.h> #include <linux/seq_file.h> #include <linux/mutex.h> #include <linux/nsproxy.h> +#include <linux/task_watchers.h> #include <asm/uaccess.h> #include "util.h" #define sem_ids(ns) (*((ns)->ids[IPC_SEM_IDS])) @@ -1286,11 +1287,11 @@ asmlinkage long sys_semop (int semid, st * See the notes above unlock_semundo() regarding the spin_lock_init() * in this code. Initialize the undo_list->lock here instead of get_undo_list() * because of the reasoning in the comment above unlock_semundo. */ -int copy_semundo(unsigned long clone_flags, struct task_struct *tsk) +static int copy_semundo(unsigned long clone_flags, struct task_struct *tsk) { struct sem_undo_list *undo_list; int error; if (clone_flags & CLONE_SYSVSEM) { @@ -1302,10 +1303,11 @@ int copy_semundo(unsigned long clone_fla } else tsk->sysvsem.undo_list = NULL; return 0; } +task_watcher_func(init, copy_semundo); /* * add semadj values to semaphores, free undo structures. * undo structures are not freed when semaphore arrays are destroyed * so some of them may be out of date. @@ -1315,22 +1317,22 @@ int copy_semundo(unsigned long clone_fla * should we queue up and wait until we can do so legally? * The original implementation attempted to do this (queue and wait). * The current implementation does not do so. The POSIX standard * and SVID should be consulted to determine what behavior is mandated. */ -void exit_sem(struct task_struct *tsk) +static int exit_sem(unsigned long ignored, struct task_struct *tsk) { struct sem_undo_list *undo_list; struct sem_undo *u, **up; struct ipc_namespace *ns; undo_list = tsk->sysvsem.undo_list; if (!undo_list) - return; + return 0; if (!atomic_dec_and_test(&undo_list->refcnt)) - return; + return 0; ns = tsk->nsproxy->ipc_ns; /* There's no need to hold the semundo list lock, as current * is the last task exiting for this undo list. */ @@ -1393,11 +1395,13 @@ found: update_queue(sma); next_entry: sem_unlock(sma); } kfree(undo_list); + return 0; } +task_watcher_func(free, exit_sem); #ifdef CONFIG_PROC_FS static int sysvipc_sem_proc_show(struct seq_file *s, void *it) { struct sem_array *sma = it; Index: linux-2.6.18-mm1/kernel/exit.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/exit.c +++ linux-2.6.18-mm1/kernel/exit.c @@ -915,11 +915,10 @@ fastcall NORET_TYPE void do_exit(long co exit_mm(tsk); notify_task_watchers(WATCH_TASK_FREE, code, tsk); if (group_dead) acct_process(); - exit_sem(tsk); __exit_files(tsk); __exit_fs(tsk); exit_thread(); cpuset_exit(tsk); exit_keys(tsk); Index: linux-2.6.18-mm1/kernel/fork.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/fork.c +++ linux-2.6.18-mm1/kernel/fork.c @@ -1103,14 +1103,12 @@ static struct task_struct *copy_process( #endif if ((retval = security_task_alloc(p))) goto bad_fork_cleanup_policy; /* copy all the process information */ - if ((retval = copy_semundo(clone_flags, p))) - goto bad_fork_cleanup_security; if ((retval = copy_files(clone_flags, p))) - goto bad_fork_cleanup_semundo; + goto bad_fork_cleanup_security; if ((retval = copy_fs(clone_flags, p))) goto bad_fork_cleanup_files; if ((retval = copy_sighand(clone_flags, p))) goto bad_fork_cleanup_fs; if ((retval = copy_signal(clone_flags, p))) @@ -1277,12 +1275,10 @@ bad_fork_cleanup_sighand: __cleanup_sighand(p->sighand); bad_fork_cleanup_fs: exit_fs(p); /* blocking */ bad_fork_cleanup_files: exit_files(p); /* blocking */ -bad_fork_cleanup_semundo: - exit_sem(p); bad_fork_cleanup_security: security_task_free(p); bad_fork_cleanup_policy: #ifdef CONFIG_NUMA mpol_free(p->mempolicy); Index: linux-2.6.18-mm1/include/linux/sem.h =================================================================== --- linux-2.6.18-mm1.orig/include/linux/sem.h +++ linux-2.6.18-mm1/include/linux/sem.h @@ -136,25 +136,8 @@ struct sem_undo_list { struct sysv_sem { struct sem_undo_list *undo_list; }; -#ifdef CONFIG_SYSVIPC - -extern int copy_semundo(unsigned long clone_flags, struct task_struct *tsk); -extern void exit_sem(struct task_struct *tsk); - -#else -static inline int copy_semundo(unsigned long clone_flags, struct task_struct *tsk) -{ - return 0; -} - -static inline void exit_sem(struct task_struct *tsk) -{ - return; -} -#endif - #endif /* __KERNEL__ */ #endif /* _LINUX_SEM_H */ -- |
From: Matt H. <mat...@us...> - 2006-09-29 02:13:17
|
Make the Process events connector use task watchers instead of hooking the paths it's interested in. Signed-off-by: Matt Helsley <mat...@us...> --- drivers/connector/cn_proc.c | 52 +++++++++++++++++++++++++++++++------------- fs/exec.c | 1 include/linux/cn_proc.h | 21 ----------------- kernel/exit.c | 2 - kernel/fork.c | 2 - kernel/sys.c | 9 ------- 6 files changed, 37 insertions(+), 50 deletions(-) Index: linux-2.6.18-mm1/drivers/connector/cn_proc.c =================================================================== --- linux-2.6.18-mm1.orig/drivers/connector/cn_proc.c +++ linux-2.6.18-mm1/drivers/connector/cn_proc.c @@ -25,10 +25,11 @@ #include <linux/module.h> #include <linux/kernel.h> #include <linux/ktime.h> #include <linux/init.h> #include <linux/connector.h> +#include <linux/task_watchers.h> #include <asm/atomic.h> #include <linux/cn_proc.h> #define CN_PROC_MSG_SIZE (sizeof(struct cn_msg) + sizeof(struct proc_event)) @@ -44,19 +45,20 @@ static inline void get_seq(__u32 *ts, in *ts = get_cpu_var(proc_event_counts)++; *cpu = smp_processor_id(); put_cpu_var(proc_event_counts); } -void proc_fork_connector(struct task_struct *task) +static int proc_fork_connector(unsigned long clone_flags, + struct task_struct *task) { struct cn_msg *msg; struct proc_event *ev; __u8 buffer[CN_PROC_MSG_SIZE]; struct timespec ts; if (atomic_read(&proc_event_num_listeners) < 1) - return; + return 0; msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; get_seq(&msg->seq, &ev->cpu); ktime_get_ts(&ts); /* get high res monotonic timestamp */ @@ -70,21 +72,24 @@ void proc_fork_connector(struct task_str memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); /* If cn_netlink_send() failed, the data is not sent */ cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); + return 0; } +task_watcher_func(clone, proc_fork_connector); -void proc_exec_connector(struct task_struct *task) +static int proc_exec_connector(unsigned long ignore, + struct task_struct *task) { struct cn_msg *msg; struct proc_event *ev; struct timespec ts; __u8 buffer[CN_PROC_MSG_SIZE]; if (atomic_read(&proc_event_num_listeners) < 1) - return; + return 0; msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; get_seq(&msg->seq, &ev->cpu); ktime_get_ts(&ts); /* get high res monotonic timestamp */ @@ -95,21 +100,23 @@ void proc_exec_connector(struct task_str memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); + return 0; } +task_watcher_func(exec, proc_exec_connector); -void proc_id_connector(struct task_struct *task, int which_id) +static int process_change_id(unsigned long which_id, struct task_struct *task) { struct cn_msg *msg; struct proc_event *ev; __u8 buffer[CN_PROC_MSG_SIZE]; struct timespec ts; if (atomic_read(&proc_event_num_listeners) < 1) - return; + return 0; msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; ev->what = which_id; ev->event_data.id.process_pid = task->pid; @@ -119,47 +126,64 @@ void proc_id_connector(struct task_struc ev->event_data.id.e.euid = task->euid; } else if (which_id == PROC_EVENT_GID) { ev->event_data.id.r.rgid = task->gid; ev->event_data.id.e.egid = task->egid; } else - return; + return 0; get_seq(&msg->seq, &ev->cpu); ktime_get_ts(&ts); /* get high res monotonic timestamp */ ev->timestamp_ns = timespec_to_ns(&ts); memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); + return 0; +} + +static int proc_change_uid_connector(unsigned long ignore, + struct task_struct *task) +{ + return process_change_id(PROC_EVENT_UID, task); +} +task_watcher_func(uid, proc_change_uid_connector); + +static int proc_change_gid_connector(unsigned long ignore, + struct task_struct *task) +{ + return process_change_id(PROC_EVENT_GID, task); } +task_watcher_func(gid, proc_change_gid_connector); -void proc_exit_connector(struct task_struct *task) +static int proc_exit_connector(unsigned long code, struct task_struct *task) { struct cn_msg *msg; struct proc_event *ev; __u8 buffer[CN_PROC_MSG_SIZE]; struct timespec ts; if (atomic_read(&proc_event_num_listeners) < 1) - return; + return 0; msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; get_seq(&msg->seq, &ev->cpu); ktime_get_ts(&ts); /* get high res monotonic timestamp */ ev->timestamp_ns = timespec_to_ns(&ts); ev->what = PROC_EVENT_EXIT; ev->event_data.exit.process_pid = task->pid; ev->event_data.exit.process_tgid = task->tgid; - ev->event_data.exit.exit_code = task->exit_code; + ev->event_data.exit.exit_code = code; ev->event_data.exit.exit_signal = task->exit_signal; memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); + return 0; } +task_watcher_func(exit, proc_exit_connector); /* * Send an acknowledgement message to userspace * * Use 0 for success, EFOO otherwise. @@ -226,14 +250,12 @@ static void cn_proc_mcast_ctl(void *data */ static int __init cn_proc_init(void) { int err; - if ((err = cn_add_callback(&cn_proc_event_id, "cn_proc", - &cn_proc_mcast_ctl))) { + err = cn_add_callback(&cn_proc_event_id, "cn_proc", &cn_proc_mcast_ctl); + if (err) printk(KERN_WARNING "cn_proc failed to register\n"); - return err; - } - return 0; + return err; } module_init(cn_proc_init); Index: linux-2.6.18-mm1/kernel/fork.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/fork.c +++ linux-2.6.18-mm1/kernel/fork.c @@ -40,11 +40,10 @@ #include <linux/mount.h> #include <linux/profile.h> #include <linux/rmap.h> #include <linux/acct.h> #include <linux/tsacct_kern.h> -#include <linux/cn_proc.h> #include <linux/delayacct.h> #include <linux/taskstats_kern.h> #include <linux/random.h> #include <linux/task_watchers.h> @@ -1220,11 +1219,10 @@ static struct task_struct *copy_process( total_forks++; spin_unlock(¤t->sighand->siglock); write_unlock_irq(&tasklist_lock); notify_task_watchers(WATCH_TASK_CLONE, clone_flags, p); - proc_fork_connector(p); return p; bad_fork_cleanup_namespaces: exit_task_namespaces(p); bad_fork_cleanup_mm: Index: linux-2.6.18-mm1/kernel/exit.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/exit.c +++ linux-2.6.18-mm1/kernel/exit.c @@ -29,11 +29,10 @@ #include <linux/taskstats_kern.h> #include <linux/delayacct.h> #include <linux/syscalls.h> #include <linux/signal.h> #include <linux/posix-timers.h> -#include <linux/cn_proc.h> #include <linux/mutex.h> #include <linux/futex.h> #include <linux/compat.h> #include <linux/pipe_fs_i.h> #include <linux/resource.h> @@ -925,11 +924,10 @@ fastcall NORET_TYPE void do_exit(long co module_put(task_thread_info(tsk)->exec_domain->module); if (tsk->binfmt) module_put(tsk->binfmt->module); tsk->exit_code = code; - proc_exit_connector(tsk); exit_notify(tsk); exit_task_namespaces(tsk); /* * This must happen late, after the PID is not * hashed anymore: Index: linux-2.6.18-mm1/kernel/sys.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/sys.c +++ linux-2.6.18-mm1/kernel/sys.c @@ -25,11 +25,10 @@ #include <linux/security.h> #include <linux/dcookies.h> #include <linux/suspend.h> #include <linux/tty.h> #include <linux/signal.h> -#include <linux/cn_proc.h> #include <linux/task_watchers.h> #include <linux/getcpu.h> #include <linux/compat.h> #include <linux/syscalls.h> @@ -956,11 +955,10 @@ asmlinkage long sys_setregid(gid_t rgid, (egid != (gid_t) -1 && egid != old_rgid)) current->sgid = new_egid; current->fsgid = new_egid; current->egid = new_egid; current->gid = new_rgid; - proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } /* @@ -991,11 +989,10 @@ asmlinkage long sys_setgid(gid_t gid) current->egid = current->fsgid = gid; } else return -EPERM; - proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } static int set_user(uid_t new_ruid, int dumpclear) @@ -1079,11 +1076,10 @@ asmlinkage long sys_setreuid(uid_t ruid, if (ruid != (uid_t) -1 || (euid != (uid_t) -1 && euid != old_ruid)) current->suid = current->euid; current->fsuid = current->euid; - proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); } @@ -1126,11 +1122,10 @@ asmlinkage long sys_setuid(uid_t uid) smp_wmb(); } current->fsuid = current->euid = uid; current->suid = new_suid; - proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); } @@ -1174,11 +1169,10 @@ asmlinkage long sys_setresuid(uid_t ruid } current->fsuid = current->euid; if (suid != (uid_t) -1) current->suid = suid; - proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); } @@ -1226,11 +1220,10 @@ asmlinkage long sys_setresgid(gid_t rgid if (rgid != (gid_t) -1) current->gid = rgid; if (sgid != (gid_t) -1) current->sgid = sgid; - proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid) @@ -1267,11 +1260,10 @@ asmlinkage long sys_setfsuid(uid_t uid) smp_wmb(); } current->fsuid = uid; } - proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); return old_fsuid; @@ -1294,11 +1286,10 @@ asmlinkage long sys_setfsgid(gid_t gid) if (gid != old_fsgid) { current->mm->dumpable = suid_dumpable; smp_wmb(); } current->fsgid = gid; - proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); } return old_fsgid; } Index: linux-2.6.18-mm1/fs/exec.c =================================================================== --- linux-2.6.18-mm1.orig/fs/exec.c +++ linux-2.6.18-mm1/fs/exec.c @@ -1085,11 +1085,10 @@ int search_binary_handler(struct linux_b fput(bprm->file); bprm->file = NULL; current->did_exec = 1; notify_task_watchers(WATCH_TASK_EXEC, 0, current); - proc_exec_connector(current); return retval; } read_lock(&binfmt_lock); put_binfmt(fmt); if (retval != -ENOEXEC || bprm->mm == NULL) Index: linux-2.6.18-mm1/include/linux/cn_proc.h =================================================================== --- linux-2.6.18-mm1.orig/include/linux/cn_proc.h +++ linux-2.6.18-mm1/include/linux/cn_proc.h @@ -95,27 +95,6 @@ struct proc_event { __u32 exit_code, exit_signal; } exit; } event_data; }; -#ifdef __KERNEL__ -#ifdef CONFIG_PROC_EVENTS -void proc_fork_connector(struct task_struct *task); -void proc_exec_connector(struct task_struct *task); -void proc_id_connector(struct task_struct *task, int which_id); -void proc_exit_connector(struct task_struct *task); -#else -static inline void proc_fork_connector(struct task_struct *task) -{} - -static inline void proc_exec_connector(struct task_struct *task) -{} - -static inline void proc_id_connector(struct task_struct *task, - int which_id) -{} - -static inline void proc_exit_connector(struct task_struct *task) -{} -#endif /* CONFIG_PROC_EVENTS */ -#endif /* __KERNEL__ */ #endif /* CN_PROC_H */ -- |
From: Matt H. <mat...@us...> - 2006-09-29 02:14:08
|
Register a NUMA mempolicy task watcher instead of hooking into copy_process() and do_exit() directly. Signed-off-by: Matt Helsley <mat...@us...> --- kernel/exit.c | 4 ---- kernel/fork.c | 15 +-------------- mm/mempolicy.c | 24 ++++++++++++++++++++++++ 3 files changed, 25 insertions(+), 18 deletions(-) Index: linux-2.6.18-mm1/mm/mempolicy.c =================================================================== --- linux-2.6.18-mm1.orig/mm/mempolicy.c +++ linux-2.6.18-mm1/mm/mempolicy.c @@ -87,10 +87,11 @@ #include <linux/seq_file.h> #include <linux/proc_fs.h> #include <linux/migrate.h> #include <linux/rmap.h> #include <linux/security.h> +#include <linux/task_watchers.h> #include <asm/tlbflush.h> #include <asm/uaccess.h> /* Internal flags */ @@ -1331,10 +1332,33 @@ struct mempolicy *__mpol_copy(struct mem } } return new; } +static int init_task_mempolicy(unsigned long clone_flags, + struct task_struct *tsk) +{ + tsk->mempolicy = mpol_copy(tsk->mempolicy); + if (IS_ERR(tsk->mempolicy)) { + int retval; + + retval = PTR_ERR(tsk->mempolicy); + tsk->mempolicy = NULL; + return retval; + } + mpol_fix_fork_child_flag(tsk); + return 0; +} +task_watcher_func(init, init_task_mempolicy); + +static int free_task_mempolicy(unsigned int ignored, struct task_struct *tsk) +{ + mpol_free(tsk); + tsk->mempolicy = NULL; +} +task_watcher_func(free, free_task_mempolicy); + /* Slow path of a mempolicy comparison */ int __mpol_equal(struct mempolicy *a, struct mempolicy *b) { if (!a || !b) return 0; Index: linux-2.6.18-mm1/kernel/fork.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/fork.c +++ linux-2.6.18-mm1/kernel/fork.c @@ -1058,19 +1058,10 @@ static struct task_struct *copy_process( p->tgid = current->tgid; retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p); if (retval < 0) goto bad_fork_cleanup_delays_binfmt; -#ifdef CONFIG_NUMA - p->mempolicy = mpol_copy(p->mempolicy); - if (IS_ERR(p->mempolicy)) { - retval = PTR_ERR(p->mempolicy); - p->mempolicy = NULL; - goto bad_fork_cleanup_delays_binfmt; - } - mpol_fix_fork_child_flag(p); -#endif #ifdef CONFIG_TRACE_IRQFLAGS p->irq_events = 0; #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW p->hardirqs_enabled = 1; #else @@ -1099,11 +1090,11 @@ static struct task_struct *copy_process( #ifdef CONFIG_DEBUG_MUTEXES p->blocked_on = NULL; /* not blocked yet */ #endif if ((retval = security_task_alloc(p))) - goto bad_fork_cleanup_policy; + goto bad_fork_cleanup_delays_binfmt; /* copy all the process information */ if ((retval = copy_files(clone_flags, p))) goto bad_fork_cleanup_security; if ((retval = copy_fs(clone_flags, p))) goto bad_fork_cleanup_files; @@ -1275,14 +1266,10 @@ bad_fork_cleanup_fs: exit_fs(p); /* blocking */ bad_fork_cleanup_files: exit_files(p); /* blocking */ bad_fork_cleanup_security: security_task_free(p); -bad_fork_cleanup_policy: -#ifdef CONFIG_NUMA - mpol_free(p->mempolicy); -#endif bad_fork_cleanup_delays_binfmt: delayacct_tsk_free(p); notify_task_watchers(WATCH_TASK_FREE, 0, p); if (p->binfmt) module_put(p->binfmt->module); Index: linux-2.6.18-mm1/kernel/exit.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/exit.c +++ linux-2.6.18-mm1/kernel/exit.c @@ -930,14 +930,10 @@ fastcall NORET_TYPE void do_exit(long co tsk->exit_code = code; proc_exit_connector(tsk); exit_notify(tsk); exit_task_namespaces(tsk); -#ifdef CONFIG_NUMA - mpol_free(tsk->mempolicy); - tsk->mempolicy = NULL; -#endif /* * This must happen late, after the PID is not * hashed anymore: */ if (unlikely(!list_empty(&tsk->pi_state_list))) -- |
From: Matt H. <mat...@us...> - 2006-09-29 02:14:09
|
Make the keyring code use a task watcher to initialize and free per-task data. NOTE: We can't make copy_thread_group_keys() in copy_signal() a task watcher because it needs the task's signal field (struct signal_struct). Signed-off-by: Matt Helsley <mat...@us...> Cc: David Howells <dho...@re...> --- include/linux/key.h | 8 -------- kernel/exit.c | 2 -- kernel/fork.c | 6 +----- kernel/sys.c | 8 -------- security/keys/process_keys.c | 19 ++++++++++++------- 5 files changed, 13 insertions(+), 30 deletions(-) Index: linux-2.6.18-mm1/include/linux/key.h =================================================================== --- linux-2.6.18-mm1.orig/include/linux/key.h +++ linux-2.6.18-mm1/include/linux/key.h @@ -335,18 +335,14 @@ extern void keyring_replace_payload(stru */ extern struct key root_user_keyring, root_session_keyring; extern int alloc_uid_keyring(struct user_struct *user, struct task_struct *ctx); extern void switch_uid_keyring(struct user_struct *new_user); -extern int copy_keys(unsigned long clone_flags, struct task_struct *tsk); extern int copy_thread_group_keys(struct task_struct *tsk); -extern void exit_keys(struct task_struct *tsk); extern void exit_thread_group_keys(struct signal_struct *tg); extern int suid_keys(struct task_struct *tsk); extern int exec_keys(struct task_struct *tsk); -extern void key_fsuid_changed(struct task_struct *tsk); -extern void key_fsgid_changed(struct task_struct *tsk); extern void key_init(void); #define __install_session_keyring(tsk, keyring) \ ({ \ struct key *old_session = tsk->signal->session_keyring; \ @@ -365,18 +361,14 @@ extern void key_init(void); #define key_ref_to_ptr(k) ({ NULL; }) #define is_key_possessed(k) 0 #define alloc_uid_keyring(u,c) 0 #define switch_uid_keyring(u) do { } while(0) #define __install_session_keyring(t, k) ({ NULL; }) -#define copy_keys(f,t) 0 #define copy_thread_group_keys(t) 0 -#define exit_keys(t) do { } while(0) #define exit_thread_group_keys(tg) do { } while(0) #define suid_keys(t) do { } while(0) #define exec_keys(t) do { } while(0) -#define key_fsuid_changed(t) do { } while(0) -#define key_fsgid_changed(t) do { } while(0) #define key_init() do { } while(0) /* Initial keyrings */ extern struct key root_user_keyring; extern struct key root_session_keyring; Index: linux-2.6.18-mm1/kernel/fork.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/fork.c +++ linux-2.6.18-mm1/kernel/fork.c @@ -1078,14 +1078,12 @@ static struct task_struct *copy_process( goto bad_fork_cleanup_fs; if ((retval = copy_signal(clone_flags, p))) goto bad_fork_cleanup_sighand; if ((retval = copy_mm(clone_flags, p))) goto bad_fork_cleanup_signal; - if ((retval = copy_keys(clone_flags, p))) - goto bad_fork_cleanup_mm; if ((retval = copy_namespaces(clone_flags, p))) - goto bad_fork_cleanup_keys; + goto bad_fork_cleanup_mm; retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs); if (retval) goto bad_fork_cleanup_namespaces; p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL; @@ -1227,12 +1225,10 @@ static struct task_struct *copy_process( proc_fork_connector(p); return p; bad_fork_cleanup_namespaces: exit_task_namespaces(p); -bad_fork_cleanup_keys: - exit_keys(p); bad_fork_cleanup_mm: if (p->mm) mmput(p->mm); bad_fork_cleanup_signal: cleanup_signal(p); Index: linux-2.6.18-mm1/security/keys/process_keys.c =================================================================== --- linux-2.6.18-mm1.orig/security/keys/process_keys.c +++ linux-2.6.18-mm1/security/keys/process_keys.c @@ -15,10 +15,11 @@ #include <linux/slab.h> #include <linux/keyctl.h> #include <linux/fs.h> #include <linux/err.h> #include <linux/mutex.h> +#include <linux/task_watchers.h> #include <asm/uaccess.h> #include "internal.h" /* session keyring create vs join semaphore */ static DEFINE_MUTEX(key_session_mutex); @@ -276,11 +277,11 @@ int copy_thread_group_keys(struct task_s /*****************************************************************************/ /* * copy the keys for fork */ -int copy_keys(unsigned long clone_flags, struct task_struct *tsk) +static int copy_keys(unsigned long clone_flags, struct task_struct *tsk) { key_check(tsk->thread_keyring); key_check(tsk->request_key_auth); /* no thread keyring yet */ @@ -290,10 +291,11 @@ int copy_keys(unsigned long clone_flags, key_get(tsk->request_key_auth); return 0; } /* end copy_keys() */ +task_watcher_func(init, copy_keys); /*****************************************************************************/ /* * dispose of thread group keys upon thread group destruction */ @@ -306,16 +308,17 @@ void exit_thread_group_keys(struct signa /*****************************************************************************/ /* * dispose of per-thread keys upon thread exit */ -void exit_keys(struct task_struct *tsk) +static int exit_keys(unsigned long exit_code, struct task_struct *tsk) { key_put(tsk->thread_keyring); key_put(tsk->request_key_auth); - + return 0; } /* end exit_keys() */ +task_watcher_func(free, exit_keys); /*****************************************************************************/ /* * deal with execve() */ @@ -356,35 +359,37 @@ int suid_keys(struct task_struct *tsk) /*****************************************************************************/ /* * the filesystem user ID changed */ -void key_fsuid_changed(struct task_struct *tsk) +static int key_fsuid_changed(unsigned long ignored, struct task_struct *tsk) { /* update the ownership of the thread keyring */ if (tsk->thread_keyring) { down_write(&tsk->thread_keyring->sem); tsk->thread_keyring->uid = tsk->fsuid; up_write(&tsk->thread_keyring->sem); } - + return 0; } /* end key_fsuid_changed() */ +task_watcher_func(uid, key_fsuid_changed); /*****************************************************************************/ /* * the filesystem group ID changed */ -void key_fsgid_changed(struct task_struct *tsk) +static int key_fsgid_changed(unsigned long ignored, struct task_struct *tsk) { /* update the ownership of the thread keyring */ if (tsk->thread_keyring) { down_write(&tsk->thread_keyring->sem); tsk->thread_keyring->gid = tsk->fsgid; up_write(&tsk->thread_keyring->sem); } - + return 0; } /* end key_fsgid_changed() */ +task_watcher_func(gid, key_fsgid_changed); /*****************************************************************************/ /* * search the process keyrings for the first matching key * - we use the supplied match function to see if the description (or other Index: linux-2.6.18-mm1/kernel/exit.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/exit.c +++ linux-2.6.18-mm1/kernel/exit.c @@ -12,11 +12,10 @@ #include <linux/capability.h> #include <linux/completion.h> #include <linux/personality.h> #include <linux/tty.h> #include <linux/namespace.h> -#include <linux/key.h> #include <linux/security.h> #include <linux/cpu.h> #include <linux/acct.h> #include <linux/tsacct_kern.h> #include <linux/file.h> @@ -917,11 +916,10 @@ fastcall NORET_TYPE void do_exit(long co if (group_dead) acct_process(); __exit_files(tsk); __exit_fs(tsk); exit_thread(); - exit_keys(tsk); if (group_dead && tsk->signal->leader) disassociate_ctty(1); module_put(task_thread_info(tsk)->exec_domain->module); Index: linux-2.6.18-mm1/kernel/sys.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/sys.c +++ linux-2.6.18-mm1/kernel/sys.c @@ -956,11 +956,10 @@ asmlinkage long sys_setregid(gid_t rgid, (egid != (gid_t) -1 && egid != old_rgid)) current->sgid = new_egid; current->fsgid = new_egid; current->egid = new_egid; current->gid = new_rgid; - key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } @@ -992,11 +991,10 @@ asmlinkage long sys_setgid(gid_t gid) current->egid = current->fsgid = gid; } else return -EPERM; - key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } @@ -1081,11 +1079,10 @@ asmlinkage long sys_setreuid(uid_t ruid, if (ruid != (uid_t) -1 || (euid != (uid_t) -1 && euid != old_ruid)) current->suid = current->euid; current->fsuid = current->euid; - key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); } @@ -1129,11 +1126,10 @@ asmlinkage long sys_setuid(uid_t uid) smp_wmb(); } current->fsuid = current->euid = uid; current->suid = new_suid; - key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); } @@ -1178,11 +1174,10 @@ asmlinkage long sys_setresuid(uid_t ruid } current->fsuid = current->euid; if (suid != (uid_t) -1) current->suid = suid; - key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); } @@ -1231,11 +1226,10 @@ asmlinkage long sys_setresgid(gid_t rgid if (rgid != (gid_t) -1) current->gid = rgid; if (sgid != (gid_t) -1) current->sgid = sgid; - key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } @@ -1273,11 +1267,10 @@ asmlinkage long sys_setfsuid(uid_t uid) smp_wmb(); } current->fsuid = uid; } - key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); @@ -1301,11 +1294,10 @@ asmlinkage long sys_setfsgid(gid_t gid) if (gid != old_fsgid) { current->mm->dumpable = suid_dumpable; smp_wmb(); } current->fsgid = gid; - key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); } return old_fsgid; } -- |
From: Paul J. <pj...@sg...> - 2006-09-29 02:31:44
|
Matt wrote: > - cpuset_fork(p); > #ifdef CONFIG_NUMA > p->mempolicy = mpol_copy(p->mempolicy); > if (IS_ERR(p->mempolicy)) { > retval = PTR_ERR(p->mempolicy); > p->mempolicy = NULL; > - goto bad_fork_cleanup_cpuset; > + goto bad_fork_cleanup_delays_binfmt; > } > mpol_fix_fork_child_flag(p); > #endif > #ifdef CONFIG_TRACE_IRQFLAGS > p->irq_events = 0; > @@ -1280,13 +1278,11 @@ bad_fork_cleanup_files: > bad_fork_cleanup_security: > security_task_free(p); > bad_fork_cleanup_policy: > #ifdef CONFIG_NUMA > mpol_free(p->mempolicy); > -bad_fork_cleanup_cpuset: > #endif > - cpuset_exit(p); > bad_fork_cleanup_delays_binfmt: The above code, before your change, had the affect that if mpol_copy() failed, then the cpusets that were just setup by the cpuset_fork() call were undone by a cpuset_exit() call. >From what I can tell, after your change, this is no longer done, and a failed mpol_copy will leave cpusets in an incorrect state. Am I missing something? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.925.600.0401 |
From: Paul J. <pj...@sg...> - 2006-09-29 02:32:47
|
Matt wrote: > It is intended to be a tool for measuring the impact of task watchers > on fork and exit-heavy workloads. So ... you're keeping us in suspense ... what was the measured impact of task watcher? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.925.600.0401 |
From: Paul J. <pj...@sg...> - 2006-09-29 02:41:50
|
How might this play with Paul Menage's <me...@go...> patch posted earlier today on lkml: [RFC][PATCH 0/4] Generic container system My cpuset_exit() call is getting popular - both you guys seem to have designs on it. Separate question - I guess that your task watcher mechanism, the way it uses linker magic now, would not enable a loadable module to plug into these various fork/exit/... hooks. Is that right? Such an ability for loadable modules to get callouts on fork/exit would be useful to some ... it could also be a controversial ability. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.925.600.0401 |
From: Matt H. <mat...@us...> - 2006-09-29 08:18:07
|
On Thu, 2006-09-28 at 19:41 -0700, Paul Jackson wrote: > How might this play with Paul Menage's <me...@go...> patch posted > earlier today on lkml: > > [RFC][PATCH 0/4] Generic container system > > My cpuset_exit() call is getting popular - both you guys seem to have > designs on it. > > Separate question - I guess that your task watcher mechanism, the > way it uses linker magic now, would not enable a loadable module > to plug into these various fork/exit/... hooks. Is that right? Yes. > Such an ability for loadable modules to get callouts on fork/exit > would be useful to some ... it could also be a controversial ability. I mentioned that I'm working on a patch to allow modules to watch tasks. It uses a similar technique but does require some list traversal during notification -- one list_head per event per module. Unlike the current series using task watchers from a module requires registration. The [un]registration functions are marked EXPORT_SYMBOL_GPL(). Perhaps that will dampen the controversy somewhat. However the module-enablement patch oopses on boot so I didn't post it. In case you're curious I've included it below: From: Matt Helsley <mat...@us...> Subject: [RFC][PATCH] Enable task watching from modules Allow modules to watch tasks initialize, clone/fork, exec, change [re][ug]ids, exit, and free. Adds a .task_watcher_table ELF section containing a series of arrays. Each array has 0 or more function pointers to the watcher functions. Symbols delineating the boundaries of the arrays within the ELF section are also created. At module load time we find the table section and keep a pointer to it. Then, whenever a watchable task event occurs we go over the list of modules interested in the event and call the functions in the module's table for that event. Module task watchers are just like regular task watchers except they require registration with a call to register_module_task_watchers(THIS_MODULE). This adds the module to the list (one for each event) of modules interested in watching tasks. To disable the task watchers the module must call unregister_module_task_watchers(THIS_MODULE). As with other unregistrations the module is responsible for calling unregister_module_task_watchers() before being unloaded. BUGS: oopses on boot inside notify_task_watchers()! Signed-off-by: Matt Helsley <mat...@us...> --- include/asm-generic/common.lds.h | 27 ++++++++ include/asm-generic/module.lds.S | 11 +++ include/asm-generic/vmlinux.lds.h | 23 +------ include/linux/kernel.h | 9 ++ include/linux/module.h | 5 + include/linux/task_watchers.h | 12 ++- kernel/module.c | 5 + kernel/task_watchers.c | 119 +++++++++++++++++++++++++++++++++++--- scripts/Makefile.modpost | 10 ++- scripts/mod/modpost.c | 21 ++++++ 10 files changed, 209 insertions(+), 33 deletions(-) Index: linux-2.6.18-mm1/kernel/task_watchers.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/task_watchers.c +++ linux-2.6.18-mm1/kernel/task_watchers.c @@ -1,6 +1,10 @@ -#include <linux/task_watchers.h> +#include <linux/module.h> +#include <linux/mutex.h> + +#define for_each_task_event(ev) \ +for ((ev) = 0; (ev) < NUM_WATCH_TASK_EVENTS; (ev)++) /* Defined in include/asm-generic/common.lds.h */ extern const task_watcher_fn __start_task_watchers_init[], __start_task_watchers_clone[], __start_task_watchers_exec[], __start_task_watchers_uid[], __start_task_watchers_gid[], @@ -8,30 +12,129 @@ extern const task_watcher_fn __start_tas __stop_task_watchers_free[]; /* * Tables of ptrs to the first watcher func for WATCH_TASK_* */ -static const task_watcher_fn *twtable[] = { +static const task_watcher_fn *twtable[] +__attribute__((section(".task_watchers_table"))) = { __start_task_watchers_init, __start_task_watchers_clone, __start_task_watchers_exec, __start_task_watchers_uid, __start_task_watchers_gid, __start_task_watchers_exit, __start_task_watchers_free, __stop_task_watchers_free, }; -int notify_task_watchers(unsigned int ev, unsigned long val, - struct task_struct *tsk) +static DEFINE_MUTEX(module_tw_list_mutex); +static struct list_head module_task_watchers[NUM_WATCH_TASK_EVENTS]; + +void init_task_watching_list_heads(struct list_head *lh_arr) +{ + unsigned int ev; + + for_each_task_event(ev) + INIT_LIST_HEAD(&lh_arr[ev]); +} + +static int __init init_tw_lists(void) +{ + init_task_watching_list_heads(module_task_watchers); + return 0; +} +core_initcall(init_tw_lists); + +static inline int __notify_tw_table(const task_watcher_fn *first, + const task_watcher_fn *end, + unsigned long val, + struct task_struct *tsk, + int ret_err) { const task_watcher_fn *tw_call; - int ret_err = 0, err; + int err; - /* Call all of the watchers, report the first error */ - for (tw_call = twtable[ev]; tw_call < twtable[ev + 1]; tw_call++) { + /* Call the watchers. Return the first error (but continue) */ + for (tw_call = first; tw_call < end; tw_call++) { err = (*tw_call)(val, tsk); - if (unlikely((err < 0) && (ret_err == NOTIFY_OK))) + if (unlikely((err < 0) && (ret_err == 0))) ret_err = err; } return ret_err; } + + +int notify_task_watchers(unsigned int ev, unsigned long val, + struct task_struct *tsk) +{ + int ret_err = 0; + struct list_head *elem, *next; + + /* last +1 of table == start of next table */ + ret_err = __notify_tw_table(twtable[ev], twtable[ev + 1], + val, tsk, ret_err); + + /* Call the module functions watching this event */ + rcu_read_lock(); + list_for_each_safe_rcu(elem, next, &(module_task_watchers[ev])) { + struct module *mod; + + mod = container_of(array_of(elem, ev), struct module, + task_watching_modules); + rcu_read_unlock(); + + /* this call makes unload of this mod unsafe */ + ret_err = __notify_tw_table(mod->twtable[ev], + mod->twtable[ev + 1], + val, tsk, ret_err); + rcu_read_lock(); + } + rcu_read_unlock(); + return ret_err; +} + +#define tw_table_empty(twtable, ev) ({ ((twtable[ev + 1] - twtable[ev]) == 0) ? 1 : 0; }) + +int register_module_task_watchers(struct module *mod) +{ + unsigned int ev; + + if (mod == NULL) + return 0; /* no need to register when linked to kernel */ + mutex_lock(&module_tw_list_mutex); + for_each_task_event(ev) { + /* list-emptying initialization must precede registration */ + BUG_ON(!list_empty(&mod->task_watching_modules[ev])); + if (tw_table_empty(mod->twtable, ev)) + continue; /* module has no watchers for event */ + mod->unsafe |= 1; /* notify_task_watchers() uses mod funcs */ + list_add_tail_rcu(&mod->task_watching_modules[ev], + &module_task_watchers[ev]); + __module_get(mod); /* prevent module removal before unreg */ + } + mutex_unlock(&module_tw_list_mutex); + return 0; +} +EXPORT_SYMBOL_GPL(register_module_task_watchers); + +int unregister_module_task_watchers(struct module *mod) +{ + unsigned int ev; + + if (mod == NULL) + return 0; + mutex_lock(&module_tw_list_mutex); + for_each_task_event(ev) { + /* exactly one registration must precede this unregistration */ + BUG_ON(list_empty(&mod->task_watching_modules[ev]) && + !tw_table_empty(mod->twtable, ev)); + if (list_empty(&mod->task_watching_modules[ev])) + continue; + list_del_rcu(&mod->task_watching_modules[ev]); + module_put(mod); + } + mutex_unlock(&module_tw_list_mutex); + synchronize_rcu(); /* wait for list_del_rcu'd list heads to be unused */ + init_task_watching_list_heads(mod->task_watching_modules); + return 0; +} +EXPORT_SYMBOL_GPL(unregister_module_task_watchers); Index: linux-2.6.18-mm1/include/linux/task_watchers.h =================================================================== --- linux-2.6.18-mm1.orig/include/linux/task_watchers.h +++ linux-2.6.18-mm1/include/linux/task_watchers.h @@ -9,23 +9,25 @@ #define WATCH_TASK_GID 4 #define WATCH_TASK_EXIT 5 #define WATCH_TASK_FREE 6 #define NUM_WATCH_TASK_EVENTS 7 -#ifndef MODULE + typedef int (*task_watcher_fn)(unsigned long, struct task_struct*); /* * Watch for events occuring within a task and call the supplied function * when (and only when) the given event happens. - * Only non-modular kernel code may register functions as task_watchers. */ #define task_watcher_func(ev, fn) \ static task_watcher_fn __task_watcher_##ev##_##fn __attribute_used__ \ __attribute__ ((__section__ (".task_watchers." #ev))) = fn -#else -#error "task_watcher() macro may not be used in modules." -#endif extern int notify_task_watchers(unsigned int ev_idx, unsigned long val, struct task_struct *tsk); + +#include <linux/module.h> +extern void init_task_watching_list_heads(struct list_head *lh_arr); + +int register_module_task_watchers(struct module *); +int unregister_module_task_watchers(struct module *); #endif /* _TASK_WATCHERS_H */ Index: linux-2.6.18-mm1/kernel/module.c =================================================================== --- linux-2.6.18-mm1.orig/kernel/module.c +++ linux-2.6.18-mm1/kernel/module.c @@ -1483,10 +1483,11 @@ static struct module *load_module(void _ unsigned int i; unsigned int symindex = 0; unsigned int strindex = 0; unsigned int setupindex; unsigned int exindex; + unsigned int twindex; unsigned int exportindex; unsigned int modindex; unsigned int obsparmindex; unsigned int infoindex; unsigned int gplindex; @@ -1588,10 +1589,11 @@ static struct module *load_module(void _ gplfuturecrcindex = find_sec(hdr, sechdrs, secstrings, "__kcrctab_gpl_future"); unusedcrcindex = find_sec(hdr, sechdrs, secstrings, "__kcrctab_unused"); unusedgplcrcindex = find_sec(hdr, sechdrs, secstrings, "__kcrctab_unused_gpl"); setupindex = find_sec(hdr, sechdrs, secstrings, "__param"); exindex = find_sec(hdr, sechdrs, secstrings, "__ex_table"); + twindex = find_sec(hdr, sechdrs, secstrings, ".task_watchers_table"); obsparmindex = find_sec(hdr, sechdrs, secstrings, "__obsparm"); versindex = find_sec(hdr, sechdrs, secstrings, "__versions"); infoindex = find_sec(hdr, sechdrs, secstrings, ".modinfo"); pcpuindex = find_pcpusec(hdr, sechdrs, secstrings); #ifdef ARCH_UNWIND_SECTION_NAME @@ -1837,10 +1839,13 @@ static struct module *load_module(void _ / sizeof(struct kernel_param), NULL); if (err < 0) goto arch_cleanup; + mod->twtable = (void*)sechdrs[twindex].sh_addr; + init_task_watching_list_heads(mod->task_watching_modules); + err = mod_sysfs_setup(mod, (struct kernel_param *) sechdrs[setupindex].sh_addr, sechdrs[setupindex].sh_size / sizeof(struct kernel_param)); Index: linux-2.6.18-mm1/scripts/Makefile.modpost =================================================================== --- linux-2.6.18-mm1.orig/scripts/Makefile.modpost +++ linux-2.6.18-mm1/scripts/Makefile.modpost @@ -40,10 +40,18 @@ include scripts/Kbuild.include include scripts/Makefile.lib kernelsymfile := $(objtree)/Module.symvers modulesymfile := $(KBUILD_EXTMOD)/Module.symvers +# Step 0), make sure the supplemental module linker script has been made +moduleldscript := include/asm-generic/module.lds +quiet_cmd_cpp_lds_S = LDS $@ + cmd_cpp_lds_S = $(CPP) $(cpp_flags) -C -P -D__ASSEMBLY__ -o $@ $< + +%.lds: %.lds.S FORCE + $(call if_changed_dep,cpp_lds_S) + # Step 1), find all modules listed in $(MODVERDIR)/ __modules := $(sort $(shell grep -h '\.ko' /dev/null $(wildcard $(MODVERDIR)/*.mod))) modules := $(patsubst %.o,%.ko, $(wildcard $(__modules:.ko=.o))) _modpost: $(modules) @@ -93,11 +101,11 @@ targets += $(modules:.ko=.mod.o) # Step 6), final link of the modules quiet_cmd_ld_ko_o = LD [M] $@ cmd_ld_ko_o = $(LD) $(LDFLAGS) $(LDFLAGS_MODULE) -o $@ \ $(filter-out FORCE,$^) -$(modules): %.ko :%.o %.mod.o FORCE +$(modules): %.ko : %.o %.mod.o $(moduleldscript) FORCE $(call if_changed,ld_ko_o) targets += $(modules) Index: linux-2.6.18-mm1/include/asm-generic/module.lds.S =================================================================== --- /dev/null +++ linux-2.6.18-mm1/include/asm-generic/module.lds.S @@ -0,0 +1,11 @@ +#include <asm-generic/common.lds.h> +#include <asm/page.h> +#include <asm/cache.h> + +SECTIONS +{ + .text : { + *(.text*) + } + TWTABLE +} Index: linux-2.6.18-mm1/include/asm-generic/common.lds.h =================================================================== --- /dev/null +++ linux-2.6.18-mm1/include/asm-generic/common.lds.h @@ -0,0 +1,27 @@ +#ifndef LOAD_OFFSET +#define LOAD_OFFSET 0 +#endif + +#ifndef VMLINUX_SYMBOL +#define VMLINUX_SYMBOL(_sym_) _sym_ +#endif + +#define TWTABLE \ + .task_watchers_table : AT(ADDR(.task_watchers_table) - LOAD_OFFSET) { \ + *(.task_watchers_table) \ + VMLINUX_SYMBOL(__start_task_watchers_init) = .; \ + *(.task_watchers.init) \ + VMLINUX_SYMBOL(__start_task_watchers_clone) = .; \ + *(.task_watchers.clone) \ + VMLINUX_SYMBOL(__start_task_watchers_exec) = .; \ + *(.task_watchers.exec) \ + VMLINUX_SYMBOL(__start_task_watchers_uid) = .; \ + *(.task_watchers.uid) \ + VMLINUX_SYMBOL(__start_task_watchers_gid) = .; \ + *(.task_watchers.gid) \ + VMLINUX_SYMBOL(__start_task_watchers_exit) = .; \ + *(.task_watchers.exit) \ + VMLINUX_SYMBOL(__start_task_watchers_free) = .; \ + *(.task_watchers.free) \ + VMLINUX_SYMBOL(__stop_task_watchers_free) = .; \ + } Index: linux-2.6.18-mm1/include/asm-generic/vmlinux.lds.h =================================================================== --- linux-2.6.18-mm1.orig/include/asm-generic/vmlinux.lds.h +++ linux-2.6.18-mm1/include/asm-generic/vmlinux.lds.h @@ -1,5 +1,7 @@ +#include <asm-generic/common.lds.h> + #ifndef LOAD_OFFSET #define LOAD_OFFSET 0 #endif #ifndef VMLINUX_SYMBOL @@ -42,29 +44,10 @@ VMLINUX_SYMBOL(__start_rio_route_ops) = .; \ *(.rio_route_ops) \ VMLINUX_SYMBOL(__end_rio_route_ops) = .; \ } \ \ - .task_watchers_table : AT(ADDR(.task_watchers_table) - LOAD_OFFSET) { \ - *(.task_watchers_table) \ - VMLINUX_SYMBOL(__start_task_watchers_init) = .; \ - *(.task_watchers.init) \ - VMLINUX_SYMBOL(__start_task_watchers_clone) = .; \ - *(.task_watchers.clone) \ - VMLINUX_SYMBOL(__start_task_watchers_exec) = .; \ - *(.task_watchers.exec) \ - VMLINUX_SYMBOL(__start_task_watchers_uid) = .; \ - *(.task_watchers.uid) \ - VMLINUX_SYMBOL(__start_task_watchers_gid) = .; \ - *(.task_watchers.gid) \ - VMLINUX_SYMBOL(__start_task_watchers_exit) = .; \ - *(.task_watchers.exit) \ - VMLINUX_SYMBOL(__start_task_watchers_free) = .; \ - *(.task_watchers.free) \ - VMLINUX_SYMBOL(__stop_task_watchers_free) = .; \ - } \ - \ /* Kernel symbol table: Normal symbols */ \ __ksymtab : AT(ADDR(__ksymtab) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___ksymtab) = .; \ *(__ksymtab) \ VMLINUX_SYMBOL(__stop___ksymtab) = .; \ @@ -136,10 +119,12 @@ /* Kernel symbol table: strings */ \ __ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) { \ *(__ksymtab_strings) \ } \ \ + TWTABLE \ + \ /* Built-in module parameters. */ \ __param : AT(ADDR(__param) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___param) = .; \ *(__param) \ VMLINUX_SYMBOL(__stop___param) = .; \ Index: linux-2.6.18-mm1/include/linux/module.h =================================================================== --- linux-2.6.18-mm1.orig/include/linux/module.h +++ linux-2.6.18-mm1/include/linux/module.h @@ -15,10 +15,11 @@ #include <linux/kmod.h> #include <linux/elf.h> #include <linux/stringify.h> #include <linux/kobject.h> #include <linux/moduleparam.h> +#include <linux/task_watchers.h> #include <asm/local.h> #include <asm/module.h> /* Not Yet Implemented */ @@ -291,10 +292,14 @@ struct module /* Exception table */ unsigned int num_exentries; const struct exception_table_entry *extable; + /* Task watcher list elems and fn ptr table */ + struct list_head task_watching_modules[NUM_WATCH_TASK_EVENTS]; + const task_watcher_fn **twtable; + /* Startup function. */ int (*init)(void); /* If this is non-NULL, vfree after init() returns */ void *module_init; Index: linux-2.6.18-mm1/scripts/mod/modpost.c =================================================================== --- linux-2.6.18-mm1.orig/scripts/mod/modpost.c +++ linux-2.6.18-mm1/scripts/mod/modpost.c @@ -1178,10 +1178,11 @@ static void check_exports(struct module static void add_header(struct buffer *b, struct module *mod) { buf_printf(b, "#include <linux/module.h>\n"); buf_printf(b, "#include <linux/vermagic.h>\n"); buf_printf(b, "#include <linux/compiler.h>\n"); + buf_printf(b, "#include <linux/task_watchers.h>\n"); buf_printf(b, "\n"); buf_printf(b, "MODULE_INFO(vermagic, VERMAGIC_STRING);\n"); buf_printf(b, "\n"); buf_printf(b, "struct module __this_module\n"); buf_printf(b, "__attribute__((section(\".gnu.linkonce.this_module\"))) = {\n"); @@ -1190,10 +1191,30 @@ static void add_header(struct buffer *b, buf_printf(b, " .init = init_module,\n"); if (mod->has_cleanup) buf_printf(b, "#ifdef CONFIG_MODULE_UNLOAD\n" " .exit = cleanup_module,\n" "#endif\n"); + buf_printf(b, "};\n\n"); + buf_printf(b, "/* Defined in include/asm-generic/common.lds.h */\n"); + buf_printf(b, "extern const task_watcher_fn __start_task_watchers_init[],\n"); + buf_printf(b, "\t__start_task_watchers_clone[],\n"); + buf_printf(b, "\t__start_task_watchers_exec[],\n"); + buf_printf(b, "\t__start_task_watchers_uid[],\n"); + buf_printf(b, "\t__start_task_watchers_gid[],\n"); + buf_printf(b, "\t__start_task_watchers_exit[],\n"); + buf_printf(b, "\t__start_task_watchers_free[],\n"); + buf_printf(b, "\t__stop_task_watchers_free[];\n\n"); + buf_printf(b, "static const task_watcher_fn *twtable[]\n"); + buf_printf(b, "__attribute_used__ __attribute__((section(\".task_watchers_table\"), used)) = {\n"); + buf_printf(b, "\t__start_task_watchers_init,\n"); + buf_printf(b, "\t__start_task_watchers_clone,\n"); + buf_printf(b, "\t__start_task_watchers_exec,\n"); + buf_printf(b, "\t__start_task_watchers_uid,\n"); + buf_printf(b, "\t__start_task_watchers_gid,\n"); + buf_printf(b, "\t__start_task_watchers_exit,\n"); + buf_printf(b, "\t__start_task_watchers_free,\n"); + buf_printf(b, "\t__stop_task_watchers_free,\n"); buf_printf(b, "};\n"); } /** * Record CRCs for unresolved symbols Index: linux-2.6.18-mm1/include/linux/kernel.h =================================================================== --- linux-2.6.18-mm1.orig/include/linux/kernel.h +++ linux-2.6.18-mm1/include/linux/kernel.h @@ -298,10 +298,19 @@ static inline int __attribute__ ((format */ #define container_of(ptr, type, member) ({ \ const typeof( ((type *)0)->member ) *__mptr = (ptr); \ (type *)( (char *)__mptr - offsetof(type,member) );}) +/** + * array_of - cast an element of an array out to the containing array + * @ptr: the pointer to the element. + * @i: the index within the array. + * + */ +#define array_of(ptr, i) \ + ({ const typeof(*(ptr)) *__mptr = (ptr); __mptr -= (i); __mptr; }) + /* * Check at compile time that something is of a particular type. * Always evaluates to 1 so you may use it easily in comparisons. */ #define typecheck(type,x) \ |
From: Paul M. <me...@go...> - 2006-09-29 16:22:38
|
On 9/28/06, Paul Jackson <pj...@sg...> wrote: > How might this play with Paul Menage's <me...@go...> patch posted > earlier today on lkml: > > [RFC][PATCH 0/4] Generic container system I've not looked closely at Matt's patch, but I'm sure that there would be no problems with hooking the container system to use task watchers rather than patching fork.c/exit.c directly. Paul |
From: Matt H. <mat...@us...> - 2006-09-29 07:53:36
|
On Thu, 2006-09-28 at 19:31 -0700, Paul Jackson wrote: > Matt wrote: > > > - cpuset_fork(p); > > #ifdef CONFIG_NUMA > > p->mempolicy = mpol_copy(p->mempolicy); > > if (IS_ERR(p->mempolicy)) { > > retval = PTR_ERR(p->mempolicy); > > p->mempolicy = NULL; > > - goto bad_fork_cleanup_cpuset; > > + goto bad_fork_cleanup_delays_binfmt; > > } > > mpol_fix_fork_child_flag(p); > > #endif > > #ifdef CONFIG_TRACE_IRQFLAGS > > p->irq_events = 0; > > @@ -1280,13 +1278,11 @@ bad_fork_cleanup_files: > > bad_fork_cleanup_security: > > security_task_free(p); > > bad_fork_cleanup_policy: > > #ifdef CONFIG_NUMA > > mpol_free(p->mempolicy); > > -bad_fork_cleanup_cpuset: > > #endif > > - cpuset_exit(p); > > bad_fork_cleanup_delays_binfmt: > > > The above code, before your change, had the affect that if mpol_copy() > failed, then the cpusets that were just setup by the cpuset_fork() > call were undone by a cpuset_exit() call. > > >From what I can tell, after your change, this is no longer done, > and a failed mpol_copy will leave cpusets in an incorrect state. > > Am I missing something? > If you look in the first patch there's a corresponding notify_task_watchers(WATCH_TASK_FREE, tsk) below when we get a failure from INIT. That in turn calls cpuset_exit() because a cpuset_exit() because a hunk of this patch marks it for execution whenever a task is freed. Cheers, -Matt Helsley |
From: Paul J. <pj...@sg...> - 2006-09-29 08:03:20
|
Matt wrote: > If you look in the first patch there's a corresponding > notify_task_watchers(WATCH_TASK_FREE, tsk) ... Ok - thanks. Looks like I was missing something. Good. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.925.600.0401 |
From: Matt H. <mat...@us...> - 2006-09-29 19:39:08
|
On Thu, 2006-09-28 at 19:32 -0700, Paul Jackson wrote: > Matt wrote: > > It is intended to be a tool for measuring the impact of task watchers > > on fork and exit-heavy workloads. > > So ... you're keeping us in suspense ... what was the measured impact > of task watcher? Heh, sorry about that. I do have some initial kernbench numbers. I performed 10 successive runs of kernbench after each patch on an old 8-way: [PATCH 01/10] Task watchers v2 Task watchers v2 815.80user 113.69system 2:04.36elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 814.55user 114.25system 2:03.80elapsed 750%CPU (0avgtext+0avgdata 0maxresident)k 815.09user 115.11system 2:04.42elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 815.84user 114.08system 2:04.25elapsed 748%CPU (0avgtext+0avgdata 0maxresident)k 814.60user 114.28system 2:04.41elapsed 746%CPU (0avgtext+0avgdata 0maxresident)k 814.52user 113.09system 2:04.51elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 816.23user 114.42system 2:04.64elapsed 746%CPU (0avgtext+0avgdata 0maxresident)k 816.45user 113.39system 2:04.72elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 815.62user 114.74system 2:04.71elapsed 746%CPU (0avgtext+0avgdata 0maxresident)k 814.40user 112.94system 2:04.45elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k [PATCH 02/10] Task watchers v2 Benchmark 806.39user 113.23system 2:04.30elapsed 739%CPU (0avgtext+0avgdata 0maxresident)k 806.21user 112.87system 2:03.40elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 811.69user 113.59system 2:03.59elapsed 748%CPU (0avgtext+0avgdata 0maxresident)k 803.16user 113.37system 2:03.53elapsed 741%CPU (0avgtext+0avgdata 0maxresident)k 804.82user 112.25system 2:03.62elapsed 741%CPU (0avgtext+0avgdata 0maxresident)k 804.45user 113.37system 2:03.34elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 806.35user 112.34system 2:02.96elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 804.35user 112.61system 2:02.96elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 804.19user 112.98system 2:02.91elapsed 746%CPU (0avgtext+0avgdata 0maxresident)k 804.54user 113.23system 2:03.80elapsed 741%CPU (0avgtext+0avgdata 0maxresident)k Seems like this benchmark patch resulted in a consistent decrease in time spent in userspace. I'm rerunning kernbench without this patch and will post the results by Monday afternoon at the latest. [PATCH 03/10] Task watchers v2 Register audit task watcher 802.30user 113.37system 2:02.36elapsed 748%CPU (0avgtext+0avgdata 0maxresident)k 802.19user 111.93system 2:03.68elapsed 739%CPU (0avgtext+0avgdata 0maxresident)k 800.90user 113.33system 2:03.02elapsed 743%CPU (0avgtext+0avgdata 0maxresident)k 800.08user 112.56system 2:01.95elapsed 748%CPU (0avgtext+0avgdata 0maxresident)k 801.53user 111.66system 2:02.87elapsed 743%CPU (0avgtext+0avgdata 0maxresident)k 803.62user 112.22system 2:02.34elapsed 748%CPU (0avgtext+0avgdata 0maxresident)k 803.65user 112.05system 2:03.07elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 804.35user 112.35system 2:03.85elapsed 740%CPU (0avgtext+0avgdata 0maxresident)k 805.20user 112.25system 2:02.80elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 802.46user 113.74system 2:03.75elapsed 740%CPU (0avgtext+0avgdata 0maxresident)k [PATCH 04/10] Task watchers v2 Register semundo task watcher 799.99user 113.19system 2:02.50elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 802.11user 112.51system 2:03.62elapsed 739%CPU (0avgtext+0avgdata 0maxresident)k 802.19user 112.40system 2:04.19elapsed 736%CPU (0avgtext+0avgdata 0maxresident)k 803.87user 113.05system 2:02.96elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 802.56user 113.38system 2:02.52elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 803.14user 113.11system 2:02.95elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 802.57user 113.66system 2:02.56elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 803.26user 113.90system 2:03.37elapsed 743%CPU (0avgtext+0avgdata 0maxresident)k 806.66user 113.95system 2:03.20elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 805.21user 113.31system 2:04.20elapsed 739%CPU (0avgtext+0avgdata 0maxresident)k [PATCH 05/10] Task watchers v2 Register cpuset task watcher 807.24user 112.35system 2:03.11elapsed 746%CPU (0avgtext+0avgdata 0maxresident)k 804.43user 112.62system 2:03.08elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 805.80user 113.85system 2:04.10elapsed 741%CPU (0avgtext+0avgdata 0maxresident)k 806.28user 114.07system 2:03.47elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 807.08user 114.10system 2:04.14elapsed 742%CPU (0avgtext+0avgdata 0maxresident)k 807.39user 113.68system 2:03.57elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 807.60user 113.27system 2:03.69elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 807.06user 113.25system 2:03.89elapsed 742%CPU (0avgtext+0avgdata 0maxresident)k 808.79user 112.62system 2:03.31elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 805.21user 113.96system 2:03.75elapsed 742%CPU (0avgtext+0avgdata 0maxresident)k [PATCH 06/10] Task watchers v2 Register NUMA mempolicy task watcher 804.88user 113.50system 2:03.04elapsed 746%CPU (0avgtext+0avgdata 0maxresident)k 807.87user 114.55system 2:04.05elapsed 743%CPU (0avgtext+0avgdata 0maxresident)k 812.19user 113.81system 2:04.21elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 810.73user 114.06system 2:04.26elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 808.18user 113.48system 2:03.06elapsed 748%CPU (0avgtext+0avgdata 0maxresident)k 810.50user 112.26system 2:04.32elapsed 742%CPU (0avgtext+0avgdata 0maxresident)k 808.79user 113.65system 2:04.58elapsed 740%CPU (0avgtext+0avgdata 0maxresident)k 807.73user 113.55system 2:03.85elapsed 743%CPU (0avgtext+0avgdata 0maxresident)k 806.90user 113.25system 2:03.83elapsed 743%CPU (0avgtext+0avgdata 0maxresident)k 804.28user 113.75system 2:03.45elapsed 743%CPU (0avgtext+0avgdata 0maxresident)k [PATCH 07/10] Task watchers v2 Register IRQ flag tracing task watcher 795.59user 111.36system 2:02.19elapsed 742%CPU (0avgtext+0avgdata 0maxresident)k 795.09user 112.89system 2:02.74elapsed 739%CPU (0avgtext+0avgdata 0maxresident)k 796.29user 112.05system 2:01.58elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 796.95user 112.99system 2:03.24elapsed 738%CPU (0avgtext+0avgdata 0maxresident)k 798.78user 111.45system 2:01.75elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 796.52user 112.00system 2:01.73elapsed 746%CPU (0avgtext+0avgdata 0maxresident)k 797.40user 112.65system 2:02.02elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 797.02user 112.55system 2:01.50elapsed 748%CPU (0avgtext+0avgdata 0maxresident)k 799.72user 111.69system 2:02.36elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 799.39user 111.93system 2:02.24elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k [PATCH 08/10] Task watchers v2 Register lockdep task watcher 804.79user 111.78system 2:03.03elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 805.81user 110.94system 2:03.13elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 803.87user 112.09system 2:02.96elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 805.32user 113.28system 2:03.35elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 803.45user 112.53system 2:02.89elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 803.69user 112.29system 2:02.90elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 801.01user 112.27system 2:02.15elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 802.30user 112.59system 2:02.87elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 802.19user 111.69system 2:02.66elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 801.97user 111.75system 2:02.66elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k [PATCH 09/10] Task watchers v2 Register process keyrings task watcher 792.91user 111.19system 2:00.91elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 793.50user 110.64system 2:02.37elapsed 738%CPU (0avgtext+0avgdata 0maxresident)k 795.85user 111.00system 2:01.39elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 794.46user 112.29system 2:01.17elapsed 748%CPU (0avgtext+0avgdata 0maxresident)k 794.04user 111.91system 2:01.44elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 797.07user 110.94system 2:01.55elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 796.84user 110.37system 2:01.41elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 796.13user 110.69system 2:02.44elapsed 740%CPU (0avgtext+0avgdata 0maxresident)k 805.46user 110.87system 2:02.66elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 796.70user 111.36system 2:02.80elapsed 739%CPU (0avgtext+0avgdata 0maxresident)k [PATCH 10/10] Task watchers v2 Register process events connector 807.06user 112.29system 2:04.01elapsed 741%CPU (0avgtext+0avgdata 0maxresident)k 807.76user 113.42system 2:03.14elapsed 748%CPU (0avgtext+0avgdata 0maxresident)k 806.95user 111.51system 2:03.33elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 805.65user 112.95system 2:04.04elapsed 740%CPU (0avgtext+0avgdata 0maxresident)k 803.63user 113.79system 2:03.01elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k 807.15user 111.49system 2:03.32elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 807.75user 112.02system 2:03.47elapsed 744%CPU (0avgtext+0avgdata 0maxresident)k 804.89user 113.48system 2:03.02elapsed 746%CPU (0avgtext+0avgdata 0maxresident)k 806.01user 112.71system 2:02.84elapsed 747%CPU (0avgtext+0avgdata 0maxresident)k 804.40user 112.63system 2:02.94elapsed 745%CPU (0avgtext+0avgdata 0maxresident)k Cheers, -Matt Helsley |
From: Paul J. <pj...@sg...> - 2006-09-29 20:13:55
|
Matt wrote: > Heh, sorry about that. I do have some initial kernbench numbers. Thanks. You mention that one of the patches, Benchmark, reduced time spent in user space. I guess that means that patch hurt something ... though I'm confused ... wouldn't these patches risk spending more time in system space, not less in user space? Do you have any analysis of the other runs? Just looking at raw numbers, when it's not a benchmark I've used recently, kinda fuzzes over my feeble brain. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.925.600.0401 |
From: Matt H. <mat...@us...> - 2006-09-30 00:01:27
|
On Fri, 2006-09-29 at 13:13 -0700, Paul Jackson wrote: > Matt wrote: > > Heh, sorry about that. I do have some initial kernbench numbers. > > Thanks. You mention that one of the patches, Benchmark, reduced > time spent in user space. I guess that means that patch hurt > something ... though I'm confused ... wouldn't these patches risk > spending more time in system space, not less in user space? I would have thought so too, but it also appears to consistently reduce time spent in the kernel. This seems to imply that the performance improves for the first task watcher that gets added. I'd randomly guess there's a branch misprediction when no watchers are registered. My latest results will be more rigorous in that they show what a pure 2.6.18-mm1 run looks like. I've also removed the benchmark patch from the series of runs. Unfortunately it takes approximately 24 hours to run so it'll be a little while before I have the numbers. > Do you have any analysis of the other runs? Just looking at raw > numbers, when it's not a benchmark I've used recently, kinda fuzzes > over my feeble brain. Nope, sorry. I'll see what I can put together. Cheers, -Matt Helsley |
From: Paul J. <pj...@sg...> - 2006-09-30 00:04:14
|
Matt wrote: > I would have thought so too, but it also appears to consistently reduce > time spent in the kernel. Interesting. Thanks for the continuing good work. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.925.600.0401 |