From: Matt H. <mat...@us...> - 2006-11-03 04:28:00
|
This is version 2 of my Task Watchers patches. Task watchers calls functions whenever a task forks, execs, changes its [re][ug]id, or exits. Task watchers is primarily useful to existing kernel code as a means of making the code in fork and exit more readable. Kernel code uses these paths by marking a function as a task watcher much like modules mark their init functions with module_init(). This improves the readability of copy_process(). The first patch adds the basic infrastructure of task watchers: notification function calls in the various paths and a table of function pointers to be called. It uses an ELF section because parts of the table must be gathered from all over the kernel code and using the linker is easier than resolving and maintaining complex header interdependencies. An ELF table is also ideal because its "readonly" nature means that no locking nor list traversal are required. Subsequent patches adapt existing parts of the kernel to use a task watcher -- typically in the fork, clone, and exit paths: FEATURE (notes) RELEVANT CONFIG VARIABLE ----------------------------------------------------------------------- audit [ CONFIG_AUDIT ... ] semundo [ CONFIG_SYSVIPC ] cpusets [ CONFIG_CPUSETS ] mempolicy [ CONFIG_NUMA ] trace irqflags [ CONFIG_TRACE_IRQFLAGS ] lockdep [ CONFIG_LOCKDEP ] keys (for processes -- not for thread groups) [ CONFIG_KEYS ] process events connector [ CONFIG_PROC_EVENTS ] TODO: Mark the task watcher table ELF section read-only. I've tried to "fix" the .lds files to do this with no success. I'd really appreciate help from folks familiar with writing linker scripts. I'm working on three more patches that add support for creating a task watcher from within a module using an ELF section. They haven't recieved as much attention since I've been focusing on measuring the performance impact of these patches. Changes: since v2 RFC: Updated to 2.6.19-rc2-mm2 Compiled, booted, tested, and benchmarked Testing Booted with audit=1 profile=2 Enabled profiling tools Enabled auditing Ran random syscall test IRQ trace and lockdep CONFIG=y not tested Benchmarks A clone benchmark (try to clone as fast as possible) Unrealistic. Shows incremental cost of one task watcher A fork benchmark (try to fork as fast as possible) Unrealistic. Shows incremental cost of one task watcher Kernbench Closer to realistic. Result summaries follow changelog See patches for details Fork and clone samples available on request (too large for email) Fork and clone benchmark sources will be posted as replies to 00 v2: Dropped use of notifier chains Dropped per-task watchers Can be implemented on top of this Still requires notifier chains Dropped taskstats conversion Parts of taskstats had to move away from the regions of copy_process() and do_exit() where task_watchers are notified Used linker script mechanism suggested by Al Viro Created one "list" of watchers per event as requested by Andrew Morton No need to multiplex a single function call Easier to static register/unregister watchers: 1 line of code val param now used for: WATCH_TASK_INIT: clone_flags WATCH_TASK_CLONE: clone_flags WATCH_TASK_EXIT: exit code WATCH_TASK_*: <unused> Renamed notify_watchers() to notify_task_watchers() Replaced: if (err != 0) --> if (err) Added patches converting more "features" to use task watchers Added return code handling to WATCH_TASK_INIT Return code handling elsewhere didn't seem appropriate since there was generally no response necessary Fixed process keys free to handle failure in fork as originally coded in copy_process Added process keys code to watch for [er][ug]id changes v1: Added ability to cause fork to fail with NOTIFY_STOP_MASK Added WARN_ON() when watchers cause WATCH_TASK_FREE to stop early Moved fork invocation Moved exec invocation Added current as argument to exec invocation Moved exit code assignment Added id change invocations (70 insertions) v0: Based on Jes Sorensen's Task Notifiers patches (posted to LSE-Tech) Benchmark result summaries (sorry, this part is 86 columns): System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel Clone - Incremental worst-case costs measured in tasks/second and as a percentage of expected rate Patch 1 2 3 4 5 6 7 8 9 -------------------------------------------------------------------------------------- Incremental Cost (tasks/s) -38.12 12.5 -84 25.2 -187.5 -0.5834 -11.36 -125.2 -64.05 Cost Err 122.3 17.84 67.11 61.03 41.8 34.64 45.53 58.28 53.18 Cost (%) -0.2 0.07 -0.5 0.1 -1 -0.004 -0.06 -0.7 -0.4 Cost Err (%) 0.7 0.1 0.4 0.3 0.2 0.2 0.2 0.3 0.3 Fork - Incremental worst-case costs measured in tasks/second and as a percentage of expected rate Patch 1 2 3 4 5 6 7 8 9 -------------------------------------------------------------------------------------- Incremental Cost (tasks/s) -64.58 -35.74 -33.29 -25.8 -139.5 -7.311 -9.2 -131.4 -50.47 Cost Err 54.09 27.58 41.76 42.47 49.87 60.94 29.72 39.7 40.89 Cost (%) -0.3 -0.2 -0.2 -0.1 -0.8 -0.04 -0.05 -0.7 -0.3 Cost Err (%) 0.3 0.2 0.2 0.2 0.3 0.3 0.2 0.2 0.2 Kernbench Measurements Patch Elapsed(s) User(s) System(s) CPU(%) - 124.406 439.947 46.615 390.700 <-- baseline 2.6.19-rc2-mm2 1 124.353 439.935 46.334 390.400 2 124.234 439.700 46.503 390.800 3 124.248 439.830 46.258 390.700 4 124.357 439.753 46.582 390.600 5 124.333 439.787 46.491 390.700 6 124.532 439.732 46.497 389.900 7 124.359 439.756 46.457 390.300 8 124.272 439.643 46.320 390.500 9 124.400 439.787 46.485 390.300 Mean: 124.349 439.787 46.454 390.490 Stddev: 0.087641 0.095917 0.115309 0.272641 Kernbench - Incremental costs Patch Elapsed(s) User(s) System(s) CPU(%) 1 -0.053 -0.012 -0.281 -0.3 2 -0.119 -0.235 0.169 0.4 3 0.014 0.130 -0.245 -0.1 4 0.109 -0.077 0.324 -0.1 5 -0.024 0.034 -0.091 0.1 6 0.199 -0.055 0.006 -0.8 7 -0.173 0.024 -0.040 0.4 8 -0.087 -0.113 -0.137 0.2 9 0.128 0.144 0.165 -0.2 Mean: 0.005875 -0.0185 0.018875 -0.0125 Stddev: 0.13094 0.12738 0.1877 0.39074 Andrew, please consider these patches for 2.6.20's -mm tree. Cheers, -Matt Helsley -- |
From: Matt H. <mat...@us...> - 2006-11-03 04:27:58
|
Change audit to register a task watcher function rather than modify the copy_process() and do_exit() paths directly. Removes an unlikely() hint from kernel/exit.c: if (unlikely(tsk->audit_context)) audit_free(tsk); This use of unlikely() is an artifact of audit_free()'s former invocation from __put_task_struct() (commit: fa84cb935d4ec601528f5e2f0d5d31e7876a5044). Clearly in the __put_task_struct() path it would be called much more frequently than do_exit() and hence the use of unlikely() there was justified. However, in the new location the hint most likely offers no measurable performance impact. Signed-off-by: Matt Helsley <mat...@us...> Cc: Al Viro <vi...@ze...> Cc: Steve Grubb <sg...@re...> Cc: lin...@re... --- include/linux/audit.h | 4 ---- kernel/auditsc.c | 10 +++++++--- kernel/exit.c | 3 --- kernel/fork.c | 7 +------ 4 files changed, 8 insertions(+), 16 deletions(-) Benchmark results: System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel Clone Number of Children Cloned 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 18053.2 18361.2 18474.4 18462 18594.7 18557.4 Dev 315.856 316.881 318.787 312.425 304.193 291.819 Err (%) 1.74958 1.72582 1.72557 1.69226 1.63592 1.57252 Fork Number of Children Forked 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 18008 18186 18400.6 18433.1 18481.1 18502.8 Dev 305.299 309.41 315.108 298.683 310.504 338.734 Err (%) 1.69536 1.70136 1.71248 1.62036 1.68011 1.83071 Kernbench: Elapsed: 124.234s User: 439.7s System: 46.503s CPU: 390.8% 439.67user 46.48system 2:04.11elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.77user 46.46system 2:03.71elapsed 393%CPU (0avgtext+0avgdata 0maxresident)k 439.62user 46.47system 2:04.54elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.68user 46.64system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.62user 46.46system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.72user 46.50system 2:04.35elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.72user 46.49system 2:04.39elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.66user 46.61system 2:04.17elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.82user 46.46system 2:04.57elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.72user 46.46system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k Index: linux-2.6.19-rc2-mm2/kernel/auditsc.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/auditsc.c +++ linux-2.6.19-rc2-mm2/kernel/auditsc.c @@ -63,10 +63,11 @@ #include <linux/list.h> #include <linux/tty.h> #include <linux/selinux.h> #include <linux/binfmts.h> #include <linux/syscalls.h> +#include <linux/task_watchers.h> #include "audit.h" extern struct list_head audit_filter_list[]; @@ -677,11 +678,11 @@ static inline struct audit_context *audi * Filter on the task information and allocate a per-task audit context * if necessary. Doing so turns on system call auditing for the * specified task. This is called from copy_process, so no lock is * needed. */ -int audit_alloc(struct task_struct *tsk) +static int audit_alloc(unsigned long val, struct task_struct *tsk) { struct audit_context *context; enum audit_state state; if (likely(!audit_enabled)) @@ -703,10 +704,11 @@ int audit_alloc(struct task_struct *tsk) tsk->audit_context = context; set_tsk_thread_flag(tsk, TIF_SYSCALL_AUDIT); return 0; } +task_watcher_func(init, audit_alloc); static inline void audit_free_context(struct audit_context *context) { struct audit_context *previous; int count = 0; @@ -1035,28 +1037,30 @@ static void audit_log_exit(struct audit_ * audit_free - free a per-task audit context * @tsk: task whose audit context block to free * * Called from copy_process and do_exit */ -void audit_free(struct task_struct *tsk) +static int audit_free(unsigned long val, struct task_struct *tsk) { struct audit_context *context; context = audit_get_context(tsk, 0, 0); if (likely(!context)) - return; + return 0; /* Check for system calls that do not go through the exit * function (e.g., exit_group), then free context block. * We use GFP_ATOMIC here because we might be doing this * in the context of the idle thread */ /* that can happen only if we are called from do_exit() */ if (context->in_syscall && context->auditable) audit_log_exit(context, tsk); audit_free_context(context); + return 0; } +task_watcher_func(free, audit_free); /** * audit_syscall_entry - fill in an audit record at syscall entry * @tsk: task being audited * @arch: architecture type Index: linux-2.6.19-rc2-mm2/include/linux/audit.h =================================================================== --- linux-2.6.19-rc2-mm2.orig/include/linux/audit.h +++ linux-2.6.19-rc2-mm2/include/linux/audit.h @@ -332,12 +332,10 @@ struct mqstat; extern int __init audit_register_class(int class, unsigned *list); extern int audit_classify_syscall(int abi, unsigned syscall); #ifdef CONFIG_AUDITSYSCALL /* These are defined in auditsc.c */ /* Public API */ -extern int audit_alloc(struct task_struct *task); -extern void audit_free(struct task_struct *task); extern void audit_syscall_entry(int arch, int major, unsigned long a0, unsigned long a1, unsigned long a2, unsigned long a3); extern void audit_syscall_exit(int failed, long return_code); extern void __audit_getname(const char *name); @@ -432,12 +430,10 @@ static inline int audit_mq_getsetattr(mq return __audit_mq_getsetattr(mqdes, mqstat); return 0; } extern int audit_n_rules; #else -#define audit_alloc(t) ({ 0; }) -#define audit_free(t) do { ; } while (0) #define audit_syscall_entry(ta,a,b,c,d,e) do { ; } while (0) #define audit_syscall_exit(f,r) do { ; } while (0) #define audit_dummy_context() 1 #define audit_getname(n) do { ; } while (0) #define audit_putname(n) do { ; } while (0) Index: linux-2.6.19-rc2-mm2/kernel/fork.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/fork.c +++ linux-2.6.19-rc2-mm2/kernel/fork.c @@ -37,11 +37,10 @@ #include <linux/jiffies.h> #include <linux/futex.h> #include <linux/rcupdate.h> #include <linux/ptrace.h> #include <linux/mount.h> -#include <linux/audit.h> #include <linux/profile.h> #include <linux/rmap.h> #include <linux/acct.h> #include <linux/tsacct_kern.h> #include <linux/cn_proc.h> @@ -1095,15 +1094,13 @@ static struct task_struct *copy_process( p->blocked_on = NULL; /* not blocked yet */ #endif if ((retval = security_task_alloc(p))) goto bad_fork_cleanup_policy; - if ((retval = audit_alloc(p))) - goto bad_fork_cleanup_security; /* copy all the process information */ if ((retval = copy_semundo(clone_flags, p))) - goto bad_fork_cleanup_audit; + goto bad_fork_cleanup_security; if ((retval = copy_files(clone_flags, p))) goto bad_fork_cleanup_semundo; if ((retval = copy_fs(clone_flags, p))) goto bad_fork_cleanup_files; if ((retval = copy_sighand(clone_flags, p))) @@ -1274,12 +1271,10 @@ bad_fork_cleanup_fs: exit_fs(p); /* blocking */ bad_fork_cleanup_files: exit_files(p); /* blocking */ bad_fork_cleanup_semundo: exit_sem(p); -bad_fork_cleanup_audit: - audit_free(p); bad_fork_cleanup_security: security_task_free(p); bad_fork_cleanup_policy: #ifdef CONFIG_NUMA mpol_free(p->mempolicy); Index: linux-2.6.19-rc2-mm2/kernel/exit.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/exit.c +++ linux-2.6.19-rc2-mm2/kernel/exit.c @@ -37,11 +37,10 @@ #include <linux/cn_proc.h> #include <linux/mutex.h> #include <linux/futex.h> #include <linux/compat.h> #include <linux/pipe_fs_i.h> -#include <linux/audit.h> /* for audit_free() */ #include <linux/resource.h> #include <linux/blkdev.h> #include <linux/task_watchers.h> #include <asm/uaccess.h> @@ -912,12 +911,10 @@ fastcall NORET_TYPE void do_exit(long co exit_robust_list(tsk); #if defined(CONFIG_FUTEX) && defined(CONFIG_COMPAT) if (unlikely(tsk->compat_robust_list)) compat_exit_robust_list(tsk); #endif - if (unlikely(tsk->audit_context)) - audit_free(tsk); taskstats_exit_send(tsk, tidstats, group_dead, mycpu); taskstats_exit_free(tidstats); exit_mm(tsk); notify_task_watchers(WATCH_TASK_FREE, code, tsk); -- |
From: Matt H. <mat...@us...> - 2006-11-03 04:28:00
|
Make the semaphore undo code use a task watcher instead of hooking into copy_process() and do_exit() directly. Signed-off-by: Matt Helsley <mat...@us...> --- include/linux/sem.h | 17 ----------------- ipc/sem.c | 12 ++++++++---- kernel/exit.c | 3 --- kernel/fork.c | 6 +----- 4 files changed, 9 insertions(+), 29 deletions(-) Benchmark results: System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel Clone Number of Children Cloned 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17960.5 18169.3 18408.2 18479.9 18515.6 18465.4 Dev 305.381 314.209 292.395 284.992 299.331 295.311 Err (%) 1.70029 1.72934 1.5884 1.54217 1.61664 1.59927 Fork Number of Children Forked 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 18050.2 18141.4 18316.2 18386.2 18441.9 18476.2 Dev 295.68 312.922 296.962 298.81 300.985 294.046 Err (%) 1.63809 1.72491 1.62131 1.62519 1.63207 1.59149 Kernbench: Elapsed: 124.272s User: 439.643s System: 46.32s CPU: 390.5% 439.64user 46.25system 2:04.46elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.70user 46.27system 2:04.04elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.64user 46.31system 2:04.18elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.49user 46.27system 2:04.41elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.55user 46.47system 2:04.32elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.77user 46.29system 2:04.63elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.61user 46.31system 2:04.09elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.68user 46.31system 2:04.02elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.76user 46.49system 2:04.59elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.59user 46.23system 2:03.98elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k Index: linux-2.6.19-rc2-mm2/ipc/sem.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/ipc/sem.c +++ linux-2.6.19-rc2-mm2/ipc/sem.c @@ -81,10 +81,11 @@ #include <linux/audit.h> #include <linux/capability.h> #include <linux/seq_file.h> #include <linux/mutex.h> #include <linux/nsproxy.h> +#include <linux/task_watchers.h> #include <asm/uaccess.h> #include "util.h" #define sem_ids(ns) (*((ns)->ids[IPC_SEM_IDS])) @@ -1288,11 +1289,11 @@ asmlinkage long sys_semop (int semid, st * See the notes above unlock_semundo() regarding the spin_lock_init() * in this code. Initialize the undo_list->lock here instead of get_undo_list() * because of the reasoning in the comment above unlock_semundo. */ -int copy_semundo(unsigned long clone_flags, struct task_struct *tsk) +static int copy_semundo(unsigned long clone_flags, struct task_struct *tsk) { struct sem_undo_list *undo_list; int error; if (clone_flags & CLONE_SYSVSEM) { @@ -1304,10 +1305,11 @@ int copy_semundo(unsigned long clone_fla } else tsk->sysvsem.undo_list = NULL; return 0; } +task_watcher_func(init, copy_semundo); /* * add semadj values to semaphores, free undo structures. * undo structures are not freed when semaphore arrays are destroyed * so some of them may be out of date. @@ -1317,22 +1319,22 @@ int copy_semundo(unsigned long clone_fla * should we queue up and wait until we can do so legally? * The original implementation attempted to do this (queue and wait). * The current implementation does not do so. The POSIX standard * and SVID should be consulted to determine what behavior is mandated. */ -void exit_sem(struct task_struct *tsk) +static int exit_sem(unsigned long ignored, struct task_struct *tsk) { struct sem_undo_list *undo_list; struct sem_undo *u, **up; struct ipc_namespace *ns; undo_list = tsk->sysvsem.undo_list; if (!undo_list) - return; + return 0; if (!atomic_dec_and_test(&undo_list->refcnt)) - return; + return 0; ns = tsk->nsproxy->ipc_ns; /* There's no need to hold the semundo list lock, as current * is the last task exiting for this undo list. */ @@ -1395,11 +1397,13 @@ found: update_queue(sma); next_entry: sem_unlock(sma); } kfree(undo_list); + return 0; } +task_watcher_func(free, exit_sem); #ifdef CONFIG_PROC_FS static int sysvipc_sem_proc_show(struct seq_file *s, void *it) { struct sem_array *sma = it; Index: linux-2.6.19-rc2-mm2/kernel/exit.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/exit.c +++ linux-2.6.19-rc2-mm2/kernel/exit.c @@ -46,12 +46,10 @@ #include <asm/uaccess.h> #include <asm/unistd.h> #include <asm/pgtable.h> #include <asm/mmu_context.h> -extern void sem_exit (void); - static void exit_mm(struct task_struct * tsk); static void __unhash_process(struct task_struct *p) { nr_threads--; @@ -919,11 +917,10 @@ fastcall NORET_TYPE void do_exit(long co exit_mm(tsk); notify_task_watchers(WATCH_TASK_FREE, code, tsk); if (group_dead) acct_process(); - exit_sem(tsk); __exit_files(tsk); __exit_fs(tsk); exit_thread(); cpuset_exit(tsk); exit_keys(tsk); Index: linux-2.6.19-rc2-mm2/kernel/fork.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/fork.c +++ linux-2.6.19-rc2-mm2/kernel/fork.c @@ -1095,14 +1095,12 @@ static struct task_struct *copy_process( #endif if ((retval = security_task_alloc(p))) goto bad_fork_cleanup_policy; /* copy all the process information */ - if ((retval = copy_semundo(clone_flags, p))) - goto bad_fork_cleanup_security; if ((retval = copy_files(clone_flags, p))) - goto bad_fork_cleanup_semundo; + goto bad_fork_cleanup_security; if ((retval = copy_fs(clone_flags, p))) goto bad_fork_cleanup_files; if ((retval = copy_sighand(clone_flags, p))) goto bad_fork_cleanup_fs; if ((retval = copy_signal(clone_flags, p))) @@ -1269,12 +1267,10 @@ bad_fork_cleanup_sighand: __cleanup_sighand(p->sighand); bad_fork_cleanup_fs: exit_fs(p); /* blocking */ bad_fork_cleanup_files: exit_files(p); /* blocking */ -bad_fork_cleanup_semundo: - exit_sem(p); bad_fork_cleanup_security: security_task_free(p); bad_fork_cleanup_policy: #ifdef CONFIG_NUMA mpol_free(p->mempolicy); Index: linux-2.6.19-rc2-mm2/include/linux/sem.h =================================================================== --- linux-2.6.19-rc2-mm2.orig/include/linux/sem.h +++ linux-2.6.19-rc2-mm2/include/linux/sem.h @@ -136,25 +136,8 @@ struct sem_undo_list { struct sysv_sem { struct sem_undo_list *undo_list; }; -#ifdef CONFIG_SYSVIPC - -extern int copy_semundo(unsigned long clone_flags, struct task_struct *tsk); -extern void exit_sem(struct task_struct *tsk); - -#else -static inline int copy_semundo(unsigned long clone_flags, struct task_struct *tsk) -{ - return 0; -} - -static inline void exit_sem(struct task_struct *tsk) -{ - return; -} -#endif - #endif /* __KERNEL__ */ #endif /* _LINUX_SEM_H */ -- |
From: Matt H. <mat...@us...> - 2006-11-03 04:28:00
|
Register an irq-flag-tracing task watcher instead of hooking into copy_process(). Signed-off-by: Matt Helsley <mat...@us...> --- kernel/fork.c | 19 ------------------- kernel/irq/handle.c | 24 ++++++++++++++++++++++++ 2 files changed, 24 insertions(+), 19 deletions(-) Benchmark results: System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel Clone Number of Children Cloned 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17826.5 18077.4 18160.1 18263.6 18343 18350.8 Dev 305.841 306.331 283.323 284.761 292.732 292.882 Err (%) 1.71565 1.69455 1.56014 1.55917 1.59588 1.59602 Fork Number of Children Forked 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17813.5 18062.4 18140.5 18246.7 18237.8 18275.2 Dev 305.816 294.914 294.779 294.727 323.996 300.176 Err (%) 1.71677 1.63275 1.62498 1.61523 1.77651 1.64253 Kernbench: Elapsed: 124.4s User: 439.787s System: 46.485s CPU: 390.3% 439.70user 46.43system 2:04.64elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.92user 46.38system 2:04.47elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.79user 46.62system 2:04.44elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.83user 46.46system 2:04.29elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.73user 46.47system 2:04.12elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.83user 46.49system 2:04.10elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.76user 46.42system 2:04.41elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.70user 46.64system 2:04.30elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.79user 46.47system 2:04.76elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k 439.82user 46.47system 2:04.47elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k Index: linux-2.6.19-rc2-mm2/kernel/fork.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/fork.c +++ linux-2.6.19-rc2-mm2/kernel/fork.c @@ -1052,29 +1052,10 @@ static struct task_struct *copy_process( p->tgid = current->tgid; retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p); if (retval < 0) goto bad_fork_cleanup_delays_binfmt; -#ifdef CONFIG_TRACE_IRQFLAGS - p->irq_events = 0; -#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW - p->hardirqs_enabled = 1; -#else - p->hardirqs_enabled = 0; -#endif - p->hardirq_enable_ip = 0; - p->hardirq_enable_event = 0; - p->hardirq_disable_ip = _THIS_IP_; - p->hardirq_disable_event = 0; - p->softirqs_enabled = 1; - p->softirq_enable_ip = _THIS_IP_; - p->softirq_enable_event = 0; - p->softirq_disable_ip = 0; - p->softirq_disable_event = 0; - p->hardirq_context = 0; - p->softirq_context = 0; -#endif #ifdef CONFIG_LOCKDEP p->lockdep_depth = 0; /* no locks held yet */ p->curr_chain_key = 0; p->lockdep_recursion = 0; #endif Index: linux-2.6.19-rc2-mm2/kernel/irq/handle.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/irq/handle.c +++ linux-2.6.19-rc2-mm2/kernel/irq/handle.c @@ -13,10 +13,11 @@ #include <linux/irq.h> #include <linux/module.h> #include <linux/random.h> #include <linux/interrupt.h> #include <linux/kernel_stat.h> +#include <linux/task_watchers.h> #include "internals.h" /** * handle_bad_irq - handle spurious and unhandled irqs @@ -266,6 +267,29 @@ void early_init_irq_lock_class(void) for (i = 0; i < NR_IRQS; i++) lockdep_set_class(&irq_desc[i].lock, &irq_desc_lock_class); } +static int init_task_trace_irqflags(unsigned long clone_flags, + struct task_struct *p) +{ + p->irq_events = 0; +#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW + p->hardirqs_enabled = 1; +#else + p->hardirqs_enabled = 0; +#endif + p->hardirq_enable_ip = 0; + p->hardirq_enable_event = 0; + p->hardirq_disable_ip = _THIS_IP_; + p->hardirq_disable_event = 0; + p->softirqs_enabled = 1; + p->softirq_enable_ip = _THIS_IP_; + p->softirq_enable_event = 0; + p->softirq_disable_ip = 0; + p->softirq_disable_event = 0; + p->hardirq_context = 0; + p->softirq_context = 0; + return 0; +} +task_watcher_func(init, init_task_trace_irqflags); #endif -- |
From: Matt H. <mat...@us...> - 2006-11-03 04:28:01
|
Make the Process events connector use task watchers instead of hooking the paths it's interested in. Signed-off-by: Matt Helsley <mat...@us...> --- drivers/connector/cn_proc.c | 52 +++++++++++++++++++++++++++++++------------- fs/exec.c | 1 include/linux/cn_proc.h | 21 ----------------- kernel/exit.c | 2 - kernel/fork.c | 2 - kernel/sys.c | 9 ------- 6 files changed, 37 insertions(+), 50 deletions(-) Benchmark results: System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel Clone Number of Children Cloned 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17602.2 17876.7 17977.4 18075.5 18134.3 18151.5 Dev 291.294 376.373 277.882 288.971 278.25 276.3 Err (%) 1.65487 2.10539 1.54573 1.59869 1.53439 1.52219 Fork Number of Children Forked 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17691.1 17770.9 17932.6 17996 18096.4 18142.9 Dev 300.692 291.913 296.654 279.183 290.228 284.693 Err (%) 1.69968 1.64265 1.65428 1.55136 1.60379 1.56917 Kernbench: Elapsed: 124.359s User: 439.756s System: 46.457s CPU: 390.3% 439.87user 46.42system 2:04.44elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.68user 46.42system 2:04.15elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.72user 46.64system 2:04.40elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.81user 46.42system 2:03.92elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k 439.77user 46.39system 2:04.48elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.66user 46.41system 2:04.70elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k 439.73user 46.59system 2:04.42elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.97user 46.46system 2:04.45elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.62user 46.40system 2:04.33elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.73user 46.42system 2:04.30elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k Index: linux-2.6.19-rc2-mm2/drivers/connector/cn_proc.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/drivers/connector/cn_proc.c +++ linux-2.6.19-rc2-mm2/drivers/connector/cn_proc.c @@ -25,10 +25,11 @@ #include <linux/module.h> #include <linux/kernel.h> #include <linux/ktime.h> #include <linux/init.h> #include <linux/connector.h> +#include <linux/task_watchers.h> #include <asm/atomic.h> #include <linux/cn_proc.h> #define CN_PROC_MSG_SIZE (sizeof(struct cn_msg) + sizeof(struct proc_event)) @@ -44,19 +45,20 @@ static inline void get_seq(__u32 *ts, in *ts = get_cpu_var(proc_event_counts)++; *cpu = smp_processor_id(); put_cpu_var(proc_event_counts); } -void proc_fork_connector(struct task_struct *task) +static int proc_fork_connector(unsigned long clone_flags, + struct task_struct *task) { struct cn_msg *msg; struct proc_event *ev; __u8 buffer[CN_PROC_MSG_SIZE]; struct timespec ts; if (atomic_read(&proc_event_num_listeners) < 1) - return; + return 0; msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; get_seq(&msg->seq, &ev->cpu); ktime_get_ts(&ts); /* get high res monotonic timestamp */ @@ -70,21 +72,24 @@ void proc_fork_connector(struct task_str memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); /* If cn_netlink_send() failed, the data is not sent */ cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); + return 0; } +task_watcher_func(clone, proc_fork_connector); -void proc_exec_connector(struct task_struct *task) +static int proc_exec_connector(unsigned long ignore, + struct task_struct *task) { struct cn_msg *msg; struct proc_event *ev; struct timespec ts; __u8 buffer[CN_PROC_MSG_SIZE]; if (atomic_read(&proc_event_num_listeners) < 1) - return; + return 0; msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; get_seq(&msg->seq, &ev->cpu); ktime_get_ts(&ts); /* get high res monotonic timestamp */ @@ -95,21 +100,23 @@ void proc_exec_connector(struct task_str memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); + return 0; } +task_watcher_func(exec, proc_exec_connector); -void proc_id_connector(struct task_struct *task, int which_id) +static int process_change_id(unsigned long which_id, struct task_struct *task) { struct cn_msg *msg; struct proc_event *ev; __u8 buffer[CN_PROC_MSG_SIZE]; struct timespec ts; if (atomic_read(&proc_event_num_listeners) < 1) - return; + return 0; msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; ev->what = which_id; ev->event_data.id.process_pid = task->pid; @@ -119,47 +126,64 @@ void proc_id_connector(struct task_struc ev->event_data.id.e.euid = task->euid; } else if (which_id == PROC_EVENT_GID) { ev->event_data.id.r.rgid = task->gid; ev->event_data.id.e.egid = task->egid; } else - return; + return 0; get_seq(&msg->seq, &ev->cpu); ktime_get_ts(&ts); /* get high res monotonic timestamp */ ev->timestamp_ns = timespec_to_ns(&ts); memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); + return 0; +} + +static int proc_change_uid_connector(unsigned long ignore, + struct task_struct *task) +{ + return process_change_id(PROC_EVENT_UID, task); +} +task_watcher_func(uid, proc_change_uid_connector); + +static int proc_change_gid_connector(unsigned long ignore, + struct task_struct *task) +{ + return process_change_id(PROC_EVENT_GID, task); } +task_watcher_func(gid, proc_change_gid_connector); -void proc_exit_connector(struct task_struct *task) +static int proc_exit_connector(unsigned long code, struct task_struct *task) { struct cn_msg *msg; struct proc_event *ev; __u8 buffer[CN_PROC_MSG_SIZE]; struct timespec ts; if (atomic_read(&proc_event_num_listeners) < 1) - return; + return 0; msg = (struct cn_msg*)buffer; ev = (struct proc_event*)msg->data; get_seq(&msg->seq, &ev->cpu); ktime_get_ts(&ts); /* get high res monotonic timestamp */ ev->timestamp_ns = timespec_to_ns(&ts); ev->what = PROC_EVENT_EXIT; ev->event_data.exit.process_pid = task->pid; ev->event_data.exit.process_tgid = task->tgid; - ev->event_data.exit.exit_code = task->exit_code; + ev->event_data.exit.exit_code = code; ev->event_data.exit.exit_signal = task->exit_signal; memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id)); msg->ack = 0; /* not used */ msg->len = sizeof(*ev); cn_netlink_send(msg, CN_IDX_PROC, GFP_KERNEL); + return 0; } +task_watcher_func(exit, proc_exit_connector); /* * Send an acknowledgement message to userspace * * Use 0 for success, EFOO otherwise. @@ -226,14 +250,12 @@ static void cn_proc_mcast_ctl(void *data */ static int __init cn_proc_init(void) { int err; - if ((err = cn_add_callback(&cn_proc_event_id, "cn_proc", - &cn_proc_mcast_ctl))) { + err = cn_add_callback(&cn_proc_event_id, "cn_proc", &cn_proc_mcast_ctl); + if (err) printk(KERN_WARNING "cn_proc failed to register\n"); - return err; - } - return 0; + return err; } module_init(cn_proc_init); Index: linux-2.6.19-rc2-mm2/kernel/fork.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/fork.c +++ linux-2.6.19-rc2-mm2/kernel/fork.c @@ -40,11 +40,10 @@ #include <linux/mount.h> #include <linux/profile.h> #include <linux/rmap.h> #include <linux/acct.h> #include <linux/tsacct_kern.h> -#include <linux/cn_proc.h> #include <linux/delayacct.h> #include <linux/taskstats_kern.h> #include <linux/random.h> #include <linux/task_watchers.h> @@ -1212,11 +1211,10 @@ static struct task_struct *copy_process( total_forks++; spin_unlock(¤t->sighand->siglock); write_unlock_irq(&tasklist_lock); notify_task_watchers(WATCH_TASK_CLONE, clone_flags, p); - proc_fork_connector(p); return p; bad_fork_cleanup_namespaces: exit_task_namespaces(p); bad_fork_cleanup_mm: Index: linux-2.6.19-rc2-mm2/kernel/exit.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/exit.c +++ linux-2.6.19-rc2-mm2/kernel/exit.c @@ -30,11 +30,10 @@ #include <linux/taskstats_kern.h> #include <linux/delayacct.h> #include <linux/syscalls.h> #include <linux/signal.h> #include <linux/posix-timers.h> -#include <linux/cn_proc.h> #include <linux/mutex.h> #include <linux/futex.h> #include <linux/compat.h> #include <linux/pipe_fs_i.h> #include <linux/resource.h> @@ -927,11 +926,10 @@ fastcall NORET_TYPE void do_exit(long co module_put(task_thread_info(tsk)->exec_domain->module); if (tsk->binfmt) module_put(tsk->binfmt->module); tsk->exit_code = code; - proc_exit_connector(tsk); exit_notify(tsk); exit_task_namespaces(tsk); /* * This must happen late, after the PID is not * hashed anymore: Index: linux-2.6.19-rc2-mm2/kernel/sys.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/sys.c +++ linux-2.6.19-rc2-mm2/kernel/sys.c @@ -25,11 +25,10 @@ #include <linux/security.h> #include <linux/dcookies.h> #include <linux/suspend.h> #include <linux/tty.h> #include <linux/signal.h> -#include <linux/cn_proc.h> #include <linux/getcpu.h> #include <linux/seccomp.h> #include <linux/task_watchers.h> #include <linux/compat.h> @@ -957,11 +956,10 @@ asmlinkage long sys_setregid(gid_t rgid, (egid != (gid_t) -1 && egid != old_rgid)) current->sgid = new_egid; current->fsgid = new_egid; current->egid = new_egid; current->gid = new_rgid; - proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } /* @@ -992,11 +990,10 @@ asmlinkage long sys_setgid(gid_t gid) current->egid = current->fsgid = gid; } else return -EPERM; - proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } static int set_user(uid_t new_ruid, int dumpclear) @@ -1080,11 +1077,10 @@ asmlinkage long sys_setreuid(uid_t ruid, if (ruid != (uid_t) -1 || (euid != (uid_t) -1 && euid != old_ruid)) current->suid = current->euid; current->fsuid = current->euid; - proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); } @@ -1127,11 +1123,10 @@ asmlinkage long sys_setuid(uid_t uid) smp_wmb(); } current->fsuid = current->euid = uid; current->suid = new_suid; - proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); } @@ -1175,11 +1170,10 @@ asmlinkage long sys_setresuid(uid_t ruid } current->fsuid = current->euid; if (suid != (uid_t) -1) current->suid = suid; - proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); } @@ -1227,11 +1221,10 @@ asmlinkage long sys_setresgid(gid_t rgid if (rgid != (gid_t) -1) current->gid = rgid; if (sgid != (gid_t) -1) current->sgid = sgid; - proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid) @@ -1268,11 +1261,10 @@ asmlinkage long sys_setfsuid(uid_t uid) smp_wmb(); } current->fsuid = uid; } - proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); return old_fsuid; @@ -1295,11 +1287,10 @@ asmlinkage long sys_setfsgid(gid_t gid) if (gid != old_fsgid) { current->mm->dumpable = suid_dumpable; smp_wmb(); } current->fsgid = gid; - proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); } return old_fsgid; } Index: linux-2.6.19-rc2-mm2/fs/exec.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/fs/exec.c +++ linux-2.6.19-rc2-mm2/fs/exec.c @@ -1086,11 +1086,10 @@ int search_binary_handler(struct linux_b fput(bprm->file); bprm->file = NULL; current->did_exec = 1; notify_task_watchers(WATCH_TASK_EXEC, 0, current); - proc_exec_connector(current); return retval; } read_lock(&binfmt_lock); put_binfmt(fmt); if (retval != -ENOEXEC || bprm->mm == NULL) Index: linux-2.6.19-rc2-mm2/include/linux/cn_proc.h =================================================================== --- linux-2.6.19-rc2-mm2.orig/include/linux/cn_proc.h +++ linux-2.6.19-rc2-mm2/include/linux/cn_proc.h @@ -95,27 +95,6 @@ struct proc_event { __u32 exit_code, exit_signal; } exit; } event_data; }; -#ifdef __KERNEL__ -#ifdef CONFIG_PROC_EVENTS -void proc_fork_connector(struct task_struct *task); -void proc_exec_connector(struct task_struct *task); -void proc_id_connector(struct task_struct *task, int which_id); -void proc_exit_connector(struct task_struct *task); -#else -static inline void proc_fork_connector(struct task_struct *task) -{} - -static inline void proc_exec_connector(struct task_struct *task) -{} - -static inline void proc_id_connector(struct task_struct *task, - int which_id) -{} - -static inline void proc_exit_connector(struct task_struct *task) -{} -#endif /* CONFIG_PROC_EVENTS */ -#endif /* __KERNEL__ */ #endif /* CN_PROC_H */ -- |
From: Matt H. <mat...@us...> - 2006-11-03 04:28:02
|
Register a task watcher for lockdep instead of hooking into copy_process(). Signed-off-by: Matt Helsley <mat...@us...> --- kernel/fork.c | 5 ----- kernel/lockdep.c | 9 +++++++++ 2 files changed, 9 insertions(+), 5 deletions(-) Benchmark results: System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel Clone Number of Children Cloned 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17808.2 18092.3 18215.5 18183.6 18310.8 18342.8 Dev 302.333 317.786 303.385 280.608 281.378 294.009 Err (%) 1.69772 1.75647 1.66553 1.5432 1.53668 1.60285 Fork Number of Children Forked 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17821.8 18025.1 18112.5 18226 18217.4 18318 Dev 316.497 310.195 291.372 297.166 364.908 293.89 Err (%) 1.7759 1.7209 1.60868 1.63045 2.00307 1.60438 Kernbench: Elapsed: 124.333s User: 439.787s System: 46.491s CPU: 390.7% 439.67user 46.42system 2:04.09elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.82user 46.46system 2:04.17elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.75user 46.65system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.79user 46.43system 2:04.54elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.71user 46.43system 2:04.56elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.71user 46.51system 2:04.45elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.86user 46.64system 2:04.69elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.87user 46.44system 2:04.05elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k 439.87user 46.48system 2:04.63elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.82user 46.45system 2:03.91elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k Index: linux-2.6.19-rc2-mm2/kernel/fork.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/fork.c +++ linux-2.6.19-rc2-mm2/kernel/fork.c @@ -1052,15 +1052,10 @@ static struct task_struct *copy_process( p->tgid = current->tgid; retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p); if (retval < 0) goto bad_fork_cleanup_delays_binfmt; -#ifdef CONFIG_LOCKDEP - p->lockdep_depth = 0; /* no locks held yet */ - p->curr_chain_key = 0; - p->lockdep_recursion = 0; -#endif #ifdef CONFIG_DEBUG_MUTEXES p->blocked_on = NULL; /* not blocked yet */ #endif Index: linux-2.6.19-rc2-mm2/kernel/lockdep.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/lockdep.c +++ linux-2.6.19-rc2-mm2/kernel/lockdep.c @@ -2556,10 +2556,19 @@ void __init lockdep_init(void) INIT_LIST_HEAD(chainhash_table + i); lockdep_initialized = 1; } +static int init_task_lockdep(unsigned long clone_flags, struct task_struct *p) +{ + p->lockdep_depth = 0; /* no locks held yet */ + p->curr_chain_key = 0; + p->lockdep_recursion = 0; + return 0; +} +task_watcher_func(init, init_task_lockdep); + void __init lockdep_info(void) { printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n"); printk("... MAX_LOCKDEP_SUBCLASSES: %lu\n", MAX_LOCKDEP_SUBCLASSES); -- |
From: Matt H. <mat...@us...> - 2006-11-03 04:28:04
|
Associate function calls with significant events in a task's lifetime much like we handle kernel and module init/exit functions. This creates a table for each of the following events in the task_watchers_table ELF section: WATCH_TASK_INIT at the beginning of a fork/clone system call when the new task struct first becomes available. WATCH_TASK_CLONE just before returning successfully from a fork/clone. WATCH_TASK_EXEC just before successfully returning from the exec system call. WATCH_TASK_UID every time a task's real or effective user id changes. WATCH_TASK_GID every time a task's real or effective group id changes. WATCH_TASK_EXIT at the beginning of do_exit when a task is exiting for any reason. WATCH_TASK_FREE is called before critical task structures like the mm_struct become inaccessible and the task is subsequently freed. The next patch will add a debugfs interface for measuring fork and exit rates which can be used to calculate the overhead of the task watcher infrastructure. Subsequent patches will make use of task watchers to simplify fork, exit, and many of the system calls that set [er][ug]ids. Signed-off-by: Matt Helsley <mat...@us...> Cc: Andrew Morton <ak...@os...> Cc: Jes Sorensen <je...@sg...> Cc: Chandra S. Seetharaman <sek...@us...> Cc: Christoph Hellwig <hc...@ls...> Cc: Al Viro <vi...@ze...> Cc: Steve Grubb <sg...@re...> Cc: lin...@re... Cc: Paul Jackson <pj...@sg...> --- fs/exec.c | 3 +++ include/asm-generic/vmlinux.lds.h | 19 +++++++++++++++++++ include/linux/task_watchers.h | 31 +++++++++++++++++++++++++++++++ kernel/Makefile | 2 +- kernel/exit.c | 3 +++ kernel/fork.c | 15 +++++++++++---- kernel/sys.c | 9 +++++++++ kernel/task_watchers.c | 37 +++++++++++++++++++++++++++++++++++++ 8 files changed, 114 insertions(+), 5 deletions(-) Benchmark results: System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel Clone Number of Children Cloned 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 18058.4 18323.3 18465.9 18439.5 18574.5 18566.3 Dev 325.705 306.322 316.464 291.979 287.531 281.275 Err (%) 1.80362 1.67176 1.71378 1.58345 1.54799 1.51498 Fork Number of Children Forked 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 18074 18199.8 18399.7 18482.5 18504.6 18565.5 Dev 331.876 315.515 302.402 309.314 300.937 309.168 Err (%) 1.83621 1.73361 1.64351 1.67356 1.62628 1.66528 Kernbench: Elapsed: 124.353s User: 439.935s System: 46.334s CPU: 390.4% 440.61user 46.24system 2:04.35elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 440.27user 46.21system 2:04.81elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k 440.78user 46.70system 2:04.39elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.91user 46.35system 2:04.31elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.80user 46.28system 2:04.39elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.67user 46.27system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.63user 46.29system 2:04.01elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.49user 46.48system 2:04.67elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k 439.63user 46.25system 2:04.34elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.56user 46.27system 2:04.13elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k Index: linux-2.6.19-rc2-mm2/kernel/sys.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/sys.c +++ linux-2.6.19-rc2-mm2/kernel/sys.c @@ -28,10 +28,11 @@ #include <linux/tty.h> #include <linux/signal.h> #include <linux/cn_proc.h> #include <linux/getcpu.h> #include <linux/seccomp.h> +#include <linux/task_watchers.h> #include <linux/compat.h> #include <linux/syscalls.h> #include <linux/kprobes.h> @@ -958,10 +959,11 @@ asmlinkage long sys_setregid(gid_t rgid, current->fsgid = new_egid; current->egid = new_egid; current->gid = new_rgid; key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); + notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } /* * setgid() is implemented like SysV w/ SAVED_IDS @@ -993,10 +995,11 @@ asmlinkage long sys_setgid(gid_t gid) else return -EPERM; key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); + notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } static int set_user(uid_t new_ruid, int dumpclear) { @@ -1081,10 +1084,11 @@ asmlinkage long sys_setreuid(uid_t ruid, current->suid = current->euid; current->fsuid = current->euid; key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); + notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); } @@ -1128,10 +1132,11 @@ asmlinkage long sys_setuid(uid_t uid) current->fsuid = current->euid = uid; current->suid = new_suid; key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); + notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); } @@ -1176,10 +1181,11 @@ asmlinkage long sys_setresuid(uid_t ruid if (suid != (uid_t) -1) current->suid = suid; key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); + notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); } asmlinkage long sys_getresuid(uid_t __user *ruid, uid_t __user *euid, uid_t __user *suid) @@ -1228,10 +1234,11 @@ asmlinkage long sys_setresgid(gid_t rgid if (sgid != (gid_t) -1) current->sgid = sgid; key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); + notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } asmlinkage long sys_getresgid(gid_t __user *rgid, gid_t __user *egid, gid_t __user *sgid) { @@ -1269,10 +1276,11 @@ asmlinkage long sys_setfsuid(uid_t uid) current->fsuid = uid; } key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); + notify_task_watchers(WATCH_TASK_UID, 0, current); security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); return old_fsuid; } @@ -1296,10 +1304,11 @@ asmlinkage long sys_setfsgid(gid_t gid) smp_wmb(); } current->fsgid = gid; key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); + notify_task_watchers(WATCH_TASK_GID, 0, current); } return old_fsgid; } asmlinkage long sys_times(struct tms __user * tbuf) Index: linux-2.6.19-rc2-mm2/kernel/exit.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/exit.c +++ linux-2.6.19-rc2-mm2/kernel/exit.c @@ -40,10 +40,11 @@ #include <linux/compat.h> #include <linux/pipe_fs_i.h> #include <linux/audit.h> /* for audit_free() */ #include <linux/resource.h> #include <linux/blkdev.h> +#include <linux/task_watchers.h> #include <asm/uaccess.h> #include <asm/unistd.h> #include <asm/pgtable.h> #include <asm/mmu_context.h> @@ -885,10 +886,11 @@ fastcall NORET_TYPE void do_exit(long co set_current_state(TASK_UNINTERRUPTIBLE); schedule(); } tsk->flags |= PF_EXITING; + notify_task_watchers(WATCH_TASK_EXIT, code, tsk); if (unlikely(in_atomic())) printk(KERN_INFO "note: %s[%d] exited with preempt_count %d\n", current->comm, current->pid, preempt_count()); @@ -916,10 +918,11 @@ fastcall NORET_TYPE void do_exit(long co audit_free(tsk); taskstats_exit_send(tsk, tidstats, group_dead, mycpu); taskstats_exit_free(tidstats); exit_mm(tsk); + notify_task_watchers(WATCH_TASK_FREE, code, tsk); if (group_dead) acct_process(); exit_sem(tsk); __exit_files(tsk); Index: linux-2.6.19-rc2-mm2/fs/exec.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/fs/exec.c +++ linux-2.6.19-rc2-mm2/fs/exec.c @@ -48,10 +48,11 @@ #include <linux/syscalls.h> #include <linux/rmap.h> #include <linux/tsacct_kern.h> #include <linux/cn_proc.h> #include <linux/audit.h> +#include <linux/task_watchers.h> #include <asm/uaccess.h> #include <asm/mmu_context.h> #ifdef CONFIG_KMOD @@ -1083,10 +1084,12 @@ int search_binary_handler(struct linux_b allow_write_access(bprm->file); if (bprm->file) fput(bprm->file); bprm->file = NULL; current->did_exec = 1; + notify_task_watchers(WATCH_TASK_EXEC, 0, + current); proc_exec_connector(current); return retval; } read_lock(&binfmt_lock); put_binfmt(fmt); Index: linux-2.6.19-rc2-mm2/include/linux/task_watchers.h =================================================================== --- /dev/null +++ linux-2.6.19-rc2-mm2/include/linux/task_watchers.h @@ -0,0 +1,31 @@ +#ifndef _TASK_WATCHERS_H +#define _TASK_WATCHERS_H +#include <linux/sched.h> + +#define WATCH_TASK_INIT 0 +#define WATCH_TASK_CLONE 1 +#define WATCH_TASK_EXEC 2 +#define WATCH_TASK_UID 3 +#define WATCH_TASK_GID 4 +#define WATCH_TASK_EXIT 5 +#define WATCH_TASK_FREE 6 +#define NUM_WATCH_TASK_EVENTS 7 + +#ifndef MODULE +typedef int (*task_watcher_fn)(unsigned long, struct task_struct*); + +/* + * Watch for events occuring within a task and call the supplied function + * when (and only when) the given event happens. + * Only non-modular kernel code may register functions as task_watchers. + */ +#define task_watcher_func(ev, fn) \ +static task_watcher_fn __task_watcher_##ev##_##fn __attribute_used__ \ + __attribute__ ((__section__ (".task_watchers." #ev))) = fn +#else +#error "task_watcher() macro may not be used in modules." +#endif + +extern int notify_task_watchers(unsigned int ev_idx, unsigned long val, + struct task_struct *tsk); +#endif /* _TASK_WATCHERS_H */ Index: linux-2.6.19-rc2-mm2/kernel/task_watchers.c =================================================================== --- /dev/null +++ linux-2.6.19-rc2-mm2/kernel/task_watchers.c @@ -0,0 +1,37 @@ +#include <linux/task_watchers.h> + +/* Defined in include/asm-generic/common.lds.h */ +extern const task_watcher_fn __start_task_watchers_init[], + __start_task_watchers_clone[], __start_task_watchers_exec[], + __start_task_watchers_uid[], __start_task_watchers_gid[], + __start_task_watchers_exit[], __start_task_watchers_free[], + __stop_task_watchers_free[]; + +/* + * Tables of ptrs to the first watcher func for WATCH_TASK_* + */ +static const task_watcher_fn *twtable[] = { + __start_task_watchers_init, + __start_task_watchers_clone, + __start_task_watchers_exec, + __start_task_watchers_uid, + __start_task_watchers_gid, + __start_task_watchers_exit, + __start_task_watchers_free, + __stop_task_watchers_free, +}; + +int notify_task_watchers(unsigned int ev, unsigned long val, + struct task_struct *tsk) +{ + const task_watcher_fn *tw_call; + int ret_err = 0, err; + + /* Call all of the watchers, report the first error */ + for (tw_call = twtable[ev]; tw_call < twtable[ev + 1]; tw_call++) { + err = (*tw_call)(val, tsk); + if (unlikely((err < 0) && (ret_err == NOTIFY_OK))) + ret_err = err; + } + return ret_err; +} Index: linux-2.6.19-rc2-mm2/kernel/Makefile =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/Makefile +++ linux-2.6.19-rc2-mm2/kernel/Makefile @@ -6,11 +6,11 @@ obj-y = sched.o fork.o exec_domain.o exit.o itimer.o time.o softirq.o resource.o \ sysctl.o capability.o ptrace.o timer.o user.o \ signal.o sys.o kmod.o workqueue.o pid.o \ rcupdate.o extable.o params.o posix-timers.o \ kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \ - hrtimer.o rwsem.o latency.o nsproxy.o srcu.o + hrtimer.o rwsem.o latency.o nsproxy.o srcu.o task_watchers.o obj-$(CONFIG_STACKTRACE) += stacktrace.o obj-y += time/ obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o obj-$(CONFIG_LOCKDEP) += lockdep.o Index: linux-2.6.19-rc2-mm2/kernel/fork.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/fork.c +++ linux-2.6.19-rc2-mm2/kernel/fork.c @@ -46,10 +46,11 @@ #include <linux/tsacct_kern.h> #include <linux/cn_proc.h> #include <linux/delayacct.h> #include <linux/taskstats_kern.h> #include <linux/random.h> +#include <linux/task_watchers.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> #include <asm/uaccess.h> #include <asm/mmu_context.h> @@ -1045,10 +1046,18 @@ static struct task_struct *copy_process( do_posix_clock_monotonic_gettime(&p->start_time); p->security = NULL; p->io_context = NULL; p->io_wait = NULL; p->audit_context = NULL; + + p->tgid = p->pid; + if (clone_flags & CLONE_THREAD) + p->tgid = current->tgid; + + retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p); + if (retval < 0) + goto bad_fork_cleanup_delays_binfmt; cpuset_fork(p); #ifdef CONFIG_NUMA p->mempolicy = mpol_copy(p->mempolicy); if (IS_ERR(p->mempolicy)) { retval = PTR_ERR(p->mempolicy); @@ -1084,14 +1093,10 @@ static struct task_struct *copy_process( #ifdef CONFIG_DEBUG_MUTEXES p->blocked_on = NULL; /* not blocked yet */ #endif - p->tgid = p->pid; - if (clone_flags & CLONE_THREAD) - p->tgid = current->tgid; - if ((retval = security_task_alloc(p))) goto bad_fork_cleanup_policy; if ((retval = audit_alloc(p))) goto bad_fork_cleanup_security; /* copy all the process information */ @@ -1248,10 +1253,11 @@ static struct task_struct *copy_process( } total_forks++; spin_unlock(¤t->sighand->siglock); write_unlock_irq(&tasklist_lock); + notify_task_watchers(WATCH_TASK_CLONE, clone_flags, p); proc_fork_connector(p); return p; bad_fork_cleanup_namespaces: exit_task_namespaces(p); @@ -1280,10 +1286,11 @@ bad_fork_cleanup_policy: bad_fork_cleanup_cpuset: #endif cpuset_exit(p); bad_fork_cleanup_delays_binfmt: delayacct_tsk_free(p); + notify_task_watchers(WATCH_TASK_FREE, 0, p); if (p->binfmt) module_put(p->binfmt->module); bad_fork_cleanup_put_domain: module_put(task_thread_info(p)->exec_domain->module); bad_fork_cleanup_count: Index: linux-2.6.19-rc2-mm2/include/asm-generic/vmlinux.lds.h =================================================================== --- linux-2.6.19-rc2-mm2.orig/include/asm-generic/vmlinux.lds.h +++ linux-2.6.19-rc2-mm2/include/asm-generic/vmlinux.lds.h @@ -42,10 +42,29 @@ VMLINUX_SYMBOL(__start_rio_route_ops) = .; \ *(.rio_route_ops) \ VMLINUX_SYMBOL(__end_rio_route_ops) = .; \ } \ \ + .task_watchers_table : AT(ADDR(.task_watchers_table) - LOAD_OFFSET) { \ + *(.task_watchers_table) \ + VMLINUX_SYMBOL(__start_task_watchers_init) = .; \ + *(.task_watchers.init) \ + VMLINUX_SYMBOL(__start_task_watchers_clone) = .; \ + *(.task_watchers.clone) \ + VMLINUX_SYMBOL(__start_task_watchers_exec) = .; \ + *(.task_watchers.exec) \ + VMLINUX_SYMBOL(__start_task_watchers_uid) = .; \ + *(.task_watchers.uid) \ + VMLINUX_SYMBOL(__start_task_watchers_gid) = .; \ + *(.task_watchers.gid) \ + VMLINUX_SYMBOL(__start_task_watchers_exit) = .; \ + *(.task_watchers.exit) \ + VMLINUX_SYMBOL(__start_task_watchers_free) = .; \ + *(.task_watchers.free) \ + VMLINUX_SYMBOL(__stop_task_watchers_free) = .; \ + } \ + \ /* Kernel symbol table: Normal symbols */ \ __ksymtab : AT(ADDR(__ksymtab) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start___ksymtab) = .; \ *(__ksymtab) \ VMLINUX_SYMBOL(__stop___ksymtab) = .; \ -- |
From: Matt H. <mat...@us...> - 2006-11-03 04:28:03
|
Register a task watcher for cpusets instead of hooking into copy_process() and do_exit() directly. Signed-off-by: Matt Helsley <mat...@us...> Cc: Paul Jackson <pj...@sg...> --- include/linux/cpuset.h | 4 ---- kernel/cpuset.c | 7 +++++-- kernel/exit.c | 2 -- kernel/fork.c | 6 +----- 4 files changed, 6 insertions(+), 13 deletions(-) Benchmark results: System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel Clone Number of Children Cloned 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 18023.8 18243.8 18485.1 18422.9 18469.4 18505.1 Dev 317.163 297.266 298.965 288.518 294.607 290.491 Err (%) 1.75969 1.6294 1.61733 1.56608 1.59511 1.56979 Fork Number of Children Forked 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17950.9 18149.7 18283 18409.3 18414.1 18450.3 Dev 310.206 300.925 297.458 290.673 298.75 301.009 Err (%) 1.72808 1.65802 1.62696 1.57895 1.6224 1.63146 Kernbench: Elapsed: 124.248s User: 439.83s System: 46.258s CPU: 390.7% 439.80user 46.26system 2:04.53elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.79user 46.20system 2:04.29elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.80user 46.42system 2:04.37elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.88user 46.16system 2:04.36elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.76user 46.21system 2:03.72elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k 439.93user 46.21system 2:03.90elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k 439.88user 46.25system 2:04.67elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k 439.79user 46.38system 2:04.31elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.90user 46.25system 2:04.09elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.77user 46.24system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k Index: linux-2.6.19-rc2-mm2/kernel/fork.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/fork.c +++ linux-2.6.19-rc2-mm2/kernel/fork.c @@ -28,11 +28,10 @@ #include <linux/mman.h> #include <linux/fs.h> #include <linux/nsproxy.h> #include <linux/capability.h> #include <linux/cpu.h> -#include <linux/cpuset.h> #include <linux/security.h> #include <linux/swap.h> #include <linux/syscalls.h> #include <linux/jiffies.h> #include <linux/futex.h> @@ -1053,17 +1052,16 @@ static struct task_struct *copy_process( p->tgid = current->tgid; retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p); if (retval < 0) goto bad_fork_cleanup_delays_binfmt; - cpuset_fork(p); #ifdef CONFIG_NUMA p->mempolicy = mpol_copy(p->mempolicy); if (IS_ERR(p->mempolicy)) { retval = PTR_ERR(p->mempolicy); p->mempolicy = NULL; - goto bad_fork_cleanup_cpuset; + goto bad_fork_cleanup_delays_binfmt; } mpol_fix_fork_child_flag(p); #endif #ifdef CONFIG_TRACE_IRQFLAGS p->irq_events = 0; @@ -1272,13 +1270,11 @@ bad_fork_cleanup_files: bad_fork_cleanup_security: security_task_free(p); bad_fork_cleanup_policy: #ifdef CONFIG_NUMA mpol_free(p->mempolicy); -bad_fork_cleanup_cpuset: #endif - cpuset_exit(p); bad_fork_cleanup_delays_binfmt: delayacct_tsk_free(p); notify_task_watchers(WATCH_TASK_FREE, 0, p); if (p->binfmt) module_put(p->binfmt->module); Index: linux-2.6.19-rc2-mm2/kernel/cpuset.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/cpuset.c +++ linux-2.6.19-rc2-mm2/kernel/cpuset.c @@ -47,10 +47,11 @@ #include <linux/stat.h> #include <linux/string.h> #include <linux/time.h> #include <linux/backing-dev.h> #include <linux/sort.h> +#include <linux/task_watchers.h> #include <asm/uaccess.h> #include <asm/atomic.h> #include <linux/mutex.h> @@ -2172,17 +2173,18 @@ void __init cpuset_init_smp(void) * * At the point that cpuset_fork() is called, 'current' is the parent * task, and the passed argument 'child' points to the child task. **/ -void cpuset_fork(struct task_struct *child) +static void cpuset_fork(unsigned long clone_flags, struct task_struct *child) { task_lock(current); child->cpuset = current->cpuset; atomic_inc(&child->cpuset->count); task_unlock(current); } +task_watcher_func(init, cpuset_fork); /** * cpuset_exit - detach cpuset from exiting task * @tsk: pointer to task_struct of exiting process * @@ -2239,11 +2241,11 @@ void cpuset_fork(struct task_struct *chi * to NULL here, and check in cpuset_update_task_memory_state() * for a NULL pointer. This hack avoids that NULL check, for no * cost (other than this way too long comment ;). **/ -void cpuset_exit(struct task_struct *tsk) +static void cpuset_exit(unsigned long exit_code, struct task_struct *tsk) { struct cpuset *cs; cs = tsk->cpuset; tsk->cpuset = &top_cpuset; /* the_top_cpuset_hack - see above */ @@ -2258,10 +2260,11 @@ void cpuset_exit(struct task_struct *tsk cpuset_release_agent(pathbuf); } else { atomic_dec(&cs->count); } } +task_watcher_func(free, cpuset_exit); /** * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset. * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed. * Index: linux-2.6.19-rc2-mm2/kernel/exit.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/exit.c +++ linux-2.6.19-rc2-mm2/kernel/exit.c @@ -28,11 +28,10 @@ #include <linux/mount.h> #include <linux/proc_fs.h> #include <linux/mempolicy.h> #include <linux/taskstats_kern.h> #include <linux/delayacct.h> -#include <linux/cpuset.h> #include <linux/syscalls.h> #include <linux/signal.h> #include <linux/posix-timers.h> #include <linux/cn_proc.h> #include <linux/mutex.h> @@ -920,11 +919,10 @@ fastcall NORET_TYPE void do_exit(long co if (group_dead) acct_process(); __exit_files(tsk); __exit_fs(tsk); exit_thread(); - cpuset_exit(tsk); exit_keys(tsk); if (group_dead && tsk->signal->leader) disassociate_ctty(1); Index: linux-2.6.19-rc2-mm2/include/linux/cpuset.h =================================================================== --- linux-2.6.19-rc2-mm2.orig/include/linux/cpuset.h +++ linux-2.6.19-rc2-mm2/include/linux/cpuset.h @@ -17,12 +17,10 @@ extern int number_of_cpusets; /* How many cpusets are defined in system? */ extern int cpuset_init_early(void); extern int cpuset_init(void); extern void cpuset_init_smp(void); -extern void cpuset_fork(struct task_struct *p); -extern void cpuset_exit(struct task_struct *p); extern cpumask_t cpuset_cpus_allowed(struct task_struct *p); extern nodemask_t cpuset_mems_allowed(struct task_struct *p); #define cpuset_current_mems_allowed (current->mems_allowed) void cpuset_init_current_mems_allowed(void); void cpuset_update_task_memory_state(void); @@ -69,12 +67,10 @@ extern void cpuset_track_online_nodes(vo #else /* !CONFIG_CPUSETS */ static inline int cpuset_init_early(void) { return 0; } static inline int cpuset_init(void) { return 0; } static inline void cpuset_init_smp(void) {} -static inline void cpuset_fork(struct task_struct *p) {} -static inline void cpuset_exit(struct task_struct *p) {} static inline cpumask_t cpuset_cpus_allowed(struct task_struct *p) { return cpu_possible_map; } -- |
From: Matt H. <mat...@us...> - 2006-11-03 04:28:04
|
Register a NUMA mempolicy task watcher instead of hooking into copy_process() and do_exit() directly. Signed-off-by: Matt Helsley <mat...@us...> --- kernel/exit.c | 4 ---- kernel/fork.c | 15 +-------------- mm/mempolicy.c | 24 ++++++++++++++++++++++++ 3 files changed, 25 insertions(+), 18 deletions(-) Benchmark results: System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel Clone Number of Children Cloned 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17836.3 18085.2 18220.4 18225 18319 18339 Dev 302.801 314.617 303.079 293.46 287.267 294.819 Err (%) 1.69767 1.73963 1.6634 1.6102 1.56814 1.60761 Fork Number of Children Forked 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17896.2 17990 18100.6 18242.3 18244 18346.9 Dev 301.64 285.698 295.646 304.361 299.472 287.153 Err (%) 1.6855 1.58809 1.63335 1.66844 1.64148 1.56513 Kernbench: Elapsed: 124.532s User: 439.732s System: 46.497s CPU: 389.9% 439.71user 46.48system 2:04.24elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.79user 46.42system 2:05.10elapsed 388%CPU (0avgtext+0avgdata 0maxresident)k 439.74user 46.44system 2:04.60elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.75user 46.64system 2:04.74elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k 439.61user 46.45system 2:05.36elapsed 387%CPU (0avgtext+0avgdata 0maxresident)k 439.60user 46.43system 2:04.33elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.77user 46.47system 2:04.34elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.87user 46.45system 2:04.10elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.76user 46.71system 2:04.58elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.72user 46.48system 2:03.93elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k Index: linux-2.6.19-rc2-mm2/mm/mempolicy.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/mm/mempolicy.c +++ linux-2.6.19-rc2-mm2/mm/mempolicy.c @@ -87,10 +87,11 @@ #include <linux/seq_file.h> #include <linux/proc_fs.h> #include <linux/migrate.h> #include <linux/rmap.h> #include <linux/security.h> +#include <linux/task_watchers.h> #include <asm/tlbflush.h> #include <asm/uaccess.h> /* Internal flags */ @@ -1333,10 +1334,33 @@ struct mempolicy *__mpol_copy(struct mem } } return new; } +static int init_task_mempolicy(unsigned long clone_flags, + struct task_struct *tsk) +{ + tsk->mempolicy = mpol_copy(tsk->mempolicy); + if (IS_ERR(tsk->mempolicy)) { + int retval; + + retval = PTR_ERR(tsk->mempolicy); + tsk->mempolicy = NULL; + return retval; + } + mpol_fix_fork_child_flag(tsk); + return 0; +} +task_watcher_func(init, init_task_mempolicy); + +static int free_task_mempolicy(unsigned int ignored, struct task_struct *tsk) +{ + mpol_free(tsk); + tsk->mempolicy = NULL; +} +task_watcher_func(free, free_task_mempolicy); + /* Slow path of a mempolicy comparison */ int __mpol_equal(struct mempolicy *a, struct mempolicy *b) { if (!a || !b) return 0; Index: linux-2.6.19-rc2-mm2/kernel/fork.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/fork.c +++ linux-2.6.19-rc2-mm2/kernel/fork.c @@ -1052,19 +1052,10 @@ static struct task_struct *copy_process( p->tgid = current->tgid; retval = notify_task_watchers(WATCH_TASK_INIT, clone_flags, p); if (retval < 0) goto bad_fork_cleanup_delays_binfmt; -#ifdef CONFIG_NUMA - p->mempolicy = mpol_copy(p->mempolicy); - if (IS_ERR(p->mempolicy)) { - retval = PTR_ERR(p->mempolicy); - p->mempolicy = NULL; - goto bad_fork_cleanup_delays_binfmt; - } - mpol_fix_fork_child_flag(p); -#endif #ifdef CONFIG_TRACE_IRQFLAGS p->irq_events = 0; #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW p->hardirqs_enabled = 1; #else @@ -1091,11 +1082,11 @@ static struct task_struct *copy_process( #ifdef CONFIG_DEBUG_MUTEXES p->blocked_on = NULL; /* not blocked yet */ #endif if ((retval = security_task_alloc(p))) - goto bad_fork_cleanup_policy; + goto bad_fork_cleanup_delays_binfmt; /* copy all the process information */ if ((retval = copy_files(clone_flags, p))) goto bad_fork_cleanup_security; if ((retval = copy_fs(clone_flags, p))) goto bad_fork_cleanup_files; @@ -1267,14 +1258,10 @@ bad_fork_cleanup_fs: exit_fs(p); /* blocking */ bad_fork_cleanup_files: exit_files(p); /* blocking */ bad_fork_cleanup_security: security_task_free(p); -bad_fork_cleanup_policy: -#ifdef CONFIG_NUMA - mpol_free(p->mempolicy); -#endif bad_fork_cleanup_delays_binfmt: delayacct_tsk_free(p); notify_task_watchers(WATCH_TASK_FREE, 0, p); if (p->binfmt) module_put(p->binfmt->module); Index: linux-2.6.19-rc2-mm2/kernel/exit.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/exit.c +++ linux-2.6.19-rc2-mm2/kernel/exit.c @@ -932,14 +932,10 @@ fastcall NORET_TYPE void do_exit(long co tsk->exit_code = code; proc_exit_connector(tsk); exit_notify(tsk); exit_task_namespaces(tsk); -#ifdef CONFIG_NUMA - mpol_free(tsk->mempolicy); - tsk->mempolicy = NULL; -#endif /* * This must happen late, after the PID is not * hashed anymore: */ if (unlikely(!list_empty(&tsk->pi_state_list))) -- |
From: Matt H. <mat...@us...> - 2006-11-03 04:28:03
|
Make the keyring code use a task watcher to initialize and free per-task data. NOTE: We can't make copy_thread_group_keys() in copy_signal() a task watcher because it needs the task's signal field (struct signal_struct). Signed-off-by: Matt Helsley <mat...@us...> Cc: David Howells <dho...@re...> --- include/linux/key.h | 8 -------- kernel/exit.c | 2 -- kernel/fork.c | 6 +----- kernel/sys.c | 8 -------- security/keys/process_keys.c | 19 ++++++++++++------- 5 files changed, 13 insertions(+), 30 deletions(-) Benchmark results: System: 4 1.7GHz ppc64 (Power 4+) processors, 30968600MB RAM, 2.6.19-rc2-mm2 kernel Clone Number of Children Cloned 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17746.8 17923.4 18079.1 18128.9 18182.7 18140.9 Dev 305.931 297.937 287.602 289.916 290.541 278.494 Err (%) 1.72387 1.66228 1.5908 1.5992 1.59789 1.53517 Fork Number of Children Forked 5000 7500 10000 12500 15000 17500 --------------------------------------------------------------------------------------- Mean 17678.6 17872.6 17975.1 18072.5 18166.1 18167.7 Dev 311.175 279.804 293.091 296.378 293.13 292.623 Err (%) 1.76017 1.56555 1.63054 1.63993 1.61361 1.61068 Kernbench: Elapsed: 124.357s User: 439.753s System: 46.582s CPU: 390.6% 439.90user 46.56system 2:04.09elapsed 392%CPU (0avgtext+0avgdata 0maxresident)k 439.71user 46.48system 2:04.23elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.82user 46.71system 2:04.77elapsed 389%CPU (0avgtext+0avgdata 0maxresident)k 439.67user 46.53system 2:04.31elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.80user 46.55system 2:04.10elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.76user 46.54system 2:04.11elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.85user 46.79system 2:04.17elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k 439.65user 46.50system 2:04.63elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.57user 46.55system 2:04.62elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 439.80user 46.61system 2:04.54elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k Index: linux-2.6.19-rc2-mm2/include/linux/key.h =================================================================== --- linux-2.6.19-rc2-mm2.orig/include/linux/key.h +++ linux-2.6.19-rc2-mm2/include/linux/key.h @@ -335,18 +335,14 @@ extern void keyring_replace_payload(stru */ extern struct key root_user_keyring, root_session_keyring; extern int alloc_uid_keyring(struct user_struct *user, struct task_struct *ctx); extern void switch_uid_keyring(struct user_struct *new_user); -extern int copy_keys(unsigned long clone_flags, struct task_struct *tsk); extern int copy_thread_group_keys(struct task_struct *tsk); -extern void exit_keys(struct task_struct *tsk); extern void exit_thread_group_keys(struct signal_struct *tg); extern int suid_keys(struct task_struct *tsk); extern int exec_keys(struct task_struct *tsk); -extern void key_fsuid_changed(struct task_struct *tsk); -extern void key_fsgid_changed(struct task_struct *tsk); extern void key_init(void); #define __install_session_keyring(tsk, keyring) \ ({ \ struct key *old_session = tsk->signal->session_keyring; \ @@ -365,18 +361,14 @@ extern void key_init(void); #define key_ref_to_ptr(k) ({ NULL; }) #define is_key_possessed(k) 0 #define alloc_uid_keyring(u,c) 0 #define switch_uid_keyring(u) do { } while(0) #define __install_session_keyring(t, k) ({ NULL; }) -#define copy_keys(f,t) 0 #define copy_thread_group_keys(t) 0 -#define exit_keys(t) do { } while(0) #define exit_thread_group_keys(tg) do { } while(0) #define suid_keys(t) do { } while(0) #define exec_keys(t) do { } while(0) -#define key_fsuid_changed(t) do { } while(0) -#define key_fsgid_changed(t) do { } while(0) #define key_init() do { } while(0) /* Initial keyrings */ extern struct key root_user_keyring; extern struct key root_session_keyring; Index: linux-2.6.19-rc2-mm2/kernel/fork.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/fork.c +++ linux-2.6.19-rc2-mm2/kernel/fork.c @@ -1070,14 +1070,12 @@ static struct task_struct *copy_process( goto bad_fork_cleanup_fs; if ((retval = copy_signal(clone_flags, p))) goto bad_fork_cleanup_sighand; if ((retval = copy_mm(clone_flags, p))) goto bad_fork_cleanup_signal; - if ((retval = copy_keys(clone_flags, p))) - goto bad_fork_cleanup_mm; if ((retval = copy_namespaces(clone_flags, p))) - goto bad_fork_cleanup_keys; + goto bad_fork_cleanup_mm; retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs); if (retval) goto bad_fork_cleanup_namespaces; p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL; @@ -1219,12 +1217,10 @@ static struct task_struct *copy_process( proc_fork_connector(p); return p; bad_fork_cleanup_namespaces: exit_task_namespaces(p); -bad_fork_cleanup_keys: - exit_keys(p); bad_fork_cleanup_mm: if (p->mm) mmput(p->mm); bad_fork_cleanup_signal: cleanup_signal(p); Index: linux-2.6.19-rc2-mm2/security/keys/process_keys.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/security/keys/process_keys.c +++ linux-2.6.19-rc2-mm2/security/keys/process_keys.c @@ -15,10 +15,11 @@ #include <linux/slab.h> #include <linux/keyctl.h> #include <linux/fs.h> #include <linux/err.h> #include <linux/mutex.h> +#include <linux/task_watchers.h> #include <asm/uaccess.h> #include "internal.h" /* session keyring create vs join semaphore */ static DEFINE_MUTEX(key_session_mutex); @@ -276,11 +277,11 @@ int copy_thread_group_keys(struct task_s /*****************************************************************************/ /* * copy the keys for fork */ -int copy_keys(unsigned long clone_flags, struct task_struct *tsk) +static int copy_keys(unsigned long clone_flags, struct task_struct *tsk) { key_check(tsk->thread_keyring); key_check(tsk->request_key_auth); /* no thread keyring yet */ @@ -290,10 +291,11 @@ int copy_keys(unsigned long clone_flags, key_get(tsk->request_key_auth); return 0; } /* end copy_keys() */ +task_watcher_func(init, copy_keys); /*****************************************************************************/ /* * dispose of thread group keys upon thread group destruction */ @@ -306,16 +308,17 @@ void exit_thread_group_keys(struct signa /*****************************************************************************/ /* * dispose of per-thread keys upon thread exit */ -void exit_keys(struct task_struct *tsk) +static int exit_keys(unsigned long exit_code, struct task_struct *tsk) { key_put(tsk->thread_keyring); key_put(tsk->request_key_auth); - + return 0; } /* end exit_keys() */ +task_watcher_func(free, exit_keys); /*****************************************************************************/ /* * deal with execve() */ @@ -356,35 +359,37 @@ int suid_keys(struct task_struct *tsk) /*****************************************************************************/ /* * the filesystem user ID changed */ -void key_fsuid_changed(struct task_struct *tsk) +static int key_fsuid_changed(unsigned long ignored, struct task_struct *tsk) { /* update the ownership of the thread keyring */ if (tsk->thread_keyring) { down_write(&tsk->thread_keyring->sem); tsk->thread_keyring->uid = tsk->fsuid; up_write(&tsk->thread_keyring->sem); } - + return 0; } /* end key_fsuid_changed() */ +task_watcher_func(uid, key_fsuid_changed); /*****************************************************************************/ /* * the filesystem group ID changed */ -void key_fsgid_changed(struct task_struct *tsk) +static int key_fsgid_changed(unsigned long ignored, struct task_struct *tsk) { /* update the ownership of the thread keyring */ if (tsk->thread_keyring) { down_write(&tsk->thread_keyring->sem); tsk->thread_keyring->gid = tsk->fsgid; up_write(&tsk->thread_keyring->sem); } - + return 0; } /* end key_fsgid_changed() */ +task_watcher_func(gid, key_fsgid_changed); /*****************************************************************************/ /* * search the process keyrings for the first matching key * - we use the supplied match function to see if the description (or other Index: linux-2.6.19-rc2-mm2/kernel/exit.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/exit.c +++ linux-2.6.19-rc2-mm2/kernel/exit.c @@ -12,11 +12,10 @@ #include <linux/capability.h> #include <linux/completion.h> #include <linux/personality.h> #include <linux/tty.h> #include <linux/mnt_namespace.h> -#include <linux/key.h> #include <linux/security.h> #include <linux/cpu.h> #include <linux/acct.h> #include <linux/tsacct_kern.h> #include <linux/file.h> @@ -919,11 +918,10 @@ fastcall NORET_TYPE void do_exit(long co if (group_dead) acct_process(); __exit_files(tsk); __exit_fs(tsk); exit_thread(); - exit_keys(tsk); if (group_dead && tsk->signal->leader) disassociate_ctty(1); module_put(task_thread_info(tsk)->exec_domain->module); Index: linux-2.6.19-rc2-mm2/kernel/sys.c =================================================================== --- linux-2.6.19-rc2-mm2.orig/kernel/sys.c +++ linux-2.6.19-rc2-mm2/kernel/sys.c @@ -957,11 +957,10 @@ asmlinkage long sys_setregid(gid_t rgid, (egid != (gid_t) -1 && egid != old_rgid)) current->sgid = new_egid; current->fsgid = new_egid; current->egid = new_egid; current->gid = new_rgid; - key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } @@ -993,11 +992,10 @@ asmlinkage long sys_setgid(gid_t gid) current->egid = current->fsgid = gid; } else return -EPERM; - key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } @@ -1082,11 +1080,10 @@ asmlinkage long sys_setreuid(uid_t ruid, if (ruid != (uid_t) -1 || (euid != (uid_t) -1 && euid != old_ruid)) current->suid = current->euid; current->fsuid = current->euid; - key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RE); } @@ -1130,11 +1127,10 @@ asmlinkage long sys_setuid(uid_t uid) smp_wmb(); } current->fsuid = current->euid = uid; current->suid = new_suid; - key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_ID); } @@ -1179,11 +1175,10 @@ asmlinkage long sys_setresuid(uid_t ruid } current->fsuid = current->euid; if (suid != (uid_t) -1) current->suid = suid; - key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); return security_task_post_setuid(old_ruid, old_euid, old_suid, LSM_SETID_RES); } @@ -1232,11 +1227,10 @@ asmlinkage long sys_setresgid(gid_t rgid if (rgid != (gid_t) -1) current->gid = rgid; if (sgid != (gid_t) -1) current->sgid = sgid; - key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); return 0; } @@ -1274,11 +1268,10 @@ asmlinkage long sys_setfsuid(uid_t uid) smp_wmb(); } current->fsuid = uid; } - key_fsuid_changed(current); proc_id_connector(current, PROC_EVENT_UID); notify_task_watchers(WATCH_TASK_UID, 0, current); security_task_post_setuid(old_fsuid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS); @@ -1302,11 +1295,10 @@ asmlinkage long sys_setfsgid(gid_t gid) if (gid != old_fsgid) { current->mm->dumpable = suid_dumpable; smp_wmb(); } current->fsgid = gid; - key_fsgid_changed(current); proc_id_connector(current, PROC_EVENT_GID); notify_task_watchers(WATCH_TASK_GID, 0, current); } return old_fsgid; } -- |
From: Paul J. <pj...@sg...> - 2006-11-03 08:57:59
|
Matt wrote: > Task watchers is primarily useful to existing kernel code as a means of making > the code in fork and exit more readable. I don't get it. The benchmark data isn't explained in plain English what it means, that I could find, so I am just guessing. But looking at the last (17500) column of the fork results, after applying patch 1/9, I see a number of 18565, and looking at that same column in patch 9/9, I see a number of 18142. I guess that means a drop of (18565 - 18142 / 18565) == 2% in the fork rate, to make the code "more readable". And I'm not even sure it makes it more readable. Looks to me like another layer of apparatus, which is one more thing to figure out before a reader understands what is going on. I'd gladly put in a few long days to improve the fork rate 2%, and I am grateful to those who have already done so - whoever they are. Somewhere I must have missed the memo explaining why this patch is a good idea - sorry. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.925.600.0401 |
From: Matt H. <mat...@us...> - 2006-11-03 22:55:14
|
On Fri, 2006-11-03 at 00:57 -0800, Paul Jackson wrote: > Matt wrote: > > Task watchers is primarily useful to existing kernel code as a means of making > > the code in fork and exit more readable. > > I don't get it. The benchmark data isn't explained in plain English Sorry, there were no units in the per-patch fork and clone data. Units there are in tasks created per second. The kernbench units are in place and should be fairly self-explanatory I think. Here's what I did: Measure the time it takes to fork N times. Retry 100 times. Try different N. Try clone instead of fork to see how different the results can be. Then run kernbench. Do the above after applying each patch. Then compare to the previous patch (or unpatched source). Run statistics on the numbers. > what it means, that I could find, so I am just guessing. But looking > at the last (17500) column of the fork results, after applying patch > 1/9, I see a number of 18565, and looking at that same column in patch > 9/9, I see a number of 18142. > > I guess that means a drop of (18565 - 18142 / 18565) == 2% in the fork > rate, to make the code "more readable". Well, it's a worst-case scenario. Without the patches I've seen the fork rate intermittently (once every 300 samples) drop to 16k forks/sec -- a much bigger drop than 2%. I also ran the tests on Andrew's hotfix patches for rc2-mm2 and got similar differences even though the patches don't change the fork path. And finally, don't forget to compare that to the error -- about +/-1.6%. So on an absolute worst-case workload we could have a drop anywhere from 0.4 to 3.6%. To get a better idea of the normal impact of these patches I think you have to look at benchmarks more like kernbench since it's not comprised entirely of fork calls. There the measurements are easily within the error margins with or without the patches. Unfortunately the differences I get always seem to be right around the size of the error. I can't seem to get a benchmark to have an error of 1% or less. I'm open to suggestions of different benchmarks or how to obtain tighter bounds on the measurements (e.g. /proc knobs to fiddle with). > And I'm not even sure it makes it more readable. Looks to me like another > layer of apparatus, which is one more thing to figure out before a reader > understands what is going on. It's nice to see a module's init function with the rest of the module and not cluttering up the kernel's module loading code. The use, benefits, disadvantages, and even the implementation of task watchers are similar. I could rename it (task_init(), task_exit(), etc.) to make the similarity more apparent. > I'd gladly put in a few long days to improve the fork rate 2%, and I am > grateful to those who have already done so - whoever they are. I'm open to suggestions on how to improve the performance. :) > Somewhere I must have missed the memo explaining why this patch is a > good idea - sorry. Well, it should make things look cleaner. It's also intended to be useful in new code like containers and resource management -- pieces many people don't want to pay attention to in those paths. Cheers, -Matt Helsley |
From: Daniel W. <dw...@mv...> - 2006-11-03 15:22:48
|
On Thu, 2006-11-02 at 20:22 -0800, Matt Helsley wrote: > +/* > + * Watch for events occuring within a task and call the supplied > function > + * when (and only when) the given event happens. > + * Only non-modular kernel code may register functions as > task_watchers. > + */ > +#define task_watcher_func(ev, fn) \ > +static task_watcher_fn __task_watcher_##ev##_##fn __attribute_used__ > \ > + __attribute__ ((__section__ (".task_watchers." #ev))) = fn > +#else > +#error "task_watcher() macro may not be used in modules." > +#endif You should make this TASK_WATCHER_FUNC() or even just TASK_WATCHER(). It looks a little goofy in the code that uses it. Looking at it now could you do something like, static int __task_watcher_init audit_alloc(unsigned long val, struct task_struct *tsk) Instead of a macro? Might be a little less invasive. Daniel |
From: Matt H. <mat...@us...> - 2006-11-04 00:43:27
|
On Fri, 2006-11-03 at 08:22 -0500, Daniel Walker wrote: > On Thu, 2006-11-02 at 20:22 -0800, Matt Helsley wrote: > > +/* > > + * Watch for events occuring within a task and call the supplied > > function > > + * when (and only when) the given event happens. > > + * Only non-modular kernel code may register functions as > > task_watchers. > > + */ > > +#define task_watcher_func(ev, fn) \ > > +static task_watcher_fn __task_watcher_##ev##_##fn __attribute_used__ > > \ > > + __attribute__ ((__section__ (".task_watchers." #ev))) = fn > > +#else > > +#error "task_watcher() macro may not be used in modules." > > +#endif > > You should make this TASK_WATCHER_FUNC() or even just TASK_WATCHER(). It > looks a little goofy in the code that uses it. I can certainly change this. In my defense I didn't capitalize it because very similar macros in init.h were not capitalized. For example: #define core_initcall(fn) __define_initcall("1",fn) #define postcore_initcall(fn) __define_initcall("2",fn) #define arch_initcall(fn) __define_initcall("3",fn) #define subsys_initcall(fn) __define_initcall("4",fn) #define fs_initcall(fn) __define_initcall("5",fn) #define device_initcall(fn) __define_initcall("6",fn) #define late_initcall(fn) __define_initcall("7",fn) setup_param, early_param, module_init, etc. do not use all-caps. And I'm sure that's not all. All of these declare variables and assign them attributes and values. > Looking at it now could you do something like, > > static int __task_watcher_init > audit_alloc(unsigned long val, struct task_struct *tsk) > > Instead of a macro? Might be a little less invasive. I like your suggestion. However, I don't see how such a macro could be made to replace the current macro. I need to be able to call every init function during task initialization. The current macro creates and initializes a function pointer in an array in the special ELF section. This allows the notify_task_watchers function to traverse the array and make calls to the init functions. I use the name of the function and event to name and intialize the function pointer. I don't see any way to get the name of the function without taking a parameter. This also means it would have to be initialized after the function was declared or defined. I considered placing the function code in the ELF section. However I don't know of any gcc or linker functions that would allow me to iterate over all of the functions in an ELF section and call them from fork, exec, exit, etc. I've even looked through the docs and googled. I considered doing symbol lookups. Part of the problem is knowing the names I need to look up. Furthermore, I think doing symbol lookups for each call would be alot slower. I could create a dynamically-allocated array and put the lookup results there. However that's more code and more memory... However, your suggestion could put all of the functions near each other. That locality could improve performance. So I'll try adding __task_watcher_<event> macros but I can't see a way to make them work as you suggested. Cheers, -Matt Helsley |
From: Daniel W. <dw...@mv...> - 2006-11-04 01:13:37
|
On Fri, 2006-11-03 at 16:43 -0800, Matt Helsley wrote: > I can certainly change this. In my defense I didn't capitalize it > because very similar macros in init.h were not capitalized. For example: > > #define core_initcall(fn) __define_initcall("1",fn) > #define postcore_initcall(fn) __define_initcall("2",fn) > #define arch_initcall(fn) __define_initcall("3",fn) > #define subsys_initcall(fn) __define_initcall("4",fn) > #define fs_initcall(fn) __define_initcall("5",fn) > #define device_initcall(fn) __define_initcall("6",fn) > #define late_initcall(fn) __define_initcall("7",fn) > > setup_param, early_param, module_init, etc. do not use all-caps. And I'm > sure that's not all. True .. It's not mandatory. The reason that I mentioned it is because it looked like a function was being called outside a function block, which looks odd to me. I think I overlook the initcall functions because I see them so often I know what they are. > All of these declare variables and assign them attributes and values. > > > Looking at it now could you do something like, > > > > static int __task_watcher_init > > audit_alloc(unsigned long val, struct task_struct *tsk) > > > > Instead of a macro? Might be a little less invasive. > > I like your suggestion. However, I don't see how such a macro could be > made to replace the current macro. > > I need to be able to call every init function during task > initialization. The current macro creates and initializes a function > pointer in an array in the special ELF section. This allows the > notify_task_watchers function to traverse the array and make calls to > the init functions. You get an "A" for research. I didn't notice you actually declare a variable inside the macro. I thought it was only setting a section attribute. You right, I don't see how you could call the functions in the section without the variable declared. ( besides that's exactly how the initcalls work. ) Daniel |
From: Matt H. <mat...@us...> - 2006-11-05 00:12:54
|
On Fri, 2006-11-03 at 17:13 -0800, Daniel Walker wrote: > On Fri, 2006-11-03 at 16:43 -0800, Matt Helsley wrote: > > > I can certainly change this. In my defense I didn't capitalize it > > because very similar macros in init.h were not capitalized. For example: > > > > #define core_initcall(fn) __define_initcall("1",fn) > > #define postcore_initcall(fn) __define_initcall("2",fn) > > #define arch_initcall(fn) __define_initcall("3",fn) > > #define subsys_initcall(fn) __define_initcall("4",fn) > > #define fs_initcall(fn) __define_initcall("5",fn) > > #define device_initcall(fn) __define_initcall("6",fn) > > #define late_initcall(fn) __define_initcall("7",fn) > > > > setup_param, early_param, module_init, etc. do not use all-caps. And I'm > > sure that's not all. > > True .. It's not mandatory. The reason that I mentioned it is because it > looked like a function was being called outside a function block, which > looks odd to me. I think I overlook the initcall functions because I see > them so often I know what they are. This is a good point -- it does look odd. I'm considering: DEFINE_TASK_INITCALL(audit_alloc); With others like: DEFINE_TASK_EXITCALL() DEFINE_TASK_CLONECALL() etc. That resembles other macros which create variables. Though I'm not sure this patten is appropriate because these variables should not be used by name. Seems that no matter what something about it is going to be unusual. :) > > All of these declare variables and assign them attributes and values. > > > > > Looking at it now could you do something like, > > > > > > static int __task_watcher_init > > > audit_alloc(unsigned long val, struct task_struct *tsk) > > > > > > Instead of a macro? Might be a little less invasive. > > > > I like your suggestion. However, I don't see how such a macro could be > > made to replace the current macro. > > > > I need to be able to call every init function during task > > initialization. The current macro creates and initializes a function > > pointer in an array in the special ELF section. This allows the > > notify_task_watchers function to traverse the array and make calls to > > the init functions. > > > You get an "A" for research. I didn't notice you actually declare a Thanks! <snip> Cheers, -Matt Helsley |