From: Kaigai K. <ka...@ak...> - 2005-01-27 12:46:53
|
[1/3] linux-2.6.11-rc2-mm1-pagg.patch This patch modifies linux-2.6.10-pagg.patch-4 for 2.6.11-rc2-mm1. We can't apply the original PAGG patch to -mm kernel completely, hence I fixed up it. -- Linux Promotion Center, NEC KaiGai Kohei <ka...@ak...> diff -rpNU3 linux-2.6.11-rc2-mm1/Documentation/pagg.txt linux-2.6.11-rc2-mm1.pagg/Documentation/pagg.txt --- linux-2.6.11-rc2-mm1/Documentation/pagg.txt 1970-01-01 09:00:00.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/Documentation/pagg.txt 2005-01-25 15:13:18.000000000 +0900 @@ -0,0 +1,32 @@ +Linux Process Aggregates (PAGG) +------------------------------- + +The process aggregates infrastructure, or PAGG, provides a generalized +mechanism for providing arbitrary process groups in Linux. PAGG consists +of a series of functions for registering and unregistering support +for new types of process aggregation containers with the kernel. +This is similar to the support currently provided within Linux that +allows for dynamic support of filesystems, block and character devices, +symbol tables, network devices, serial devices, and execution domains. +This implementation of PAGG provides developers the basic hooks necessary +to implement kernel modules for specific process containers, such as +the job container. + +The do_fork function in the kernel was altered to support PAGG. If a +process is attached to any PAGG containers and subsequently forks a +child process, the child process will also be attached to the same PAGG +containers. The PAGG containers involved during the fork are notified +that a new process has been attached. The notification is accomplished +via a callback function provided by the PAGG module. + +The do_exit function in the kernel has also been altered. If a process +is attached to any PAGG containers and that process is exiting, the PAGG +containers are notified that a process has detached from the container. +The notification is accomplished via a callback function provided by +the PAGG module. + +The sys_execve function has been modified to support an optional callout +that can be run when a process in a pagg list does an exec. It can be +used, for example, by other kernel modules that wish to do advanced CPU +placement on multi-processor systems (just one example). + diff -rpNU3 linux-2.6.11-rc2-mm1/fs/exec.c linux-2.6.11-rc2-mm1.pagg/fs/exec.c --- linux-2.6.11-rc2-mm1/fs/exec.c 2005-01-25 14:56:17.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/fs/exec.c 2005-01-25 15:13:18.000000000 +0900 @@ -49,6 +49,7 @@ #include <linux/rmap.h> #include <linux/acct.h> #include <linux/ltt-events.h> +#include <linux/pagg.h> #include <asm/uaccess.h> #include <asm/mmu_context.h> @@ -1192,6 +1193,7 @@ int do_execve(char * filename, retval = search_binary_handler(bprm,regs); if (retval >= 0) { free_arg_pages(bprm); + pagg_exec(current); /* execve success */ security_bprm_free(bprm); diff -rpNU3 linux-2.6.11-rc2-mm1/include/linux/init_task.h linux-2.6.11-rc2-mm1.pagg/include/linux/init_task.h --- linux-2.6.11-rc2-mm1/include/linux/init_task.h 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/include/linux/init_task.h 2005-01-25 15:37:43.000000000 +0900 @@ -2,6 +2,7 @@ #define _LINUX__INIT_TASK_H #include <linux/file.h> +#include <linux/pagg.h> #define INIT_FILES \ { \ @@ -112,6 +113,7 @@ extern struct group_info init_groups; .switch_lock = SPIN_LOCK_UNLOCKED, \ .journal_info = NULL, \ .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \ + INIT_TASK_PAGG(tsk) \ .private_pages = LIST_HEAD_INIT(tsk.private_pages), \ .private_pages_count = 0, \ } diff -rpNU3 linux-2.6.11-rc2-mm1/include/linux/pagg.h linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h --- linux-2.6.11-rc2-mm1/include/linux/pagg.h 1970-01-01 09:00:00.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h 2005-01-25 15:13:18.000000000 +0900 @@ -0,0 +1,223 @@ +/* + * PAGG (Process Aggregates) interface + * + * + * Copyright (c) 2000-2002, 2004 Silicon Graphics, Inc. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + * + * For further information regarding this notice, see: + * + * http://oss.sgi.com/projects/GenInfo/NoticeExplan + */ + +/* + * Data structure definitions and function prototypes used to implement + * process aggregates (paggs). + * + * Paggs provides a generalized way to implement process groupings or + * containers. Modules use these functions to register with the kernel as + * providers of process aggregation containers. The pagg data structures + * define the callback functions and data access pointers back into the + * pagg modules. + */ + +#ifndef _LINUX_PAGG_H +#define _LINUX_PAGG_H + +#include <linux/sched.h> + +#ifdef CONFIG_PAGG + +#define PAGG_NAMELN 32 /* Max chars in PAGG module name */ + + +/** + * INIT_PAGG_LIST - used to initialize a pagg_list structure after declaration + * @_l: Task struct to init the pagg_list and semaphore in + * + */ +#define INIT_PAGG_LIST(_l) \ +do { \ + INIT_LIST_HEAD(&(_l)->pagg_list); \ + init_rwsem(&(_l)->pagg_sem); \ +} while(0) + + +/* + * Used by task_struct to manage list of pagg attachments for the process. + * Each pagg provides the link between the process and the + * correct pagg container. + * + * STRUCT MEMBERS: + * hook: Reference to pagg module structure. That struct + * holds the name key and function pointers. + * data: Opaque data pointer - defined by pagg modules. + * entry: List pointers + */ +struct pagg { + struct pagg_hook *hook; + void *data; + struct list_head entry; +}; + +/* + * Used by pagg modules to define the callback functions into the + * module. + * + * STRUCT MEMBERS: + * name: The name of the pagg container type provided by + * the module. This will be set by the pagg module. + * attach: Function pointer to function used when attaching + * a process to the pagg container referenced by + * this struct. + * Return codes from the attach function pointer have + * These meanings: + * <0 Error which is propagated back to copy_process so + * the fork fails. + * =0 success, attach to same container as parent + * >0 success, but don't attach to a container + * + * detach: Function pointer to function used when detaching + * a process to the pagg container referenced by + * this struct. + * init: Function pointer to initialization function. This + * function is used when the module is loaded to attach + * existing processes to a default container as defined by + * the pagg module. This is optional and may be set to + * NULL if it is not needed by the pagg module. + * + * Note: The return values are managed the same way as in + * attach above. Except, of course, an error doesn't + * result in a fork failure. + * + * Note: The implementation of pagg_hook_register causes + * us to evaluate some tasks more than once in some cases. + * See the comments in pagg_hook_register for why. + * Therefore, if the init function pointer returns >0, + * which means that it doesn't want a pagg association, + * that init function must be prepared to possibly look at + * the same "skipped" task more than once. + * + * data: Opaque data pointer - defined by pagg modules. + * module: Pointer to kernel module struct. Used to increment & + * decrement the use count for the module. + * entry: List pointers + * exec: Function pointer to function used when a process + * in the pagg container exec's a new process. This + * is optional and may be set to NULL if it is not + * needed by the pagg module. + * refcnt: Keep track of user count of the pagg hook + */ +struct pagg_hook { + struct module *module; + char *name; /* Name Key - restricted to 32 characters */ + void *data; /* Opaque module specific data */ + struct list_head entry; /* List pointers */ + atomic_t refcnt; /* usage counter */ + int (*init)(struct task_struct *, struct pagg *); + int (*attach)(struct task_struct *, struct pagg *, void*); + void (*detach)(struct task_struct *, struct pagg *); + void (*exec)(struct task_struct *, struct pagg *); +}; + + +/* Kernel service functions for providing PAGG support */ +extern struct pagg *pagg_get(struct task_struct *task, char *key); +extern struct pagg *pagg_alloc(struct task_struct *task, + struct pagg_hook *pt); +extern void pagg_free(struct pagg *pagg); +extern int pagg_hook_register(struct pagg_hook *pt_new); +extern int pagg_hook_unregister(struct pagg_hook *pt_old); +extern int __pagg_attach(struct task_struct *to_task, + struct task_struct *from_task); +extern void __pagg_detach(struct task_struct *task); +extern int __pagg_exec(struct task_struct *task); + +/** + * pagg_attach - child inherits attachment to pagg containers of its parent + * @child: child task - to inherit + * @parent: parenet task - child inherits pagg containers from this parent + * + * function used when a child process must inherit attachment to pagg + * containers from the parent. Return code is propagated as a fork fail. + * + */ +static inline int pagg_attach(struct task_struct *child, + struct task_struct *parent) +{ + INIT_PAGG_LIST(child); + if (!list_empty(&parent->pagg_list)) + return __pagg_attach(child, parent); + + return 0; +} + + +/** + * pagg_detach - Detach a process from a pagg container it is a member of + * @task: The task the pagg will be detached from + * + */ +static inline void pagg_detach(struct task_struct *task) +{ + if (!list_empty(&task->pagg_list)) + __pagg_detach(task); +} + +/** + * pagg_exec - Used when a process exec's + * @task: The process doing the exec + * + */ +static inline void pagg_exec(struct task_struct *task) +{ + if (!list_empty(&task->pagg_list)) + __pagg_exec(task); +} + +/** + * INIT_TASK_PAGG - Used in INIT_TASK to set the head and sem of pagg_list + * @tsk: The task work with + * + * Marco Used in INIT_TASK to set the head and sem of pagg_list. + * If CONFIG_PAGG is off, it is defined as an empty macro below. + * + */ +#define INIT_TASK_PAGG(tsk) \ + .pagg_list = LIST_HEAD_INIT(tsk.pagg_list), \ + .pagg_sem = __RWSEM_INITIALIZER(tsk.pagg_sem), + +#else /* CONFIG_PAGG */ + +/* + * Replacement macros used when PAGG (Process Aggregates) support is not + * compiled into the kernel. + */ +#define INIT_TASK_PAGG(tsk) +#define INIT_PAGG_LIST(l) do { } while(0) +#define pagg_attach(ct, pt) ({ 0; }) +#define pagg_detach(t) do { } while(0) +#define pagg_exec(t) do { } while(0) + +#endif /* CONFIG_PAGG */ + +#endif /* _LINUX_PAGG_H */ diff -rpNU3 linux-2.6.11-rc2-mm1/include/linux/sched.h linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h --- linux-2.6.11-rc2-mm1/include/linux/sched.h 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h 2005-01-25 15:36:35.000000000 +0900 @@ -729,6 +729,12 @@ struct task_struct { int cpuset_mems_generation; #endif +#ifdef CONFIG_PAGG +/* List of pagg (process aggregate) attachments */ + struct list_head pagg_list; + struct rw_semaphore pagg_sem; +#endif + struct list_head private_pages; /* per-process private pages */ int private_pages_count; }; diff -rpNU3 linux-2.6.11-rc2-mm1/init/Kconfig linux-2.6.11-rc2-mm1.pagg/init/Kconfig --- linux-2.6.11-rc2-mm1/init/Kconfig 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/init/Kconfig 2005-01-25 15:13:18.000000000 +0900 @@ -138,6 +138,14 @@ config BSD_PROCESS_ACCT_V3 for processing it. A preliminary version of these tools is available at <http://www.physik3.uni-rostock.de/tim/kernel/utils/acct/>. +config PAGG + bool "Support for process aggregates (PAGGs)" + help + Say Y here if you will be loading modules which provide support + for process aggregate containers. Examples of such modules include the + Linux Jobs module and the Linux Array Sessions module. If you will not + be using such modules, say N. + config SYSCTL bool "Sysctl support" ---help--- diff -rpNU3 linux-2.6.11-rc2-mm1/kernel/Makefile linux-2.6.11-rc2-mm1.pagg/kernel/Makefile --- linux-2.6.11-rc2-mm1/kernel/Makefile 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/kernel/Makefile 2005-01-25 15:13:18.000000000 +0900 @@ -21,6 +21,7 @@ obj-$(CONFIG_KEXEC) += kexec.o obj-$(CONFIG_LTT) += ltt-core.o obj-$(CONFIG_COMPAT) += compat.o obj-$(CONFIG_CPUSETS) += cpuset.o +obj-$(CONFIG_PAGG) += pagg.o obj-$(CONFIG_IKCONFIG) += configs.o obj-$(CONFIG_IKCONFIG_PROC) += configs.o obj-$(CONFIG_STOP_MACHINE) += stop_machine.o diff -rpNU3 linux-2.6.11-rc2-mm1/kernel/exit.c linux-2.6.11-rc2-mm1.pagg/kernel/exit.c --- linux-2.6.11-rc2-mm1/kernel/exit.c 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/kernel/exit.c 2005-01-25 15:13:18.000000000 +0900 @@ -29,6 +29,7 @@ #include <linux/cpuset.h> #include <linux/perfctr.h> #include <linux/syscalls.h> +#include <linux/pagg.h> #include <asm/uaccess.h> #include <asm/unistd.h> @@ -837,6 +838,9 @@ fastcall NORET_TYPE void do_exit(long co module_put(tsk->binfmt->module); tsk->exit_code = code; + + pagg_detach(tsk); + exit_notify(tsk); #ifdef CONFIG_NUMA mpol_free(tsk->mempolicy); diff -rpNU3 linux-2.6.11-rc2-mm1/kernel/fork.c linux-2.6.11-rc2-mm1.pagg/kernel/fork.c --- linux-2.6.11-rc2-mm1/kernel/fork.c 2005-01-25 14:56:18.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/kernel/fork.c 2005-01-25 15:13:18.000000000 +0900 @@ -42,6 +42,7 @@ #include <linux/rmap.h> #include <linux/acct.h> #include <linux/ltt-events.h> +#include <linux/pagg.h> #include <asm/pgtable.h> #include <asm/pgalloc.h> @@ -131,6 +132,9 @@ void __init fork_init(unsigned long memp init_task.signal->rlim[RLIMIT_NPROC].rlim_cur = max_threads/2; init_task.signal->rlim[RLIMIT_NPROC].rlim_max = max_threads/2; + + /* Initialize the pagg list in pid 0 before it can clone itself. */ + INIT_PAGG_LIST(current); } static struct task_struct *dup_task_struct(struct task_struct *orig) @@ -981,6 +985,15 @@ static task_t *copy_process(unsigned lon sched_fork(p); /* + * call pagg modules to properly attach new process to the same + * process aggregate containers as the parent process. Fail the fork + * on error. + */ + retval = pagg_attach(p, current); + if (retval) + goto bad_fork_cleanup_namespace; + + /* * Ok, make it visible to the rest of the system. * We dont wake it up yet. */ @@ -1087,6 +1100,7 @@ fork_out: return p; bad_fork_cleanup_namespace: + pagg_detach(p); exit_namespace(p); bad_fork_cleanup_keys: exit_keys(p); diff -rpNU3 linux-2.6.11-rc2-mm1/kernel/pagg.c linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c --- linux-2.6.11-rc2-mm1/kernel/pagg.c 1970-01-01 09:00:00.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c 2005-01-25 15:13:18.000000000 +0900 @@ -0,0 +1,496 @@ +/* + * PAGG (Process Aggregates) interface + * + * + * Copyright (c) 2000-2004 Silicon Graphics, Inc. All Rights Reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Contact information: Silicon Graphics, Inc., 1500 Crittenden Lane, + * Mountain View, CA 94043, or: + * + * http://www.sgi.com + */ + +#include <linux/config.h> +#include <linux/slab.h> +#include <linux/sched.h> +#include <linux/module.h> +#include <linux/pagg.h> +#include <asm/semaphore.h> + +/* list of pagg hook entries that reference the "module" implementations */ +static LIST_HEAD(pagg_hook_list); +static DECLARE_RWSEM(pagg_hook_list_sem); + + +/** + * pagg_get - get a pagg given a search key + * @task: We examine the pagg_list from the given task + * @key: Key name of pagg we wish to retrieve + * + * Given a pagg_list list structure, this function will return + * a pointer to the pagg struct that matches the search + * key. If the key is not found, the function will return NULL. + * + * The caller should hold at least a read lock on the pagg_list + * for task using down_read(&task->pagg_list.sem). + * + */ +struct pagg * +pagg_get(struct task_struct *task, char *key) +{ + struct pagg *pagg; + + list_for_each_entry(pagg, &task->pagg_list, entry) { + if (!strcmp(pagg->hook->name,key)) + return pagg; + } + return NULL; +} + + +/** + * pagg_alloc - Insert a new pagg in to the pagg_list for a task + * @task: Task we want to insert the pagg in to + * @pagg_hook: Pagg hook to associate with the new pagg + * + * Given a task and a pagg hook, this function will allocate + * a new pagg structure, initialize the settings, and insert the pagg into + * the pagg_list for the task. + * + * The caller for this function should hold at least a read lock on the + * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be + * removed. If this function was called from the pagg module (usually the + * case), then the caller need not hold this lock. The caller should hold + * a write lock on for the tasks pagg_sem. This can be locked using + * down_write(&task->pagg_sem) + * + */ +struct pagg * +pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) +{ + struct pagg *pagg; + + pagg = kmalloc(sizeof(struct pagg), GFP_KERNEL); + if (!pagg) + return NULL; + + pagg->hook = pagg_hook; + pagg->data = NULL; + atomic_inc(&pagg_hook->refcnt); /* Increase hook's reference count */ + list_add_tail(&pagg->entry, &task->pagg_list); + return pagg; +} + + +/** + * pagg_free - Delete pagg from the list and free its memory + * @pagg: The pagg to free + * + * This function will ensure the pagg is deleted form + * the list of pagg entries for the task. Finally, the memory for the + * pagg is discarded. + * + * The caller of this function should hold a write lock on the pagg_sem + * for the task. This can be locked using down_write(&task->pagg_sem). + * + * Prior to calling pagg_free, the pagg should have been detached from the + * pagg container represented by this pagg. That is usually done using + * p->hook->detach(task, pagg); + * + */ +void +pagg_free(struct pagg *pagg) +{ + atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */ + list_del(&pagg->entry); + kfree(pagg); +} + + +/** + * get_pagg_hook - Get the pagg hook matching the requested name + * @key: The name of the pagg hook to get + * + * Given a pagg hook name key, this function will return a pointer + * to the pagg_hook struct that matches the name. + * + * You should hold either the write or read lock for pagg_hook_list_sem + * before using this function. This will ensure that the pagg_hook_list + * does not change while iterating through the list entries. + * + */ +static struct pagg_hook * +get_pagg_hook(char *key) +{ + struct pagg_hook *pagg_hook; + + list_for_each_entry(pagg_hook, &pagg_hook_list, entry) { + if (!strcmp(pagg_hook->name, key)) { + return pagg_hook; + } + } + return NULL; +} + +/** + * remove_client_paggs_from_all_tasks - Remove all paggs associated with hook + * @php: Pagg hook associated with paggs to purge + * + * Given a pagg hook, this function will remove all paggs associated with that + * pagg hook from all tasks calling the provided function on each pagg. + * + * If there is a detach function associated with the pagg, it is called + * before the pagg is freed. + * + * This is meant to be used by pagg_hook_register and pagg_hook_unregister + * + */ +static void +remove_client_paggs_from_all_tasks(struct pagg_hook *php) +{ + if (php == NULL) + return; + + /* Because of internal race conditions we can't gaurantee + * getting every task in just one pass so we just keep going + * until there are no tasks with paggs from this hook attached. + * The inefficiency of this should be tempered by the fact that this + * happens at most once for each registered client. + */ + while (atomic_read(&php->refcnt) != 0) { + struct task_struct *g = NULL, *p = NULL; + + read_lock(&tasklist_lock); + do_each_thread(g, p) { + struct pagg *paggp; + int task_exited; + + get_task_struct(p); + read_unlock(&tasklist_lock); + down_write(&p->pagg_sem); + paggp = pagg_get(p, php->name); + if (paggp != NULL) { + (void)php->detach(p, paggp); + pagg_free(paggp); + } + up_write(&p->pagg_sem); + read_lock(&tasklist_lock); + + /* If a PAGG got removed from the list while we're going through + * each process, the tasks list for the process would be empty. In + * that case, break out of this for_each_thread so we can do it + * again. */ + task_exited = list_empty(&p->sibling); + put_task_struct(p); + if (task_exited) + goto endloop; + } while_each_thread(g, p); + endloop: + read_unlock(&tasklist_lock); + } +} + +/** + * pagg_hook_register - Register a new pagg hook and enter it the list + * @pagg_hook_new: The new pagg hook to register + * + * Used to register a new pagg hook and enter it into the pagg_hook_list. + * The service name for a pagg hook is restricted to 32 characters. + * + * If an "init()" function is supplied in the hook being registered then a + * pagg will be attached to all existing tasks and the supplied "init()" + * function will be applied to it. If any call to the supplied "init()" + * function returns a non zero result the registration will be aborted. As + * part of the abort process, all paggs belonging to the new client will be + * removed from all tasks and the supplied "detach()" function will be + * called on them. + * + * If a memory error is encountered, the pagg hook is unregistered and any + * tasks that have been attached to the initial pagg container are detached + * from that container. + * + */ +int +pagg_hook_register(struct pagg_hook *pagg_hook_new) +{ + struct pagg_hook *pagg_hook = NULL; + + /* Add new pagg module to access list */ + if (!pagg_hook_new) + return -EINVAL; /* error */ + if (!list_empty(&pagg_hook_new->entry)) + return -EINVAL; /* error */ + if (pagg_hook_new->name == NULL || strlen(pagg_hook_new->name) > PAGG_NAMELN) + return -EINVAL; /* error */ + if (!pagg_hook_new->attach || !pagg_hook_new->detach) + return -EINVAL; /* error */ + + /* Try to insert new hook entry into the pagg hook list */ + down_write(&pagg_hook_list_sem); + + pagg_hook = get_pagg_hook(pagg_hook_new->name); + + if (pagg_hook) { + up_write(&pagg_hook_list_sem); + printk(KERN_WARNING "Attempt to register duplicate" + " PAGG support (name=%s)\n", pagg_hook_new->name); + return -EBUSY; + } + + /* Okay, we can insert into the pagg hook list */ + list_add_tail(&pagg_hook_new->entry, &pagg_hook_list); + /* set the ref count to zero */ + atomic_set(&pagg_hook_new->refcnt, 0); + + /* Now we can call the initializer function (if present) for each task */ + if (pagg_hook_new->init != NULL) { + struct task_struct *g = NULL, *p = NULL; + int init_result = 0; + + /* Because of internal race conditions we can't guarantee + * getting every task in just one pass so we just keep going + * until we don't find any unitialized tasks. The inefficiency + * of this should be tempered by the fact that this happens + * at most once for each registered client. + */ + read_lock(&tasklist_lock); + repeat: + do_each_thread(g, p) { + struct pagg *paggp; + int task_exited; + + get_task_struct(p); + read_unlock(&tasklist_lock); + down_write(&p->pagg_sem); + paggp = pagg_get(p, pagg_hook_new->name); + if (!paggp && !(p->flags & PF_EXITING)) { + paggp = pagg_alloc(p, pagg_hook_new); + if (paggp != NULL) { + init_result = pagg_hook_new->init(p, paggp); + + /* Success, but init function pointer doesn't want grouping */ + if (init_result > 0) + pagg_free(paggp); + } + else + init_result = -ENOMEM; + } + up_write(&p->pagg_sem); + read_lock(&tasklist_lock); + /* Like in remove_client_paggs_from_all_tasks, if the task + * disappeared on us while we were going through the + * for_each_thread loop, we need to start over with that loop. + * That's why we have the list_empty here */ + task_exited = list_empty(&p->sibling); + put_task_struct(p); + if (init_result < 0) + goto endloop; + if (task_exited) + goto repeat; + } while_each_thread(g, p); + endloop: + read_unlock(&tasklist_lock); + + /* + * if anything went wrong during initialisation abandon the + * registration process + */ + if (init_result < 0) { + remove_client_paggs_from_all_tasks(pagg_hook_new); + list_del_init(&pagg_hook_new->entry); + up_write(&pagg_hook_list_sem); + + printk(KERN_WARNING "Registering PAGG support for" + " (name=%s) failed\n", pagg_hook_new->name); + + return init_result; /* hook init function error result */ + } + } + + up_write(&pagg_hook_list_sem); + + printk(KERN_INFO "Registering PAGG support for (name=%s)\n", + pagg_hook_new->name); + + return 0; /* success */ + +} + +/** + * pagg_hook_unregister - Unregister pagg hook and remove it from the list + * @pagg_hook_old: The hook to unregister and remove + * + * Used to unregister pagg hooks and remove them from the pagg_hook_list. + * Once the pagg hook entry in the pagg_hook_list is found, paggs associated + * with the hook (if any) will have their detach function called and will + * be detached. + * + */ +int +pagg_hook_unregister(struct pagg_hook *pagg_hook_old) +{ + struct pagg_hook *pagg_hook; + + /* Check the validity of the arguments */ + if (!pagg_hook_old) + return -EINVAL; /* error */ + if (list_empty(&pagg_hook_old->entry)) + return -EINVAL; /* error */ + if (pagg_hook_old->name == NULL) + return -EINVAL; /* error */ + + down_write(&pagg_hook_list_sem); + + pagg_hook = get_pagg_hook(pagg_hook_old->name); + + if (pagg_hook && pagg_hook == pagg_hook_old) { + remove_client_paggs_from_all_tasks(pagg_hook); + list_del_init(&pagg_hook->entry); + up_write(&pagg_hook_list_sem); + + printk(KERN_INFO "Unregistering PAGG support for" + " (name=%s)\n", pagg_hook_old->name); + + return 0; /* success */ + } + + up_write(&pagg_hook_list_sem); + + printk(KERN_WARNING "Attempt to unregister PAGG support (name=%s)" + " failed - not found\n", pagg_hook_old->name); + + return -EINVAL; /* error */ +} + + +/** + * __pagg_attach - Attach a new task to the same containers of its parent + * @to_task: The child task that will inherit the parent's containers + * @from_task: The parent task + * + * Used to attach a new task to the same pagg containers to which it's parent + * is attached. + * + * The "from" argument is the parent task. The "to" argument is the child + * task. + * + * See the attach decription in linux/include/linux/pagg.h for details on + * how to handle return codes from the attach function pointer. + * + */ +int +__pagg_attach(struct task_struct *to_task, struct task_struct *from_task) +{ + struct pagg *from_pagg; + int ret; + + /* lock the parents pagg_list we are copying from */ + down_read(&from_task->pagg_sem); /* read lock the pagg list */ + + list_for_each_entry(from_pagg, &from_task->pagg_list, entry) { + struct pagg *to_pagg = NULL; + + to_pagg = pagg_alloc(to_task, from_pagg->hook); + if (!to_pagg) { + ret=-ENOMEM; + goto error_return; + } + ret = to_pagg->hook->attach(to_task, to_pagg, from_pagg->data); + + if (ret < 0) { + /* Propagates to copy_process as a fork failure */ + goto error_return; + } + else if (ret > 0) { + /* Success, but attach function pointer doesn't want grouping */ + pagg_free(to_pagg); + } + } + + up_read(&from_task->pagg_sem); /* unlock the pagg list */ + + return 0; /* success */ + + error_return: + /* + * Clean up all the pagg attachments made on behalf of the new + * task. Set new task pagg ptr to NULL for return. + */ + up_read(&from_task->pagg_sem); /* unlock the pagg list */ + __pagg_detach(to_task); + return ret; /* failure */ +} + +/** + * __pagg_detach - Detach a task from all pagg containers it is attached to + * @task: Task to detach from pagg containers + * + * Used to detach a task from all pagg containers to which it is attached. + * + */ +void +__pagg_detach(struct task_struct *task) +{ + struct pagg *pagg; + struct pagg *paggtmp; + + /* Remove ref. to paggs from task immediately */ + down_write(&task->pagg_sem); /* write lock pagg list */ + + list_for_each_entry_safe(pagg, paggtmp, &task->pagg_list, entry) { + pagg->hook->detach(task, pagg); + pagg_free(pagg); + } + + up_write(&task->pagg_sem); /* write unlock the pagg list */ + + return; /* 0 = success, else return last code for failure */ +} + + +/** + * __pagg_exec - Execute callback when a process in a container execs + * @task: We go through the pagg list in the given task + * + * Used to when a process that is in a pagg container does an exec. + * + * The "from" argument is the task. The "name" argument is the name + * of the process being exec'ed. + * + */ +int +__pagg_exec(struct task_struct *task) +{ + struct pagg *pagg; + + down_read(&task->pagg_sem); /* lock the pagg list */ + + list_for_each_entry(pagg, &task->pagg_list, entry) { + if (pagg->hook->exec) /* conditional because it's optional */ + pagg->hook->exec(task, pagg); + } + + up_read(&task->pagg_sem); /* unlock the pagg list */ + return 0; +} + + +EXPORT_SYMBOL(pagg_get); +EXPORT_SYMBOL(pagg_alloc); +EXPORT_SYMBOL(pagg_free); +EXPORT_SYMBOL(pagg_hook_register); +EXPORT_SYMBOL(pagg_hook_unregister); |
From: Kaigai K. <ka...@ak...> - 2005-01-27 12:40:12
|
Hi, Erik Jacobson wrote: > Are any of you using PAGG in open source projects? Currently, only Job (and CSA) is known as the PAGG user. But we can use the PAGG framework as the generic purpose fork()/exit() event handling semantics, I think. For example, the CpuSet is typically appliable on this. > One of the reasons PAGG has had trouble being accepted is because we can't > point to enough open source users. Here at SGI, we have a few different > open source packages making use of it. However, only one PAGG user so far > has gone through community review (Job). > > We think we might be able to improve our case for including PAGG in the > kernel if other open source projects are using PAGG. Indeed, I tried to include the CpuSet into PAGG. And, some modification for PAGG is needed. [1/3] linux-2.6.11-rc2-mm1-pagg.patch This patch modifies linux-2.6.10-pagg.patch-4 for 2.6.11-rc2-mm1. We can't apply the original PAGG patch to -mm kernel completely, hence I fixed up it. [2/3] linux-2.6.11-rc2-mm1-pagg_on_RCU When we call pagg_get(), we must hold the task->pagg_sem read-semaphore. This make it difficult to refere the PAGG object in the interruption context or under the any types of spinlock. This patch make it possible to refere the PAGG object without any locking. (CpuSet-patch needs lockless references.) [3/3] linux-2.6.11-rc2-mm1-CpuSet_by_PAGG.patch We can use PAGG as the fork()/exit() event handling framework for generic purposes. Some functions, like as CpuSet, fit the PAGG framework, I think. We want to use Job(and CSA) or CpuSet without specific patches. And, it's so important to adopt the PAGG framework into the stock kernel. Thanks. -- Linux Promotion Center, NEC KaiGai Kohei <ka...@ak...> |
From: Kaigai K. <ka...@ak...> - 2005-01-27 12:47:40
|
[2/3] linux-2.6.11-rc2-mm1-pagg_on_RCU When we call pagg_get(), we must hold the task->pagg_sem read-semaphore. This make it difficult to refere the PAGG object in the interruption context or under the any types of spinlock. This patch make it possible to refere the PAGG object without any locking. (CpuSet-patch needs lockless references.) Notice: - task_struct->pagg_sem was replaced by pagg_lock (spinlock_t). - We must call pagg_get() under the rcu_read_lock(), and the existance of the returned PAGG object is guaranteed until rcu_read_unlock(). - We must call pagg_alloc() and pagg_free() under the spin_lock(&task->pagg_lock) to make sure the processing serialization. -- Linux Promotion Center, NEC KaiGai Kohei <ka...@ak...> diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/pagg.h --- linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/pagg.h 2005-01-27 17:09:40.000000000 +0900 @@ -44,6 +44,7 @@ #define _LINUX_PAGG_H #include <linux/sched.h> +#include <linux/rcupdate.h> #ifdef CONFIG_PAGG @@ -57,8 +58,8 @@ */ #define INIT_PAGG_LIST(_l) \ do { \ - INIT_LIST_HEAD(&(_l)->pagg_list); \ - init_rwsem(&(_l)->pagg_sem); \ + INIT_LIST_HEAD(&(_l)->pagg_list); \ + spin_lock_init(&(_l)->pagg_lock); \ } while(0) @@ -74,9 +75,10 @@ do { \ * entry: List pointers */ struct pagg { - struct pagg_hook *hook; - void *data; - struct list_head entry; + struct pagg_hook *hook; + void *data; + struct list_head entry; + struct rcu_head rhead; }; /* @@ -147,52 +149,10 @@ extern struct pagg *pagg_alloc(struct ta extern void pagg_free(struct pagg *pagg); extern int pagg_hook_register(struct pagg_hook *pt_new); extern int pagg_hook_unregister(struct pagg_hook *pt_old); -extern int __pagg_attach(struct task_struct *to_task, +extern int pagg_attach(struct task_struct *to_task, struct task_struct *from_task); -extern void __pagg_detach(struct task_struct *task); -extern int __pagg_exec(struct task_struct *task); - -/** - * pagg_attach - child inherits attachment to pagg containers of its parent - * @child: child task - to inherit - * @parent: parenet task - child inherits pagg containers from this parent - * - * function used when a child process must inherit attachment to pagg - * containers from the parent. Return code is propagated as a fork fail. - * - */ -static inline int pagg_attach(struct task_struct *child, - struct task_struct *parent) -{ - INIT_PAGG_LIST(child); - if (!list_empty(&parent->pagg_list)) - return __pagg_attach(child, parent); - - return 0; -} - - -/** - * pagg_detach - Detach a process from a pagg container it is a member of - * @task: The task the pagg will be detached from - * - */ -static inline void pagg_detach(struct task_struct *task) -{ - if (!list_empty(&task->pagg_list)) - __pagg_detach(task); -} - -/** - * pagg_exec - Used when a process exec's - * @task: The process doing the exec - * - */ -static inline void pagg_exec(struct task_struct *task) -{ - if (!list_empty(&task->pagg_list)) - __pagg_exec(task); -} +extern void pagg_detach(struct task_struct *task); +extern int pagg_exec(struct task_struct *task); /** * INIT_TASK_PAGG - Used in INIT_TASK to set the head and sem of pagg_list @@ -204,7 +164,7 @@ static inline void pagg_exec(struct task */ #define INIT_TASK_PAGG(tsk) \ .pagg_list = LIST_HEAD_INIT(tsk.pagg_list), \ - .pagg_sem = __RWSEM_INITIALIZER(tsk.pagg_sem), + .pagg_lock = SPIN_LOCK_UNLOCKED, #else /* CONFIG_PAGG */ diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/sched.h --- linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/sched.h 2005-01-27 17:08:46.000000000 +0900 @@ -732,7 +732,7 @@ struct task_struct { #ifdef CONFIG_PAGG /* List of pagg (process aggregate) attachments */ struct list_head pagg_list; - struct rw_semaphore pagg_sem; + spinlock_t pagg_lock; #endif struct list_head private_pages; /* per-process private pages */ diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c linux-2.6.11-rc2-mm1.pagg.rcu/kernel/pagg.c --- linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/kernel/pagg.c 2005-01-27 17:35:53.000000000 +0900 @@ -45,16 +45,15 @@ static DECLARE_RWSEM(pagg_hook_list_sem) * a pointer to the pagg struct that matches the search * key. If the key is not found, the function will return NULL. * - * The caller should hold at least a read lock on the pagg_list - * for task using down_read(&task->pagg_list.sem). - * + * The caller must be under the rcu_read_lock(), and the existance + * of the object which is returned is guaranteed by rcu_read_unlock(). */ struct pagg * pagg_get(struct task_struct *task, char *key) { struct pagg *pagg; - list_for_each_entry(pagg, &task->pagg_list, entry) { + list_for_each_entry_rcu(pagg, &task->pagg_list, entry) { if (!strcmp(pagg->hook->name,key)) return pagg; } @@ -74,24 +73,36 @@ pagg_get(struct task_struct *task, char * The caller for this function should hold at least a read lock on the * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be * removed. If this function was called from the pagg module (usually the - * case), then the caller need not hold this lock. The caller should hold - * a write lock on for the tasks pagg_sem. This can be locked using - * down_write(&task->pagg_sem) + * case), then the caller need not hold this lock. The caller must hold + * a spin lock on for the tasks pagg_lock. This can be locked using + * spin_lock(&task->pagg_lock) * */ -struct pagg * -pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) +static struct pagg * +__pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) { struct pagg *pagg; pagg = kmalloc(sizeof(struct pagg), GFP_KERNEL); if (!pagg) return NULL; - pagg->hook = pagg_hook; pagg->data = NULL; + INIT_LIST_HEAD(&pagg->entry); atomic_inc(&pagg_hook->refcnt); /* Increase hook's reference count */ - list_add_tail(&pagg->entry, &task->pagg_list); + + return pagg; +} + +struct pagg * +pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) +{ + struct pagg *pagg; + + pagg = __pagg_alloc(task, pagg_hook); + if (!pagg) + return NULL; + list_add_tail_rcu(&pagg->entry, &task->pagg_list); return pagg; } @@ -100,24 +111,37 @@ pagg_alloc(struct task_struct *task, str * pagg_free - Delete pagg from the list and free its memory * @pagg: The pagg to free * - * This function will ensure the pagg is deleted form + * This function will ensure the pagg is deleted from * the list of pagg entries for the task. Finally, the memory for the * pagg is discarded. * - * The caller of this function should hold a write lock on the pagg_sem - * for the task. This can be locked using down_write(&task->pagg_sem). + * The caller of this function must hold a spin lock on the pagg_list + * for the task. This can be locked using spin_lock(&task->pagg_list). * * Prior to calling pagg_free, the pagg should have been detached from the * pagg container represented by this pagg. That is usually done using * p->hook->detach(task, pagg); * */ +static void +rcu_pagg_free(struct rcu_head *rhead) +{ + struct pagg *pg = container_of(rhead, struct pagg, rhead); + kfree(pg); +} + +static void +__pagg_free(struct pagg *pagg) +{ + atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */ + call_rcu(&pagg->rhead, rcu_pagg_free); +} + void pagg_free(struct pagg *pagg) { - atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */ - list_del(&pagg->entry); - kfree(pagg); + list_del_rcu(&pagg->entry); + __pagg_free(pagg); } @@ -181,13 +205,20 @@ remove_client_paggs_from_all_tasks(struc get_task_struct(p); read_unlock(&tasklist_lock); - down_write(&p->pagg_sem); + + rcu_read_lock(); + spin_lock(&p->pagg_lock); paggp = pagg_get(p, php->name); if (paggp != NULL) { + list_del_rcu(&paggp->entry); + spin_unlock(&p->pagg_lock); (void)php->detach(p, paggp); - pagg_free(paggp); + __pagg_free(paggp); + } else { + spin_unlock(&p->pagg_lock); } - up_write(&p->pagg_sem); + rcu_read_unlock(); + read_lock(&tasklist_lock); /* If a PAGG got removed from the list while we're going through @@ -275,21 +306,24 @@ pagg_hook_register(struct pagg_hook *pag get_task_struct(p); read_unlock(&tasklist_lock); - down_write(&p->pagg_sem); + + spin_lock(&p->pagg_lock); paggp = pagg_get(p, pagg_hook_new->name); if (!paggp && !(p->flags & PF_EXITING)) { - paggp = pagg_alloc(p, pagg_hook_new); + paggp = __pagg_alloc(p, pagg_hook_new); if (paggp != NULL) { init_result = pagg_hook_new->init(p, paggp); - - /* Success, but init function pointer doesn't want grouping */ - if (init_result > 0) - pagg_free(paggp); - } - else + if (init_result == 0) { + list_add_tail_rcu(&paggp->entry, &p->pagg_list); + } else { + __pagg_free(paggp); + } + } else { init_result = -ENOMEM; + } } - up_write(&p->pagg_sem); + spin_unlock(&p->pagg_lock); + read_lock(&tasklist_lock); /* Like in remove_client_paggs_from_all_tasks, if the task * disappeared on us while we were going through the @@ -388,41 +422,46 @@ pagg_hook_unregister(struct pagg_hook *p * The "from" argument is the parent task. The "to" argument is the child * task. * - * See the attach decription in linux/include/linux/pagg.h for details on - * how to handle return codes from the attach function pointer. - * + * The child task must not be referenced yet. */ int -__pagg_attach(struct task_struct *to_task, struct task_struct *from_task) +pagg_attach(struct task_struct *to_task, struct task_struct *from_task) { struct pagg *from_pagg; int ret; - /* lock the parents pagg_list we are copying from */ - down_read(&from_task->pagg_sem); /* read lock the pagg list */ + INIT_PAGG_LIST(to_task); + + rcu_read_lock(); + if (list_empty(&from_task->pagg_list)) { + rcu_read_unlock(); + return 0; + } + + /* lock the parents pagg_list we are copying from */ list_for_each_entry(from_pagg, &from_task->pagg_list, entry) { struct pagg *to_pagg = NULL; - to_pagg = pagg_alloc(to_task, from_pagg->hook); + to_pagg = __pagg_alloc(to_task, from_pagg->hook); if (!to_pagg) { ret=-ENOMEM; goto error_return; } ret = to_pagg->hook->attach(to_task, to_pagg, from_pagg->data); - - if (ret < 0) { + if (likely(ret==0)) { + /* Success, and PAGG will be chained */ + list_add_tail_rcu(&to_pagg->entry, &to_task->pagg_list); + } else if (ret < 0) { /* Propagates to copy_process as a fork failure */ goto error_return; - } - else if (ret > 0) { + } else { /* Success, but attach function pointer doesn't want grouping */ - pagg_free(to_pagg); + __pagg_free(to_pagg); } } - up_read(&from_task->pagg_sem); /* unlock the pagg list */ - + rcu_read_unlock(); return 0; /* success */ error_return: @@ -430,8 +469,8 @@ __pagg_attach(struct task_struct *to_tas * Clean up all the pagg attachments made on behalf of the new * task. Set new task pagg ptr to NULL for return. */ - up_read(&from_task->pagg_sem); /* unlock the pagg list */ - __pagg_detach(to_task); + rcu_read_unlock(); + pagg_detach(to_task); return ret; /* failure */ } @@ -443,21 +482,28 @@ __pagg_attach(struct task_struct *to_tas * */ void -__pagg_detach(struct task_struct *task) +pagg_detach(struct task_struct *task) { struct pagg *pagg; struct pagg *paggtmp; - /* Remove ref. to paggs from task immediately */ - down_write(&task->pagg_sem); /* write lock pagg list */ + rcu_read_lock(); + if (list_empty(&task->pagg_list)) + goto out; + spin_lock(&task->pagg_lock); list_for_each_entry_safe(pagg, paggtmp, &task->pagg_list, entry) { - pagg->hook->detach(task, pagg); - pagg_free(pagg); - } + list_del_rcu(&pagg->entry); + spin_unlock(&task->pagg_lock); - up_write(&task->pagg_sem); /* write unlock the pagg list */ + pagg->hook->detach(task, pagg); + __pagg_free(pagg); + spin_lock(&task->pagg_lock); + } + spin_unlock(&task->pagg_lock); +out: + rcu_read_unlock(); return; /* 0 = success, else return last code for failure */ } @@ -473,18 +519,20 @@ __pagg_detach(struct task_struct *task) * */ int -__pagg_exec(struct task_struct *task) +pagg_exec(struct task_struct *task) { struct pagg *pagg; - down_read(&task->pagg_sem); /* lock the pagg list */ + rcu_read_lock(); + if (list_empty(&task->pagg_list)) + goto out; - list_for_each_entry(pagg, &task->pagg_list, entry) { + list_for_each_entry_rcu(pagg, &task->pagg_list, entry) { if (pagg->hook->exec) /* conditional because it's optional */ pagg->hook->exec(task, pagg); } - - up_read(&task->pagg_sem); /* unlock the pagg list */ + out: + rcu_read_unlock(); return 0; } |
From: Kaigai K. <ka...@ak...> - 2005-01-27 12:48:23
|
[3/3] linux-2.6.11-rc2-mm1-CpuSet_by_PAGG.patch We can use PAGG as the fork()/exit() event handling framework for generic purposes. Some functions, like as CpuSet, fit the PAGG framework, I think. -- Linux Promotion Center, NEC KaiGai Kohei <ka...@ak...> diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/pagg.h --- linux-2.6.11-rc2-mm1.pagg/include/linux/pagg.h 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/pagg.h 2005-01-27 17:09:40.000000000 +0900 @@ -44,6 +44,7 @@ #define _LINUX_PAGG_H #include <linux/sched.h> +#include <linux/rcupdate.h> #ifdef CONFIG_PAGG @@ -57,8 +58,8 @@ */ #define INIT_PAGG_LIST(_l) \ do { \ - INIT_LIST_HEAD(&(_l)->pagg_list); \ - init_rwsem(&(_l)->pagg_sem); \ + INIT_LIST_HEAD(&(_l)->pagg_list); \ + spin_lock_init(&(_l)->pagg_lock); \ } while(0) @@ -74,9 +75,10 @@ do { \ * entry: List pointers */ struct pagg { - struct pagg_hook *hook; - void *data; - struct list_head entry; + struct pagg_hook *hook; + void *data; + struct list_head entry; + struct rcu_head rhead; }; /* @@ -147,52 +149,10 @@ extern struct pagg *pagg_alloc(struct ta extern void pagg_free(struct pagg *pagg); extern int pagg_hook_register(struct pagg_hook *pt_new); extern int pagg_hook_unregister(struct pagg_hook *pt_old); -extern int __pagg_attach(struct task_struct *to_task, +extern int pagg_attach(struct task_struct *to_task, struct task_struct *from_task); -extern void __pagg_detach(struct task_struct *task); -extern int __pagg_exec(struct task_struct *task); - -/** - * pagg_attach - child inherits attachment to pagg containers of its parent - * @child: child task - to inherit - * @parent: parenet task - child inherits pagg containers from this parent - * - * function used when a child process must inherit attachment to pagg - * containers from the parent. Return code is propagated as a fork fail. - * - */ -static inline int pagg_attach(struct task_struct *child, - struct task_struct *parent) -{ - INIT_PAGG_LIST(child); - if (!list_empty(&parent->pagg_list)) - return __pagg_attach(child, parent); - - return 0; -} - - -/** - * pagg_detach - Detach a process from a pagg container it is a member of - * @task: The task the pagg will be detached from - * - */ -static inline void pagg_detach(struct task_struct *task) -{ - if (!list_empty(&task->pagg_list)) - __pagg_detach(task); -} - -/** - * pagg_exec - Used when a process exec's - * @task: The process doing the exec - * - */ -static inline void pagg_exec(struct task_struct *task) -{ - if (!list_empty(&task->pagg_list)) - __pagg_exec(task); -} +extern void pagg_detach(struct task_struct *task); +extern int pagg_exec(struct task_struct *task); /** * INIT_TASK_PAGG - Used in INIT_TASK to set the head and sem of pagg_list @@ -204,7 +164,7 @@ static inline void pagg_exec(struct task */ #define INIT_TASK_PAGG(tsk) \ .pagg_list = LIST_HEAD_INIT(tsk.pagg_list), \ - .pagg_sem = __RWSEM_INITIALIZER(tsk.pagg_sem), + .pagg_lock = SPIN_LOCK_UNLOCKED, #else /* CONFIG_PAGG */ diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/sched.h --- linux-2.6.11-rc2-mm1.pagg/include/linux/sched.h 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/include/linux/sched.h 2005-01-27 17:08:46.000000000 +0900 @@ -732,7 +732,7 @@ struct task_struct { #ifdef CONFIG_PAGG /* List of pagg (process aggregate) attachments */ struct list_head pagg_list; - struct rw_semaphore pagg_sem; + spinlock_t pagg_lock; #endif struct list_head private_pages; /* per-process private pages */ diff -rpNU3 linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c linux-2.6.11-rc2-mm1.pagg.rcu/kernel/pagg.c --- linux-2.6.11-rc2-mm1.pagg/kernel/pagg.c 2005-01-27 17:02:10.000000000 +0900 +++ linux-2.6.11-rc2-mm1.pagg.rcu/kernel/pagg.c 2005-01-27 17:35:53.000000000 +0900 @@ -45,16 +45,15 @@ static DECLARE_RWSEM(pagg_hook_list_sem) * a pointer to the pagg struct that matches the search * key. If the key is not found, the function will return NULL. * - * The caller should hold at least a read lock on the pagg_list - * for task using down_read(&task->pagg_list.sem). - * + * The caller must be under the rcu_read_lock(), and the existance + * of the object which is returned is guaranteed by rcu_read_unlock(). */ struct pagg * pagg_get(struct task_struct *task, char *key) { struct pagg *pagg; - list_for_each_entry(pagg, &task->pagg_list, entry) { + list_for_each_entry_rcu(pagg, &task->pagg_list, entry) { if (!strcmp(pagg->hook->name,key)) return pagg; } @@ -74,24 +73,36 @@ pagg_get(struct task_struct *task, char * The caller for this function should hold at least a read lock on the * pagg_hook_list_sem - or ensure that the pagg hook entry cannot be * removed. If this function was called from the pagg module (usually the - * case), then the caller need not hold this lock. The caller should hold - * a write lock on for the tasks pagg_sem. This can be locked using - * down_write(&task->pagg_sem) + * case), then the caller need not hold this lock. The caller must hold + * a spin lock on for the tasks pagg_lock. This can be locked using + * spin_lock(&task->pagg_lock) * */ -struct pagg * -pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) +static struct pagg * +__pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) { struct pagg *pagg; pagg = kmalloc(sizeof(struct pagg), GFP_KERNEL); if (!pagg) return NULL; - pagg->hook = pagg_hook; pagg->data = NULL; + INIT_LIST_HEAD(&pagg->entry); atomic_inc(&pagg_hook->refcnt); /* Increase hook's reference count */ - list_add_tail(&pagg->entry, &task->pagg_list); + + return pagg; +} + +struct pagg * +pagg_alloc(struct task_struct *task, struct pagg_hook *pagg_hook) +{ + struct pagg *pagg; + + pagg = __pagg_alloc(task, pagg_hook); + if (!pagg) + return NULL; + list_add_tail_rcu(&pagg->entry, &task->pagg_list); return pagg; } @@ -100,24 +111,37 @@ pagg_alloc(struct task_struct *task, str * pagg_free - Delete pagg from the list and free its memory * @pagg: The pagg to free * - * This function will ensure the pagg is deleted form + * This function will ensure the pagg is deleted from * the list of pagg entries for the task. Finally, the memory for the * pagg is discarded. * - * The caller of this function should hold a write lock on the pagg_sem - * for the task. This can be locked using down_write(&task->pagg_sem). + * The caller of this function must hold a spin lock on the pagg_list + * for the task. This can be locked using spin_lock(&task->pagg_list). * * Prior to calling pagg_free, the pagg should have been detached from the * pagg container represented by this pagg. That is usually done using * p->hook->detach(task, pagg); * */ +static void +rcu_pagg_free(struct rcu_head *rhead) +{ + struct pagg *pg = container_of(rhead, struct pagg, rhead); + kfree(pg); +} + +static void +__pagg_free(struct pagg *pagg) +{ + atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */ + call_rcu(&pagg->rhead, rcu_pagg_free); +} + void pagg_free(struct pagg *pagg) { - atomic_dec(&pagg->hook->refcnt); /* decr the reference count on the hook */ - list_del(&pagg->entry); - kfree(pagg); + list_del_rcu(&pagg->entry); + __pagg_free(pagg); } @@ -181,13 +205,20 @@ remove_client_paggs_from_all_tasks(struc get_task_struct(p); read_unlock(&tasklist_lock); - down_write(&p->pagg_sem); + + rcu_read_lock(); + spin_lock(&p->pagg_lock); paggp = pagg_get(p, php->name); if (paggp != NULL) { + list_del_rcu(&paggp->entry); + spin_unlock(&p->pagg_lock); (void)php->detach(p, paggp); - pagg_free(paggp); + __pagg_free(paggp); + } else { + spin_unlock(&p->pagg_lock); } - up_write(&p->pagg_sem); + rcu_read_unlock(); + read_lock(&tasklist_lock); /* If a PAGG got removed from the list while we're going through @@ -275,21 +306,24 @@ pagg_hook_register(struct pagg_hook *pag get_task_struct(p); read_unlock(&tasklist_lock); - down_write(&p->pagg_sem); + + spin_lock(&p->pagg_lock); paggp = pagg_get(p, pagg_hook_new->name); if (!paggp && !(p->flags & PF_EXITING)) { - paggp = pagg_alloc(p, pagg_hook_new); + paggp = __pagg_alloc(p, pagg_hook_new); if (paggp != NULL) { init_result = pagg_hook_new->init(p, paggp); - - /* Success, but init function pointer doesn't want grouping */ - if (init_result > 0) - pagg_free(paggp); - } - else + if (init_result == 0) { + list_add_tail_rcu(&paggp->entry, &p->pagg_list); + } else { + __pagg_free(paggp); + } + } else { init_result = -ENOMEM; + } } - up_write(&p->pagg_sem); + spin_unlock(&p->pagg_lock); + read_lock(&tasklist_lock); /* Like in remove_client_paggs_from_all_tasks, if the task * disappeared on us while we were going through the @@ -388,41 +422,46 @@ pagg_hook_unregister(struct pagg_hook *p * The "from" argument is the parent task. The "to" argument is the child * task. * - * See the attach decription in linux/include/linux/pagg.h for details on - * how to handle return codes from the attach function pointer. - * + * The child task must not be referenced yet. */ int -__pagg_attach(struct task_struct *to_task, struct task_struct *from_task) +pagg_attach(struct task_struct *to_task, struct task_struct *from_task) { struct pagg *from_pagg; int ret; - /* lock the parents pagg_list we are copying from */ - down_read(&from_task->pagg_sem); /* read lock the pagg list */ + INIT_PAGG_LIST(to_task); + + rcu_read_lock(); + if (list_empty(&from_task->pagg_list)) { + rcu_read_unlock(); + return 0; + } + + /* lock the parents pagg_list we are copying from */ list_for_each_entry(from_pagg, &from_task->pagg_list, entry) { struct pagg *to_pagg = NULL; - to_pagg = pagg_alloc(to_task, from_pagg->hook); + to_pagg = __pagg_alloc(to_task, from_pagg->hook); if (!to_pagg) { ret=-ENOMEM; goto error_return; } ret = to_pagg->hook->attach(to_task, to_pagg, from_pagg->data); - - if (ret < 0) { + if (likely(ret==0)) { + /* Success, and PAGG will be chained */ + list_add_tail_rcu(&to_pagg->entry, &to_task->pagg_list); + } else if (ret < 0) { /* Propagates to copy_process as a fork failure */ goto error_return; - } - else if (ret > 0) { + } else { /* Success, but attach function pointer doesn't want grouping */ - pagg_free(to_pagg); + __pagg_free(to_pagg); } } - up_read(&from_task->pagg_sem); /* unlock the pagg list */ - + rcu_read_unlock(); return 0; /* success */ error_return: @@ -430,8 +469,8 @@ __pagg_attach(struct task_struct *to_tas * Clean up all the pagg attachments made on behalf of the new * task. Set new task pagg ptr to NULL for return. */ - up_read(&from_task->pagg_sem); /* unlock the pagg list */ - __pagg_detach(to_task); + rcu_read_unlock(); + pagg_detach(to_task); return ret; /* failure */ } @@ -443,21 +482,28 @@ __pagg_attach(struct task_struct *to_tas * */ void -__pagg_detach(struct task_struct *task) +pagg_detach(struct task_struct *task) { struct pagg *pagg; struct pagg *paggtmp; - /* Remove ref. to paggs from task immediately */ - down_write(&task->pagg_sem); /* write lock pagg list */ + rcu_read_lock(); + if (list_empty(&task->pagg_list)) + goto out; + spin_lock(&task->pagg_lock); list_for_each_entry_safe(pagg, paggtmp, &task->pagg_list, entry) { - pagg->hook->detach(task, pagg); - pagg_free(pagg); - } + list_del_rcu(&pagg->entry); + spin_unlock(&task->pagg_lock); - up_write(&task->pagg_sem); /* write unlock the pagg list */ + pagg->hook->detach(task, pagg); + __pagg_free(pagg); + spin_lock(&task->pagg_lock); + } + spin_unlock(&task->pagg_lock); +out: + rcu_read_unlock(); return; /* 0 = success, else return last code for failure */ } @@ -473,18 +519,20 @@ __pagg_detach(struct task_struct *task) * */ int -__pagg_exec(struct task_struct *task) +pagg_exec(struct task_struct *task) { struct pagg *pagg; - down_read(&task->pagg_sem); /* lock the pagg list */ + rcu_read_lock(); + if (list_empty(&task->pagg_list)) + goto out; - list_for_each_entry(pagg, &task->pagg_list, entry) { + list_for_each_entry_rcu(pagg, &task->pagg_list, entry) { if (pagg->hook->exec) /* conditional because it's optional */ pagg->hook->exec(task, pagg); } - - up_read(&task->pagg_sem); /* unlock the pagg list */ + out: + rcu_read_unlock(); return 0; } |
From: Paul J. <pj...@sg...> - 2005-01-27 16:18:47
|
Kaigai Kohei wrote: > Indeed, I tried to include the CpuSet into PAGG. Could you describe more what CpuSet patch this is that you are including in PAGG? I have a cpuset patch in Andrew Morton's *-mm patch series for several months now, but I have not thought that it was a good candidate customer of PAGG, for the main reason that my cpuset patch requires other kernel changes, in the kernel memory allocator, and in the other calls that manipulate scheduling (sched_setaffinity) and memory placement (mbind, set_mempolicy), as well as in the /proc file system. See the added kernel files include/linux/cpuset.h and kernel/cpuset.c, for the central portions of the cpuset patch in any *-mm release of the last few months. My understanding of PAGG is that it is especially useful in supporting loadable modules that require to construct some grouping of the tasks on a system, and that require to take some actions on key task events such as fork and exit. Since the cpuset's that I know require several additional specialized hooks not provided by PAGG, I have concluded that PAGG is not a valuable base for cpusets. I have also concluded that cpuset's is not a potential loadable module -- too many kernel hooks required. Are you referring to these cpusets, or about some other facility by that name? If you are referring to these same cpusets, then what benefit do you consider that PAGG provides to these cpusets? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373, 1.925.600.0401 |
From: Kaigai K. <ka...@ak...> - 2005-01-28 12:42:30
|
Hi, Paul. Thanks for your comments. I also understood that CpuSet requires some more kernel changes, like as sched_setaffinity() and so on, than PAGG provided. But my main subject is not this point. The purpose of those patches is to restrain incrementation of hook functions in fork() or exit(). I used PAGG for this, as a common event handling framework. Currently CpuSet is the representative example widely known, and Job+CSA and CKRM also require fork()/exit() event handling mechanism. (CKRM uses ckrm_cb_fork(), PAGG uses pagg_attach()) Of course, above advanced features can't implement all of own functions completely without some kernel modifications. I have little motivation to implement CpuSet (or Job+CSA, CKRM) as a kernel-loadable module. The main motivation is that those advanced features use a common fork()/exit() event handling framework, and it will make to restrain the unregulated hook functions in fork()/exit(). I chosen PAGG as a common event handling framework, merely. But what I really wanted is a Common fork()/exit() event handling framework. It may be called PAGG, or not. Thanks. Paul Jackson wrote: > I have a cpuset patch in Andrew Morton's *-mm patch series for several > months now, but I have not thought that it was a good candidate customer > of PAGG, for the main reason that my cpuset patch requires other kernel > changes, in the kernel memory allocator, and in the other calls that > manipulate scheduling (sched_setaffinity) and memory placement (mbind, > set_mempolicy), as well as in the /proc file system. See the added > kernel files include/linux/cpuset.h and kernel/cpuset.c, for the central > portions of the cpuset patch in any *-mm release of the last few months. > > My understanding of PAGG is that it is especially useful in supporting > loadable modules that require to construct some grouping of the tasks on > a system, and that require to take some actions on key task events such > as fork and exit. Since the cpuset's that I know require several > additional specialized hooks not provided by PAGG, I have concluded that > PAGG is not a valuable base for cpusets. I have also concluded that > cpuset's is not a potential loadable module -- too many kernel hooks > required. > > Are you referring to these cpusets, or about some other facility by > that name? > > If you are referring to these same cpusets, then what benefit do you > consider that PAGG provides to these cpusets? -- Linux Promotion Center, NEC KaiGai Kohei <ka...@ak...> |
From: Paul J. <pj...@sg...> - 2005-01-28 13:09:21
|
Thank-you for your informative response. Kaigai wrote: > But what I really wanted is a Common fork()/exit() event handling framework. Could you expand on this a bit? Especially since you acknowledge that loadable modules are not particularly essential to your work, I am curious as to what else you find valuable in such a fork/exit framework. > it will make to restrain the unregulated hook functions in fork()/exit(). I will confess to not quite making sense of this statement - sorry. Thanks for your reply so far. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373, 1.925.600.0401 |
From: Kaigai K. <ka...@ak...> - 2005-01-31 09:40:18
|
Thanks for your comments. >>But what I really wanted is a Common fork()/exit() event handling framework. > > Could you expand on this a bit? Especially since you acknowledge that loadable > modules are not particularly essential to your work, I am curious as to what > else you find valuable in such a fork/exit framework. If we can implement some advanced features (CpuSet, CSA+Job, CKRM, etc...) as a kernel loadable module, it's best I also think. But using the hooks in fork()/exit() is better than patching to fork.c or exit.c for each feature, even though it can't be implemented as a kernel loadable module. Because we need not modify kernel/fork.c or kernel/exit.c directly. For example, we must append individually cpuset_fork() for CpuSet, pagg_attach() for PAGG(CSA+Job), ckrm_cb_fork() for CKRM in kernel/fork.c when we try to use those advanced features. In this case, we need to patch into three points in kernel/fork.c. But if we have a common purpose hook in kernel/fork.c, those advanced features does not need to modify kernel/fork.c directly. They have only to register their own event handler for the fork-hook. In short, my motivation is to integrate the hooks plugged ramdomly in kernel/fork.c and so on. Thanks. -- Linux Promotion Center, NEC KaiGai Kohei <ka...@ak...> |
From: Paul J. <pj...@sg...> - 2005-01-31 11:08:30
|
Thank-you, Kaigai Kohei, for taking the time to explain your motivation for preferring PAGG to hook fork/exit. In your initial post a few days ago, you wrote: > We want to use Job(and CSA) or CpuSet without specific patches. > And, it's so important to adopt the PAGG framework into the stock kernel. I agree that it would be nice if cpusets didn't require a specific kernel patch. It would make the job of getting cpusets accepted into mainstream Linux much easier. However I think that there is no way that you can use cpusets without the specific cpuset patch. Even with your patches to allow the cpuset fork/exit hooks to be done using PAGG, you still have the other, more specialized, cpuset hooks to consider. > (CpuSet-patch needs lockless references.) I am a little surprised that the fork/exit cpuset hooks must be lockless. Are you talking about the cpuset patch that is in Andrew Morton's *-mm kernels of the last few months, or some other CpuSet patch? The cpuset_fork() just does an atomic_inc() of a reference count, so doesn't care what locks are held when it is called. But the cpuset_exit() code can grab the cpuset semaphore if the last task using a cpuset exits, when one needs to consider invoking notify_on_release. I thought it was ok to nest semaphores inside semaphores (so long as you respect an order, so that you can't deadlock), so I don't understand why you needed to replace that pagg semaphore with an rcu section. In your patch 3 of 3, you wrote: > Some functions, like as CpuSet, fit the PAGG framework, I think. I am sure that my colleagues at SGI who are supporting PAGG hope that you are right. However I still don't see it. Call-by-string-name dynamically evaluated invocations are not necessarily better or worse than simple, hard coded, directly linked function calls. They _are_ more expensive, by far, and more complex and obscure, which impairs the ease of both reading and debugging code. They _have_ to provide some balancing benefit to be justified. If something can be made entirely a loadable module, requiring no specific patches (to use your nice phrase) then that might be such a benefit. Until you can dynamically plug each of the following hooks: int cpuset_init(void); void cpuset_init_smp(void); void cpuset_fork(struct task_struct *p); void cpuset_exit(struct task_struct *p); const cpumask_t cpuset_cpus_allowed(const struct task_struct *p); void cpuset_init_current_mems_allowed(void); void cpuset_update_current_mems_allowed(void); void cpuset_restrict_to_mems_allowed(unsigned long *nodes); int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl); int cpuset_zone_allowed(struct zone *z); struct file_operations proc_cpuset_operations; char *cpuset_task_status_allowed(struct task_struct *task, char *buffer); I think that you will require a cpuset specific patch. Am I missing something ?? Aside -- if you do value cpusets, please put in a good word for them with Andrew Morton, on lkml perhaps. He will _not_ further the advance of cpusets unless others outside SGI ask for them eagerly. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373, 1.925.600.0401 |
From: Kaigai K. <ka...@ak...> - 2005-02-01 13:06:19
|
Hello, Paul. Thank you for your fast and attentive response. Excuse me for less explanation. I wondered why every advanced feature duplicates its own hooks in fork()/exit() for those event handling. Thus, my main motivation and attention are about fork()/exit() event handling. Because of this, I didn't intend to implement all of the CpuSet without a specific patch. About PAGG implementation: >> (CpuSet-patch needs lockless references.) > > > I am a little surprised that the fork/exit cpuset hooks must > be lockless. I'm not talking about another CpuSet patch. When CpuSet patch is on PAGG as I modified, CpuSet functionality need refer a PAGG object related to CpuSet in some points. pagg_get() require that caller must hold pagg_sem semaphore, though we can't hold a semaphore from an interruption context or a critical section wedged between spinlock(). Thus, CpuSet-patch needs lockless references (to PAGG object). __alloc_pages() has possibility to be called from a section which cannot block. Thus, I replaced PAGG semaphore by RCU. I agree that Call-by-string-name dynamically evaluated invocations are expensive and not good as you said. (1) It should be possible to refer a PAGG object from some critical sections. (2) It should be light-weight to refer a PAGG object for each customer. IMO, these should be fixed for PAGG to be widely-supported. I want a comment by Erik Jacobson. > Aside -- if you do value cpusets, please put in a good word for them > with Andrew Morton, on lkml perhaps. He will _not_ further the advance > of cpusets unless others outside SGI ask for them eagerly. Indeed, I think CpuSet and PAGG(+Job/CSA) are so worthwhile solution. I'll try to move the discussion into LKML. But we might start from the improvement of PAGG before it. Thanks. -- Linux Promotion Center, NEC KaiGai Kohei <ka...@ak...> |
From: Paul J. <pj...@sg...> - 2005-02-01 16:13:45
|
Kaigai wrote: > Hello, Paul. > Thank you for your fast and attentive response. Thank-you for starting this good discussion. > Excuse me for less explanation. No problem. Please excuse my many questions. > I wondered why every advanced feature duplicates its own hooks > in fork()/exit() for those event handling. Sometimes the simple mechanism is the best. If dozens of features require a special subroutine to be called from the same place, such as fork, exec or init (kernel boot), then perhaps dozens of subroutine calls, all in a row, is best. Just because a way of doing things requires a minimum of compute cycles, and is so simple that even a beginning programmer can understand it almost immediately, doesn't make it inferior. > I'm not talking about another CpuSet patch. Good - I was just checking. Thanks. > When CpuSet patch is on PAGG as I modified, CpuSet functionality > need refer a PAGG object related to CpuSet in some points. > pagg_get() require that caller must hold pagg_sem semaphore, > though we can't hold a semaphore from an interruption context or > a critical section wedged between spinlock(). > Thus, CpuSet-patch needs lockless references (to PAGG object). > > __alloc_pages() has possibility to be called from a section which > cannot block. Thus, I replaced PAGG semaphore by RCU. Are you saying that pagg_sem can't block? I thought semaphores could block. I am missing something here. What is the sequence of events that would lead to trying to hold a semaphore from interrupt context (before your lockless changes)? > > Aside -- if you do value cpusets, please put in a good word for them > > with Andrew Morton, on lkml perhaps. He will _not_ further the advance > > of cpusets unless others outside SGI ask for them eagerly. > > Indeed, I think CpuSet and PAGG(+Job/CSA) are so worthwhile solution. > I'll try to move the discussion into LKML. > But we might start from the improvement of PAGG before it. I think that there is a good chance that later this month, February 2005, the subject of the cpuset patch will again become active on lkml. I will be pleased and grateful for any support you might provide to encourage the acceptance of cpusets. Thank-you. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373, 1.925.600.0401 |
From: Kaigai K. <ka...@ak...> - 2005-02-03 13:42:15
|
Hello, Paul. Thanks for your comments. >>I wondered why every advanced feature duplicates its own hooks >>in fork()/exit() for those event handling. > > > Sometimes the simple mechanism is the best. If dozens of features > require a special subroutine to be called from the same place, such as > fork, exec or init (kernel boot), then perhaps dozens of subroutine > calls, all in a row, is best. > > Just because a way of doing things requires a minimum of compute cycles, > and is so simple that even a beginning programmer can understand it > almost immediately, doesn't make it inferior. Hmm..., 'Simple is the best' is truth, I also think. >>__alloc_pages() has possibility to be called from a section which >>cannot block. Thus, I replaced PAGG semaphore by RCU. > > Are you saying that pagg_sem can't block? I thought semaphores > could block. No, semaphores always have possibility to block, and pagg_sem is also same. (Who dose need a never-blocking-semaphore?) When I modified the CpuSet feature for using PAGG, I noticed we need call pagg_get() from some critical sections. Thus, I modified PAGG to replace task->pagg_sem by RCU. pagg_get() does not block on the PAGG implemented by RCU. > I am missing something here. What is the sequence of events that > would lead to trying to hold a semaphore from interrupt context > (before your lockless changes)? For example, pid_array_load() in kernel/cpuset.c is relevant. (This is the Paul Jackson's original implementation.) int pid_array_load() { read_lock(&tasklist_lock); --------- do_each_thread(g, p) { ^ if (p->cpuset == cs) { | : Critical } Section } while_each_thread(g, p); | array_full: v read_unlock(&tasklist_lock); --------- } In this case, if CpuSet would be a PAGG's custmer, we call pagg_call() to refer the PAGG object related to each task under the read_lock(&tasklist_lock). Thus, pagg_get() must not block. >>Indeed, I think CpuSet and PAGG(+Job/CSA) are so worthwhile solution. >>I'll try to move the discussion into LKML. >>But we might start from the improvement of PAGG before it. > > I think that there is a good chance that later this month, February 2005, > the subject of the cpuset patch will again become active on lkml. > I will be pleased and grateful for any support you might provide to > encourage the acceptance of cpusets. Today, my colleague acclaim CpuSet also. :) It's good, if we can use CpuSet on stock-kernel. Thank you. -- Linux Promotion Center, NEC KaiGai Kohei <ka...@ak...> |
From: Erik J. <er...@su...> - 2005-02-02 15:07:10
|
> I agree that Call-by-string-name dynamically evaluated invocations are > expensive and not good as you said. > (1) It should be possible to refer a PAGG object from some critical > sections. (2) It should be light-weight to refer a PAGG object for > each customer. IMO, these should be fixed for PAGG to be widely-supported. > I want a comment by Erik Jacobson. Sorry for the delay. I was on vacation. For (1) - can you give some examples? For (2) - If you look at the exec hook __pagg_exec function, we go through the list of paggs for the task that reached the hook, and we execute the associated function pointer (if it is assigned in the pagg hook). When we execute the associated exec function pointer, we pass a reference to the pagg in question. pagg_get I suppose is a tiny bit expensive but it only gets bad when there are lots of pagg associations for a given task. I assume this is your concern for (2) though, right? If we were to change this, what would you suggest? Recall that there is a data pointer in the pagg structure that kernel module "customers" can store stuff in on a per-task basis. One could envision look-up tables or something, but that seems only a little less expensive and more complicated. We're of course open to ideas and suggestions. Thank you. -- Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota |
From: Kaigai K. <ka...@ak...> - 2005-02-03 13:45:03
|
Hi, Erik. Thanks for your comments. Erik Jacobson wrote: >>I agree that Call-by-string-name dynamically evaluated invocations are >>expensive and not good as you said. >>(1) It should be possible to refer a PAGG object from some critical >>sections. (2) It should be light-weight to refer a PAGG object for >>each customer. IMO, these should be fixed for PAGG to be widely-supported. >>I want a comment by Erik Jacobson. > > > Sorry for the delay. I was on vacation. > > For (1) - can you give some examples? For example, I want to refer the PAGG object for each task when we scan the task-list under the read_lock(&tasklist_lock). Using semaphore for exclusion isn't appropriate st such times. And, we should avoid to use any kind of spinlock as possible as we can, because it's expencive. I think RCU is appropriate for alternative lockless method in this case. # RCU's bad point is current PAGG's custmer must be forced to modify # own implementation. > For (2) - If you look at the exec hook __pagg_exec function, we go through > the list of paggs for the task that reached the hook, and we execute the > associated function pointer (if it is assigned in the pagg hook). When we > execute the associated exec function pointer, we pass a reference to the pagg > in question. It's right when we are processing the hook function. But we can call pagg_get() to refer the PAGG object in other place. I think the issue of (2) is using string comparison in pagg_get and so on. Generically, string comparison is more expensive than integer one. Is it possible that PAGG-engine associates the unique integer key with PAGG's customer(such as Job) when pagg_hook_register() was called, and we can call pagg_get() with task_strcut and this integer key instead of PAGG-custmer's identical string(such as "job" or "cpuset") ? > pagg_get I suppose is a tiny bit expensive but it only gets bad when there > are lots of pagg associations for a given task. I assume this is your > concern for (2) though, right? That's right also, I think. But is it unavoidable ? > If we were to change this, what would you suggest? Recall that there is > a data pointer in the pagg structure that kernel module "customers" can > store stuff in on a per-task basis. One could envision look-up tables > or something, but that seems only a little less expensive and more > complicated. My proposal is that pagg_sem should be replaced by RCU for issue (1). For example, is it possible to abolish strcmp() for the issue (2) by the following method ? 1) pagg_hook_register() returns the unique integer key associated with PAGG's customer registered. 2) That unique integer key is made a key for finding the PAGG object. struct pagg * pagg_get(struct task_struct *task, int key){ struct pagg *pagg; list_for_each_entry(pagg, &task->pagg_list, entry) { if (key==pagg->key) return pagg; return NULL; } I expect that PAGG and PAGG's custmer such as Job is merged into up-stream kernel. Thanks. -- Linux Promotion Center, NEC KaiGai Kohei <ka...@ak...> |
From: Guillaume T. <gui...@bu...> - 2005-01-28 13:29:24
|
On Fri, 2005-01-28 at 21:41 +0900, Kaigai Kohei wrote: > But my main subject is not this point. The purpose of those patches is > to restrain incrementation of hook functions in fork() or exit(). > I used PAGG for this, as a common event handling framework. I agree with this point. It seems that several applications need hook functions in fork() or/and exit(). I can give example like CSA, ELSA, CKRM, CpuSet, LSM or Dprobes. Thus, if I need a hook in fork() for my accounting application, ELSA for example , and if I don't want to add my own hook, PAGG is a solution. AFAIU, I can't use LSM hooks because it's a security framework, I can't use Dprobes because it's a debugging framework and the hooks used by CpuSet and CKRM don't allow any registration. There was also another project called kernelhooks (the former GKHI I think) but I don't know if it's still maintained... Best, Guillaume |
From: Guillaume T. <gui...@bu...> - 2005-01-31 10:31:24
|
On Mon, 2005-01-31 at 18:39 +0900, Kaigai Kohei wrote: > For example, we must append individually cpuset_fork() for CpuSet, > pagg_attach() for PAGG(CSA+Job), ckrm_cb_fork() for CKRM in kernel/fork.c > when we try to use those advanced features. > In this case, we need to patch into three points in kernel/fork.c. > But if we have a common purpose hook in kernel/fork.c, those advanced > features does not need to modify kernel/fork.c directly. > They have only to register their own event handler for the fork-hook. Thus in this case, the interesting aspect of PAGG is not the "container" aspect but it's the "hook manager" aspect, right? PAGG has only a callback for "exec" and not for "fork". So, if you want to use PAGG has a common hook in "fork" for several applications, we need to add new hook (like pagg_fork in kernel/fork.c:do_fork()). Is it be possible to split PAGG into two pieces: 1. The container manager part 2. The hook manager part Thanks, Guillaume |
From: Paul J. <pj...@sg...> - 2005-01-31 11:35:38
|
Guillaume writes: > Is it be possible to split PAGG into two pieces: > 1. The container manager part > 2. The hook manager part But the "container manager" is simply the projection of the hook manager onto the set of tasks <grin>. In other, less obfuscated terms, I mean that two tasks are in the same "container" iff they have the same hooks. So with the current implementation, I doubt they split. We have two ways of looking at one mechanism, not two mechanisms. Or at least, that's my understanding of PAGG (I could easily be mistaken here - beware). I could see some refactoring being of benefit here, however. I'd be tempted to consider, if these were my projects: 1) Implementing the 'container manager' using a simple integer id field in the task struct, some sort of "job id" (jid), with associated getjid system call, similar to gettid and getpid, and "Jid: <jid>" /proc/*/status field. 2) Continuing to work with the others doing system accounting data collection to integrate CSA, as I see others already doing. 3) For any resource management aspects, work with CKRM. The hook manager is an implementation intended to support loadable kernel modules doing some of this work. I will be surprised if these can be done this way, and do not share the enthusiasm of my SGI colleagues for this mechanism. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373, 1.925.600.0401 |