From: Dipankar S. <dip...@in...> - 2003-02-04 06:43:42
|
On Mon, Feb 03, 2003 at 06:36:38PM -0800, John Hawkes wrote: > Folks, > > One thing we're seeing here on systems with lots of CPUs (e.g., 32 or > 64) and workloads that produce significant contention on some key locks > (e.g., tasklist_lock in 2.4.19, or one of the runqueue spinlocks, or ... > pick one) is the unfortunate effect of a spin_lock_irq() or > write_lock_irq(): disable interrupts, then contend for the lock. If the > lock is highly contended and we encounter high wait-times, then > interrupts are disabled for the entire wait-time, which can be a > distressingly significant period of time. John, You need to backport some of the tasklist_lock-abuse-prevention work in 2.5. The one that comes to the top of my mind is proc_read_super() that walked the entire tasklist to count the number of tasks. That was fixed in 2.5 using a per-cpu task counter. If you get_pid_list() is a problem, you could try out the patch attached below (you need RCU). I am interested in looking at other potential problems with large tasklist. Could you publish your lockmeter results ? Some of these problems may have been solved in 2.5. Thanks Dipankar diff -urN linux-2.5.59-base/fs/proc/base.c linux-2.5.59-tasks_rcu/fs/proc/base.c --- linux-2.5.59-base/fs/proc/base.c 2003-01-17 07:52:19.000000000 +0530 +++ linux-2.5.59-tasks_rcu/fs/proc/base.c 2003-01-22 11:21:05.000000000 +0530 @@ -31,6 +31,7 @@ #include <linux/kallsyms.h> #include <linux/mount.h> #include <linux/security.h> +#include <linux/rcupdate.h> /* * For hysterical raisins we keep the same inumbers as in the old procfs. @@ -1160,7 +1161,7 @@ int nr_pids = 0; index--; - read_lock(&tasklist_lock); + rcu_read_lock(); for_each_process(p) { int pid = p->pid; if (!pid) @@ -1172,7 +1173,7 @@ if (nr_pids >= PROC_MAXPIDS) break; } - read_unlock(&tasklist_lock); + rcu_read_unlock(); return nr_pids; } diff -urN linux-2.5.59-base/include/linux/sched.h linux-2.5.59-tasks_rcu/include/linux/sched.h --- linux-2.5.59-base/include/linux/sched.h 2003-01-17 07:51:38.000000000 +0530 +++ linux-2.5.59-tasks_rcu/include/linux/sched.h 2003-02-01 00:31:17.000000000 +0530 @@ -28,6 +28,7 @@ #include <linux/completion.h> #include <linux/pid.h> #include <linux/percpu.h> +#include <linux/rcupdate.h> struct exec_domain; @@ -400,12 +401,17 @@ struct backing_dev_info *backing_dev_info; unsigned long ptrace_message; + struct rcu_head rcu; }; extern void __put_task_struct(struct task_struct *tsk); #define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0) -#define put_task_struct(tsk) \ -do { if (atomic_dec_and_test(&(tsk)->usage)) __put_task_struct(tsk); } while(0) +static inline void put_task_struct(struct task_struct *tsk) +{ + if (atomic_dec_and_test(&tsk->usage)) + call_rcu(&tsk->rcu, (void (*)(void *))__put_task_struct, tsk); +} + /* * Per process flags @@ -607,13 +613,13 @@ #define REMOVE_LINKS(p) do { \ if (thread_group_leader(p)) \ - list_del_init(&(p)->tasks); \ + list_del_rcu(&(p)->tasks); \ remove_parent(p); \ } while (0) #define SET_LINKS(p) do { \ if (thread_group_leader(p)) \ - list_add_tail(&(p)->tasks,&init_task.tasks); \ + list_add_tail_rcu(&(p)->tasks,&init_task.tasks); \ add_parent(p, (p)->parent); \ } while (0) @@ -621,7 +627,7 @@ #define prev_task(p) list_entry((p)->tasks.prev, struct task_struct, tasks) #define for_each_process(p) \ - for (p = &init_task ; (p = next_task(p)) != &init_task ; ) + for (p = &init_task ; (p = next_task(p)),({ read_barrier_depends(); 0;}),p != &init_task ; ) /* * Careful: do_each_thread/while_each_thread is a double loop so |