On Mon, Feb 03, 2003 at 06:36:38PM -0800, John Hawkes wrote:
> Folks,
>
> One thing we're seeing here on systems with lots of CPUs (e.g., 32 or
> 64) and workloads that produce significant contention on some key locks
> (e.g., tasklist_lock in 2.4.19, or one of the runqueue spinlocks, or ...
> pick one) is the unfortunate effect of a spin_lock_irq() or
> write_lock_irq(): disable interrupts, then contend for the lock. If the
> lock is highly contended and we encounter high wait-times, then
> interrupts are disabled for the entire wait-time, which can be a
> distressingly significant period of time.
John,
You need to backport some of the tasklist_lock-abuse-prevention work
in 2.5. The one that comes to the top of my mind is proc_read_super()
that walked the entire tasklist to count the number of tasks. That was
fixed in 2.5 using a per-cpu task counter. If you get_pid_list() is a
problem, you could try out the patch attached below (you need RCU).
I am interested in looking at other potential problems with large
tasklist.
Could you publish your lockmeter results ? Some of these problems may
have been solved in 2.5.
Thanks
Dipankar
diff -urN linux-2.5.59-base/fs/proc/base.c linux-2.5.59-tasks_rcu/fs/proc/base.c
--- linux-2.5.59-base/fs/proc/base.c 2003-01-17 07:52:19.000000000 +0530
+++ linux-2.5.59-tasks_rcu/fs/proc/base.c 2003-01-22 11:21:05.000000000 +0530
@@ -31,6 +31,7 @@
#include <linux/kallsyms.h>
#include <linux/mount.h>
#include <linux/security.h>
+#include <linux/rcupdate.h>
/*
* For hysterical raisins we keep the same inumbers as in the old procfs.
@@ -1160,7 +1161,7 @@
int nr_pids = 0;
index--;
- read_lock(&tasklist_lock);
+ rcu_read_lock();
for_each_process(p) {
int pid = p->pid;
if (!pid)
@@ -1172,7 +1173,7 @@
if (nr_pids >= PROC_MAXPIDS)
break;
}
- read_unlock(&tasklist_lock);
+ rcu_read_unlock();
return nr_pids;
}
diff -urN linux-2.5.59-base/include/linux/sched.h linux-2.5.59-tasks_rcu/include/linux/sched.h
--- linux-2.5.59-base/include/linux/sched.h 2003-01-17 07:51:38.000000000 +0530
+++ linux-2.5.59-tasks_rcu/include/linux/sched.h 2003-02-01 00:31:17.000000000 +0530
@@ -28,6 +28,7 @@
#include <linux/completion.h>
#include <linux/pid.h>
#include <linux/percpu.h>
+#include <linux/rcupdate.h>
struct exec_domain;
@@ -400,12 +401,17 @@
struct backing_dev_info *backing_dev_info;
unsigned long ptrace_message;
+ struct rcu_head rcu;
};
extern void __put_task_struct(struct task_struct *tsk);
#define get_task_struct(tsk) do { atomic_inc(&(tsk)->usage); } while(0)
-#define put_task_struct(tsk) \
-do { if (atomic_dec_and_test(&(tsk)->usage)) __put_task_struct(tsk); } while(0)
+static inline void put_task_struct(struct task_struct *tsk)
+{
+ if (atomic_dec_and_test(&tsk->usage))
+ call_rcu(&tsk->rcu, (void (*)(void *))__put_task_struct, tsk);
+}
+
/*
* Per process flags
@@ -607,13 +613,13 @@
#define REMOVE_LINKS(p) do { \
if (thread_group_leader(p)) \
- list_del_init(&(p)->tasks); \
+ list_del_rcu(&(p)->tasks); \
remove_parent(p); \
} while (0)
#define SET_LINKS(p) do { \
if (thread_group_leader(p)) \
- list_add_tail(&(p)->tasks,&init_task.tasks); \
+ list_add_tail_rcu(&(p)->tasks,&init_task.tasks); \
add_parent(p, (p)->parent); \
} while (0)
@@ -621,7 +627,7 @@
#define prev_task(p) list_entry((p)->tasks.prev, struct task_struct, tasks)
#define for_each_process(p) \
- for (p = &init_task ; (p = next_task(p)) != &init_task ; )
+ for (p = &init_task ; (p = next_task(p)),({ read_barrier_depends(); 0;}),p != &init_task ; )
/*
* Careful: do_each_thread/while_each_thread is a double loop so
|