From: Miklos S. <mi...@sz...> - 2009-03-20 09:44:49
|
Ingo, I tested this one, and I think it makes sense in any case as an optimization. It should also be good for -stable kernels. Does it look OK? Thanks, Miklos ---- From: Miklos Szeredi <msz...@su...> This patch fixes bug #12208: http://bugzilla.kernel.org/show_bug.cgi?id=12208 Don't preempt tasks in preempt_schedule() if they are already in the process of going to sleep. Otherwise the task would wake up only to go to sleep again. Due to the way wait_task_inactive() works this can also drastically slow down ptrace: - task A is ptracing task B - task B stops on a trace event - task A is woken up and preempts task B - task A calls ptrace on task B, which does ptrace_check_attach() - this calls wait_task_inactive(), which sees that task B is still on the runq - task A goes to sleep for a jiffy - ... Since UML does lots of the above sequences, those jiffies quickly add up to make it slow as hell. Signed-off-by: Miklos Szeredi <msz...@su...> CC: st...@ke... --- kernel/sched.c | 4 ++++ 1 file changed, 4 insertions(+) Index: linux.git/kernel/sched.c =================================================================== --- linux.git.orig/kernel/sched.c 2009-03-20 09:40:47.000000000 +0100 +++ linux.git/kernel/sched.c 2009-03-20 10:28:56.000000000 +0100 @@ -4632,6 +4632,10 @@ asmlinkage void __sched preempt_schedule if (likely(ti->preempt_count || irqs_disabled())) return; + /* No point in preempting we are just about to go to sleep. */ + if (current->state != TASK_RUNNING) + return; + do { add_preempt_count(PREEMPT_ACTIVE); schedule(); |
From: Peter Z. <pe...@in...> - 2009-03-20 10:03:43
|
On Fri, 2009-03-20 at 10:43 +0100, Miklos Szeredi wrote: > Ingo, > > I tested this one, and I think it makes sense in any case as an > optimization. It should also be good for -stable kernels. > > Does it look OK? The idea is good, but there is a risk of preemption latencies here. Some code paths aren't real quick between setting ->state != TASK_RUNNING and calling schedule. [ Both quick: as in O(1) and few instructions ] So if we're going to do this, we'd need to audit all such code paths -- and there be lots. The first line of attack for this problem is making wait_task_inactive() sucks less, which shouldn't be too hard, that unconditional 1 jiffy sleep is simply retarded. > Index: linux.git/kernel/sched.c > =================================================================== > --- linux.git.orig/kernel/sched.c 2009-03-20 09:40:47.000000000 +0100 > +++ linux.git/kernel/sched.c 2009-03-20 10:28:56.000000000 +0100 > @@ -4632,6 +4632,10 @@ asmlinkage void __sched preempt_schedule > if (likely(ti->preempt_count || irqs_disabled())) > return; > > + /* No point in preempting we are just about to go to sleep. */ > + if (current->state != TASK_RUNNING) > + return; > + > do { > add_preempt_count(PREEMPT_ACTIVE); > schedule(); |
From: Miklos S. <mi...@sz...> - 2009-03-20 10:38:12
|
On Fri, 20 Mar 2009, Peter Zijlstra wrote: > On Fri, 2009-03-20 at 10:43 +0100, Miklos Szeredi wrote: > > Ingo, > > > > I tested this one, and I think it makes sense in any case as an > > optimization. It should also be good for -stable kernels. > > > > Does it look OK? > > The idea is good, but there is a risk of preemption latencies here. Some > code paths aren't real quick between setting ->state != TASK_RUNNING and > calling schedule. > > [ Both quick: as in O(1) and few instructions ] > > So if we're going to do this, we'd need to audit all such code paths -- > and there be lots. Oh, yes. In a random sample the most common pattern is something like this: spin_lock(&some_lock); /* do something */ set_task_state(TASK_SOMESLEEP); /* do something more */ spin_unlock(&some_lock); schedule(); ... Which should only positively be impacted by the change. But I can imagine rare cases where it's more complex. > The first line of attack for this problem is making wait_task_inactive() > sucks less, which shouldn't be too hard, that unconditional 1 jiffy > sleep is simply retarded. I completely agree. However, I'd like to have a non-invasive solution that can go into current and stable kernels so UML users don't need to suffer any more. Thanks, Miklos > > > Index: linux.git/kernel/sched.c > > =================================================================== > > --- linux.git.orig/kernel/sched.c 2009-03-20 09:40:47.000000000 +0100 > > +++ linux.git/kernel/sched.c 2009-03-20 10:28:56.000000000 +0100 > > @@ -4632,6 +4632,10 @@ asmlinkage void __sched preempt_schedule > > if (likely(ti->preempt_count || irqs_disabled())) > > return; > > > > + /* No point in preempting we are just about to go to sleep. */ > > + if (current->state != TASK_RUNNING) > > + return; > > + > > do { > > add_preempt_count(PREEMPT_ACTIVE); > > schedule(); > |
From: Ingo M. <mi...@el...> - 2009-03-20 10:53:37
|
* Miklos Szeredi <mi...@sz...> wrote: > On Fri, 20 Mar 2009, Peter Zijlstra wrote: > > On Fri, 2009-03-20 at 10:43 +0100, Miklos Szeredi wrote: > > > Ingo, > > > > > > I tested this one, and I think it makes sense in any case as an > > > optimization. It should also be good for -stable kernels. > > > > > > Does it look OK? > > > > The idea is good, but there is a risk of preemption latencies here. Some > > code paths aren't real quick between setting ->state != TASK_RUNNING and > > calling schedule. > > > > [ Both quick: as in O(1) and few instructions ] > > > > So if we're going to do this, we'd need to audit all such code paths -- > > and there be lots. > > Oh, yes. > > In a random sample the most common pattern is something like this: > > spin_lock(&some_lock); > /* do something */ > set_task_state(TASK_SOMESLEEP); > /* do something more */ > spin_unlock(&some_lock); > schedule(); > ... > > Which should only positively be impacted by the change. But I can > imagine rare cases where it's more complex. I'd suggest spin_unlock_no_resched() and task_unlock_no_resched() instead of open-coding preempt-disable sequences. > > The first line of attack for this problem is making > > wait_task_inactive() sucks less, which shouldn't be too hard, > > that unconditional 1 jiffy sleep is simply retarded. > > I completely agree. However, I'd like to have a non-invasive > solution that can go into current and stable kernels so UML users > don't need to suffer any more. Agreed. task_unlock_no_resched() should do that i think. Ingo |
From: Miklos S. <mi...@sz...> - 2009-03-20 11:26:40
|
On Fri, 20 Mar 2009, Ingo Molnar wrote: > > > The first line of attack for this problem is making > > > wait_task_inactive() sucks less, which shouldn't be too hard, > > > that unconditional 1 jiffy sleep is simply retarded. > > > > I completely agree. However, I'd like to have a non-invasive > > solution that can go into current and stable kernels so UML users > > don't need to suffer any more. > > Agreed. task_unlock_no_resched() should do that i think. I don't see how that would help. ptrace_stop() specifically would need read_unlock_no_resched(). But I'm reluctant to add more spinlock functions with all their variants... Thanks, Miklos |
From: Ingo M. <mi...@el...> - 2009-03-20 11:40:11
|
* Miklos Szeredi <mi...@sz...> wrote: > On Fri, 20 Mar 2009, Ingo Molnar wrote: > > > > The first line of attack for this problem is making > > > > wait_task_inactive() sucks less, which shouldn't be too hard, > > > > that unconditional 1 jiffy sleep is simply retarded. > > > > > > I completely agree. However, I'd like to have a non-invasive > > > solution that can go into current and stable kernels so UML users > > > don't need to suffer any more. > > > > Agreed. task_unlock_no_resched() should do that i think. > > I don't see how that would help. it more clearly expresses the need there, and we already have _no_resched API variants (we add them on an as-needed basis). Doing: preempt_disable(); read_lock(); ... read_unlock(); preempt_enable_no_resched(); Really just open-codes read_unlock_no_resched() and uglifies the code. > ptrace_stop() specifically would need read_unlock_no_resched(). > But I'm reluctant to add more spinlock functions with all their > variants... if you worry about backportability, we can certainly add the easy fix too, if it's followed by the more involved fix(es). Ingo |