Re: [Lse-tech] Re: more on scheduler benchmarks

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

From: "Andrew Morton" <an...@uo...>
...[snip]...
> Applying timepegs, plus schedule-timer.patch (attached) reveals that
> vanilla schedule() takes 32 microseconds with 100 tasks on the
> runqueue, and 4 usecs with an empty runqueue.
...[snip]...
> runqueue length microseconds (500MHz PII)
...
> 64 25
> 128 44
>
> Seems surprisingly slow?

What greatly exacerbates the problem is if the global runqueue_lock is
held during this span of schedule() time and if the desired
context-switch rate is high.  On a two-cpu [and not very fast] i386 I've
seen context-switch rates of 10-20,000/second.  This is obviously going
to waste lots of cpu cycles in the spinlock waits.  An 8p SMP is going
to scale even worse.  And the same or larger NUMA machine with a global
runqueue_lock exhibits distinctly unfair spinlock contention (i.e.,
starvation of the cpus "farthest" from the runqueue_lock physical
memory).

I believe the only effective solution for [large] NUMA systems is to
reduce the contention on the global runqueue_lock by using multiple
queues with individual spinlocks.  Using prioritized runqueues (and the
same global runqueue_lock) helps because it reduces the "hold" times,
but if you add enough cpus, you'll eventually be back into the same
high-contention high-waste situation.

John Hawkes
ha...@en...