Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

* Andi Kleen <ak...@su...> wrote:

> It doesn't do load balance in wake_up_forked_process() and is
> relatively non aggressive in balancing later. This leads to the
> multithreaded OpenMP STREAM running its childs first on the same node
> as the original process and allocating memory there. Then later they
> run on a different node when the balancing finally happens, but
> generate cross traffic to the old node, instead of using the memory
> bandwidth of their local nodes.
> 
> The difference is very visible, even the 4 thread STREAM only sees the
> bandwidth of a single node. With a more aggressive scheduler you get 4
> times as much.
> 
> Admittedly it's a bit of a stupid benchmark, but seems to
> representative for a lot of HPC codes.

There's no way the scheduler can figure out the scheduling and memory
use patterns of the new tasks in advance.

but userspace could give hints - e.g. a syscall that triggers a
rebalancing: sys_sched_load_balance(). This way userspace notifies the
scheduler that it is on 'zero ground' and that the scheduler can move it
to the least loaded cpu/node.

a variant of this is already possible, userspace can use setaffinity to
load-balance manually - but sched_load_balance() would be automatic.

	Ingo