From: Mike K. <kr...@us...> - 2002-02-26 19:04:03
|
Erich, I'm glad to see you are also exploring NUMA scheduling. The more the merrier. On Tue, Feb 26, 2002 at 11:33:14AM +0100, Erich Focht wrote: > > Well, maybe my description was a little bit misleading. My approach is not > balancing much more aggressively, the difference is actually minimal, > e.g. for 1ms ticks: > > Mike's approach: > - idle CPU : load_balance() every 1ms (only within local node) > balance_cpu_sets() every 2ms (balance across nodes) > - busy CPU : load_balance() every 250ms > balance_cpu_sets() every 500ms > - schedule() : load_balance() if idle (only within local node) > > Erich's approach: > - idle CPU : load_balance() every 1ms (first try balancing the local > node, if already balanced (no CPU exceeds the > current load by >25%) try to find a remote > node with larger load than on the current one > (>25% again)). > - busy CPU : load_balance() every 250ms (same comment as above) > - schedule() : load_balance() if idle (same comment as above). > > So the functional difference is not really that big here, I am also trying > to balance locally first. If that fails (no imbalance), I try > globally. The factor of 2 in the times is not so relevant, I think, and > also I don't consider my approach significantly more aggressive. My factor of 'two' is really a 'distance' factor. My thoughts were along the lines that node rebalancing would occur at different rates based on the distance between nodes. On the Sequent NUMA-Q machines with really high remote memory latencies, you might want to be less agressive than on machines with lower latencies. I was playing with the idea that you would discover distances during topology discovery, and the rate of rebalancing would somehow correspond to these distances. > More significant is the difference in the data used for the balance > decision: > > Mike: calculate load of a particular cpu set in the corresponding > load_balance() call. > Advantage: cheap (if spinlocks don't hurt there) > Disadvantage: for busy CPUs it can be really old (250ms) > > Erich: calculate load when needed, at the load_balance() call, but not > more than needed (normally only local node data, global data if needed, > all lockless). > Advantage: fresh, lockless > Disadvantage: sometimes slower (when balancing across nodes) > > As Mike has mainly the cache affinity in mind, it doesn't really matter > where a task is scheduled as long as it stays there long enough and the > nodes are well balanced. A wrong scheduling decision (based on old > data) will be fixed sooner or later (after x*250ms or so). Agreed. I also played with the idea of keeping a load average over time, which seems to be something we may want. However, I couldn't think of an efficient way to accomplish this, and my first attempts showed little promise. Perhaps, I will investigate this more. -- Mike |