From: Erich F. <ef...@hp...> - 2004-03-24 12:04:50
|
Hello Xavier, On Tuesday 23 March 2004 19:24, Xavier Bru wrote: > Running a single process on a 16*w NUMA machine (4 nodes of 4 cpus), we > sometimes see the process migrating across nodes when the machine is not > loaded. It seems that on a machine where most CPUs are idle, the > balance_node() routine finds a load unbalance (for example: 2 active > processes on a node and 0 on another one), and a process is migrated > across nodes with the major problem of the broken memory affinity. As Rick mentioned in his reply, in some cases you might actually want an idle node to steal single tasks. The time averaged loads should actually help to avoid the situation you mention. But the loads are computed by taking into acount all runnable processes. It might be better to just consider processes which ran longer or have more memory, as Rick suggested. OTOH this gives a minimum unbalanced run-time which you again might want to avoid. My main idea behind a NUMA scheduler was to provide a mechanism for the process to return to its node. This isn't there in 2.6 and the current NUMA scheduler is pretty "crippled", IMHO. As the sched_domains provide more flexibility regarding HT, NUMA, SMT and what not, and Andrew Morton seems to be willing to accept it, I'm not too motivated to improve the current NUMA scheduler. Looking at your patch: by just skipping all nodes with nr_running<=nr_cpus you don't update this_rq()->prev_node_load[i] so some decisions might be wrong, later. So maybe you want to move the "continue" lower in the loop. Otherwise it's fine IF you want this behaviour... > to fix the problem. Note that in this case processes still migrate > beetwen cpus of the same node. ??? That shouldn't happen, either. Is it due to short running processes on the same CPU? The time-averaging should catch that, too... Andrew suggests in a reply to change try_to_wakeup()... Might work but you still have kernel threads pinned to each CPU which you cannot move, they just have to wake up there. Would be nice to find out what exactly happens and which task pushes the numatest away. It would again be better to not count particular tasks in the load. Regards, Erich |