Thread: [Lse-tech] NUMA scheduler issue

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello Erich.
We are hitting a problem with the NUMA scheduler:
Running a single process on a 16*w NUMA machine (4 nodes of 4 cpus), we=20
sometimes see the process migrating across nodes when the machine is not=20
loaded. It seems that on a machine where most CPUs are idle, the=20
balance_node() routine finds a load unbalance (for example: 2 active=20
processes on a node and 0 on another one), and a process is migrated=20
across nodes with the major problem of the broken memory affinity.
Running the numatest with only one process, in the (good) case we have:

initial CPU =3D 7
cpu  18493 12
cpu0 0 0
cpu1 0 0
cpu2 0 0
cpu3 0 0
cpu4 0 0
cpu5 0 0
cpu6 0 0
cpu7 18493 12
cpu8 0 0
cpu9 0 0
cpu10 0 0
cpu11 0 0
cpu12 0 0
cpu13 0 0
cpu14 0 0
cpu15 0 0
current_cpu 7

real    0m18.073s
user    0m18.060s
sys     0m0.013s

, but in the bad one (cross node migration):

initial CPU =3D 8
cpu  30271 13
cpu0 0 0
cpu1 26902 1
cpu2 0 0
cpu3 0 0
cpu4 0 0
cpu5 0 0
cpu6 0 0
cpu7 0 0
cpu8 4 1
cpu9 191 10
cpu10 3174 1
cpu11 0 0
cpu12 0 0
cpu13 0 0
cpu14 0 0
cpu15 0 0
current_cpu 1

real    0m29.577s
user    0m29.562s
sys     0m0.015s

The following patch to not care about nodes where the number of active=20
processes is <=3D number of cpus in the node in  find_busiest_node() seem=
s=20
to fix the problem. Note that in this case processes still migrate=20
beetwen cpus of the same node.

Should that be ameliorated ?
Thanks in advance.
Xavier

--=20

 Sinc=E8res salutations.
_____________________________________________________________________
=20
Xavier BRU                 BULL ISD/R&D/INTEL office:     FREC B1-422
tel : +33 (0)4 76 29 77 45                    http://www-frec.bull.fr
fax : +33 (0)4 76 29 77 70                 mailto:Xav...@bu...
addr: BULL, 1 rue de Provence, BP 208, 38432 Echirolles Cedex, FRANCE
_____________________________________________________________________

Thread: [Lse-tech] NUMA scheduler issue

lse-tech