Re: [Lse-tech] cpus_allowed in multi-queue scheduler

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

John, regarding your message and my previous message.

I am also looking into make the PROC_CHANGE_PENALTY a function of the pool.
So within a cpu-pool, the PROC_CHANGE_PENALTY is set to "A" and to cpu's
outside the pool
the PROC_CHANGE_PENALTY will be set to "Y*A + B".
CPU sets and different PROC_CHANGE_PENALTIES are extremely easy to code in
our current MQ
scheduler. Have you managed to get it running on a 32-way SGI machine. If
so you might want to
try the prototype of the cpu-pools we posted on the lse site.

Hubertus Franke
Enterprise Linux Group (Mgr),  Linux Technology Center (Member Scalability)
, OS-PIC (Chair)
email: fr...@us...
(w) 914-945-2003    (fax) 914-945-4425   TL: 862-2003

"John Hawkes" <ha...@en...>@lists.sourceforge.net on 02/09/2001
03:10:49 PM

Sent by:  lse...@li...

To:   "Mike Kravetz" <mkr...@se...>,
      <lse...@li...>
cc:
Subject:  Re: [Lse-tech] cpus_allowed in multi-queue scheduler

> Each CPU specific runqueue data structure has a
> field which contains the maximum 'non-affinity goodness'
> value of all schedulable tasks on that runqueue.  Therefore,
> when we 'take a quick look' we are really only looking at
> the task with the maximum 'non-affinity goodness' on a remote
> CPU's runqueue.

Another wrinkle: specific hardware implementations may have gradations
of "non-affinity goodness", rather than a binary presumption that a
process that previously executed on cpuA will prefer to execute again on
cpuA, but if not cpuA then any other cpu is equally less-good.

Suppose we have a NUMA machine consisting of two nodes, and each node
contains main memory, *two* CPUs (that we'll name cpuA and cpuB for
node1, and cpuC and cpuD for node2), and perhaps even a shared L3 cache
for that node's main memory.  Suppose processX last executed on cpuA.
If it re-executes on cpuA, then we have some potential for having L1 and
L2 cacheblocks waiting for it and we expect optimum performance.  And if
it re-executes on cpuB, then we have some potential for having L3
cacheblocks waiting, and our performance is almost as good as cpuA.
Re-executing on another node -- on cpuC or cpuD -- would be definitely
inferior to cpuB.  Moreover, a NUMA machine that has a hierarchy of
memory access penalties, depending upon how "far away" you are from the
previous-execution node's memory, will have an even more complex
"goodness" calculation.  Thus, what we need is to abstract the
"goodness" calculation to allow for architecture-specific differences.

John Hawkes
ha...@en...

_______________________________________________
Lse-tech mailing list
Lse...@li...
http://lists.sourceforge.net/lists/listinfo/lse-tech