Rick Lindsley wrote:
>So let me try a diagram. Each of these groups of numbers represent a
>cpu_group, and the labels to the left are individual sched_domains.
>
>SD1 01234567
>SD2SD3 0123 4567
>SD4SD7 01 23 45 67
>SD8SD15 0 1 2 3 4 5 6 7
>
>Currently, we assume each cpu has a power of 1, so each cpu group in
>domains SD8SD15 would have a power of 1, each cpu group in SD4SD7
>would have a power of 2, each of SD2 and SD3 would have a power of 4,
>and collectively, all CPUs as represented in SD1 would have a power of 8.
>Of course, we don't really make use of this assumption but this just
>enumerates our assumption that all nodes, all cpus are created equal.
>
>
Well we used to sum up the number of CPUs in each group, so it
wasn't quite that bad. We assumed all CPUs are created equal.
>Your new power code would assign each cpu group a static power other
>than this, making SMT pairs, for instance, 1.2 instead of 2. In the
>case of four siblings, 1.4 instead of 4. Correct? In the example above,
>SD2 and SD3 would have a power rating of 2.4, and SD1 would have a power
>rating of 4*1.2 or 4.8, right?
>
>
Right.
>With your current code, we only consult the power ratings if we've already
>decided that we are currently "balanced enough".
>
Well we do work out the per group loads by dividing with the power
rating instead of cpusinthegroup too.
> I'd go one step further
>and say that manipulating for power only makes sense if you have an idle
>processor somewhere. If all processors are busy, then short of some
>qualityofprocess assessment, how can you improve power? (You could
>improve fairness, I suppose, but that would require lots more stats and
>history than we have here.) If one set of procs is slower than another,
>won't that make itself apparent by a longer queue developing there? (or
>shorter queues forming somewhere else?) and it being loadbalanced
>by the existing algorithm? Seems to me we only need to make power
>decisions when we want to consider an idle processor stealing a task (a
>possibly *running* task) from another processor because this processor
>is faster/stronger/better.
>
>
Yeah, probably we could change that test to:
if (*imbalance <= SCHED_LOAD_SCALE / 2
&& this_load < SCHED_LOAD_SCALE)
Either way, if the calculation should be done in such a way that
if your CPUs are not idle, then it wouldn't predict a performance
increase.
No doubt there is room for improvement, but hopefully it is now
at a "good enough" stage...
