Rick Lindsley wrote:
>So let me try a diagram. Each of these groups of numbers represent a
>cpu_group, and the labels to the left are individual sched_domains.
>SD2-SD3 0123 4567
>SD4-SD7 01 23 45 67
>SD8-SD15 0 1 2 3 4 5 6 7
>Currently, we assume each cpu has a power of 1, so each cpu group in
>domains SD8-SD15 would have a power of 1, each cpu group in SD4-SD7
>would have a power of 2, each of SD2 and SD3 would have a power of 4,
>and collectively, all CPUs as represented in SD1 would have a power of 8.
>Of course, we don't really make use of this assumption but this just
>enumerates our assumption that all nodes, all cpus are created equal.
Well we used to sum up the number of CPUs in each group, so it
wasn't quite that bad. We assumed all CPUs are created equal.
>Your new power code would assign each cpu group a static power other
>than this, making SMT pairs, for instance, 1.2 instead of 2. In the
>case of four siblings, 1.4 instead of 4. Correct? In the example above,
>SD2 and SD3 would have a power rating of 2.4, and SD1 would have a power
>rating of 4*1.2 or 4.8, right?
>With your current code, we only consult the power ratings if we've already
>decided that we are currently "balanced enough".
Well we do work out the per group loads by dividing with the power
rating instead of cpus-in-the-group too.
> I'd go one step further
>and say that manipulating for power only makes sense if you have an idle
>processor somewhere. If all processors are busy, then short of some
>quality-of-process assessment, how can you improve power? (You could
>improve fairness, I suppose, but that would require lots more stats and
>history than we have here.) If one set of procs is slower than another,
>won't that make itself apparent by a longer queue developing there? (or
>shorter queues forming somewhere else?) and it being load-balanced
>by the existing algorithm? Seems to me we only need to make power
>decisions when we want to consider an idle processor stealing a task (a
>possibly *running* task) from another processor because this processor
Yeah, probably we could change that test to:
if (*imbalance <= SCHED_LOAD_SCALE / 2
&& this_load < SCHED_LOAD_SCALE)
Either way, if the calculation should be done in such a way that
if your CPUs are not idle, then it wouldn't predict a performance
No doubt there is room for improvement, but hopefully it is now
at a "good enough" stage...