From: Nick Piggin <piggin@cy...>  20040221 01:45:35

Rick Lindsley wrote: >So let me try a diagram. Each of these groups of numbers represent a >cpu_group, and the labels to the left are individual sched_domains. > >SD1 01234567 >SD2SD3 0123 4567 >SD4SD7 01 23 45 67 >SD8SD15 0 1 2 3 4 5 6 7 > >Currently, we assume each cpu has a power of 1, so each cpu group in >domains SD8SD15 would have a power of 1, each cpu group in SD4SD7 >would have a power of 2, each of SD2 and SD3 would have a power of 4, >and collectively, all CPUs as represented in SD1 would have a power of 8. >Of course, we don't really make use of this assumption but this just >enumerates our assumption that all nodes, all cpus are created equal. > > Well we used to sum up the number of CPUs in each group, so it wasn't quite that bad. We assumed all CPUs are created equal. >Your new power code would assign each cpu group a static power other >than this, making SMT pairs, for instance, 1.2 instead of 2. In the >case of four siblings, 1.4 instead of 4. Correct? In the example above, >SD2 and SD3 would have a power rating of 2.4, and SD1 would have a power >rating of 4*1.2 or 4.8, right? > > Right. >With your current code, we only consult the power ratings if we've already >decided that we are currently "balanced enough". > Well we do work out the per group loads by dividing with the power rating instead of cpusinthegroup too. > I'd go one step further >and say that manipulating for power only makes sense if you have an idle >processor somewhere. If all processors are busy, then short of some >qualityofprocess assessment, how can you improve power? (You could >improve fairness, I suppose, but that would require lots more stats and >history than we have here.) If one set of procs is slower than another, >won't that make itself apparent by a longer queue developing there? (or >shorter queues forming somewhere else?) and it being loadbalanced >by the existing algorithm? Seems to me we only need to make power >decisions when we want to consider an idle processor stealing a task (a >possibly *running* task) from another processor because this processor >is faster/stronger/better. > > Yeah, probably we could change that test to: if (*imbalance <= SCHED_LOAD_SCALE / 2 && this_load < SCHED_LOAD_SCALE) Either way, if the calculation should be done in such a way that if your CPUs are not idle, then it wouldn't predict a performance increase. No doubt there is room for improvement, but hopefully it is now at a "good enough" stage... 