From: Erich F. <ef...@es...> - 2002-10-27 23:33:06
|
On Sunday 27 October 2002 19:16, Martin J. Bligh wrote: > > OK, I went to your latest patches (just 1 and 2). And they worked! > > You've fixed the performance degradation problems for kernel compile > > (now a 14% improvement in systime), that core set works without > > further futzing about or crashing, with or without TSC, on either > > version of gcc ... congrats! > > So I have a slight correction to make to the above ;-) Your patches > do work just fine, no crashes any more. HOWEVER ... turns out I only > had the first patch installed, not both. Silly mistake, but turns out > to be very interesting. > > So your second patch is the balance on exec stuff ... I've looked at > it, and think it's going to be very expensive to do in practice, at > least the simplistic "recalc everything on every exec" approach. It > does benefit the low end schedbench results, but not the high end ones, > and you can see the cost of your second patch in the system times of > the kernbench. This is interesting, indeed. As you might have seen from the tests I posted on LKML I could not see that effect on our IA64 NUMA machine. Which arises the question: is it expensive to recalculate the load when doing an exec (which I should also see) or is the strategy of equally distributing the jobs across the nodes bad for certain load+architecture combinations? As I'm not seeing the effect, maybe you could do the following experiment: In sched_best_node() keep only the "while" loop at the beginning. This leads to a cheap selection of the next node, just a simple round robin.=20 Regarding the schedbench results: are they averages over multiple runs? The numa_test needs to be repeated a few times to get statistically meaningful results. Thanks, Erich > In summary, I think I like the first patch alone better than the > combination, but will have a play at making a cross between the two. > As I have very little context about the scheduler, would appreciate > any help anyone would like to volunteer ;-) > > Corrected results are: > > Kernbench: > Elapsed User System CP= U > 2.5.44-mm4 19.676s 192.794s 42.678s 1197.4= % > 2.5.44-mm4-hbaum 19.422s 189.828s 40.204s 1196.2= % > 2.5.44-mm4-focht-1 19.46s 189.838s 37.938s 1171= % > 2.5.44-mm4-focht-12 20.32s 190s 44.4s 1153.6= % > > Schedbench 4: > Elapsed TotalUser TotalSys AvgUse= r > 2.5.44-mm4 32.45 49.47 129.86 0.8= 2 > 2.5.44-mm4-hbaum 31.31 43.85 125.29 0.8= 4 > 2.5.44-mm4-focht-1 38.61 45.15 154.48 1.0= 6 > 2.5.44-mm4-focht-12 23.23 38.87 92.99 0.8= 5 > > Schedbench 8: > Elapsed TotalUser TotalSys AvgUse= r > 2.5.44-mm4 39.90 61.48 319.26 2.7= 9 > 2.5.44-mm4-hbaum 32.63 46.56 261.10 1.9= 9 > 2.5.44-mm4-focht-1 37.76 61.09 302.17 2.5= 5 > 2.5.44-mm4-focht-12 28.40 34.43 227.25 2.0= 9 > > Schedbench 16: > Elapsed TotalUser TotalSys AvgUse= r > 2.5.44-mm4 62.99 93.59 1008.01 5.1= 1 > 2.5.44-mm4-hbaum 49.78 76.71 796.68 4.4= 3 > 2.5.44-mm4-focht-1 51.69 60.23 827.20 4.9= 5 > 2.5.44-mm4-focht-12 51.24 60.86 820.08 4.2= 3 > > Schedbench 32: > Elapsed TotalUser TotalSys AvgUse= r > 2.5.44-mm4 88.13 194.53 2820.54 11.5= 2 > 2.5.44-mm4-hbaum 54.67 147.30 1749.77 7.9= 1 > 2.5.44-mm4-focht-1 56.71 123.62 1815.12 7.9= 2 > 2.5.44-mm4-focht-12 55.69 118.85 1782.25 7.2= 8 > > Schedbench 64: > Elapsed TotalUser TotalSys AvgUse= r > 2.5.44-mm4 159.92 653.79 10235.93 25.1= 6 > 2.5.44-mm4-hbaum 65.20 300.58 4173.26 16.8= 2 > 2.5.44-mm4-focht-1 55.60 232.36 3558.98 17.6= 1 > 2.5.44-mm4-focht-12 56.03 234.45 3586.46 15.7= 6 |