Erich Focht wrote:
> Hi Nick,
> On Tuesday 30 March 2004 11:05, Nick Piggin wrote:
>>I'm with Martin here, we are just about to merge all this
>>sched-domains stuff. So we should at least wait until after
>>that. And of course, *nothing* gets changed without at least
>>one benchmark that shows it improves something. So far
>>nobody has come up to the plate with that.
> I thought you're talking the whole time about STREAM. That is THE
> benchmark which shows you an impact of balancing at fork. At it is a
> VERY relevant benchmark. Though you shouldn't run it on historical
> machines like NUMAQ, no compute center in the western world will buy
> NUMAQs for high performance... Andy typically runs STREAM on all CPUs
> of a machine. Try on N/2 and N/4 and so on, you'll see the impact.
Well yeah, but the immediate problem was that sched-domains was
*much* worse than 2.6's numasched, neither of which balance on
fork/clone. I didn't want to obscure the issue by implementing
balance on fork/clone until we worked out exactly the problem.
Anyway, once sched-domains goes in, you can basically do whatever
you like without impacting anyone else...
>>There are other things, like java, ervers, etc that use threads.
> I'm just saying that you should have the choice. The default should be
> as before, balance at exec().
Yeah well that is a very sane thing to do ;)
>>The point is that we have never had this before, and nobody
>>(until now) has been asking for it. And there are as yet no
> ?? Sorry, I'm having balance at fork since 2001 in the NEC IA64 NUMA
> kernels and users use it intensively with OpenMP. Advertised it a lot,
> asked for it, atlked about it at the last OLS. Only IA64 was
> considered rare big iron. I understand that the issue gets hotter if
> the problem hurts on AMD64...
Sorry I hadn't realised. I guess because you are happy with
your own stuff you don't make too much noise about it on the
list lately. I apologise.
I wonder though, why don't you just teach OpenMP to use
affinities as well? Surely that is better than relying on the
behaviour of the scheduler, even if it does balance on clone.
>>convincing benchmarks that even show best case improvements. And
>>it could very easily have some bad cases.
> Again: I'm talking about having the choice. The user decides. Nothing
> protects you against user stupidity, but if they just have the choice
> of poor automatic initial scheduling, it's not enough. And: having the
> fork/clone initial balancing policy means: you don't need to make your
> code complicated and unportable by playing with setaffinity (which is
> just plainly unusable when you share the machine with other users).
If you do it by hand, you know exactly what is going to happen,
and you can turn off the balance-on-clone flags and you don't
incur the hit of pulling in remote cachelines from every CPU at
clone time to do balancing. Surely an HPC application wouldn't
mind doing that? (I guess they probably don't call clone a lot
>>And finally, HPC
>>applications are the very ones that should be using CPU
>>affinities because they are usually tuned quite tightly to the
> There are companies mainly selling NUMA machines for HPC (SGI?), so
> this is not a niche market. Clusters of big NUMA machines are not
> unusual, and they're typically not used for databases but for HPC
> apps. Unfortunately proprietary UNIX is still considered to have
> better features than Linux for such configurations.
Well, SGI should be doing tests soon and tuning the scheduler
to their liking. Hopefully others will too, so we'll see what
>>Let's just make sure we don't change defaults without any
> No reason? Aaarghh... >;-)
Sorry I mean evidence. I'm sure with a properly tuned
implementation, you could get really good speedups in lots
of places... I just want to *see* them. All I have seen so
far is Andi getting a bit better performance on something
where he can get *much* better performance by making a
trivial tweak instead.
I really don't have the software or hardware to test this
at all so I just have to sit and watch.