From: Benjamin K. <ben...@na...> - 2008-07-22 15:22:26
|
Check out attached... I've been doing some MPI profiling on my 4-socket, dual-core per node Opteron cluster. I've been curious for a while about "multilevel domain decomposition" for this class of architectures - e.g. (1) partition into the number of nodes (2) partition each subdomain into the number of processors per node Since the on-node communication is cheaper than off-node communication, it would seem there is performance to gain here (especially in terms of latency). What do y'all think? The mvapich-1 intra-node latency numbers are really impressive! -Ben |