Am Dienstag, den 30.08.2011, 12:34 -0600 schrieb John Peterson:
> On Tue, Aug 30, 2011 at 12:23 PM, robert <robert.bodner@...> wrote:
> >> 32 nodes or 32 cores? I don't know the details of your cluster so it
> >> may be obvious, but make sure you aren't accidentally running too many
> >> MPI processes on a given node.
> > As far as I understood it it is:
> > 1 node = 4cores
> > 4GB/node
> This doesn't match the output of the top command you posted below.
> The total memory given there is 31 985 140 kilobytes = 30.5034065
> Does the cluster you are on have a public information web page? That
> would probably help clear things up...
> > For testing and learning I only used a partition of 32 nodes.
> > I have just changed to 128 nodes but this doesn't change anything.
> > If I am running into swap and I use --enable-parmesh this wouldn't
> > change much, (since I have one copy of the mesh per mpi-process), right?
> The idea would be to run fewer processes per node. For example, you
> could run 1 MPI process each on 128 different nodes, then each of the
> individual processes would have access to the full amount of RAM for
> the node. The method for doing this is again cluster dependent; I
> don't know if it's possible on your particular cluster.
It is possible to run 1, 2 or 4 processes per node. If I run 2 or 4 processes I get:
Error! ***Memory allocation failed for SetUpCoarseGraph: gdata.
Requested size: 107754020 bytesError! ***Memory allocation failed for
SetUpCoarseGraph: gdata. Requested size: 107754020 bytesError!
For 1 process it works but very, very slowly
> > top - 20:19:21 up 35 days, 8:55, 51 users, load average: 0.01, 0.29,
> > 0.45
> > Tasks: 399 total, 1 running, 397 sleeping, 1 stopped, 0 zombie
> > Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si,
> > 0.0%st
> > Mem: 31985140k total, 31158420k used, 826720k free, 274980k buffers
> > Swap: 8393952k total, 160k used, 8393792k free, 16572876k cached
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> > COMMAND
> > 2955 bodner 16 0 3392 1932 1244 R 1 0.0 0:00.69
> > top
> > 6602 bodner 15 0 14296 3248 1864 S 0 0.0 0:10.11
> > sshd
> > 2829 bodner 15 0 19604 3892 3092 S 0 0.0 0:00.17 mpirun
> > The last one is the process of interest.
> Actually none of these are interesting... we would need to see that
> actual processes that mpirun spawned. That is, if you ran something
> mpirun -np 4 ./foo
> You would need to look for the four instances of "foo" in the top
> command, see how much CPU/memory they are consuming.