Re: [Libmesh-devel] Matrix Free Memory Scaling with ParallelMesh

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Tue, 24 Jun 2008, Derek Gaston wrote:

> Using Roy's workaround so that partitioning doesn't happen with
> ParallelMesh I've been able to run some pretty big problems today, and
> I thought I would share some numbers  All I'm doing is solving pure
> diffusion with a Dirichlet BC and a forcing function in 3d on
> hexes....

First order elements?

> but I'm doing it completely matrix free using the
> NonlinearSystem class.

Thanks; this is interesting stuff.  All the numbers I've taken down
have been with matrix and preconditioner included, on smaller
problems.

> First thing to note is that the #MB/proc is a _range_.  This range is
> taken from me watching "top" on one of the compute nodes.  The memory
> usage of each process _oscillates_ between the two numbers listed
> about every 5 seconds.  Based on watching using "xosview" I believe
> that the high numbers occur during communication steps,

Define "communication steps" - I assume you're not talking about
synching up ghost DoFs during a solve?

Are you doing mesh refinement at each oscillation?  If that's it, then
there's two likely culprits, but I don't know if they're sufficient
explanation:

We still use global error vectors; these need to be (optionally) moved
to a mapvector sort of structure just like the mesh node and element
vectors were.  A single-precision float per element would be 40MB on
each node for 10 million elements.

We use serialized DoF vectors in System::project_vector().  This is
even worse because of the double precision; each vector must cost 80MB
for 10 million DoFs.

But even that adds up to only 120MB; if you're seeing ~500MB usage
jumps, I'm not sure where the rest is coming from.  I don't suppose
you can pin down exactly where the memory's being allocated?
---
Roy