From: Roy Stogner <roy@st...>  20080624 18:50:30

On Tue, 24 Jun 2008, Derek Gaston wrote: > Using Roy's workaround so that partitioning doesn't happen with > ParallelMesh I've been able to run some pretty big problems today, and > I thought I would share some numbers All I'm doing is solving pure > diffusion with a Dirichlet BC and a forcing function in 3d on > hexes.... First order elements? > but I'm doing it completely matrix free using the > NonlinearSystem class. Thanks; this is interesting stuff. All the numbers I've taken down have been with matrix and preconditioner included, on smaller problems. > First thing to note is that the #MB/proc is a _range_. This range is > taken from me watching "top" on one of the compute nodes. The memory > usage of each process _oscillates_ between the two numbers listed > about every 5 seconds. Based on watching using "xosview" I believe > that the high numbers occur during communication steps, Define "communication steps"  I assume you're not talking about synching up ghost DoFs during a solve? Are you doing mesh refinement at each oscillation? If that's it, then there's two likely culprits, but I don't know if they're sufficient explanation: We still use global error vectors; these need to be (optionally) moved to a mapvector sort of structure just like the mesh node and element vectors were. A singleprecision float per element would be 40MB on each node for 10 million elements. We use serialized DoF vectors in System::project_vector(). This is even worse because of the double precision; each vector must cost 80MB for 10 million DoFs. But even that adds up to only 120MB; if you're seeing ~500MB usage jumps, I'm not sure where the rest is coming from. I don't suppose you can pin down exactly where the memory's being allocated?  Roy 