From: Derek Gaston <friedmud@gm...>  20080624 18:30:37

Hey guys, Using Roy's workaround so that partitioning doesn't happen with ParallelMesh I've been able to run some pretty big problems today, and I thought I would share some numbers All I'm doing is solving pure diffusion with a Dirichlet BC and a forcing function in 3d on hexes.... but I'm doing it completely matrix free using the NonlinearSystem class. The point of these runs is to do some parallel scaling tests. From previous runs with SerialMesh I knew that I needed more that 2 million dofs to see any kind of good parallel scaling over 128 procs. Wanting to get good scaling up to 1024 procs... I did some quick calculations (using some small problems on my 8GB of RAM desktop) I figured I could fit 80 million DOFs on 128 procs (each proc has 2GB of RAM) using ParallelMesh.... it turns out I was pretty far off! In fact, I was off so far that I ended up bringing down half of our supercomputer as it started swapping like crazy! After rebooting a few nodes, I'm now running a slightly smaller problem at 10 million DOFs. I have it running on 64,128 and 256 processors (the 512, and 1024 jobs are still in the queue). This gives me an interesting opportunity to look at the memory scaling using ParallelMesh, since I don't have a matrix involved. Here is how much each proc is using: #CPU : #MB/proc 256 : 200700 128 : 350700 64 : 450800 First thing to note is that the #MB/proc is a _range_. This range is taken from me watching "top" on one of the compute nodes. The memory usage of each process _oscillates_ between the two numbers listed about every 5 seconds. Based on watching using "xosview" I believe that the high numbers occur during communication steps, while the low numbers are during "assembly" (residual computation). At this point, these are just guesses, but I've been watching these things for a few weeks now and kind of have a feel for what's going on. Second thing to realize is that the upper number (700,700,800) is about the same. This is somewhat of a bummer since it means that just adding more procs isn't going to allow me to run an appreciably larger problem. I'm guessing that we have some serialized vectors that at this point are contributing much more memory than the mesh is... Anyway, I just thought I would share some data with everyone. This is by no means a request for anything to happen... nor is it a bug report or anything of the sort. The fact of the matter is that I will be able to run something like 30 million DOFs on this computer without problem.... and and that is freaking sweet! That will more than satisfy my goals this year. But... I am going to keep an eye out for where some fat might be trimmed along the way... and maybe we can get the memory usage scaling a bit better... Derek 