On Tue, Oct 29, 2013 at 2:48 PM, John Peterson <jwpeterson@gmail.com> wrote:

I just checked out the hash immediately prior to the latest Metis/Parmetis refresh (git co 5771c42933), ran the same tests again, and got basically the same results on the 200^3 case.

So I don't think the metis/parmetis refresh introduced any new memory bugs...

Just for the hell of it, I also tried some other problem sizes, and in going from 1 core to 2 cores (Metis off to Metis on) the memory usage per core always increases (to within the accuracy of Activity Monitor) by a factor between 1.5 and 1.9:

100^3: 300 -> 500  Mb/core (1.67X)
150^3: 975 ->1700 Mb/core (1.75X)
175^3: 1.5 -> 2.8 Gb/core (1.87X) 
200^3: 2.22 -> 4 Gb/core (1.80X)
225^3: 3.15 -> 4.75 Gb/core (1.5X)

Using a more accurate memory logger evened these numbers out quite a bit.  It is a nearly universal 1.9X increase in peak memory per core to use Metis:

150^3
-----
3864660 / 1001284 / 2 = 1.9298 (per core)

175^3
-----
6118764 / 1570976 / 2 = 1.9474 (per core)

200^3 
-----
9070180 / 2333592 / 2 = 1.9434 (per core)

(numbers are _total_ peak memory for 2 procs, peak memory for 1 proc, then divide by 2 to get a per-core number.)



 
We have a more fine-grained memory checker tool here that I'm going to try in a bit, and I'm also going to try the same tests with ParallelMesh/Parmetis.

The numbers are a bit better when using ParallelMesh with Parmetis (rather than Metis) as the partitioner, but not great: peak memory per core increases by about 1.4X when using the partitioner.  

150^3
-----
4147908 / 1433204 / 2 = 1.4470 (per core)

175^3
-----
6483244 / 2258816 / 2 = 1.4350 (per core)

200^3
-----
9783764 / 3356264 / 2 = 1.4575 (per core)



So, in summary, if you use Metis/Parmetis don't assume that because the Mesh alone takes up 2 gigs on 1 processor that you can safely run the same problem in, say, 8 gigs on 4 procs.

In reality, you are looking at about 1.9 * 2 gigs/proc * 4 procs = 15.2 gigs for SerialMesh or 1.45*2*4=11.6 gigs for ParalleMesh...



Ben, it looks like we currently base our partitioning algorithm choice solely on the number of partitions...  Do you recall if PartGraphKway is any more memory efficient than the PartGraphRecursive algorithm?  If so, perhaps we could base our algorithm choice on the size of the mesh requested as well as the number of partitions... I might experiment with this a bit as well.

Testing the PartGraphKway algorithm now, will report back with results...

--
John