That graph is pretty awesome - thanks! I'm gonna have to digest that, but I think there could be some small room for improvement with a different data structure - if I interpret the steep ramp up to (global_index_map end) as map construction. 

-Ben


On Oct 30, 2013, at 6:55 PM, "John Peterson" <jwpeterson@gmail.com> wrote:




On Wed, Oct 30, 2013 at 3:27 PM, John Peterson <jwpeterson@gmail.com> wrote:



On Wed, Oct 30, 2013 at 3:09 PM, Kirk, Benjamin (JSC-EG311) <benjamin.kirk@nasa.gov> wrote:
Yeah, before I get too carried away I should probably just try running the existing code path twice:  Once as-is, and again actually commenting out the underlying Metis call, making the partitioner a big, expensive no-op.

Actually, John, if you have a chance could you rerun one of the cases you have data for, but just comment out the call to metis?  Hopefully the memory  usage will drop, verifying metis is the issue.

It should suffice to comment out the metis call, and add a

std::fill (part.begin(), part.end(), 0);

instead, provided its this simple stand-alone case where the mesh is not used!

Yep, I can certainly do that, but I think this is already verified just by looking at the difference in memory usage between Centroid/Linear/SFC Paritioner and Metis I posted in one of the prior emails this week.

Here's a link to a plot of total memory usage (across 2 procs) for the 200^3 case, annoated at different points in the simulation:


The plot didn't quite include all the annotations I was expecting, but I do have some more precise numbers:

1. before/after building global_index_map: 6653660 -  5615440 K = 0.99 Gb total, half a gig/core

2. begin/end call to Metis: 7628896 - 7460828 = .16 Gb, we actually have slightly _more_ memory free when Metis finishes (plus/minus sampling error) so I don't think there are any major leaks in Metis

3. The ramp between the "global_index_map end" and "graph alloc" is the time when the graph is filled up and when the entries in vwgt, which was allocated earlier, are finally being touched.  Could be the OS is finally assigning vwgt actual memory during this time?  I would have thought we would recover more memory when the graph is deallocated, which happens just before the call to PartGraphRecursive (you can see a slight dip there)...

I'll have to try and instrument it a bit more carefully tomorrow.

--
John