On Fri, Nov 1, 2013 at 7:46 PM, John Peterson <jwpeterson@gmail.com> wrote:

I guess the next thing I'll look at is 

400.a/b: Wraps the creation of the 'xadj', 'adjncy', and 'graph' objects.  vwgt is also written to during this time, so it must finally be allocated.  Note that 'graph' has been deallocated by the time we reach 400.b.

but I really don't see any obvious path for achieving memory optimization here, unless it's possible to build the 'adjncy' array directly without the intermediate 'graph' object.

Here's something else I found interesting:  'graph' is a vector<vector> with a ton of "rows" (n_active_elem) but relatively few "columns" (no more than n_neighbors per element, so 6 for hexes).  I compared the amount of storage required for a 1D vector with 48M elements to a vector<vector> with 8M rows, each row having 6 entries.  For the 1D case, we expect to use about 48 * 10^6 * 4 / 1024 = 187,500 K, which the linked graph confirms.  For the vector<vector> case, which stores the same amount of actual data, the memory overhead is pretty huge (2.36X) to store all those small vectors.  Instead of .18 Gb you are using .42 Gb...


Ben, I think this is pretty similar to the optimization you did fairly recently in the DofMap...

Finally, note that we call adjncy.reserve(graph_size); where graph_size=48M.  This was most likely done for speed, but it means that the full memory for both 'graph' and 'adjncy' are required at the same time.  We could instead try letting the size of adjncy grow as it's filled in, and check to see if it slows down the code appreciably... but I would expect this to be a much smaller memory savings than somehow refactoring 'graph' would be.