From: Roy S. <ro...@st...> - 2008-09-02 01:59:12
|
On Mon, 1 Sep 2008, Tim Kroeger wrote: > Since I didn't get any reply yet, I am not sure whether you got the mail > below. On the other hand, perhaps you just didn't answer because you found > there was nothing to say. Actually, neither was the case - I didn't answer because had very little time to check email over the last week, and a proper answer to yours required a little time to look over things. > Anyway, the thing remains a major problem for me, in particular because the > computation keeps running out of memory for the number of elements I really > want to use -- no matter how many CPUs I am using. (I am not completely sure > whether this is really due to the serial vector in EquationSystems::reinit(), > but I have no other idea what the reason could be.) We instantiate a couple other serial vectors in a typical code (the current_local_solution and its value at the previous timestep) - these don't have the CPU time scalability issues that the System::project_vector temporary does because only O(N/Nproc) of their entries are regularly accessed, but they have the same memory scalability problems. Typically the biggest problem for memory scalability is the SerialMesh - a coefficient or two per degree of freedom is still less than the many pointers per element and per node that our unstructured mesh class requires. Unfortunately ParallelMesh probably won't work for you yet - it's not well tested in general and it's definitely got remaining bugs in certain adaptive coarsening cases. > Unfortunately, I feel not able to restructure that projection method > myself, since I am not familiar with that part of libMesh. However, > if you give me some advice (where to look for similar code etc.), I > might try it. For fixing the runtime scalability of project_vector, I'm actually working on the easiest improvement to that right now: we'll keep creating the serial vector, but localize to it with a properly built send_list instead of doing a global localization. This is the same thing we do with the other serial vectors, and it should be sufficient. We want O(N/Nproc) allocation eventually, but O(N/Nproc) communication is more pressing. I'll let you know when I commit that to SVN - because your code seems to be hitting this bottleneck most strongly I'd appreciate it if you would help with benchmarking/debugging. Note to Ben: this is going to be a sufficiently complex change that I don't think it should go into 0.6.3. So depending on whether that mesh I/O problem I found was a regression or just a corrupted xdr file, I think we should either backport the fix or re-label the 0.6.3-rc1 as 0.6.3 final. Tim: although I wouldn't recommend digging into System::project_vector or into ParallelMesh, the one remaining obstacle to O(N/Nproc) memory scalability in libMesh is those serial vectors. We currently allocate a global vector then only fill the parts of it that correspond to local and ghost dofs - simply because we don't have the kind of "SparseVector" data structure that would be necessary to do that efficiently. Ben tells me that PETSc has something reasonable available (using for internal storage a single block for local coefficients plus a sparse structure for ghost coefficients), but we'd need a libMesh interface to that (while maintaining compatibility with LASPACK, Trilinos, and our internal vector formats!) If you're looking to volunteer for something, this is the one place where our scalability really needs improvement but where nobody's currently working on it. > I hope this information is sufficient for you. If not, please let > me know what else you need. I could use some more fine-grained perf logging. You don't need to create your own PerfLog object; using the global log with START_LOG/STOP_LOG would be fine. While your current test establishes that the combination of distribute_dofs, create_dof_constraints, and prolong_vectors is responsible for your poor performance, I'd like to verify that System::project_vector is the problem and that the localization in particular is what's not scaling. In particular, try wrapping a log around lines 79 through 107 of system_projection.C and make sure that the scalability failure is there. --- Roy |