From: Tim K. <tim...@ce...> - 2009-01-14 14:51:30
|
Dear libMesh team, Is there any chance that the memory scaling of System::current_local_solution will be improved in the near future? In my application, I have a large number of systems and a large number of cells, and the fact that System::current_local_solution is always a serial vector seems to destroy the memory scalability completely. The cluster I'm computing on has 8 CPUs per node but I cannot use more than 2 (or perhaps three) since the nodes run out of memory otherwise. I don't think that the (serial) grid itself is the decisive thing in my application. (I watched the memory consumption, and it remains small during the grid creation procedure and increases drastically when the systems are created and initialized.) I think that adding the required functionality to NumericVector and PetscVector would not be too complicated (PETSc's VecCreateGhost() seems to do the trick). I can try to do this part myself, that is to add a constructor that aditionally takes a list of ghost indices that we want to store (and implement everything on the PETSc side). The other side is to make the System class use that new constructor (in PETSc case at least) and, in particular, determine which indices are actually required. And to do the correct thing when the grid is refined/coarsened. I feel not able to implement that part because I'm not familiar enough with the interna of libMesh. Let me know what you guys think. Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Roy S. <roy...@ic...> - 2009-01-14 15:16:20
|
On Wed, 14 Jan 2009, Tim Kroeger wrote: > I don't think that the (serial) grid itself is the decisive thing in > my application. (I watched the memory consumption, and it remains > small during the grid creation procedure and increases drastically > when the systems are created and initialized.) Keep in mind, the serial mesh isn't the only thing that goes on the SerialMesh. We store our degree of freedom indexing on the Node and Elem objects themselves, and so on a SerialMesh those don't scale well either. And like the current_local_solution, those get allocated and initialized during system initialization. > I think that adding the required functionality to NumericVector and > PetscVector would not be too complicated (PETSc's VecCreateGhost() > seems to do the trick). I can try to do this part myself, that is to > add a constructor that aditionally takes a list of ghost indices that > we want to store (and implement everything on the PETSc side). The > other side is to make the System class use that new constructor (in > PETSc case at least) and, in particular, determine which indices are > actually required. And to do the correct thing when the grid is > refined/coarsened. I feel not able to implement that part because I'm > not familiar enough with the interna of libMesh. Yeah, I can do that easily enough. Ben's already done the hard work of properly creating the "send_list" of ghost DoFs; I don't think I'll have any problem figuring out when to plug that into a constructor or reinitialization function. --- Roy |
From: Tim K. <tim...@ce...> - 2009-01-14 15:21:23
|
Dear Roy, On Wed, 14 Jan 2009, Roy Stogner wrote: > On Wed, 14 Jan 2009, Tim Kroeger wrote: > >> I don't think that the (serial) grid itself is the decisive thing in >> my application. (I watched the memory consumption, and it remains >> small during the grid creation procedure and increases drastically >> when the systems are created and initialized.) > > Keep in mind, the serial mesh isn't the only thing that goes on the > SerialMesh. We store our degree of freedom indexing on the Node and > Elem objects themselves, and so on a SerialMesh those don't scale well > either. And like the current_local_solution, those get allocated and > initialized during system initialization. Okay, I didn't know that. >> I think that adding the required functionality to NumericVector and >> PetscVector would not be too complicated (PETSc's VecCreateGhost() >> seems to do the trick). I can try to do this part myself, that is to >> add a constructor that aditionally takes a list of ghost indices that >> we want to store (and implement everything on the PETSc side). The >> other side is to make the System class use that new constructor (in >> PETSc case at least) and, in particular, determine which indices are >> actually required. And to do the correct thing when the grid is >> refined/coarsened. I feel not able to implement that part because I'm >> not familiar enough with the interna of libMesh. > > Yeah, I can do that easily enough. Ben's already done the hard work > of properly creating the "send_list" of ghost DoFs; I don't think I'll > have any problem figuring out when to plug that into a constructor or > reinitialization function. What do you mean by "Yeah"? Would you like me to do something (NumericVector, PetscVector), or do you mean you will do everything yourselves? In any case, is there any idea about when this will be usable? Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Kirk, B. (JSC-EG) <Ben...@na...> - 2009-01-14 15:30:00
|
>>> I don't think that the (serial) grid itself is the decisive thing in >>> my application. (I watched the memory consumption, and it remains >>> small during the grid creation procedure and increases drastically >>> when the systems are created and initialized.) >> >> Keep in mind, the serial mesh isn't the only thing that goes on the >> SerialMesh. We store our degree of freedom indexing on the Node and >> Elem objects themselves, and so on a SerialMesh those don't scale well >> either. And like the current_local_solution, those get allocated and >> initialized during system initialization. > > Okay, I didn't know that. The memory usage spike is almost certainly happening when the degree of freedom indices are allocated and stored in the DofObject. We've whittled down the memory usage of that class a few times over the years, and I owe it one more major refactoring to try and reduce memory consumption. John and I were talking about this a little while back... Right now we essentially build a 2D matrix with variable row length to store the dof index data, we should instead pack it into a contiguous array. -Ben |
From: Roy S. <roy...@ic...> - 2009-01-14 15:52:31
|
On Wed, 14 Jan 2009, Tim Kroeger wrote: >>> I think that adding the required functionality to NumericVector and >>> PetscVector would not be too complicated (PETSc's VecCreateGhost() >>> seems to do the trick). I can try to do this part myself, that is to >>> add a constructor that aditionally takes a list of ghost indices that >>> we want to store (and implement everything on the PETSc side). The >>> other side is to make the System class use that new constructor (in >>> PETSc case at least) and, in particular, determine which indices are >>> actually required. And to do the correct thing when the grid is >>> refined/coarsened. I feel not able to implement that part because I'm >>> not familiar enough with the interna of libMesh. >> >> Yeah, I can do that easily enough. Ben's already done the hard work >> of properly creating the "send_list" of ghost DoFs; I don't think I'll >> have any problem figuring out when to plug that into a constructor or >> reinitialization function. > > What do you mean by "Yeah"? Would you like me to do something > (NumericVector, PetscVector), or do you mean you will do everything > yourselves? By "that" I just meant the stuff you've referred to as "the other side"; I don't have time to figure out the right PETSc APIs for part-contiguous, part-sparse vectors. I'll write "dummy" implementations for Laspack and Trilinos vectors too - not to get the same performance out of them yet, but just to make sure stuff still compiles after the change. > In any case, is there any idea about when this will be usable? Depends how fast you get the Petsc implementation working. ;-) But if you put in that part of the work, I'll do my best not to hold you up waiting for the rest. This won't be a high priority yet on my day job, but for a real chance at finishing something that's been on the libMesh TODO list for years, I wouldn't mind working a couple nights or a weekend. --- Roy |
From: Tim K. <tim...@ce...> - 2009-01-14 17:18:04
|
Dear Roy, On Wed, 14 Jan 2009, Roy Stogner wrote: > By "that" I just meant the stuff you've referred to as "the other > side"; I don't have time to figure out the right PETSc APIs for > part-contiguous, part-sparse vectors. Okay, so I will start doing the NumericVector and the PetscVector stuff now. I will keep you informed. Now, the first problem I encounter -- but actually it's not really a problem -- is that PETSc supplies the possibility to automatically communicate any changes of vector components to the other processors that may have those indices as ghost indices. While, in a long term view, using that functionality would help to get rid of the duplicated solution representation, I think that I should for now *not* use that functionality since the current construction of current_local_solution does not expect that. Let me know if you have a different option. Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Roy S. <roy...@ic...> - 2009-01-14 17:34:08
|
On Wed, 14 Jan 2009, Tim Kroeger wrote: > Now, the first problem I encounter -- but actually it's not really a problem > -- is that PETSc supplies the possibility to automatically communicate any > changes of vector components to the other processors that may have those > indices as ghost indices. While, in a long term view, using that > functionality would help to get rid of the duplicated solution > representation, I think that I should for now *not* use that functionality > since the current construction of current_local_solution does not expect > that. Interesting possibility, but even in the long run I don't think we'd want to use it unless it's quite efficient. I can't see how they'd accomplish that without requiring every set() to perform tests, every ghost dof set() to do non-blocking message sends, and every get() to test for a message receipt. Synchronizing data only in large batches when we know it's necessary isn't quite as intuitive but is probably faster. Plus there's the Trilinos question - if we assume NumericVectors have such functionality, are we going to be able to get it easily from every implementation without adding it ourselves? --- Roy |
From: Derek G. <fri...@gm...> - 2009-01-14 17:53:08
|
I'm pretty sure that Trilinos doesn't have this capability. I'm with Roy in thinking that it would be better to code this once ourselves so it will work for all NumericVectors. Derek |
From: Tim K. <tim...@ce...> - 2009-01-15 07:49:51
|
Dear Roy, On Wed, 14 Jan 2009, Roy Stogner wrote: > On Wed, 14 Jan 2009, Tim Kroeger wrote: > >> Now, the first problem I encounter -- but actually it's not really a >> problem -- is that PETSc supplies the possibility to automatically >> communicate any changes of vector components to the other processors that >> may have those indices as ghost indices. While, in a long term view, using >> that functionality would help to get rid of the duplicated solution >> representation, I think that I should for now *not* use that functionality >> since the current construction of current_local_solution does not expect >> that. > > Interesting possibility, but even in the long run I don't think we'd > want to use it unless it's quite efficient. I can't see how they'd > accomplish that without requiring every set() to perform tests, every > ghost dof set() to do non-blocking message sends, and every get() to > test for a message receipt. Synchronizing data only in large batches > when we know it's necessary isn't quite as intuitive but is probably > faster. I formulated my statement misleading. The communication is not done automatically after each set() operation, it is (as yours) done in blocks on request. There are two advantages I see over the current implementation: First, the necessity to have two vectors would be removed. Second, communication is reduced by only transfering those values that have actually changed (It seems to me that PETSc manages a list of them). But still, I think we should not use that, at least at the moment. But there is another problem (things turn out to be more difficult than I thought): In the ghost cell case, PETSc does not provide the appropriate global-to-local mapping that would be required for e.g. NumericVector::operator()(unsigned int). I asked this in the petsc-users list. Jed Brown commented on this by arguing the natural thing would be that libMesh's DofMap should work on local dof numbers. (A local-to-global mapping seems to be provided by PETSc.) My idea would be that PetscVector creates the global-to-local mapping (for the ghost cells only) itself (and stores it, e.g. as a std::map<unsigned int, unsigned int>). This should still save a lot of memory compared with the serial vector version. Do you agree? Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |
From: Roy S. <roy...@ic...> - 2009-01-15 14:00:06
|
On Thu, 15 Jan 2009, Tim Kroeger wrote: > But there is another problem (things turn out to be more difficult than I > thought): In the ghost cell case, PETSc does not provide the appropriate > global-to-local mapping that would be required for e.g. > NumericVector::operator()(unsigned int). I asked this in the petsc-users > list. Jed Brown commented on this by arguing the natural thing would be that > libMesh's DofMap should work on local dof numbers. Giving identical degrees of freedom different ids on different processors would not be natural for us, especially considering the changes we'd be forced to make to the DofMap (which would now need to talk to our numeric interfaces for the first time!), to our non-Petsc numeric interfaces, to code that stores non-DoF data associated with particular degrees of freedom... I sympathize with their desire to have contiguous local dof numbers, but I think this would be too big a change for us right now. > (A local-to-global mapping seems to be provided by PETSc.) Interesting - like a sparsity pattern for vectors. Is it shared between multiple vectors? > My idea would be that PetscVector creates the global-to-local mapping (for > the ghost cells only) itself (and stores it, e.g. as a std::map<unsigned int, > unsigned int>). This should still save a lot of memory compared with the > serial vector version. Sounds like the best we can do. Maybe typedef the container to make it easier for us to play with map vs. hash_map performance for it. --- Roy |
From: Tim K. <tim...@ce...> - 2009-01-15 16:13:02
|
On Thu, 15 Jan 2009, Roy Stogner wrote: >> (A local-to-global mapping seems to be provided by PETSc.) > > Interesting - like a sparsity pattern for vectors. Is it shared > between multiple vectors? I think yes, but I'm not sure. >> My idea would be that PetscVector creates the global-to-local mapping (for >> the ghost cells only) itself (and stores it, e.g. as a std::map<unsigned >> int, unsigned int>). This should still save a lot of memory compared with >> the serial vector version. > > Sounds like the best we can do. Maybe typedef the container to make > it easier for us to play with map vs. hash_map performance for it. Okay. (I've not done much yet, though.) Best Regards, Tim -- Dr. Tim Kroeger tim...@me... Phone +49-421-218-7710 tim...@ce... Fax +49-421-218-4236 Fraunhofer MEVIS, Institute for Medical Image Computing Universitaetsallee 29, 28359 Bremen, Germany |