From: Roy S. <roy...@ic...> - 2007-02-24 20:18:39
|
I'm never going to find time to implement this myself, but I thought I'd throw it out there in case anyone else is interested: With reference counting off, DofObjects in libMesh currently have: DofObject *old_dof_object int _id short int _processor_id char _n_systems (probably 1 byte alignment padding here) unsigned char *_n_vars unsigned char **_n_comp unsigned char **_dof_ids That's 24 bytes on a 32 bit system, 40 bytes on an LP64 bit system. But, the pointers are all pointing to things allocated on the heap... does anyone know how much memory management overhead we get with g++ and the default new()? Even with 0 overhead, we still end up with at least an additional byte and two pointers per system, plus a byte and an int per variable... and then the old_dof_object doubles that. For Vikram's code, as an example, with 3 systems and up to 5 variables on each node, that's something like 152 bytes on a 32 bit system or 232 bytes on an LP64 system. So here's my thought: What if we have this for each DofObject: int _id short int _processor_id char _dof_object_type char _old_dof_object_type int _per_type_id int _old_per_type_id And this for each DofMap: std::vector<int> dof_renumbering(n_dofs) That's 16 bytes per DofObject, plus 4 bytes per variable (totaling up to 36 bytes per node in Vikram's code), with no allocator overhead. The _dof_object_type tells the DofMap what kind of object it is: Elem, vertex, hanging vertex/edge, hanging vertex/face, edge, etc. Likewise for the _old_dof_object_type. The _per_type_id numbers each type of DofObject from 0 to (e.g.) n_vertices; likewise for _old_per_type_id. The "raw" dof numbering then goes in order by object type: all the vertices first, all the edges next, all the faces third, etc. With isotropic refinement, there should only be a dozen or two types of objects. Because n_comp is the same for DofObjects of the same _dof_object_type, to get the dof numbering for a variable on a DoF object you just add: for the v variable on an edge in a u/v/p system, for example, the raw global DoF index is: raw_global_index = n_u_dofs + n_v_vertices*n_v_dofs_per_vertex + _per_type_id*n_v_dofs_per_edge + edge_local_index And the global DoF index is dof_renumbering[raw_global_index] (which is necessary, most importantly to keep DoFs on each processor contiguous) This is probably way too much work to save what may add up to only a hundred megs of RAM... but it seems in my experience that Mesh objects are taking more memory than they should, and the first suspect IMHO is the DofObject class. --- Roy |
From: Derek G. <fri...@gm...> - 2007-02-25 17:10:47
|
The idea definitely sounds promising, but I will caution against unnecessary complication in the name of memory optimization. In Sierra no Mesh object (node, element, face etc..) knows its own type (MeshObj is essentially a typeless container)... therefore the code itself must keep track of which type is which and sort it out when necessary. At first this sounds like a great idea (MeshObj are really light... and really general) until you actually start implementing in code... and you end up dynamic casting all over the place to figure out what is what. It makes things that should be relatively simple _much_ more difficult (like seeing if a point lies within an element... I'll give you an example of this on Monday). This is mostly a subsystem change, but you have to consider that if you want to bring outside FE developers in to help.... the more of these things you do the harder it is for someone to get in there and change things. Keeping subsystems logically similar to how someone new to the code would think about it is advantageous in the long run. (but of course the whole dofobject thing is fairly complicated now anyway, so a little added complication might not hurt ;-) Anyway, what you are proposing isn't going that far... I'm just trying to illustrate some of the pitfalls of "over optimization". Essentially, if you were to take memory optimization to the max.... you would end up with Sierra... and I think _none_ of us wants that! For me, the most worthwhile optimization endeavor is parallel mesh. All day long I talk about how great libMesh is to all my coworkers at Sandia... and when I show them code snippets they go "wow" because of how simple it is to get seemingly complicated things done (things that would take a month to do in Sierra can be done in 10 minutes in libMesh). But then the discussion _always_ comes back to: "But does it do parallel mesh?"... and then I have to digress.... Derek On 2/24/07, Roy Stogner <roy...@ic...> wrote: > > I'm never going to find time to implement this myself, but I thought > I'd throw it out there in case anyone else is interested: > > > With reference counting off, DofObjects in libMesh currently have: > > DofObject *old_dof_object > int _id > short int _processor_id > char _n_systems > (probably 1 byte alignment padding here) > unsigned char *_n_vars > unsigned char **_n_comp > unsigned char **_dof_ids > > That's 24 bytes on a 32 bit system, 40 bytes on an LP64 bit system. > > But, the pointers are all pointing to things allocated on the heap... > does anyone know how much memory management overhead we get with g++ > and the default new()? > > Even with 0 overhead, we still end up with at least an additional byte > and two pointers per system, plus a byte and an int per variable... > and then the old_dof_object doubles that. For Vikram's code, as an > example, with 3 systems and up to 5 variables on each node, that's > something like 152 bytes on a 32 bit system or 232 bytes on an LP64 > system. > > > So here's my thought: > > What if we have this for each DofObject: > > int _id > short int _processor_id > char _dof_object_type > char _old_dof_object_type > int _per_type_id > int _old_per_type_id > > And this for each DofMap: > std::vector<int> dof_renumbering(n_dofs) > > That's 16 bytes per DofObject, plus 4 bytes per variable (totaling up > to 36 bytes per node in Vikram's code), with no allocator overhead. > > The _dof_object_type tells the DofMap what kind of object it is: Elem, > vertex, hanging vertex/edge, hanging vertex/face, edge, etc. Likewise > for the _old_dof_object_type. The _per_type_id numbers each type of > DofObject from 0 to (e.g.) n_vertices; likewise for _old_per_type_id. > > The "raw" dof numbering then goes in order by object type: all the > vertices first, all the edges next, all the faces third, etc. With > isotropic refinement, there should only be a dozen or two types of > objects. > > Because n_comp is the same for DofObjects of the same > _dof_object_type, to get the dof numbering for a variable on a DoF > object you just add: for the v variable on an edge in a u/v/p system, > for example, the raw global DoF index is: > > raw_global_index = n_u_dofs + n_v_vertices*n_v_dofs_per_vertex + > _per_type_id*n_v_dofs_per_edge + edge_local_index > > And the global DoF index is dof_renumbering[raw_global_index] > (which is necessary, most importantly to keep DoFs on each processor > contiguous) > > > > This is probably way too much work to save what may add up to only a > hundred megs of RAM... but it seems in my experience that Mesh > objects are taking more memory than they should, and the first suspect > IMHO is the DofObject class. > --- > Roy > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys-and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Libmesh-devel mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-devel > |
From: John P. <pet...@cf...> - 2007-02-26 16:35:22
|
Derek Gaston writes: > > For me, the most worthwhile optimization endeavor is parallel mesh. > All day long I talk about how great libMesh is to all my coworkers at > Sandia... and when I show them code snippets they go "wow" because of > how simple it is to get seemingly complicated things done (things that > would take a month to do in Sierra can be done in 10 minutes in > libMesh). But then the discussion _always_ comes back to: "But does > it do parallel mesh?"... and then I have to digress.... I agree. They aren't putting any fewer cores on CPUs these days and I believe we are heading toward a semi-crisis in terms of the scalability of LibMesh on clusters with nodes that have 2 or 4 dual-core CPUs and only 8GB (or less) of system memory. However, if we start seeing more 16GB server nodes I don't think I'll be worrying nearly as much. -John |
From: Roy S. <roy...@ic...> - 2007-02-25 17:41:10
|
On Sun, 25 Feb 2007, Derek Gaston wrote: > This is mostly a subsystem change, but you have to consider that if > you want to bring outside FE developers in to help.... the more of > these things you do the harder it is for someone to get in there and > change things. That's a worry, yes. The biggest drawback of my idea is that it would require a new Enum of DofObject types, a new method like Elem::dof_object_type(node_num), and so it would give people one more thing to worry about when adding new geometric elements. Granted, we're not planning on adding more geometric elements any time soon, but we'll have to add a few eventually if we ever want to support p>2 on tets and prisms or even p>1 on pyramids. But I really think there's a lot of room to work within DofObject and DofMap. I've already made two big feature additions (AMR on non-Lagrange elements, and p refinement) and one slight optimization (storing dof index ranges as first,count rather than as whole lists), and neither required any changes to the API or any DoF related changes to other code. I don't think this change would affect any code outside of the dof_map and dof_object files either. It's an incredibly well-isolated subsystem, especially considering how low level it is. > Keeping subsystems logically similar to how someone > new to the code would think about it is advantageous in the long run. > (but of course the whole dofobject thing is fairly complicated now > anyway, so a little added complication might not hurt ;-) Bah; someone new to the code shouldn't be futzing around in the most low level systems anyway. I probably only got away with it because Ben wasn't watching the CVS logs closely enough. ;-) The complication should all be hidden, too. The DofObject and DofMap APIs wouldn't have to change, just the underlying implementation. > Anyway, what you are proposing isn't going that far... I'm just trying > to illustrate some of the pitfalls of "over optimization". Oh, I understand being wary about it. > Essentially, if you were to take memory optimization to the max.... > you would end up with Sierra... and I think _none_ of us wants that! You know these mailing list messages get archived and Google indexed where managers can read them, right? ;-) > For me, the most worthwhile optimization endeavor is parallel mesh. Agreed. Tweaking DofObject would probably save dozens of megabytes on medium sized problems, but parallelizing the mesh would save gigabytes on large problems. That's another endeavor I wish I had the time (and the MPI skills) for. Ben was sounding more motivated about it the last time he was in Austin, though; perhaps once he's got his defense out of the way we can goad him into action. --- Roy |
From: Roy S. <roy...@ic...> - 2007-02-25 17:53:50
|
On Sun, 25 Feb 2007, Roy Stogner wrote: > I've already made two big feature additions (AMR on > non-Lagrange elements, and p refinement) You know, I forgot to think about how my optimization idea would interact with p refinement. That's another few bytes per object, plus it would make the global indexing calculations take about ten times longer. Consider also the 28 bytes per node of geometry and Node::key data, and my idea is sounding less and less worthwhile. --- Roy |