From: Roy S. <roy...@ic...> - 2007-07-12 22:52:55
|
Just to throw out some ideas: After EquationSystems::init() is called, all of the Systems' DofMap objects have had their chance to walk all over the Mesh, and then in most cases every processor should stick to its local and ghost elements until it's time to do adaptive refinement - am I right about that? If so, then codes which aren't using AMR/C (and aren't using MeshFunction, etc) could call Mesh::parallelize() at this time, and all "parallelize()" would have to do would be to delete a lot of Elem and Node objects. Even for codes which are using adaptive refinement, would it be a good stopgap solution to temporarily serialize the Mesh during EquationSystems::reinit() calls? The sequence would go something like: 1. Delete System matrices (which are invalid anyway since we're changing the mesh), making room for a serial Mesh 2. Serialize the Mesh (which might not be much harder than the code in Mesh::read; there's just lots of tricks like remembering to copy old_dof_objects) 3. Repartition the mesh, call System::project_vector, etc. 4. Parallelize the Mesh Of course, this won't work as is (the MeshRefinement flagging schemes assume a serial Mesh, for instance), but it might be a good place to start without creating a huge new ParallelMesh class or breaking existing code. If anyone has a better name suggestion for Mesh::parallelize() (perhaps Mesh::raze()?) or any thoughts to add, let me know. --- Roy |
From: John P. <pet...@cf...> - 2007-07-12 22:59:58
|
Roy Stogner writes: > > Just to throw out some ideas: > > After EquationSystems::init() is called, all of the Systems' DofMap > objects have had their chance to walk all over the Mesh, and then in > most cases every processor should stick to its local and ghost > elements until it's time to do adaptive refinement - am I right about > that? If so, then codes which aren't using AMR/C (and aren't using > MeshFunction, etc) could call Mesh::parallelize() at this time, and > all "parallelize()" would have to do would be to delete a lot of Elem > and Node objects. > > Even for codes which are using adaptive refinement, would it be a good > stopgap solution to temporarily serialize the Mesh during > EquationSystems::reinit() calls? The sequence would go something > like: > > 1. Delete System matrices (which are invalid anyway since we're > changing the mesh), making room for a serial Mesh > 2. Serialize the Mesh (which might not be much harder than the code in > Mesh::read; there's just lots of tricks like remembering to copy > old_dof_objects) > 3. Repartition the mesh, call System::project_vector, etc. > 4. Parallelize the Mesh > > Of course, this won't work as is (the MeshRefinement flagging schemes > assume a serial Mesh, for instance), but it might be a good place to > start without creating a huge new ParallelMesh class or breaking > existing code. > > > If anyone has a better name suggestion for Mesh::parallelize() > (perhaps Mesh::raze()?) or any thoughts to add, let me know. I think this definitely takes us in the right direction, though I don't doubt there will be several gotchas even with this first step. We should also think about meshes which can't fit on a single processor... Such a mesh would be ungainly (how would we store it, multiple data files?) but you can certainly build_square() something which is too large for a single CPU, and it would be cool if that eventually worked in parallel w/o ever having to store the whole thing on 1 CPU. -John |
From: Roy S. <roy...@ic...> - 2007-07-13 13:52:53
|
On Thu, 12 Jul 2007, John Peterson wrote: > I think this definitely takes us in the right direction, though I don't > doubt there will be several gotchas even with this first step. We should > also think about meshes which can't fit on a single processor... > > Such a mesh would be ungainly (how would we store it, multiple > data files?) but you can certainly build_square() something which > is too large for a single CPU, and it would be cool if that eventually > worked in parallel w/o ever having to store the whole thing on 1 CPU. Absolutely. I think Ben's got the right idea about storage (even though it makes Mesh::read() tricky, since we'd have to make multiple passes to avoid reading in too many elements/nodes at once). What worries me about meshes too big to fit on one node is repartitioning. Will Parmetis work that way? The space-filling-curve repartitioners might be ideal for use in parallel; putting all the elements in some 1D order is easy to do in parallel, and then it's easy to negotiate which processors get which elements. --- Roy |
From: John P. <pet...@cf...> - 2007-07-13 14:37:55
|
Roy Stogner writes: > On Thu, 12 Jul 2007, John Peterson wrote: > > > I think this definitely takes us in the right direction, though I don't > > doubt there will be several gotchas even with this first step. We should > > also think about meshes which can't fit on a single processor... > > > > Such a mesh would be ungainly (how would we store it, multiple > > data files?) but you can certainly build_square() something which > > is too large for a single CPU, and it would be cool if that eventually > > worked in parallel w/o ever having to store the whole thing on 1 CPU. > > Absolutely. Oh, and I forgot to mention, even if your mesh *is* small enough to fit on 1 CPU (i.e. you can successfully open the mesh file and read it all in) it may be too big for 16 copies to fit, supposing you are running on some kind of quad-quad box. I don't think fewer cores are coming back in style any time soon. > I think Ben's got the right idea about storage (even though it makes > Mesh::read() tricky, since we'd have to make multiple passes to avoid > reading in too many elements/nodes at once). What worries me about > meshes too big to fit on one node is repartitioning. Will Parmetis > work that way? The space-filling-curve repartitioners might be ideal > for use in parallel; putting all the elements in some 1D order is easy > to do in parallel, and then it's easy to negotiate which processors > get which elements. I believe parmetis is designed to handle partitioning in parallel (based on the name only, I haven't tried it). So it should work if we read in a chunk of elements, send that out to a processor, and repeat until we've read them all in. Then a "real" partitioner could partition them correctly, incurring additional communication overhead of course. We also implemented a parallel sort algorithm some years back (I have this code somewhere, I'm sure) which would probably allow the space-filling curve partitioning you suggest. -John |
From: Derek G. <fri...@gm...> - 2007-07-13 15:49:24
|
Let me give a bit of insight from how we do mesh reading/writing in parallel in Sierra.... What we do, is partition the mesh _offline_ first. Then, when we start the program, each processor reads only it's mesh file. This is advantageous because you can find some large memory machine to do the partitioning on... and then run on a cluster where each node has a small amount of memory. As for writing... it's the same. Each node writes out it's portion of the solution... then, after the simulation is finished you can bring those solutions over to a large memory machine and "concat" them all together. Again, this means that the machine you solve on never has to have enough memory to hold the entire mesh. Now, of course, this definitely has some drawbacks... all the pre-post processing steps are kind of cumbersome (they can be done automatically if you ask for it, but there is no way to do it automatically if your mesh won't fit on the machine), but it does give some food for thought. That said, I do like the idea of processor 0 "dishing out" processor sized chunks at a time. But, I'm not sure how it would decide what a processor sized chunk is... or how it would figure out what a good pattern for dishing out the chunks would be. This is the job of metis/parmetis... so either we need to use Metis offline _first_ to get an original splitting (like Sierra), or we're going to have to dish out all of the chunks and then call Parmetis to do a better partitioning... and then do a lot of crazy communication to get the mesh situated correctly. Both ways have their drawbacks. BTW: Parmetis is exactly what it says it is... _Par_metis. It is supposed to work in parallel on parallel meshes... I also like the idea of a ParallelMesh class that derives from Mesh. I don't however like the look of either of: > if (EquationSystems::MeshType == ParallelMesh) > ... > > Or else a partial specialization for > EquationSystems<ParallelMesh>::reinit(); I would like to hope that we could design the ParallelMesh class so that EquationSystems would _not_ have to know what kind of mesh is being used. I believe this should be a major design goal. It still might mean some changes to EquationSystems, but at the end of the day EquationSystems should just take a Mesh object, and not care about anything else. BTW - I'm _REALLY_ glad to see this discussion being had... for reasons I can't divulge at the moment..... Derek On 7/13/07, John Peterson <pet...@cf...> wrote: > Roy Stogner writes: > > On Thu, 12 Jul 2007, John Peterson wrote: > > > > > I think this definitely takes us in the right direction, though I don't > > > doubt there will be several gotchas even with this first step. We should > > > also think about meshes which can't fit on a single processor... > > > > > > Such a mesh would be ungainly (how would we store it, multiple > > > data files?) but you can certainly build_square() something which > > > is too large for a single CPU, and it would be cool if that eventually > > > worked in parallel w/o ever having to store the whole thing on 1 CPU. > > > > Absolutely. > > Oh, and I forgot to mention, even if your mesh *is* small enough to fit > on 1 CPU (i.e. you can successfully open the mesh file and read it all in) > it may be too big for 16 copies to fit, supposing you are running on > some kind of quad-quad box. I don't think fewer cores are coming back > in style any time soon. > > > I think Ben's got the right idea about storage (even though it makes > > Mesh::read() tricky, since we'd have to make multiple passes to avoid > > reading in too many elements/nodes at once). What worries me about > > meshes too big to fit on one node is repartitioning. Will Parmetis > > work that way? The space-filling-curve repartitioners might be ideal > > for use in parallel; putting all the elements in some 1D order is easy > > to do in parallel, and then it's easy to negotiate which processors > > get which elements. > > I believe parmetis is designed to handle partitioning in parallel (based > on the name only, I haven't tried it). So it should work if we read in > a chunk of elements, send that out to a processor, and repeat until we've > read them all in. Then a "real" partitioner could partition them correctly, > incurring additional communication overhead of course. > > We also implemented a parallel sort algorithm some years back (I have this > code somewhere, I'm sure) which would probably allow the space-filling curve > partitioning you suggest. > > -John > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Libmesh-devel mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-devel > |
From: Roy S. <roy...@ic...> - 2007-07-13 16:03:08
|
On Fri, 13 Jul 2007, Derek Gaston wrote: > That said, I do like the idea of processor 0 "dishing out" processor > sized chunks at a time. But, I'm not sure how it would decide what a > processor sized chunk is... A "processor sized chunk" is just the total number of elements divided by the number of processors (rounding up). > or how it would figure out what a good pattern for dishing out the > chunks would be. Probably nothing more complicated than "the first elements we see in the file go to processor n, the next go to n-1, etc." > This is the job of metis/parmetis... so either we need to use Metis > offline _first_ to get an original splitting (like Sierra), or we're > going to have to dish out all of the chunks and then call Parmetis > to do a better partitioning... and then do a lot of crazy > communication to get the mesh situated correctly. Both ways have > their drawbacks. I vote for "crazy communication". In the long run we want to handle adaptivity efficiently on parallel meshes, and that's going to require repartitioning on the fly anyway. In the short run, I just prefer writing complicated library code once over running through a complicated workthrough for every application. > BTW: Parmetis is exactly what it says it is... _Par_metis. It is > supposed to work in parallel on parallel meshes... Well, libMesh is supposed to work in parallel too, and that's mostly true, but clearly there's parallel and then there's *parallel*. ;-) Good to hear confirmation, though. --- Roy |
From: John P. <pet...@cf...> - 2007-07-13 16:09:21
|
Derek Gaston writes: > > I also like the idea of a ParallelMesh class that derives from Mesh. > I don't however like the look of either of: > > > if (EquationSystems::MeshType == ParallelMesh) > > ... > > > > Or else a partial specialization for > > EquationSystems<ParallelMesh>::reinit(); > > I would like to hope that we could design the ParallelMesh class so > that EquationSystems would _not_ have to know what kind of mesh is > being used. I believe this should be a major design goal. It still > might mean some changes to EquationSystems, but at the end of the day > EquationSystems should just take a Mesh object, and not care about > anything else. Agree. If we start templating on Mesh type I think pretty soon the whole library would become templated on Mesh type. This is exactly what happens if you start templating on space dimension, as I have heard some FE libraries do ;) Before it got to that, I would rather have a polymorphic solution (e.g. for EquationSystems) ESBase -----------^ ^--------- ES ParallelES (In case my ascii art doesn't come across, I just mean an abstract base class of EquationSystems where the implementation determines what happens in the parallel case) But the best solution is the one Derek mentioned: the Mesh interface is such that code using it doesn't know if the Mesh is parallel or not. It's definitely possible that we won't be able to do this in all instances, however. Consider the MeshTools for instance: build_square(MeshBase&, ...) would *ideally* not need to be changed *at all* for parallel Mesh generation, but somehow I don't see that happening. The add_elem() interface would have to somehow "ignore/not add" the elements that would eventually belong to other CPUs. > > BTW - I'm _REALLY_ glad to see this discussion being had... for > reasons I can't divulge at the moment..... Well, you could, but then you would have to kill all of us D: -Joh |