Let me give a bit of insight from how we do mesh reading/writing in
parallel in Sierra....
What we do, is partition the mesh _offline_ first. Then, when we
start the program, each processor reads only it's mesh file. This is
advantageous because you can find some large memory machine to do the
partitioning on... and then run on a cluster where each node has a
small amount of memory.
As for writing... it's the same. Each node writes out it's portion of
the solution... then, after the simulation is finished you can bring
those solutions over to a large memory machine and "concat" them all
together. Again, this means that the machine you solve on never has
to have enough memory to hold the entire mesh.
Now, of course, this definitely has some drawbacks... all the pre-post
processing steps are kind of cumbersome (they can be done
automatically if you ask for it, but there is no way to do it
automatically if your mesh won't fit on the machine), but it does give
some food for thought.
That said, I do like the idea of processor 0 "dishing out" processor
sized chunks at a time. But, I'm not sure how it would decide what a
processor sized chunk is... or how it would figure out what a good
pattern for dishing out the chunks would be. This is the job of
metis/parmetis... so either we need to use Metis offline _first_ to
get an original splitting (like Sierra), or we're going to have to
dish out all of the chunks and then call Parmetis to do a better
partitioning... and then do a lot of crazy communication to get the
mesh situated correctly. Both ways have their drawbacks.
BTW: Parmetis is exactly what it says it is... _Par_metis. It is
supposed to work in parallel on parallel meshes...
I also like the idea of a ParallelMesh class that derives from Mesh.
I don't however like the look of either of:
> if (EquationSystems::MeshType == ParallelMesh)
> Or else a partial specialization for
I would like to hope that we could design the ParallelMesh class so
that EquationSystems would _not_ have to know what kind of mesh is
being used. I believe this should be a major design goal. It still
might mean some changes to EquationSystems, but at the end of the day
EquationSystems should just take a Mesh object, and not care about
BTW - I'm _REALLY_ glad to see this discussion being had... for
reasons I can't divulge at the moment.....
On 7/13/07, John Peterson <peterson@...> wrote:
> Roy Stogner writes:
> > On Thu, 12 Jul 2007, John Peterson wrote:
> > > I think this definitely takes us in the right direction, though I don't
> > > doubt there will be several gotchas even with this first step. We should
> > > also think about meshes which can't fit on a single processor...
> > >
> > > Such a mesh would be ungainly (how would we store it, multiple
> > > data files?) but you can certainly build_square() something which
> > > is too large for a single CPU, and it would be cool if that eventually
> > > worked in parallel w/o ever having to store the whole thing on 1 CPU.
> > Absolutely.
> Oh, and I forgot to mention, even if your mesh *is* small enough to fit
> on 1 CPU (i.e. you can successfully open the mesh file and read it all in)
> it may be too big for 16 copies to fit, supposing you are running on
> some kind of quad-quad box. I don't think fewer cores are coming back
> in style any time soon.
> > I think Ben's got the right idea about storage (even though it makes
> > Mesh::read() tricky, since we'd have to make multiple passes to avoid
> > reading in too many elements/nodes at once). What worries me about
> > meshes too big to fit on one node is repartitioning. Will Parmetis
> > work that way? The space-filling-curve repartitioners might be ideal
> > for use in parallel; putting all the elements in some 1D order is easy
> > to do in parallel, and then it's easy to negotiate which processors
> > get which elements.
> I believe parmetis is designed to handle partitioning in parallel (based
> on the name only, I haven't tried it). So it should work if we read in
> a chunk of elements, send that out to a processor, and repeat until we've
> read them all in. Then a "real" partitioner could partition them correctly,
> incurring additional communication overhead of course.
> We also implemented a parallel sort algorithm some years back (I have this
> code somewhere, I'm sure) which would probably allow the space-filling curve
> partitioning you suggest.
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> Libmesh-devel mailing list