Thread: [Libmesh-devel] parallelization again !!

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi all, 

I am currently taking a course in Parallel algorithms and I am
considering working on some parts of libmesh as a part of the course
project.

As of the email of Bill Barth, Is there a plan for that?

Initially, I'll list my understanding of the current status of the code.
1- Initially the unstructured mesh is read by the master processor using
Mesh::read function and then all the mesh nodes coordinates are packed
into a vector implemented as std::vector<Real>. This vector is
broadcasted to all the processors. Similarly an elements vector that
contains element types and connectivity information is broadcasted to
all processors. Now as all the processors have a copy of the whole mesh
information. 

2- After that a preparation step is done which includes nodes
renumbering, finding neighboring information and finally partitioning
the mesh using the library METIS. The results from METIS is assigned to
each element using a function called elem->set_processor_id() which sets
a private element data of the element class to the owner processor
number.

3- The classical steps in the finite elements solution include building
the stiffness matrix of each element and then mapping the local element
stiffness matrix to the global stiffness matrix are done on each
processor using a local element iterator which uses the processor id as
a predicate.

4- Either LASPack or PETSc library which hides most of the parallization
details of the iterative solver are called

5- The resulting solution is Broadcasted all to all the processors such
that all the processors have the solution results of the whole mesh
which is already stored on each processor.

6- If adaptive mesh refinement is used the results are analyzed using
some error estimation techniques and a refinement or coarsening flag is
assigned to each mesh element. 

7- In the current implementation, the error estimation is done on the
mesh elements owned by each local processor and the results is
broadcasted to all the processors. 

8- The mesh refinement and coarsening is done for the whole mesh on each
processor which is another fault in the current implementation. 

I think the true parallization will have many challenges... 

First, dealing with nodes and edges on the boundary of each partition
which needs to be presented on all the processors that share that
boundary.

Second, the error estimator needs some information across the boundary
which is hard to get unless an overlapping boundaries between processors
is implemented or an element by element messaging across the partition
boundaries is done

Third, during the mesh adaptation a non-conforming nodes may results on
the partition boundaries.

I am currently thinking about sort of an overlapping partitioning, where
each processor stores the elements which belongs to its partition only,
in addition to all neighboring elements on the boundary. These elements
should still retain its processor id and will be called remote copy. No
local calculation is done for remote copies but after each round of
calculation each element that is a neighbor to a remote copy should send
the calculated results to the processor that owns that neighbor.

One of the issues I am thinking about now is the parent information when
the children are distributed across the partitions boundaries!! Another
important issue is the migration of mesh elements across the boundaries
for the dynamic load balancing. 

Feed Back, current status, comments ... are all needed :)

Regards,

Ahmed

Thread: [Libmesh-devel] parallelization again !!

libmesh-devel