Re: [Gfs-users] Gerris parallel performance

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Rob,

Thanks for your detailed problem report. I have been aware that large
number of boxes can cause performance problems but you had useful
details.

There are two ways to improve the situation: a short term goal of
fixing the performance issues you describe and longer term goal of
removing the requirement for large number of boxes altogether.

> 1) Each time the domain is split, the Poisson problem gets stiffer as
> a result of fewer coarse grids being available. This produces a very
> stiff problem when the domain is split many times.

Indeed. In the limit of "one cell per box", there is no multigrid acceleration.

> 2) The overhead involved in calls to gfs_domain_foreach(...) gets very
> large. This is due to the fact that the boxes are sorted and the
> "compare_boxes" function gets called a lot. I profiled this for a
> domain split 5 times (32,768 octrees) and calls to the compare_boxes
> function took up over 30% of the total simulation time. For cases with
> more than 5 splits, this gets much worse. I observe this behavior both
> with debug and optimized builds of Gerris.

This is very useful. The box sorting is only there to make the box
traversal (and hence the whole algorithm) deterministic, so in many
ways it is not essential. I will see what I can do.

> 3) I get tons of warnings indicating that the maximum MPI tag value
> has been exceeded:
> Gfs-WARNING **: PE 1 (rehlin64): GfsBoundaryMpi id (22855) is larger
> than the maximum MPI tag value
> allowed on this system (5461)

Hmm, 5461 is a small maximum tag value. Which MPI are you using? What
happens is that MPI needs to tag the messages for each boundary of
your many gfsboxes and is thus running out of tags.

> 1) Is the use of Hypre and algebraic multigrid the ultimate solution
> to the Poisson stiffness problem, or are their ways to get around this
> using the native Gerris Poisson solver?

Hypre should help, however the correct solution is to reduce the
number of boxes. This could be done by allowing neighbouring boxes to
be of different sizes (i.e. by a factor of 2). This is the long term
solution I was referring to.

> 2) Can the box sorting be optimized, or only done once and stored?
> Could the traversals proceed instead on a domain of unsorted boxes
> (probably not, but thought I'd ask)? Is this an inherent performance
> bottleneck of the current method?

As I said box sorting is nice but not essential at all, so we should
be able to workaround this one.

> 3) Should I be getting the above warnings? Do the warnings indicate
> that tagged MPI communications will not behave as expected?

Hmm, good question. I would think this should cause problems, yes.

> I'm apologize for the length of the post, but I am in need of some
> general advice on the use of Gerris in the manor described above.

No worries. The informations you provided were very useful. And good
on you for living in a manor... ;-)

cheers

Stephane