Thread: [Libmesh-devel] High-order elements, Stokes solvers

Brought to you by: benkirk, jwpeterson, roystgnr

libmesh-devel

[Libmesh-devel] High-order elements, Stokes solvers

From: Jed B. <je...@59...> - 2009-05-31 17:52:33

Attachments: signature.asc

I'm using libmesh to provide a reference implementation of Q_2 elements
for Poisson, and Q_2-Q_1 for Stokes. If possible, I would also like to
compare with third-order elements, but HIERARCHIC appears to produce
insanely ill-conditioned matrices. For example,

./ex4-opt -d 3 -n 4 -f HIERARCHIC -o THIRD -ksp_monitor -ksp_converged_reason -pc_type lu -pc_factor_mat_solver_package superlu

does not converge. All of petsc, umfpack, mumps, and spooles are
unsuccessful as well. Apparently there is no third-order Lagrange
implementation so perhaps I'm out of luck here.

For the Stokes solver, I modified ex11 for 3D, but no Schur complement
preconditioners are available (I could plug it into the new
factorization version of PCFieldSplit, but don't have time at the
moment). Since BoomerAMG [1] and ML fail when naively given indefinite
matrices, the libmesh Stokes implementation isn't really getting a fair
shake (I probably just won't include it). I know some people on this
list solve (Navier-)Stokes, so I'm curious if anyone has an
algorithmically scalable implementation they'd like to compare.

[1] BoomerAMG does a great job of handling the penalty boundary
conditions so the residual initially drops by a large factor and then
stagnates (watch with -ksp_monitor_true_residual) on a nontrivial
problem size.

Heads up on a change in petsc-dev: The arguments to MatGetSubMatrix used
to reflect implementation details for MPIAIJ/MPIBAIJ and was
fundamentally unscalable (though fine for the sizes we've seen so far).
We decided to drop the 'csize' argument and take a parallel index set
for 'iscol' (instead of the serial gathered one used through 3.0.0).
The current implementations still do the gather so there is still a
scalability issue when taking submatrices with number of columns similar
to single-node memory, but at least the interface is now scalable (I
take submatrices of MatShell so this is good). Speak up if you run into
this issue, it can be made scalable but it's not a high priority at the
moment.

It looks like PetscMatrix::_get_submatrix() will work correctly in
parallel if you drop the csize argument (PETSC_DECIDE in your current
source). The arguments become the rows and columns desired in the
*local* part of the new submatrix. (Yes, the local part can contain
rows and columns that were formerly remote.)

Jed

Re: [Libmesh-devel] High-order elements, Stokes solvers

From: John P. <jwp...@gm...> - 2009-05-31 21:49:40

On Sun, May 31, 2009 at 12:52 PM, Jed Brown <je...@59...> wrote:
>
> I'm using libmesh to provide a reference implementation of Q_2 elements
> for Poisson, and Q_2-Q_1 for Stokes.  If possible, I would also like to
> compare with third-order elements, but HIERARCHIC appears to produce
> insanely ill-conditioned matrices.  For example,
>
> ./ex4-opt -d 3 -n 4 -f HIERARCHIC -o THIRD -ksp_monitor -ksp_converged_reason -pc_type lu -pc_factor_mat_solver_package superlu
>
> does not converge.  All of petsc, umfpack, mumps, and spooles are
> unsuccessful as well.  Apparently there is no third-order Lagrange
> implementation so perhaps I'm out of luck here.
>
>
> For the Stokes solver, I modified ex11 for 3D, but no Schur complement
> preconditioners are available (I could plug it into the new
> factorization version of PCFieldSplit, but don't have time at the
> moment).  Since BoomerAMG [1] and ML fail when naively given indefinite
> matrices, the libmesh Stokes implementation isn't really getting a fair
> shake (I probably just won't include it).  I know some people on this
> list solve (Navier-)Stokes, so I'm curious if anyone has an
> algorithmically scalable implementation they'd like to compare.

Did you modify the boundary conditions from ex11 to account for a
non-interpolary basis?  Looks like we aren't doing the full
L2-projection in ex11...  the BC's in ex4 are about what you'd want.

Also, did you pin a value of the pressure (see e.g. ex13)?  This
should not be strictly necessary, but sometimes it helps in practice
to do it.

> Heads up on a change in petsc-dev: The arguments to MatGetSubMatrix used
> to reflect implementation details for MPIAIJ/MPIBAIJ and was
> fundamentally unscalable (though fine for the sizes we've seen so far).
> We decided to drop the 'csize' argument and take a parallel index set
> for 'iscol' (instead of the serial gathered one used through 3.0.0).
> The current implementations still do the gather so there is still a
> scalability issue when taking submatrices with number of columns similar
> to single-node memory, but at least the interface is now scalable (I
> take submatrices of MatShell so this is good).  Speak up if you run into
> this issue, it can be made scalable but it's not a high priority at the
> moment.

Thanks for the info!  If something is in petsc-dev, does that mean it
will go into a patch-level change or will this take effect only when
3.0.1 is released?

-- 
John

Re: [Libmesh-devel] High-order elements, Stokes solvers

From: Jed B. <je...@59...> - 2009-05-31 22:23:37

Attachments: signature.asc

John Peterson wrote:
> On Sun, May 31, 2009 at 12:52 PM, Jed Brown <je...@59...> wrote:
>> I'm using libmesh to provide a reference implementation of Q_2 elements
>> for Poisson, and Q_2-Q_1 for Stokes.  If possible, I would also like to
>> compare with third-order elements, but HIERARCHIC appears to produce
>> insanely ill-conditioned matrices.  For example,
>>
>> ./ex4-opt -d 3 -n 4 -f HIERARCHIC -o THIRD -ksp_monitor -ksp_converged_reason -pc_type lu -pc_factor_mat_solver_package superlu
>>
>> does not converge.  All of petsc, umfpack, mumps, and spooles are
>> unsuccessful as well.  Apparently there is no third-order Lagrange
>> implementation so perhaps I'm out of luck here.
>>
>>
>> For the Stokes solver, I modified ex11 for 3D, but no Schur complement
>> preconditioners are available (I could plug it into the new
>> factorization version of PCFieldSplit, but don't have time at the
>> moment).  Since BoomerAMG [1] and ML fail when naively given indefinite
>> matrices, the libmesh Stokes implementation isn't really getting a fair
>> shake (I probably just won't include it).  I know some people on this
>> list solve (Navier-)Stokes, so I'm curious if anyone has an
>> algorithmically scalable implementation they'd like to compare.
> 
> Did you modify the boundary conditions from ex11 to account for a
> non-interpolary basis?  Looks like we aren't doing the full
> L2-projection in ex11...  the BC's in ex4 are about what you'd want.

The issue with HIERARCHIC elements occurs for ex4, I never bothered
trying for Stokes.  Skipping the L^2 projection just means solving a
slightly different problem (and it won't impact existance of a solution
for the lid-driven cavity).  In other words, I didn't change this, but I
would be surprised if it made a difference.

> Also, did you pin a value of the pressure (see e.g. ex13)?  This
> should not be strictly necessary, but sometimes it helps in practice
> to do it.

It shouldn't make any difference as long as you remove the null space
and you're not trying to use a direct solver.  The issue with applying
normal preconditioners to indefinite problems is pretty well-known.
Interestingly, libmesh's ordering is remarkably good for ILU.  For
example, run

  ./ex11-opt -ksp_monitor_true_residual -ksp_rtol 1e-12 -pc_type ilu -pc_factor_mat_ordering_type rcm

This performs ILU(0) with an ordering computed by reverse Cuthill-McKee.
It takes care of the penalty and then stagnates, but with the natural
ordering it converges completely.  If you play with various orderings,
you'll find that it's really easy to make ILU fail.

General support for "block factorization" preconditioners is coming soon
in PCFieldSplit.  For incompressible flow, you would provide an IS for
all the velocity degrees of freedom and we'll be able to offer a full
suite of preconditioners based on block factorization with various
choices for the Schur complement (e.g. variants of BFBt).

>> Heads up on a change in petsc-dev: The arguments to MatGetSubMatrix used
>> to reflect implementation details for MPIAIJ/MPIBAIJ and was
>> fundamentally unscalable (though fine for the sizes we've seen so far).
>> We decided to drop the 'csize' argument and take a parallel index set
>> for 'iscol' (instead of the serial gathered one used through 3.0.0).
>> The current implementations still do the gather so there is still a
>> scalability issue when taking submatrices with number of columns similar
>> to single-node memory, but at least the interface is now scalable (I
>> take submatrices of MatShell so this is good).  Speak up if you run into
>> this issue, it can be made scalable but it's not a high priority at the
>> moment.
> 
> Thanks for the info!  If something is in petsc-dev, does that mean it
> will go into a patch-level change or will this take effect only when
> 3.0.1 is released?

Just 3.0.1 (I don't know when that will be).  Since I build against
petsc-dev, I made this one-line change locally.

Jed

[Libmesh-devel] [PATCH] PETSc-dev API change to MatGetSubMatrix()

From: Jed B. <je...@59...> - 2009-11-26 17:58:12

---
 src/numerics/petsc_matrix.C |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/src/numerics/petsc_matrix.C b/src/numerics/petsc_matrix.C
index 5ce8d30..27bf370 100644
--- a/src/numerics/petsc_matrix.C
+++ b/src/numerics/petsc_matrix.C
@@ -390,12 +390,20 @@ void PetscMatrix<T>::_get_submatrix(SparseMatrix<T>& submatrix,
 			 &iscol); CHKERRABORT(libMesh::COMM_WORLD,ierr);
 
   // Extract submatrix
+#if !PETSC_VERSION_LESS_THAN(3,0,1) || !PETSC_VERSION_RELEASE
+  ierr = MatGetSubMatrix(_mat,
+			 isrow,
+			 iscol,
+			 (reuse_submatrix ? MAT_REUSE_MATRIX : MAT_INITIAL_MATRIX),
+			 &(petsc_submatrix->_mat));  CHKERRABORT(libMesh::COMM_WORLD,ierr);
+#else
   ierr = MatGetSubMatrix(_mat,
 			 isrow,
 			 iscol,
 			 PETSC_DECIDE,
 			 (reuse_submatrix ? MAT_REUSE_MATRIX : MAT_INITIAL_MATRIX),
 			 &(petsc_submatrix->_mat));  CHKERRABORT(libMesh::COMM_WORLD,ierr);
+#endif
 
   // Specify that the new submatrix is initialized and close it.
   petsc_submatrix->_is_initialized = true;
-- 
1.6.5.3

Re: [Libmesh-devel] [PATCH] PETSc-dev API change to MatGetSubMatrix()

From: Roy S. <roy...@ic...> - 2009-11-26 19:51:01

On Thu, 26 Nov 2009, Jed Brown wrote:

> src/numerics/petsc_matrix.C |    8 ++++++++
> 1 files changed, 8 insertions(+), 0 deletions(-)

Thanks!  It's in svn now.

Say, as long as I'm writing you, a PETSc question just came up
yesterday:

In our method
PetscVector<T>::operator = (const PetscVector<T>& v)
we don't currently set this->_is_closed, which needs to be fixed.  I
presume the fix is "this->_is_closed = v._is_closed;", but I'm not
certain enough about PETSc APIs.  Is it safe to assume that if we do a
VecCopy on an assembled vector, the result is assembled, and if we do
a VecCopy on an unassembled vector, the result still needs a
VecAssemblyBegin/VecAssemblyEnd?

Thanks,
---
Roy

Re: [Libmesh-devel] [PATCH] PETSc-dev API change to MatGetSubMatrix()

From: Jed B. <je...@59...> - 2009-11-26 21:01:45

On Thu, 26 Nov 2009 13:50:23 -0600 (CST), Roy Stogner <roy...@ic...> wrote:

> Is it safe to assume that if we do a VecCopy on an assembled vector,
> the result is assembled

Yes

> and if we do a VecCopy on an unassembled vector

Don't do this.

VecSetValues works as follows:

* values that are owned by this process get inserted immediately

* values that are not owned get put in the "stash", the stash is a very
  private thing and *not* copied by VecCopy

If you can guarantee that you have not set any remote values, then you
aren't strictly [1] required to call VecAssemblyBegin/VecAssemblyEnd
[2], but you will get wrong results if you have set remote values and
don't assemble before using the vector.

PETSc almost never uses VecSetValues internally, it is just offered as a
convenient way to set remote values if the application really needs it
(a lot of examples use it, but it's mostly gratuitous).  If you are
using VecSetValues in a performance-sensitive place, it would be better
to VecGetArray() and index it directly.  Get a "local vector" (including
unowned "ghost" values) if you need to set remote values (as in FEM
function evaluation) and scatter the result.

Jed

[1] This could change in a subsequent release, the API says you have to
    assemble.  This is unlikely to change for performance reasons, but
    it could change to help with debugging.  Actually, I'm adding that
    check to PETSc-dev now, you will now get an error if you try to copy
    an unassembled vector (they are assembled by default, you can only
    get an unassembled Vec by calling VecSetValues/VecSetValuesBlocked.

[2] That call would be practically free anyway.

Re: [Libmesh-devel] [PATCH] PETSc-dev API change to MatGetSubMatrix()

From: Roy S. <roy...@ic...> - 2009-11-27 03:32:49

On Thu, 26 Nov 2009, Jed Brown wrote:

> On Thu, 26 Nov 2009 13:50:23 -0600 (CST), Roy Stogner <roy...@ic...> wrote:
>
>> Is it safe to assume that if we do a VecCopy on an assembled vector,
>> the result is assembled
>
> Yes
>
>> and if we do a VecCopy on an unassembled vector
>
> Don't do this.
>
> VecSetValues works as follows:
>
> * values that are owned by this process get inserted immediately
>
> * values that are not owned get put in the "stash", the stash is a very
>  private thing and *not* copied by VecCopy

Ah!  Thanks.  So I'll change operator= to assert that we're assembled
already, as well as to reflect that in the target vector.

> PETSc almost never uses VecSetValues internally, it is just offered as a
> convenient way to set remote values if the application really needs it
> (a lot of examples use it, but it's mostly gratuitous).  If you are
> using VecSetValues in a performance-sensitive place, it would be better
> to VecGetArray() and index it directly.

Oh, I believe it.  I'm actually a bit surprised that the libMesh use
of VecSetValues doesn't seem to hurt us a lot.  It was never
noticeable in the oprofile results on my own codes, but that may just
be because it was getting dwarfed by an expensive transcendental
Jacobian and a more expensive nonlinear solve.  We'll see if it comes
up in any of the new optimization work on other apps.
---
Roy