From: Jed B. <je...@59...> - 2009-05-31 17:52:33
Attachments:
signature.asc
|
I'm using libmesh to provide a reference implementation of Q_2 elements for Poisson, and Q_2-Q_1 for Stokes. If possible, I would also like to compare with third-order elements, but HIERARCHIC appears to produce insanely ill-conditioned matrices. For example, ./ex4-opt -d 3 -n 4 -f HIERARCHIC -o THIRD -ksp_monitor -ksp_converged_reason -pc_type lu -pc_factor_mat_solver_package superlu does not converge. All of petsc, umfpack, mumps, and spooles are unsuccessful as well. Apparently there is no third-order Lagrange implementation so perhaps I'm out of luck here. For the Stokes solver, I modified ex11 for 3D, but no Schur complement preconditioners are available (I could plug it into the new factorization version of PCFieldSplit, but don't have time at the moment). Since BoomerAMG [1] and ML fail when naively given indefinite matrices, the libmesh Stokes implementation isn't really getting a fair shake (I probably just won't include it). I know some people on this list solve (Navier-)Stokes, so I'm curious if anyone has an algorithmically scalable implementation they'd like to compare. [1] BoomerAMG does a great job of handling the penalty boundary conditions so the residual initially drops by a large factor and then stagnates (watch with -ksp_monitor_true_residual) on a nontrivial problem size. Heads up on a change in petsc-dev: The arguments to MatGetSubMatrix used to reflect implementation details for MPIAIJ/MPIBAIJ and was fundamentally unscalable (though fine for the sizes we've seen so far). We decided to drop the 'csize' argument and take a parallel index set for 'iscol' (instead of the serial gathered one used through 3.0.0). The current implementations still do the gather so there is still a scalability issue when taking submatrices with number of columns similar to single-node memory, but at least the interface is now scalable (I take submatrices of MatShell so this is good). Speak up if you run into this issue, it can be made scalable but it's not a high priority at the moment. It looks like PetscMatrix::_get_submatrix() will work correctly in parallel if you drop the csize argument (PETSC_DECIDE in your current source). The arguments become the rows and columns desired in the *local* part of the new submatrix. (Yes, the local part can contain rows and columns that were formerly remote.) Jed |
From: John P. <jwp...@gm...> - 2009-05-31 21:49:40
|
On Sun, May 31, 2009 at 12:52 PM, Jed Brown <je...@59...> wrote: > > I'm using libmesh to provide a reference implementation of Q_2 elements > for Poisson, and Q_2-Q_1 for Stokes. If possible, I would also like to > compare with third-order elements, but HIERARCHIC appears to produce > insanely ill-conditioned matrices. For example, > > ./ex4-opt -d 3 -n 4 -f HIERARCHIC -o THIRD -ksp_monitor -ksp_converged_reason -pc_type lu -pc_factor_mat_solver_package superlu > > does not converge. All of petsc, umfpack, mumps, and spooles are > unsuccessful as well. Apparently there is no third-order Lagrange > implementation so perhaps I'm out of luck here. > > > For the Stokes solver, I modified ex11 for 3D, but no Schur complement > preconditioners are available (I could plug it into the new > factorization version of PCFieldSplit, but don't have time at the > moment). Since BoomerAMG [1] and ML fail when naively given indefinite > matrices, the libmesh Stokes implementation isn't really getting a fair > shake (I probably just won't include it). I know some people on this > list solve (Navier-)Stokes, so I'm curious if anyone has an > algorithmically scalable implementation they'd like to compare. Did you modify the boundary conditions from ex11 to account for a non-interpolary basis? Looks like we aren't doing the full L2-projection in ex11... the BC's in ex4 are about what you'd want. Also, did you pin a value of the pressure (see e.g. ex13)? This should not be strictly necessary, but sometimes it helps in practice to do it. > Heads up on a change in petsc-dev: The arguments to MatGetSubMatrix used > to reflect implementation details for MPIAIJ/MPIBAIJ and was > fundamentally unscalable (though fine for the sizes we've seen so far). > We decided to drop the 'csize' argument and take a parallel index set > for 'iscol' (instead of the serial gathered one used through 3.0.0). > The current implementations still do the gather so there is still a > scalability issue when taking submatrices with number of columns similar > to single-node memory, but at least the interface is now scalable (I > take submatrices of MatShell so this is good). Speak up if you run into > this issue, it can be made scalable but it's not a high priority at the > moment. Thanks for the info! If something is in petsc-dev, does that mean it will go into a patch-level change or will this take effect only when 3.0.1 is released? -- John |
From: Jed B. <je...@59...> - 2009-05-31 22:23:37
Attachments:
signature.asc
|
John Peterson wrote: > On Sun, May 31, 2009 at 12:52 PM, Jed Brown <je...@59...> wrote: >> I'm using libmesh to provide a reference implementation of Q_2 elements >> for Poisson, and Q_2-Q_1 for Stokes. If possible, I would also like to >> compare with third-order elements, but HIERARCHIC appears to produce >> insanely ill-conditioned matrices. For example, >> >> ./ex4-opt -d 3 -n 4 -f HIERARCHIC -o THIRD -ksp_monitor -ksp_converged_reason -pc_type lu -pc_factor_mat_solver_package superlu >> >> does not converge. All of petsc, umfpack, mumps, and spooles are >> unsuccessful as well. Apparently there is no third-order Lagrange >> implementation so perhaps I'm out of luck here. >> >> >> For the Stokes solver, I modified ex11 for 3D, but no Schur complement >> preconditioners are available (I could plug it into the new >> factorization version of PCFieldSplit, but don't have time at the >> moment). Since BoomerAMG [1] and ML fail when naively given indefinite >> matrices, the libmesh Stokes implementation isn't really getting a fair >> shake (I probably just won't include it). I know some people on this >> list solve (Navier-)Stokes, so I'm curious if anyone has an >> algorithmically scalable implementation they'd like to compare. > > Did you modify the boundary conditions from ex11 to account for a > non-interpolary basis? Looks like we aren't doing the full > L2-projection in ex11... the BC's in ex4 are about what you'd want. The issue with HIERARCHIC elements occurs for ex4, I never bothered trying for Stokes. Skipping the L^2 projection just means solving a slightly different problem (and it won't impact existance of a solution for the lid-driven cavity). In other words, I didn't change this, but I would be surprised if it made a difference. > Also, did you pin a value of the pressure (see e.g. ex13)? This > should not be strictly necessary, but sometimes it helps in practice > to do it. It shouldn't make any difference as long as you remove the null space and you're not trying to use a direct solver. The issue with applying normal preconditioners to indefinite problems is pretty well-known. Interestingly, libmesh's ordering is remarkably good for ILU. For example, run ./ex11-opt -ksp_monitor_true_residual -ksp_rtol 1e-12 -pc_type ilu -pc_factor_mat_ordering_type rcm This performs ILU(0) with an ordering computed by reverse Cuthill-McKee. It takes care of the penalty and then stagnates, but with the natural ordering it converges completely. If you play with various orderings, you'll find that it's really easy to make ILU fail. General support for "block factorization" preconditioners is coming soon in PCFieldSplit. For incompressible flow, you would provide an IS for all the velocity degrees of freedom and we'll be able to offer a full suite of preconditioners based on block factorization with various choices for the Schur complement (e.g. variants of BFBt). >> Heads up on a change in petsc-dev: The arguments to MatGetSubMatrix used >> to reflect implementation details for MPIAIJ/MPIBAIJ and was >> fundamentally unscalable (though fine for the sizes we've seen so far). >> We decided to drop the 'csize' argument and take a parallel index set >> for 'iscol' (instead of the serial gathered one used through 3.0.0). >> The current implementations still do the gather so there is still a >> scalability issue when taking submatrices with number of columns similar >> to single-node memory, but at least the interface is now scalable (I >> take submatrices of MatShell so this is good). Speak up if you run into >> this issue, it can be made scalable but it's not a high priority at the >> moment. > > Thanks for the info! If something is in petsc-dev, does that mean it > will go into a patch-level change or will this take effect only when > 3.0.1 is released? Just 3.0.1 (I don't know when that will be). Since I build against petsc-dev, I made this one-line change locally. Jed |
From: Jed B. <je...@59...> - 2009-11-26 17:58:12
|
--- src/numerics/petsc_matrix.C | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/src/numerics/petsc_matrix.C b/src/numerics/petsc_matrix.C index 5ce8d30..27bf370 100644 --- a/src/numerics/petsc_matrix.C +++ b/src/numerics/petsc_matrix.C @@ -390,12 +390,20 @@ void PetscMatrix<T>::_get_submatrix(SparseMatrix<T>& submatrix, &iscol); CHKERRABORT(libMesh::COMM_WORLD,ierr); // Extract submatrix +#if !PETSC_VERSION_LESS_THAN(3,0,1) || !PETSC_VERSION_RELEASE + ierr = MatGetSubMatrix(_mat, + isrow, + iscol, + (reuse_submatrix ? MAT_REUSE_MATRIX : MAT_INITIAL_MATRIX), + &(petsc_submatrix->_mat)); CHKERRABORT(libMesh::COMM_WORLD,ierr); +#else ierr = MatGetSubMatrix(_mat, isrow, iscol, PETSC_DECIDE, (reuse_submatrix ? MAT_REUSE_MATRIX : MAT_INITIAL_MATRIX), &(petsc_submatrix->_mat)); CHKERRABORT(libMesh::COMM_WORLD,ierr); +#endif // Specify that the new submatrix is initialized and close it. petsc_submatrix->_is_initialized = true; -- 1.6.5.3 |
From: Roy S. <roy...@ic...> - 2009-11-26 19:51:01
|
On Thu, 26 Nov 2009, Jed Brown wrote: > src/numerics/petsc_matrix.C | 8 ++++++++ > 1 files changed, 8 insertions(+), 0 deletions(-) Thanks! It's in svn now. Say, as long as I'm writing you, a PETSc question just came up yesterday: In our method PetscVector<T>::operator = (const PetscVector<T>& v) we don't currently set this->_is_closed, which needs to be fixed. I presume the fix is "this->_is_closed = v._is_closed;", but I'm not certain enough about PETSc APIs. Is it safe to assume that if we do a VecCopy on an assembled vector, the result is assembled, and if we do a VecCopy on an unassembled vector, the result still needs a VecAssemblyBegin/VecAssemblyEnd? Thanks, --- Roy |
From: Jed B. <je...@59...> - 2009-11-26 21:01:45
|
On Thu, 26 Nov 2009 13:50:23 -0600 (CST), Roy Stogner <roy...@ic...> wrote: > Is it safe to assume that if we do a VecCopy on an assembled vector, > the result is assembled Yes > and if we do a VecCopy on an unassembled vector Don't do this. VecSetValues works as follows: * values that are owned by this process get inserted immediately * values that are not owned get put in the "stash", the stash is a very private thing and *not* copied by VecCopy If you can guarantee that you have not set any remote values, then you aren't strictly [1] required to call VecAssemblyBegin/VecAssemblyEnd [2], but you will get wrong results if you have set remote values and don't assemble before using the vector. PETSc almost never uses VecSetValues internally, it is just offered as a convenient way to set remote values if the application really needs it (a lot of examples use it, but it's mostly gratuitous). If you are using VecSetValues in a performance-sensitive place, it would be better to VecGetArray() and index it directly. Get a "local vector" (including unowned "ghost" values) if you need to set remote values (as in FEM function evaluation) and scatter the result. Jed [1] This could change in a subsequent release, the API says you have to assemble. This is unlikely to change for performance reasons, but it could change to help with debugging. Actually, I'm adding that check to PETSc-dev now, you will now get an error if you try to copy an unassembled vector (they are assembled by default, you can only get an unassembled Vec by calling VecSetValues/VecSetValuesBlocked. [2] That call would be practically free anyway. |
From: Roy S. <roy...@ic...> - 2009-11-27 03:32:49
|
On Thu, 26 Nov 2009, Jed Brown wrote: > On Thu, 26 Nov 2009 13:50:23 -0600 (CST), Roy Stogner <roy...@ic...> wrote: > >> Is it safe to assume that if we do a VecCopy on an assembled vector, >> the result is assembled > > Yes > >> and if we do a VecCopy on an unassembled vector > > Don't do this. > > VecSetValues works as follows: > > * values that are owned by this process get inserted immediately > > * values that are not owned get put in the "stash", the stash is a very > private thing and *not* copied by VecCopy Ah! Thanks. So I'll change operator= to assert that we're assembled already, as well as to reflect that in the target vector. > PETSc almost never uses VecSetValues internally, it is just offered as a > convenient way to set remote values if the application really needs it > (a lot of examples use it, but it's mostly gratuitous). If you are > using VecSetValues in a performance-sensitive place, it would be better > to VecGetArray() and index it directly. Oh, I believe it. I'm actually a bit surprised that the libMesh use of VecSetValues doesn't seem to hurt us a lot. It was never noticeable in the oprofile results on my own codes, but that may just be because it was getting dwarfed by an expensive transcendental Jacobian and a more expensive nonlinear solve. We'll see if it comes up in any of the new optimization work on other apps. --- Roy |