Re: [Libmesh-users] block matrix implementation

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Thanks for the infos.
I guess that the savings I was reporting were also due to the DofObjects
refactoring.
Do you store one DofObject per VariableGroup?
I guess in my usage pattern one DofObject per element would be the most
efficient strategy.
Would it be hard to do such a thing?

I think that to use add_block_matrix the local matrices should have
contiguous VariableGrops dofs, right?
The problem is that my local matrices now have contiguous dofs for each
variable.
Also, how do I get the blocked_dof_indices?

I'm going to try with log_summary to understand if the PETSc matrix has the
right structure and why a block of size 12 instead of 3 doesn't lead to
memory savings.

Thanks for help.
Lorenzo

On Fri, May 31, 2013 at 3:15 PM, Kirk, Benjamin (JSC-EG311) <
ben...@na...> wrote:

>
> On May 31, 2013, at 4:04 AM, Lorenzo Alessio Botti <
> lor...@gm...> wrote:
>
> > I don't know if it is the "correct" way to do it but the memory
> footprint has not changed.
> > I'm also interested in knowing if the memory savings take place on the
> libMesh side or on the PETSc side, or on both sides.
>
>
> That's what I would have done, for what it's worth.
>
> The memory savings with this most recent patch is purely on the PETSc
> side.  Specifically:
>
> - Back in December when we introduced "VariableGroups" there was an
> appreciable memory savings in the DofObject by reducing the number of
> variable indices that we store.  The forward plan is to also optimize the
> DofMap for the case of VariableGroups, where sparsity pattern generation
> could be more efficient.  However, this is not done yet.  We still generate
> the sparsity pattern and constraints at the per-DOF level.
>
> - This most recent change boils down to just this, in petsc_matrix.C:
>
> #ifdef LIBMESH_ENABLE_BLOCKED_STORAGE
>   if (blocksize > 1)
>     {
>       // specified blocksize, bs>1.
>       // double check sizes.
>       libmesh_assert_equal_to (m_local  % blocksize, 0);
>       libmesh_assert_equal_to (n_local  % blocksize, 0);
>       libmesh_assert_equal_to (m_global % blocksize, 0);
>       libmesh_assert_equal_to (n_global % blocksize, 0);
>
>       ierr = MatSetType(_mat, MATBAIJ); // Automatically chooses seqbaij
> or mpibaij
>       LIBMESH_CHKERRABORT(ierr);
>       ierr = MatSetBlockSize(_mat, blocksize);
>       LIBMESH_CHKERRABORT(ierr);
>
>       // transform the per-entry n_nz and n_oz arrays into their block
> counterparts.
>       std::vector<numeric_index_type> b_n_nz, b_n_oz;
>
>       transform_preallocation_arrays (blocksize,
>                                       n_nz, n_oz,
>                                       b_n_nz, b_n_oz);
>
>       ierr = MatSeqBAIJSetPreallocation(_mat, blocksize, 0,
> (PetscInt*)(b_n_nz.empty()?NULL:&b_n_nz[0]));
>       LIBMESH_CHKERRABORT(ierr);
>
>       ierr = MatMPIBAIJSetPreallocation(_mat, blocksize,
>                                         0,
> (PetscInt*)(b_n_nz.empty()?NULL:&b_n_nz[0]),
>                                         0,
> (PetscInt*)(b_n_oz.empty()?NULL:&b_n_oz[0]));
>       LIBMESH_CHKERRABORT(ierr);
>     }
>   else
> #endif
>
> The function 'transform_preallocation_arrays() is defined in that file and
> simply takes the usual per-dof n_nz, n_oz libMesh-computed arrays and
> transforms them into their per-block counterpart.
>
> So you see the most recent memory savings is almost entirely PETSc.  Try
> running with the -log_summary command line option too…
>
> For a sparse matrix, PETSc must store the graph of it as well.  Without
> blocked DOFS, the size of the graph is the same as the number of nonzeros
> in the matrix.  With blocked DOFs, each graph entry represents an NB*NB
> dense matrix.  So there is a memory savings there for sure, but at some
> point you'll still be dominated by storing the coefficients.  Maybe that's
> already happening in your case?
>
> The other thing this allows for is more efficient element matrix
> insertion.  Rather than traversing the graph at the scalar value level it
> can be done at the block index level.  This is accessed via libmesh using
>
> jacobian->add_block_matrix (dRdUe, blocked_dof_indices);
>
> I haven't advertised this much yet…
>
>
>
>
>