On May 31, 2013, at 4:04 AM, Lorenzo Alessio Botti <lorenzoalessiobotti@...> wrote:
> I don't know if it is the "correct" way to do it but the memory footprint has not changed.
> I'm also interested in knowing if the memory savings take place on the libMesh side or on the PETSc side, or on both sides.
That's what I would have done, for what it's worth.
The memory savings with this most recent patch is purely on the PETSc side. Specifically:
 Back in December when we introduced "VariableGroups" there was an appreciable memory savings in the DofObject by reducing the number of variable indices that we store. The forward plan is to also optimize the DofMap for the case of VariableGroups, where sparsity pattern generation could be more efficient. However, this is not done yet. We still generate the sparsity pattern and constraints at the perDOF level.
 This most recent change boils down to just this, in petsc_matrix.C:
#ifdef LIBMESH_ENABLE_BLOCKED_STORAGE
if (blocksize > 1)
{
// specified blocksize, bs>1.
// double check sizes.
libmesh_assert_equal_to (m_local % blocksize, 0);
libmesh_assert_equal_to (n_local % blocksize, 0);
libmesh_assert_equal_to (m_global % blocksize, 0);
libmesh_assert_equal_to (n_global % blocksize, 0);
ierr = MatSetType(_mat, MATBAIJ); // Automatically chooses seqbaij or mpibaij
LIBMESH_CHKERRABORT(ierr);
ierr = MatSetBlockSize(_mat, blocksize);
LIBMESH_CHKERRABORT(ierr);
// transform the perentry n_nz and n_oz arrays into their block counterparts.
std::vector<numeric_index_type> b_n_nz, b_n_oz;
transform_preallocation_arrays (blocksize,
n_nz, n_oz,
b_n_nz, b_n_oz);
ierr = MatSeqBAIJSetPreallocation(_mat, blocksize, 0, (PetscInt*)(b_n_nz.empty()?NULL:&b_n_nz[0]));
LIBMESH_CHKERRABORT(ierr);
ierr = MatMPIBAIJSetPreallocation(_mat, blocksize,
0, (PetscInt*)(b_n_nz.empty()?NULL:&b_n_nz[0]),
0, (PetscInt*)(b_n_oz.empty()?NULL:&b_n_oz[0]));
LIBMESH_CHKERRABORT(ierr);
}
else
#endif
The function 'transform_preallocation_arrays() is defined in that file and simply takes the usual perdof n_nz, n_oz libMeshcomputed arrays and transforms them into their perblock counterpart.
So you see the most recent memory savings is almost entirely PETSc. Try running with the log_summary command line option too…
For a sparse matrix, PETSc must store the graph of it as well. Without blocked DOFS, the size of the graph is the same as the number of nonzeros in the matrix. With blocked DOFs, each graph entry represents an NB*NB dense matrix. So there is a memory savings there for sure, but at some point you'll still be dominated by storing the coefficients. Maybe that's already happening in your case?
The other thing this allows for is more efficient element matrix insertion. Rather than traversing the graph at the scalar value level it can be done at the block index level. This is accessed via libmesh using
jacobian>add_block_matrix (dRdUe, blocked_dof_indices);
I haven't advertised this much yet…
