From: lorenzo a. b. <lor...@gm...> - 2013-05-31 14:11:01
|
Thanks for the infos. I guess that the savings I was reporting were also due to the DofObjects refactoring. Do you store one DofObject per VariableGroup? I guess in my usage pattern one DofObject per element would be the most efficient strategy. Would it be hard to do such a thing? I think that to use add_block_matrix the local matrices should have contiguous VariableGrops dofs, right? The problem is that my local matrices now have contiguous dofs for each variable. Also, how do I get the blocked_dof_indices? I'm going to try with log_summary to understand if the PETSc matrix has the right structure and why a block of size 12 instead of 3 doesn't lead to memory savings. Thanks for help. Lorenzo On Fri, May 31, 2013 at 3:15 PM, Kirk, Benjamin (JSC-EG311) < ben...@na...> wrote: > > On May 31, 2013, at 4:04 AM, Lorenzo Alessio Botti < > lor...@gm...> wrote: > > > I don't know if it is the "correct" way to do it but the memory > footprint has not changed. > > I'm also interested in knowing if the memory savings take place on the > libMesh side or on the PETSc side, or on both sides. > > > That's what I would have done, for what it's worth. > > The memory savings with this most recent patch is purely on the PETSc > side. Specifically: > > - Back in December when we introduced "VariableGroups" there was an > appreciable memory savings in the DofObject by reducing the number of > variable indices that we store. The forward plan is to also optimize the > DofMap for the case of VariableGroups, where sparsity pattern generation > could be more efficient. However, this is not done yet. We still generate > the sparsity pattern and constraints at the per-DOF level. > > - This most recent change boils down to just this, in petsc_matrix.C: > > #ifdef LIBMESH_ENABLE_BLOCKED_STORAGE > if (blocksize > 1) > { > // specified blocksize, bs>1. > // double check sizes. > libmesh_assert_equal_to (m_local % blocksize, 0); > libmesh_assert_equal_to (n_local % blocksize, 0); > libmesh_assert_equal_to (m_global % blocksize, 0); > libmesh_assert_equal_to (n_global % blocksize, 0); > > ierr = MatSetType(_mat, MATBAIJ); // Automatically chooses seqbaij > or mpibaij > LIBMESH_CHKERRABORT(ierr); > ierr = MatSetBlockSize(_mat, blocksize); > LIBMESH_CHKERRABORT(ierr); > > // transform the per-entry n_nz and n_oz arrays into their block > counterparts. > std::vector<numeric_index_type> b_n_nz, b_n_oz; > > transform_preallocation_arrays (blocksize, > n_nz, n_oz, > b_n_nz, b_n_oz); > > ierr = MatSeqBAIJSetPreallocation(_mat, blocksize, 0, > (PetscInt*)(b_n_nz.empty()?NULL:&b_n_nz[0])); > LIBMESH_CHKERRABORT(ierr); > > ierr = MatMPIBAIJSetPreallocation(_mat, blocksize, > 0, > (PetscInt*)(b_n_nz.empty()?NULL:&b_n_nz[0]), > 0, > (PetscInt*)(b_n_oz.empty()?NULL:&b_n_oz[0])); > LIBMESH_CHKERRABORT(ierr); > } > else > #endif > > The function 'transform_preallocation_arrays() is defined in that file and > simply takes the usual per-dof n_nz, n_oz libMesh-computed arrays and > transforms them into their per-block counterpart. > > So you see the most recent memory savings is almost entirely PETSc. Try > running with the -log_summary command line option too… > > For a sparse matrix, PETSc must store the graph of it as well. Without > blocked DOFS, the size of the graph is the same as the number of nonzeros > in the matrix. With blocked DOFs, each graph entry represents an NB*NB > dense matrix. So there is a memory savings there for sure, but at some > point you'll still be dominated by storing the coefficients. Maybe that's > already happening in your case? > > The other thing this allows for is more efficient element matrix > insertion. Rather than traversing the graph at the scalar value level it > can be done at the block index level. This is accessed via libmesh using > > jacobian->add_block_matrix (dRdUe, blocked_dof_indices); > > I haven't advertised this much yet… > > > > > |