From: David X. <dx...@my...> - 2006-07-22 23:30:50
|
Hi All, I was trying to assemble the stiffness and mass matrices on a dense mesh with 531441 nodes (40x40x40, HEX27, 3rd, HERMITE) and I ran into "Out of memory" problem at the line of: equation_systems.init(); Here's the error message: [0]PETSC ERROR: PetscMallocAlign() line 62 in src/sys/memory/mal.c [0]PETSC ERROR: Out of memory. This could be due to allocating [0]PETSC ERROR: too large an object or bleeding by not properly [0]PETSC ERROR: destroying unneeded objects. [0]PETSC ERROR: Memory allocated 1380430780 Memory used by process 886095872 [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. [0]PETSC ERROR: Memory requested 907039528! [0]PETSC ERROR: PetscTrMallocDefault() line 191 in src/sys/memory/mtr.c [0]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 2735 in src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: MatCreateSeqAIJ() line 2621 in src/mat/impls/aij/seq/aij.c [0]PETSC ERROR: User provided function() line 137 in unknowndirectory/src/numerics/petsc_matrix.C [unset]: aborting job: application called MPI_Abort(comm=0x84000000, 1) - process 0 My question is: is it possible to assemble the matrices without having to initilize the equation system? My goal is just to output the assembled system matrices to files and i don't have to solve them inside libmesh. Thanks! David |
From: Roy S. <roy...@ic...> - 2006-07-22 23:53:53
|
On Sat, 22 Jul 2006, David Xu wrote: > I was trying to assemble the stiffness and mass matrices on a dense mesh > with 531441 nodes (40x40x40, HEX27, 3rd, HERMITE) Can I suggest trying HEX8? The HERMITE elements are unique among our higher order elements in that all their degrees of freedom are topologically associated with mesh vertices, so unless you need quadratic mapping functions you don't need HEX27 elements. Those nodes aren't reponsible for most of your memory use (the system matrix is), but every megabyte helps. > and I ran into "Out of memory" problem at the line of: > > equation_systems.init(); > > > Here's the error message: > > [0]PETSC ERROR: PetscMallocAlign() line 62 in src/sys/memory/mal.c > [0]PETSC ERROR: Out of memory. This could be due to allocating > [0]PETSC ERROR: too large an object or bleeding by not properly > [0]PETSC ERROR: destroying unneeded objects. > [0]PETSC ERROR: Memory allocated 1380430780 Memory used by process 886095872 > [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. > [0]PETSC ERROR: Memory requested 907039528! > [0]PETSC ERROR: PetscTrMallocDefault() line 191 in src/sys/memory/mtr.c > [0]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 2735 in > src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: MatCreateSeqAIJ() line 2621 in src/mat/impls/aij/seq/aij.c > [0]PETSC ERROR: User provided function() line 137 in > unknowndirectory/src/numerics/petsc_matrix.C > [unset]: aborting job: > application called MPI_Abort(comm=0x84000000, 1) - process 0 > > My question is: is it possible to assemble the matrices without having to > initilize the equation system? I'm afraid not. You could initialize a finite element object and evaluate the element matrices, but putting them into the system matrix requires that matrix and the degree of freedom structures to be initialized, and those two things are probably what's sucking up all your memory. > My goal is just to output the assembled system matrices to files and > i don't have to solve them inside libmesh. The system matrix should have 551368 degrees of freedom, most of which couple to 216 others. With 8 byte coefficients that's a hundred megs of RAM and with sparsity pattern overhead it's probably two hundred megs... but nine hundred MB seems excessive. Are you solving for more than one scalar, using a system like a generalized EigenSystem that builds more than one matrix, using complex-valued variables, or anything else that might bump up the RAM requirements? I'd appreciate it if you've got debugging tools that can give you a memory breakdown by object type and you could give use such output. It sounds like either we or PETSc might need to do a little more optimization. To work around your immediate problem, however: can you output the element matrices instead of the system matrix, and assemble them outside of libMesh? It sounds like you're using a structured mesh, which can require much less overhead than the unstructured mesh class in libMesh. --- Roy |
From: David X. <dx...@my...> - 2006-07-23 00:15:52
|
On 7/22/06, Roy Stogner <roy...@ic...> wrote: > > On Sat, 22 Jul 2006, David Xu wrote: > > > I was trying to assemble the stiffness and mass matrices on a dense mesh > > with 531441 nodes (40x40x40, HEX27, 3rd, HERMITE) > > Can I suggest trying HEX8? The HERMITE elements are unique among our > higher order elements in that all their degrees of freedom are > topologically associated with mesh vertices, so unless you need > quadratic mapping functions you don't need HEX27 elements. Those > nodes aren't reponsible for most of your memory use (the system matrix > is), but every megabyte helps. Are you saying HEX27 and HEX8 will give same level of accuracy in the solutions if I don't need quadratic mapping functions? > and I ran into "Out of memory" problem at the line of: > > > > equation_systems.init(); > > > > > > Here's the error message: > > > > [0]PETSC ERROR: PetscMallocAlign() line 62 in src/sys/memory/mal.c > > [0]PETSC ERROR: Out of memory. This could be due to allocating > > [0]PETSC ERROR: too large an object or bleeding by not properly > > [0]PETSC ERROR: destroying unneeded objects. > > [0]PETSC ERROR: Memory allocated 1380430780 Memory used by process > 886095872 > > [0]PETSC ERROR: Try running with -malloc_dump or -malloc_log for info. > > [0]PETSC ERROR: Memory requested 907039528! > > [0]PETSC ERROR: PetscTrMallocDefault() line 191 in src/sys/memory/mtr.c > > [0]PETSC ERROR: MatSeqAIJSetPreallocation_SeqAIJ() line 2735 in > > src/mat/impls/aij/seq/aij.c > > [0]PETSC ERROR: MatCreateSeqAIJ() line 2621 in > src/mat/impls/aij/seq/aij.c > > [0]PETSC ERROR: User provided function() line 137 in > > unknowndirectory/src/numerics/petsc_matrix.C > > [unset]: aborting job: > > application called MPI_Abort(comm=0x84000000, 1) - process 0 > > > > My question is: is it possible to assemble the matrices without having > to > > initilize the equation system? > > I'm afraid not. You could initialize a finite element object and > evaluate the element matrices, but putting them into the system matrix > requires that matrix and the degree of freedom structures to be > initialized, and those two things are probably what's sucking up all > your memory. I see. > My goal is just to output the assembled system matrices to files and > > i don't have to solve them inside libmesh. > > The system matrix should have 551368 degrees of freedom, most of which > couple to 216 others. With 8 byte coefficients that's a hundred megs > of RAM and with sparsity pattern overhead it's probably two hundred > megs... but nine hundred MB seems excessive. Are you solving for more > than one scalar, using a system like a generalized EigenSystem that > builds more than one matrix, using complex-valued variables, or > anything else that might bump up the RAM requirements? It's for solving 2 system matrices in a real-valued generalized eigenvalue problem. I went back and tried (30x30x30, HEX27, 3rd order HERMITE). This time it didn't blew out the memory, but did take significantly longer time to assemble the matrices than (30x30x30, HEX27, 2nd order LANGRANGE). Maybe HERMITE is the problem? I'd appreciate it if you've got debugging tools that can give you a > memory breakdown by object type and you could give use such output. > It sounds like either we or PETSc might need to do a little more > optimization. I don't have any debugging tools and I have to admit that I'm a below-average C++ user. To work around your immediate problem, however: can you output the > element matrices instead of the system matrix, and assemble them > outside of libMesh? It sounds like you're using a structured mesh, > which can require much less overhead than the unstructured mesh class > in libMesh. How to assemble the element matrices outside of libmesh? Do you know any existing code/program that can do that? So, that would be, output each element matrix to a file and a problem should be able to read in all the element matrices from the file and assemble them to system matrices. I'm definitely interested if this is doable. Thanks, David |
From: Roy S. <roy...@ic...> - 2006-07-23 01:22:43
|
On Sat, 22 Jul 2006, David Xu wrote: > On 7/22/06, Roy Stogner <roy...@ic...> wrote: > > Are you saying HEX27 and HEX8 will give same level of accuracy in the > solutions if I don't need quadratic mapping functions? Yes. > It's for solving 2 system matrices in a real-valued generalized eigenvalue > problem. Okay, then the simplest workaround is clear: only build one matrix at a time. As long as you're just writing them both out to files anyway, there's no reason you need them both in RAM simultaneously. > I went back and tried (30x30x30, HEX27, 3rd order HERMITE). This > time it didn't blew out the memory, but did take significantly > longer time to assemble the matrices than (30x30x30, HEX27, 2nd > order LANGRANGE). Maybe HERMITE is the problem? Almost certainly it is. None of our finite element classes are as optimized as they should be, but I think the Hermite elements may be worse than average. Keep in mind, too, that even if they were equally optimized, the Hermite assembly would be more expensive. If you're using the default quadrature order (which is designed for nonlinear problems and may be gross overkill for you) then I think quadratic hexes will be calculating at 27 points and cubic hexes will be calculating at 64. > How to assemble the element matrices outside of libmesh? Basically you'd almost do what libMesh does: create a big empty matrix of the appropriate sparsity pattern, then loop though all the element matrices and add their entries after looking up the global index for each local degree of freedom. The only difference is that because you know you've got a uniform grid, you could do that local->global lookup with a few equations instead of the big data structures that general unstructured grids require. > Do you know any existing code/program that can do that? So, that > would be, output each element matrix to a file and a problem should > be able to read in all the element matrices from the file and > assemble them to system matrices. I'm definitely interested if this > is doable. It's definitely doable, but I don't know of any existing code to do it. I could script it up in Matlab pretty easily, but the Matlab sparse matrix format sucks and so I wouldn't want to work with the result. --- Roy |
From: David X. <dx...@my...> - 2006-07-23 02:31:40
|
On 7/22/06, Roy Stogner <roy...@ic...> wrote: > > On Sat, 22 Jul 2006, David Xu wrote: > > > On 7/22/06, Roy Stogner <roy...@ic...> wrote: > > > > Are you saying HEX27 and HEX8 will give same level of accuracy in the > > solutions if I don't need quadratic mapping functions? > > Yes. Just ccurious. Does the same rule apply to other types of element? What about Tet, Tri, Quad and Prism. So, is the level of solution accuracy independent with the number of node within the same type of element? What about the difference between different element types in terms of the effect on the quality of the solutions? > It's for solving 2 system matrices in a real-valued generalized eigenvalue > > problem. > > Okay, then the simplest workaround is clear: only build one matrix at > a time. As long as you're just writing them both out to files > anyway, there's no reason you need them both in RAM simultaneously. Yes, that's a great idea. > I went back and tried (30x30x30, HEX27, 3rd order HERMITE). This > > time it didn't blew out the memory, but did take significantly > > longer time to assemble the matrices than (30x30x30, HEX27, 2nd > > order LANGRANGE). Maybe HERMITE is the problem? > > Almost certainly it is. None of our finite element classes are as > optimized as they should be, but I think the Hermite elements may be > worse than average. > > Keep in mind, too, that even if they were equally optimized, the > Hermite assembly would be more expensive. If you're using the default > quadrature order (which is designed for nonlinear problems and may be > gross overkill for you) then I think quadratic hexes will be > calculating at 27 points and cubic hexes will be calculating at 64. That explains why the ouput filesize from hermite is much larger than lagrange. Does that mean, even the matrix dimension is the same, but hermit produces more entries in the matrix, thus it's less sparse? > How to assemble the element matrices outside of libmesh? > > Basically you'd almost do what libMesh does: create a big empty matrix > of the appropriate sparsity pattern, then loop though all the element > matrices and add their entries after looking up the global index for > each local degree of freedom. > > The only difference is that because you know you've got a uniform > grid, you could do that local->global lookup with a few equations > instead of the big data structures that general unstructured grids > require. > > > Do you know any existing code/program that can do that? So, that > > would be, output each element matrix to a file and a problem should > > be able to read in all the element matrices from the file and > > assemble them to system matrices. I'm definitely interested if this > > is doable. > > It's definitely doable, but I don't know of any existing code to do > it. I could script it up in Matlab pretty easily, but the Matlab > sparse matrix format sucks and so I wouldn't want to work with the > result. I might try it using python/numpy/scipy. Thanks for the great tips! David |
From: Roy S. <roy...@ic...> - 2006-07-23 03:02:49
|
On Sat, 22 Jul 2006, David Xu wrote: > On 7/22/06, Roy Stogner <roy...@ic...> wrote: >> >> On Sat, 22 Jul 2006, David Xu wrote: >> >> > Are you saying HEX27 and HEX8 will give same level of accuracy in the >> > solutions if I don't need quadratic mapping functions? >> >> Yes. > > Just ccurious. Does the same rule apply to other types of element? What > about Tet, Tri, Quad and Prism. So, is the level of solution accuracy > independent with the number of node within the same type of element? The level of solution accuracy is independent of the number of geometric nodes... but libMesh reuses geometric nodes to store degrees of the freedom that have the same topological connectivity, so you usually still need second-order nodes even if you aren't fitting a second-order geometry. As far as I know, the HERMITE elements and the two discontinuous elements are the only way to get better than linear approximations on linear geometric elements. If you try to use finite elements on geometric elements that don't support them, however, you won't just get reduced accuracy, your code will exit with an error. > What about the difference between different element types in terms > of the effect on the quality of the solutions? You can get better solutions (better conditioned matrices, at least) from quadratic elements if you use a mesh smoother that takes advantage of them. Mostly, though, you only need higher order geometric elements to better fit curved domain boundaries. >> Almost certainly it is. None of our finite element classes are as >> optimized as they should be, but I think the Hermite elements may be >> worse than average. >> >> Keep in mind, too, that even if they were equally optimized, the >> Hermite assembly would be more expensive. If you're using the default >> quadrature order (which is designed for nonlinear problems and may be >> gross overkill for you) then I think quadratic hexes will be >> calculating at 27 points and cubic hexes will be calculating at 64. > > That explains why the ouput filesize from hermite is much larger than > lagrange. No, it doesn't. I'm talking about quadrature points here, and the size of your final matrix is (with few exceptions) independent of the quadrature rule you use to calculate it. I can see why that's confusing, though: by coincidence the number of quadrature points is the same as the number of local DoFs for both elements here. Of course, it's not just the quadrature rule that's important. Having 64 local DoFs instead of 27 also increases calculation time. Finally, on uniform meshes Hermite cube DoFs usually couple to 216 DoFs rather than 27, 45, or 125, which is probably what's increasing your output file size. > Does that mean, even the matrix dimension is the same, but hermit > produces more entries in the matrix, thus it's less sparse? Yes. Increasing polynomial order requires more bandwidth, so does increaing continuity, and going from quadratic Lagrange to cubic Hermite does both at once. --- Roy |