Thread: [Libmesh-users] Memory usage for large parallel problem

Brought to you by: benkirk, jwpeterson, roystgnr

libmesh-users

[Libmesh-users] Memory usage for large parallel problem

From: Jens L. E. <jle...@gm...> - 2013-06-18 18:40:55

Hi all,


I'm solving a large 3D linear elasticity (steady) problem. My mesh has 
8.8million nodes, so since this is a vector-valued problem there's 
around 26 million unknowns.

The code is basically systems_of_equations_ex6 (except for the mesh 
generation part, and I also do not compute stresses).

I have configured petsc with --download-ml, and I am running my program with

mpirun -np 20 ./elasticity-opt -ksp_type cg -pc_type gamg 
-pc_gamg_agg_nsmooths 1 -ksp_monitor -ksp_converged_reason -log_summary

that is, CG as the iterative solver and AMG as preconditioner.

The problem is the huge amount of memory that this solve requires --- I 
have 128Gb memory, and I run out! Due to the large problem size I was of 
course expecting significant memory consumption, but not this bad. I did 
try ParallelMesh, but that did not change things.

Am I doing something obviously wrong here?


Thanks,
Jens

Re: [Libmesh-users] Memory usage for large parallel problem

From: John P. <jwp...@gm...> - 2013-06-18 18:46:22

On Tue, Jun 18, 2013 at 12:41 PM, Jens Lohne Eftang <jle...@gm...>wrote:

> Hi all,
>
>
> I'm solving a large 3D linear elasticity (steady) problem. My mesh has
> 8.8million nodes, so since this is a vector-valued problem there's
> around 26 million unknowns.
>
> The code is basically systems_of_equations_ex6 (except for the mesh
> generation part, and I also do not compute stresses).
>
> I have configured petsc with --download-ml, and I am running my program
> with
>
> mpirun -np 20 ./elasticity-opt -ksp_type cg -pc_type gamg
> -pc_gamg_agg_nsmooths 1 -ksp_monitor -ksp_converged_reason -log_summary
>
> that is, CG as the iterative solver and AMG as preconditioner.
>
> The problem is the huge amount of memory that this solve requires --- I
> have 128Gb memory, and I run out! Due to the large problem size I was of
> course expecting significant memory consumption, but not this bad. I did
> try ParallelMesh, but that did not change things.
>
> Am I doing something obviously wrong here?
>

Do you still run out of memory if you run without GAMG?

There could be some GAMG options that control memory consumption, I don't
know too much about it.

-- 
John

Re: [Libmesh-users] Memory usage for large parallel problem

From: Cody P. <cod...@gm...> - 2013-06-18 18:48:18

On Tue, Jun 18, 2013 at 12:41 PM, Jens Lohne Eftang <jle...@gm...>wrote:

> Hi all,
>
>
> I'm solving a large 3D linear elasticity (steady) problem. My mesh has
> 8.8million nodes, so since this is a vector-valued problem there's
> around 26 million unknowns.
>
> The code is basically systems_of_equations_ex6 (except for the mesh
> generation part, and I also do not compute stresses).
>
> I have configured petsc with --download-ml, and I am running my program
> with
>
> mpirun -np 20 ./elasticity-opt -ksp_type cg -pc_type gamg
> -pc_gamg_agg_nsmooths 1 -ksp_monitor -ksp_converged_reason -log_summary
>
> that is, CG as the iterative solver and AMG as preconditioner.
>
> The problem is the huge amount of memory that this solve requires --- I
> have 128Gb memory, and I run out! Due to the large problem size I was of
> course expecting significant memory consumption, but not this bad. I did
> try ParallelMesh, but that did not change things.
>


Is that total memory across all 20 processes, per node, or per process?
 128GB total is not unreasonable for a problem of this size.  Typically you
would spread this out over several nodes though so it could run.  One way
to drastically reduce total memory consumption would be to implement
threading for your Jacobian and residual callbacks.

Cody


>
> Am I doing something obviously wrong here?
>
>
> Thanks,
> Jens
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Libmesh-users mailing list
> Lib...@li...
> https://lists.sourceforge.net/lists/listinfo/libmesh-users
>

Re: [Libmesh-users] Memory usage for large parallel problem

From: Jens L. E. <jle...@gm...> - 2013-06-18 19:19:09

On 06/18/2013 02:48 PM, Cody Permann wrote:
>
>
>
> On Tue, Jun 18, 2013 at 12:41 PM, Jens Lohne Eftang 
> <jle...@gm... <mailto:jle...@gm...>> wrote:
>
>     Hi all,
>
>
>     I'm solving a large 3D linear elasticity (steady) problem. My mesh has
>     8.8million nodes, so since this is a vector-valued problem there's
>     around 26 million unknowns.
>
>     The code is basically systems_of_equations_ex6 (except for the mesh
>     generation part, and I also do not compute stresses).
>
>     I have configured petsc with --download-ml, and I am running my
>     program with
>
>     mpirun -np 20 ./elasticity-opt -ksp_type cg -pc_type gamg
>     -pc_gamg_agg_nsmooths 1 -ksp_monitor -ksp_converged_reason
>     -log_summary
>
>     that is, CG as the iterative solver and AMG as preconditioner.
>
>     The problem is the huge amount of memory that this solve requires
>     --- I
>     have 128Gb memory, and I run out! Due to the large problem size I
>     was of
>     course expecting significant memory consumption, but not this bad.
>     I did
>     try ParallelMesh, but that did not change things.
>
>
>
> Is that total memory across all 20 processes, per node, or per 
> process?  128GB total is not unreasonable for a problem of this size. 
>  Typically you would spread this out over several nodes though so it 
> could run.  One way to drastically reduce total memory consumption 
> would be to implement threading for your Jacobian and residual callbacks.

Hmm ok... the memory consumption I reported was total across all 20 
processes. I'll  look into some solver options and see if I can get it 
under my 128 threshold.

Thanks!

>
> Cody
>
>
>     Am I doing something obviously wrong here?
>
>
>     Thanks,
>     Jens
>
>     ------------------------------------------------------------------------------
>     This SF.net email is sponsored by Windows:
>
>     Build for Windows Store.
>
>     http://p.sf.net/sfu/windows-dev2dev
>     _______________________________________________
>     Libmesh-users mailing list
>     Lib...@li...
>     <mailto:Lib...@li...>
>     https://lists.sourceforge.net/lists/listinfo/libmesh-users
>
>

Re: [Libmesh-users] Memory usage for large parallel problem

From: Jens L. E. <jle...@gm...> - 2013-06-18 18:58:47

On 06/18/2013 02:45 PM, John Peterson wrote:
>
>
>
> On Tue, Jun 18, 2013 at 12:41 PM, Jens Lohne Eftang 
> <jle...@gm... <mailto:jle...@gm...>> wrote:
>
>     Hi all,
>
>
>     I'm solving a large 3D linear elasticity (steady) problem. My mesh has
>     8.8million nodes, so since this is a vector-valued problem there's
>     around 26 million unknowns.
>
>     The code is basically systems_of_equations_ex6 (except for the mesh
>     generation part, and I also do not compute stresses).
>
>     I have configured petsc with --download-ml, and I am running my
>     program with
>
>     mpirun -np 20 ./elasticity-opt -ksp_type cg -pc_type gamg
>     -pc_gamg_agg_nsmooths 1 -ksp_monitor -ksp_converged_reason
>     -log_summary
>
>     that is, CG as the iterative solver and AMG as preconditioner.
>
>     The problem is the huge amount of memory that this solve requires
>     --- I
>     have 128Gb memory, and I run out! Due to the large problem size I
>     was of
>     course expecting significant memory consumption, but not this bad.
>     I did
>     try ParallelMesh, but that did not change things.
>
>     Am I doing something obviously wrong here?
>
>
> Do you still run out of memory if you run without GAMG?
>
> There could be some GAMG options that control memory consumption, I 
> don't know too much about it.
I am able to to solve the problem with -pc_type bjacobi and -sub_pc_type 
icc, but that still uses a lot of memory, around 60Gb. And also this 
required more than 5000 CG iterations which is why I moved to AMG.

Jens
>
> -- 
> John

Re: [Libmesh-users] Memory usage for large parallel problem

From: John P. <jwp...@gm...> - 2013-06-18 19:05:45

On Tue, Jun 18, 2013 at 12:58 PM, Jens Lohne Eftang <jle...@gm...>
wrote:
>
> On 06/18/2013 02:45 PM, John Peterson wrote:
>
>
>
>
> On Tue, Jun 18, 2013 at 12:41 PM, Jens Lohne Eftang <jle...@gm...>
wrote:
>>
>> Hi all,
>>
>>
>> I'm solving a large 3D linear elasticity (steady) problem. My mesh has
>> 8.8million nodes, so since this is a vector-valued problem there's
>> around 26 million unknowns.
>>
>> The code is basically systems_of_equations_ex6 (except for the mesh
>> generation part, and I also do not compute stresses).
>>
>> I have configured petsc with --download-ml, and I am running my program
with
>>
>> mpirun -np 20 ./elasticity-opt -ksp_type cg -pc_type gamg
>> -pc_gamg_agg_nsmooths 1 -ksp_monitor -ksp_converged_reason -log_summary
>>
>> that is, CG as the iterative solver and AMG as preconditioner.
>>
>> The problem is the huge amount of memory that this solve requires --- I
>> have 128Gb memory, and I run out! Due to the large problem size I was of
>> course expecting significant memory consumption, but not this bad. I did
>> try ParallelMesh, but that did not change things.
>>
>> Am I doing something obviously wrong here?
>
>
> Do you still run out of memory if you run without GAMG?
>
> There could be some GAMG options that control memory consumption, I don't
know too much about it.
>
> I am able to to solve the problem with -pc_type bjacobi and -sub_pc_type
icc, but that still uses a lot of memory, around 60Gb. And also this
required more than 5000 CG iterations which is why I moved to AMG.


OK, switching from bjacobi -> GAMG caused memory consumption to more than
double?!  I'd definitely look into the GAMG options...

Another possibility is that you could build PETSc with Hypre and run with:

-pc_type hypre -pc_hypre_type boomeramg
-pc_hypre_boomeramg_strong_threshold 0.7

--
John

Re: [Libmesh-users] Memory usage for large parallel problem

From: Jens L. E. <jle...@gm...> - 2013-06-24 13:18:25

On 06/18/2013 03:05 PM, John Peterson wrote:
> > Do you still run out of memory if you run without GAMG?
> >
> > There could be some GAMG options that control memory consumption, I 
> don't know too much about it.
> >
> > I am able to to solve the problem with -pc_type bjacobi and 
> -sub_pc_type icc, but that still uses a lot of memory, around 60Gb. 
> And also this required more than 5000 CG iterations which is why I 
> moved to AMG.
>
>
> OK, switching from bjacobi -> GAMG caused memory consumption to more 
> than double?!  I'd definitely look into the GAMG options...
>
> Another possibility is that you could build PETSc with Hypre and run 
> with:
>
> -pc_type hypre -pc_hypre_type boomeramg 
> -pc_hypre_boomeramg_strong_threshold 0.7

Thanks. This works, but the number of CG iterations still don't seem 
quite right (more than 1000 to get to rtol 1e-4).

Do you know if I somehow have to tell hypre that I am solving a vector 
problem (I'm guessing otherwise it would not be able to exploit the 
elliptic structure)?

Jens


>
> --
> John

Re: [Libmesh-users] Memory usage for large parallel problem

From: Jens L. E. <jle...@gm...> - 2013-06-24 18:51:50

On 06/24/2013 09:18 AM, Jens Lohne Eftang wrote:
>
> Do you know if I somehow have to tell hypre that I am solving a vector 
> problem (I'm guessing otherwise it would not be able to exploit the 
> elliptic structure)?
>
> Jens
So I'm now doing this:

mpirun -np 8 ./elasticity-opt --node_major_dofs -ksp_type cg -pc_type 
fieldsplit -pc_fieldsplit_block_size 3 -fieldsplit_pc_type hypre 
-fieldsplit_pc_hypre_type boomeramg 
-fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7 
-ksp_converged_reason -pc_fieldsplit_0 0,1,2 -ksp_atol 1e-6 -log_summary 
-ksp_monitor -pc_fieldsplit_type symmetric_multiplicative

where I believe the idea is to use a block Jacobi preconditioner first, 
but where there are only three blocks that correspond to each of the 
three fields, and then within each block we apply AMG. This reduced the 
number of iterations from 1000+ to 71.

Memory consumption is still an issue though as this requires close to 
128Gb, so if anyone has experience with large problems like this and 
ideas on how to reduce memory footprint that would be appreciated.

One question for ParallelMesh: will this work by just changing Mesh to 
ParallelMesh?

Thanks,
Jens

Re: [Libmesh-users] Memory usage for large parallel problem

From: Cody P. <cod...@gm...> - 2013-06-25 01:10:30

Sent from my iPhone

On Jun 24, 2013, at 12:51 PM, Jens Lohne Eftang <jle...@gm...> wrote:

> On 06/24/2013 09:18 AM, Jens Lohne Eftang wrote:
>>
>> Do you know if I somehow have to tell hypre that I am solving a vector
>> problem (I'm guessing otherwise it would not be able to exploit the
>> elliptic structure)?
>>
>> Jens
> So I'm now doing this:
>
> mpirun -np 8 ./elasticity-opt --node_major_dofs -ksp_type cg -pc_type
> fieldsplit -pc_fieldsplit_block_size 3 -fieldsplit_pc_type hypre
> -fieldsplit_pc_hypre_type boomeramg
> -fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7
> -ksp_converged_reason -pc_fieldsplit_0 0,1,2 -ksp_atol 1e-6 -log_summary
> -ksp_monitor -pc_fieldsplit_type symmetric_multiplicative
>
> where I believe the idea is to use a block Jacobi preconditioner first,
> but where there are only three blocks that correspond to each of the
> three fields, and then within each block we apply AMG. This reduced the
> number of iterations from 1000+ to 71.
>
> Memory consumption is still an issue though as this requires close to
> 128Gb, so if anyone has experience with large problems like this and
> ideas on how to reduce memory footprint that would be appreciated.
>
> One question for ParallelMesh: will this work by just changing Mesh to
> ParallelMesh?
>
Actually, this is a configure-time option, --enable-parmesh.  If you
haven't explicitly used SerialMesh in your code, you should be good to
go.

Cody

> Thanks,
> Jens
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Libmesh-users mailing list
> Lib...@li...
> https://lists.sourceforge.net/lists/listinfo/libmesh-users

Re: [Libmesh-users] Memory usage for large parallel problem

From: Jens L. E. <jle...@gm...> - 2013-07-18 21:07:54

Hi all,


So apparently I was somehow lucky when the below worked. If I use a 
different mesh, for example as in ex6 under systems_of_equations, the 
options below do not work. I get the error:

[0]PETSC ERROR: Nonconforming object sizes!
[0]PETSC ERROR: Index set does not match blocks!

I've tried different petsc versions (also the most recent) and there's 
no difference. It does not seem to matter whether the mesh is 
libmesh-generated or not (the mesh for which the options below work is a 
cubit-generated exodus mesh, but I also have cubit-generated exodus 
meshes for which it doesn't work. I've not been able to identify what 
causes the error).

Any ideas?


Best,
Jens




On 06/24/2013 09:10 PM, Cody Permann wrote:
> Sent from my iPhone
>
> On Jun 24, 2013, at 12:51 PM, Jens Lohne Eftang <jle...@gm...> wrote:
>
>> On 06/24/2013 09:18 AM, Jens Lohne Eftang wrote:
>>> Do you know if I somehow have to tell hypre that I am solving a vector
>>> problem (I'm guessing otherwise it would not be able to exploit the
>>> elliptic structure)?
>>>
>>> Jens
>> So I'm now doing this:
>>
>> mpirun -np 8 ./elasticity-opt --node_major_dofs -ksp_type cg -pc_type
>> fieldsplit -pc_fieldsplit_block_size 3 -fieldsplit_pc_type hypre
>> -fieldsplit_pc_hypre_type boomeramg
>> -fieldsplit_pc_hypre_boomeramg_strong_threshold 0.7
>> -ksp_converged_reason -pc_fieldsplit_0 0,1,2 -ksp_atol 1e-6 -log_summary
>> -ksp_monitor -pc_fieldsplit_type symmetric_multiplicative
>>
>> where I believe the idea is to use a block Jacobi preconditioner first,
>> but where there are only three blocks that correspond to each of the
>> three fields, and then within each block we apply AMG. This reduced the
>> number of iterations from 1000+ to 71.
>>
>> Memory consumption is still an issue though as this requires close to
>> 128Gb, so if anyone has experience with large problems like this and
>> ideas on how to reduce memory footprint that would be appreciated.
>>
>> One question for ParallelMesh: will this work by just changing Mesh to
>> ParallelMesh?
>>
> Actually, this is a configure-time option, --enable-parmesh.  If you
> haven't explicitly used SerialMesh in your code, you should be good to
> go.
>
> Cody
>
>> Thanks,
>> Jens
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Windows:
>>
>> Build for Windows Store.
>>
>> http://p.sf.net/sfu/windows-dev2dev
>> _______________________________________________
>> Libmesh-users mailing list
>> Lib...@li...
>> https://lists.sourceforge.net/lists/listinfo/libmesh-users

Re: [Libmesh-users] Memory usage for large parallel problem

From: John P. <jwp...@gm...> - 2013-07-18 21:28:52

On Thu, Jul 18, 2013 at 3:08 PM, Jens Lohne Eftang <jle...@gm...> wrote:
> Hi all,
>
>
> So apparently I was somehow lucky when the below worked. If I use a
> different mesh, for example as in ex6 under systems_of_equations, the
> options below do not work. I get the error:
>
> [0]PETSC ERROR: Nonconforming object sizes!
> [0]PETSC ERROR: Index set does not match blocks!
>
> I've tried different petsc versions (also the most recent) and there's no
> difference. It does not seem to matter whether the mesh is libmesh-generated
> or not (the mesh for which the options below work is a cubit-generated
> exodus mesh, but I also have cubit-generated exodus meshes for which it
> doesn't work. I've not been able to identify what causes the error).
>
> Any ideas?

Sounds like its related to the new block size stuff that Ben added recently?

You can configure with --disable-blocked-storage and maybe get back to
the way the code worked previously?

--
John

Re: [Libmesh-users] Memory usage for large parallel problem

From: Jens L. E. <jle...@gm...> - 2013-07-18 22:34:14

On 07/18/2013 05:28 PM, John Peterson wrote:
> On Thu, Jul 18, 2013 at 3:08 PM, Jens Lohne Eftang <jle...@gm...> wrote:
>> Hi all,
>>
>>
>> So apparently I was somehow lucky when the below worked. If I use a
>> different mesh, for example as in ex6 under systems_of_equations, the
>> options below do not work. I get the error:
>>
>> [0]PETSC ERROR: Nonconforming object sizes!
>> [0]PETSC ERROR: Index set does not match blocks!
>>
>> I've tried different petsc versions (also the most recent) and there's no
>> difference. It does not seem to matter whether the mesh is libmesh-generated
>> or not (the mesh for which the options below work is a cubit-generated
>> exodus mesh, but I also have cubit-generated exodus meshes for which it
>> doesn't work. I've not been able to identify what causes the error).
>>
>> Any ideas?
> Sounds like its related to the new block size stuff that Ben added recently?
>
> You can configure with --disable-blocked-storage and maybe get back to
> the way the code worked previously?

Thanks! That was indeed the issue.

Does this mean I can't use eg mumps as my solver package?

Re: [Libmesh-users] Memory usage for large parallel problem

From: John P. <jwp...@gm...> - 2013-07-18 22:39:23

On Thu, Jul 18, 2013 at 4:35 PM, Jens Lohne Eftang <jle...@gm...> wrote:
> On 07/18/2013 05:28 PM, John Peterson wrote:
>>
>> On Thu, Jul 18, 2013 at 3:08 PM, Jens Lohne Eftang <jle...@gm...>
>> wrote:
>>>
>>> Hi all,
>>>
>>>
>>> So apparently I was somehow lucky when the below worked. If I use a
>>> different mesh, for example as in ex6 under systems_of_equations, the
>>> options below do not work. I get the error:
>>>
>>> [0]PETSC ERROR: Nonconforming object sizes!
>>> [0]PETSC ERROR: Index set does not match blocks!
>>>
>>> I've tried different petsc versions (also the most recent) and there's no
>>> difference. It does not seem to matter whether the mesh is
>>> libmesh-generated
>>> or not (the mesh for which the options below work is a cubit-generated
>>> exodus mesh, but I also have cubit-generated exodus meshes for which it
>>> doesn't work. I've not been able to identify what causes the error).
>>>
>>> Any ideas?
>>
>> Sounds like its related to the new block size stuff that Ben added
>> recently?
>>
>> You can configure with --disable-blocked-storage and maybe get back to
>> the way the code worked previously?
>
>
> Thanks! That was indeed the issue.
>
> Does this mean I can't use eg mumps as my solver package?

No, I don't think the blocksize optimization being off has anything to
do with MUMPS.

--
John

Re: [Libmesh-users] Memory usage for large parallel problem

From: Kirk, B. (JSC-EG311) <ben...@na...> - 2013-07-18 22:46:38

On Jul 18, 2013, at 3:38 PM, John Peterson <jwp...@gm...> wrote:

> No, I don't think the blocksize optimization being off has anything to
> do with MUMPS.

And also as of 0.9.2 the default configuration will leave the blocked DOF support off until we/(I?) resolve these types of issues!

-Ben