From: Salazar De T. M. <sal...@ll...> - 2017-11-11 21:14:19
|
Hello I have an optimization problem for which at a certain iteration, PETSc fails to build the preconditioner. I would like to catch that error and print my EquationSystem to disk so I can rerun the PETSc solve at that optimization iteration from the start. I cannot just rerun my optimization and write to disk at the iteration I know it is going to fail. The reason is that in parallel the optimization slightly differs every time I run it so the iteration it is going to fail at keeps changing. Maybe something along this lines: try{ system.solve() } catch(...){ EquationSystem & es = system.get_equation_systems(); es.write("eq_output.xdr") } Thanks Miguel |
From: Jed B. <je...@je...> - 2017-11-11 21:43:42
|
What is your error message? If the error is raised in PETSc, there is usually a setting to make it return without raising an error (instead setting a "diverged" reason). "Salazar De Troya, Miguel" <sal...@ll...> writes: > Hello > > > I have an optimization problem for which at a certain iteration, PETSc fails to build the preconditioner. I would like to catch that error and print my EquationSystem to disk so I can rerun the PETSc solve at that optimization iteration from the start. I cannot just rerun my optimization and write to disk at the iteration I know it is going to fail. The reason is that in parallel the optimization slightly differs every time I run it so the iteration it is going to fail at keeps changing. Maybe something along this lines: > > > try{ > > system.solve() > > } > > catch(...){ > > EquationSystem & es = system.get_equation_systems(); > > es.write("eq_output.xdr") > > } > > > Thanks > > Miguel > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Libmesh-users mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-users |
From: Salazar De T. M. <sal...@ll...> - 2017-11-12 19:20:50
|
The petsc options for the system are as follows: -Elasticity_ksp_monitor_true_residual -Elasticity_ksp_converged_reason -Elasticity_ksp_type cg -Elasticity_log_view -Elasticity_mg_levels_esteig_ksp_type cg -Elasticity_mg_levels_ksp_chebyshev_esteig_steps 50 -Elasticity_mg_levels_ksp_type chebyshev -Elasticity_mg_levels_pc_type sor -Elasticity_pc_type gamg -Elasticity_pc_gamg_verbose 7 -Elasticity_pc_gamg_type agg -Elasticity_pc_gamg_agg_nsmooths 1 -Elasticity_pc_gamg_threshold 0.001 -Elasticity_snes_linesearch_type basic -Elasticity_snes_atol 1e-6 -Elasticity_ksp_atol 1e-7 -Elasticity_ksp_rtol 1e-9 -Elasticity_ksp_norm_type unpreconditioned I pass the rigid body modes to the solver. ________________________________ From: Jed Brown <je...@je...> Sent: Saturday, November 11, 2017 1:43:32 PM To: Salazar De Troya, Miguel; lib...@li... Subject: Re: [Libmesh-users] try-catch in a PETSc solve() What is your error message? If the error is raised in PETSc, there is usually a setting to make it return without raising an error (instead setting a "diverged" reason). "Salazar De Troya, Miguel" <sal...@ll...> writes: > Hello > > > I have an optimization problem for which at a certain iteration, PETSc fails to build the preconditioner. I would like to catch that error and print my EquationSystem to disk so I can rerun the PETSc solve at that optimization iteration from the start. I cannot just rerun my optimization and write to disk at the iteration I know it is going to fail. The reason is that in parallel the optimization slightly differs every time I run it so the iteration it is going to fail at keeps changing. Maybe something along this lines: > > > try{ > > system.solve() > > } > > catch(...){ > > EquationSystem & es = system.get_equation_systems(); > > es.write("eq_output.xdr") > > } > > > Thanks > > Miguel > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Libmesh-users mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-users |
From: Salazar De T. M. <sal...@ll...> - 2017-11-12 19:15:14
|
The error is: Linear Elasticity_ solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 PCSETUP_FAILED due to FACTOR_NOERROR The program actually continues, Is there any way I can catch this divergence and do some operation like writing to disk? Thanks Miguel ________________________________ From: Jed Brown <je...@je...> Sent: Saturday, November 11, 2017 1:43:32 PM To: Salazar De Troya, Miguel; lib...@li... Subject: Re: [Libmesh-users] try-catch in a PETSc solve() What is your error message? If the error is raised in PETSc, there is usually a setting to make it return without raising an error (instead setting a "diverged" reason). "Salazar De Troya, Miguel" <sal...@ll...> writes: > Hello > > > I have an optimization problem for which at a certain iteration, PETSc fails to build the preconditioner. I would like to catch that error and print my EquationSystem to disk so I can rerun the PETSc solve at that optimization iteration from the start. I cannot just rerun my optimization and write to disk at the iteration I know it is going to fail. The reason is that in parallel the optimization slightly differs every time I run it so the iteration it is going to fail at keeps changing. Maybe something along this lines: > > > try{ > > system.solve() > > } > > catch(...){ > > EquationSystem & es = system.get_equation_systems(); > > es.write("eq_output.xdr") > > } > > > Thanks > > Miguel > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Libmesh-users mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-users |
From: Jed B. <je...@je...> - 2017-11-12 20:11:41
|
"Salazar De Troya, Miguel" <sal...@ll...> writes: > The error is: > > > Linear Elasticity_ solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 This is PETSc output due to -Elasticity_ksp_converged_reason It doesn't set an error. > PCSETUP_FAILED due to FACTOR_NOERROR Is there really no other output? Always send all the output. > The program actually continues, Is there any way I can catch this divergence and do some operation like writing to disk? > > Thanks > Miguel > > > > ________________________________ > From: Jed Brown <je...@je...> > Sent: Saturday, November 11, 2017 1:43:32 PM > To: Salazar De Troya, Miguel; lib...@li... > Subject: Re: [Libmesh-users] try-catch in a PETSc solve() > > What is your error message? If the error is raised in PETSc, there is > usually a setting to make it return without raising an error (instead > setting a "diverged" reason). > > "Salazar De Troya, Miguel" <sal...@ll...> writes: > >> Hello >> >> >> I have an optimization problem for which at a certain iteration, PETSc fails to build the preconditioner. I would like to catch that error and print my EquationSystem to disk so I can rerun the PETSc solve at that optimization iteration from the start. I cannot just rerun my optimization and write to disk at the iteration I know it is going to fail. The reason is that in parallel the optimization slightly differs every time I run it so the iteration it is going to fail at keeps changing. Maybe something along this lines: >> >> >> try{ >> >> system.solve() >> >> } >> >> catch(...){ >> >> EquationSystem & es = system.get_equation_systems(); >> >> es.write("eq_output.xdr") >> >> } >> >> >> Thanks >> >> Miguel >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ >> Libmesh-users mailing list >> Lib...@li... >> https://lists.sourceforge.net/lists/listinfo/libmesh-users |
From: Salazar De T. M. <sal...@ll...> - 2017-11-13 00:07:39
|
No. There does not seem to be more output. The program keeps running, but after the first time I see this error, all subsequent solves return a "nan" residual norm and the same preconditioner error pops up. Is there any way to obtain more output? This is the output in the context of the KSPSolve: First solve where the preconditioner fails: Residual norms for Elasticity_ solve. 0 KSP unpreconditioned resid norm 1.421591570438e+02 true resid norm 1.421591570438e+02 ||r(i)||/||b|| 1.000000000000e+00 Linear Elasticity_ solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 PCSETUP_FAILED due to FACTOR_NOERROR Following solves (the optimization keeps running) all look like this: Residual norms for Elasticity_ solve. 0 KSP unpreconditioned resid norm 1.450166975748e+02 true resid norm -nan ||r(i)||/||b|| -nan Linear Elasticity_ solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 PCSETUP_FAILED due to FACTOR_NOERROR Any way I can catch this PCSETUP_FAILED due to FACTOR_NOERROR? Miguel ________________________________ From: Jed Brown <je...@je...> Sent: Sunday, November 12, 2017 12:11:29 PM To: Salazar De Troya, Miguel; lib...@li... Subject: Re: [Libmesh-users] try-catch in a PETSc solve() "Salazar De Troya, Miguel" <sal...@ll...> writes: > The error is: > > > Linear Elasticity_ solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 This is PETSc output due to -Elasticity_ksp_converged_reason It doesn't set an error. > PCSETUP_FAILED due to FACTOR_NOERROR Is there really no other output? Always send all the output. > The program actually continues, Is there any way I can catch this divergence and do some operation like writing to disk? > > Thanks > Miguel > > > > ________________________________ > From: Jed Brown <je...@je...> > Sent: Saturday, November 11, 2017 1:43:32 PM > To: Salazar De Troya, Miguel; lib...@li... > Subject: Re: [Libmesh-users] try-catch in a PETSc solve() > > What is your error message? If the error is raised in PETSc, there is > usually a setting to make it return without raising an error (instead > setting a "diverged" reason). > > "Salazar De Troya, Miguel" <sal...@ll...> writes: > >> Hello >> >> >> I have an optimization problem for which at a certain iteration, PETSc fails to build the preconditioner. I would like to catch that error and print my EquationSystem to disk so I can rerun the PETSc solve at that optimization iteration from the start. I cannot just rerun my optimization and write to disk at the iteration I know it is going to fail. The reason is that in parallel the optimization slightly differs every time I run it so the iteration it is going to fail at keeps changing. Maybe something along this lines: >> >> >> try{ >> >> system.solve() >> >> } >> >> catch(...){ >> >> EquationSystem & es = system.get_equation_systems(); >> >> es.write("eq_output.xdr") >> >> } >> >> >> Thanks >> >> Miguel >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ >> Libmesh-users mailing list >> Lib...@li... >> https://lists.sourceforge.net/lists/listinfo/libmesh-users |
From: Jed B. <je...@je...> - 2017-11-13 04:03:42
|
"Salazar De Troya, Miguel" <sal...@ll...> writes: > No. There does not seem to be more output. The program keeps running, but after the first time I see this error, all subsequent solves return a "nan" residual norm and the same preconditioner error pops up. Is there any way to obtain more output? This is the output in the context of the KSPSolve: > > > First solve where the preconditioner fails: > > > Residual norms for Elasticity_ solve. > 0 KSP unpreconditioned resid norm 1.421591570438e+02 true resid norm 1.421591570438e+02 ||r(i)||/||b|| 1.000000000000e+00 > Linear Elasticity_ solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 > PCSETUP_FAILED due to FACTOR_NOERROR What code is printing this last line? The PC isn't of type Factor so I don't see why it would be printed. You could try running with -info to get more detailed information from PETSc. Or run in a debugger. The Libmesh/user code on the outside probably shouldn't naively continue after the first error. It could, for example, recompute the operators with a smaller time step and retry the solve. > Following solves (the optimization keeps running) all look like this: > > Residual norms for Elasticity_ solve. > 0 KSP unpreconditioned resid norm 1.450166975748e+02 true resid norm -nan ||r(i)||/||b|| -nan > Linear Elasticity_ solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 > PCSETUP_FAILED due to FACTOR_NOERROR > > Any way I can catch this PCSETUP_FAILED due to FACTOR_NOERROR? > > > Miguel > > ________________________________ > From: Jed Brown <je...@je...> > Sent: Sunday, November 12, 2017 12:11:29 PM > To: Salazar De Troya, Miguel; lib...@li... > Subject: Re: [Libmesh-users] try-catch in a PETSc solve() > > "Salazar De Troya, Miguel" <sal...@ll...> writes: > >> The error is: >> >> >> Linear Elasticity_ solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 > > This is PETSc output due to > > -Elasticity_ksp_converged_reason > > It doesn't set an error. > >> PCSETUP_FAILED due to FACTOR_NOERROR > > Is there really no other output? Always send all the output. > >> The program actually continues, Is there any way I can catch this divergence and do some operation like writing to disk? >> >> Thanks >> Miguel >> >> >> >> ________________________________ >> From: Jed Brown <je...@je...> >> Sent: Saturday, November 11, 2017 1:43:32 PM >> To: Salazar De Troya, Miguel; lib...@li... >> Subject: Re: [Libmesh-users] try-catch in a PETSc solve() >> >> What is your error message? If the error is raised in PETSc, there is >> usually a setting to make it return without raising an error (instead >> setting a "diverged" reason). >> >> "Salazar De Troya, Miguel" <sal...@ll...> writes: >> >>> Hello >>> >>> >>> I have an optimization problem for which at a certain iteration, PETSc fails to build the preconditioner. I would like to catch that error and print my EquationSystem to disk so I can rerun the PETSc solve at that optimization iteration from the start. I cannot just rerun my optimization and write to disk at the iteration I know it is going to fail. The reason is that in parallel the optimization slightly differs every time I run it so the iteration it is going to fail at keeps changing. Maybe something along this lines: >>> >>> >>> try{ >>> >>> system.solve() >>> >>> } >>> >>> catch(...){ >>> >>> EquationSystem & es = system.get_equation_systems(); >>> >>> es.write("eq_output.xdr") >>> >>> } >>> >>> >>> Thanks >>> >>> Miguel >>> ------------------------------------------------------------------------------ >>> Check out the vibrant tech community on one of the world's most >>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >>> _______________________________________________ >>> Libmesh-users mailing list >>> Lib...@li... >>> https://lists.sourceforge.net/lists/listinfo/libmesh-users |
From: John P. <jwp...@gm...> - 2017-11-13 15:01:24
|
On Sat, Nov 11, 2017 at 2:14 PM, Salazar De Troya, Miguel < sal...@ll...> wrote: > Hello > > > I have an optimization problem for which at a certain iteration, PETSc > fails to build the preconditioner. I would like to catch that error and > print my EquationSystem to disk so I can rerun the PETSc solve at that > optimization iteration from the start. I cannot just rerun my optimization > and write to disk at the iteration I know it is going to fail. The reason > is that in parallel the optimization slightly differs every time I run it > so the iteration it is going to fail at keeps changing. Maybe something > along this lines: > > > try{ > > system.solve() > > } > > catch(...){ > > EquationSystem & es = system.get_equation_systems(); > > es.write("eq_output.xdr") > > } > I'm pretty sure we don't throw an exception when a solve fails, so there's not going to be anything for you to catch here... For a LinearImplicitSystem, the usual approach is to inspect the value of system.get_linear_solver()->get_converged_reason(); and then decide what to do (retry the last time step, etc.) from there. When you say "optimization problem" are you actually using the TAO-based OptimizationSystem class that is in libmesh? -- John |
From: Jed B. <je...@je...> - 2017-11-15 02:51:53
|
"Salazar De Troya, Miguel" <sal...@ll...> writes: > I ran the code with –info and got a quite lengthy output that I am attaching. For some reason, now the error is (last line of the attached output): I'm still not sure what triggers the error and with the non-determinism you seem to have just observed, I would start by ruling out memory corruption. Can you reproduce with a smaller problem size and/or fewer processes? Can you run with a memory checker (valgrind is easy but slow, alternative is to compile your whole stack with gcc -fsanitize=address)? If those are clean, it would be faster if you can share code to reproduce versus trying to operate a debugging session via email. Can you share code to reproduce, either on this list or to pet...@mc... (private list with PETSc developers)? > Linear Elasticity_ solve did not converge due to DIVERGED_PCSETUP_FAILED iterations 0 > PCSETUP_FAILED due to SUBPC_ERROR > > Not sure why, because I am not specifying any –sub_pc in my options. Which other messages should I look into? |