You can subscribe to this list here.
2003 
_{Jan}
(4) 
_{Feb}
(1) 
_{Mar}
(9) 
_{Apr}
(2) 
_{May}
(7) 
_{Jun}
(1) 
_{Jul}
(1) 
_{Aug}
(4) 
_{Sep}
(12) 
_{Oct}
(8) 
_{Nov}
(3) 
_{Dec}
(4) 

2004 
_{Jan}
(1) 
_{Feb}
(21) 
_{Mar}
(31) 
_{Apr}
(10) 
_{May}
(12) 
_{Jun}
(15) 
_{Jul}
(4) 
_{Aug}
(6) 
_{Sep}
(5) 
_{Oct}
(11) 
_{Nov}
(43) 
_{Dec}
(13) 
2005 
_{Jan}
(25) 
_{Feb}
(12) 
_{Mar}
(49) 
_{Apr}
(19) 
_{May}
(104) 
_{Jun}
(60) 
_{Jul}
(10) 
_{Aug}
(42) 
_{Sep}
(15) 
_{Oct}
(12) 
_{Nov}
(6) 
_{Dec}
(4) 
2006 
_{Jan}
(1) 
_{Feb}
(6) 
_{Mar}
(31) 
_{Apr}
(17) 
_{May}
(5) 
_{Jun}
(95) 
_{Jul}
(38) 
_{Aug}
(44) 
_{Sep}
(6) 
_{Oct}
(8) 
_{Nov}
(21) 
_{Dec}

2007 
_{Jan}
(5) 
_{Feb}
(46) 
_{Mar}
(9) 
_{Apr}
(23) 
_{May}
(17) 
_{Jun}
(51) 
_{Jul}
(41) 
_{Aug}
(4) 
_{Sep}
(28) 
_{Oct}
(71) 
_{Nov}
(193) 
_{Dec}
(20) 
2008 
_{Jan}
(46) 
_{Feb}
(46) 
_{Mar}
(18) 
_{Apr}
(38) 
_{May}
(14) 
_{Jun}
(107) 
_{Jul}
(50) 
_{Aug}
(115) 
_{Sep}
(84) 
_{Oct}
(96) 
_{Nov}
(105) 
_{Dec}
(34) 
2009 
_{Jan}
(89) 
_{Feb}
(93) 
_{Mar}
(119) 
_{Apr}
(73) 
_{May}
(39) 
_{Jun}
(51) 
_{Jul}
(27) 
_{Aug}
(8) 
_{Sep}
(91) 
_{Oct}
(90) 
_{Nov}
(77) 
_{Dec}
(67) 
2010 
_{Jan}
(25) 
_{Feb}
(36) 
_{Mar}
(98) 
_{Apr}
(45) 
_{May}
(25) 
_{Jun}
(60) 
_{Jul}
(17) 
_{Aug}
(36) 
_{Sep}
(48) 
_{Oct}
(45) 
_{Nov}
(65) 
_{Dec}
(39) 
2011 
_{Jan}
(26) 
_{Feb}
(48) 
_{Mar}
(151) 
_{Apr}
(108) 
_{May}
(61) 
_{Jun}
(108) 
_{Jul}
(27) 
_{Aug}
(50) 
_{Sep}
(43) 
_{Oct}
(43) 
_{Nov}
(27) 
_{Dec}
(37) 
2012 
_{Jan}
(56) 
_{Feb}
(120) 
_{Mar}
(72) 
_{Apr}
(57) 
_{May}
(82) 
_{Jun}
(66) 
_{Jul}
(51) 
_{Aug}
(75) 
_{Sep}
(166) 
_{Oct}
(232) 
_{Nov}
(284) 
_{Dec}
(105) 
2013 
_{Jan}
(168) 
_{Feb}
(151) 
_{Mar}
(30) 
_{Apr}
(145) 
_{May}
(26) 
_{Jun}
(53) 
_{Jul}
(76) 
_{Aug}
(33) 
_{Sep}
(23) 
_{Oct}
(72) 
_{Nov}
(125) 
_{Dec}
(38) 
2014 
_{Jan}
(47) 
_{Feb}
(62) 
_{Mar}
(27) 
_{Apr}
(8) 
_{May}
(12) 
_{Jun}
(2) 
_{Jul}
(22) 
_{Aug}
(22) 
_{Sep}

_{Oct}
(17) 
_{Nov}
(20) 
_{Dec}
(12) 
2015 
_{Jan}
(25) 
_{Feb}
(2) 
_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

From: John Peterson <jwpeterson@gm...>  20150209 22:24:45

On Mon, Feb 9, 2015 at 2:53 PM, Paul T. Bauman <ptbauman@...> wrote: > So it looks like gdb_backtrace() makes a system() call to print the stack > trace info from gdb. Does the system() call not inherit the environment > from that which launched libMeshbased application? > > I ask because I've noticed that this will hang if I have a local version > of gdb loaded in my environment, e.g. I've compiled a newer GDB than what's > on my system, made a module, and loaded that module such that it's in the > front of my $PATH. If I don't have that newer gdb version loaded, > everything is fine  the stack trace prints as it's supposed to and all is > well. > > So, if I'm understanding the issue correctly, would be there interest in > me adding a withgdb configure option that would call a user supplied gdb > version? > > My motivation is that this hang can be really annoying because if it's > hanging during a make check, I have to manually kill the process, ctrlc is > insufficient. > gdb_backtrace() isn't the most robust thing in the world... we have also had some issues with it on our clusters here. There are better ways of launching external programs than by using the system() command. I saw some examples using execvp with fork but they were more complicated, so I went with the simplest solution. I think the gdb_backtrac() capability should probably be disabled by default... while it can be useful for providing line numbers where the backtrace() call does not, the potential benefit does not outweigh a hung code during a crash in my opinion. That said, the ability to specify what gdb to use would also be a useful enhancement, especially if it fixes the problem for you...  John 
From: Paul T. Bauman <ptbauman@gm...>  20150209 21:53:10

So it looks like gdb_backtrace() makes a system() call to print the stack trace info from gdb. Does the system() call not inherit the environment from that which launched libMeshbased application? I ask because I've noticed that this will hang if I have a local version of gdb loaded in my environment, e.g. I've compiled a newer GDB than what's on my system, made a module, and loaded that module such that it's in the front of my $PATH. If I don't have that newer gdb version loaded, everything is fine  the stack trace prints as it's supposed to and all is well. So, if I'm understanding the issue correctly, would be there interest in me adding a withgdb configure option that would call a user supplied gdb version? My motivation is that this hang can be really annoying because if it's hanging during a make check, I have to manually kill the process, ctrlc is insufficient. Thanks, Paul 
From: Jed Brown <jed@je...>  20150123 02:40:27

Roy Stogner <roystgnr@...> writes: >> Interesting. Can you include the full unpreconditioned output? And >> please compare with and without ksp_gmres_modifiedgramschmidt. > > From "./mainopt ksp_monitor_true_residual ksp_norm_type unpreconditioned": > 0 KSP unpreconditioned resid norm 1.137340546775e01 true resid norm 1.137340546775e01 r(i)/b 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 1.851221771429e02 true resid norm 1.850940946785e02 r(i)/b 1.627428963148e01 > 2 KSP unpreconditioned resid norm 8.058732139350e04 true resid norm 8.487837216519e04 r(i)/b 7.462881052281e03 > 3 KSP unpreconditioned resid norm 1.410067346760e04 true resid norm 3.750759001371e04 r(i)/b 3.297832836441e03 > 4 KSP unpreconditioned resid norm 3.036297998627e06 true resid norm 3.296862054294e04 r(i)/b 2.898746609924e03 > 5 KSP unpreconditioned resid norm 1.628004374290e07 true resid norm 4.048903454605e04 r(i)/b 3.559974596954e03 > 6 KSP unpreconditioned resid norm 6.839537794328e09 true resid norm 2.859920502301e04 r(i)/b 2.514568315013e03 > 7 KSP unpreconditioned resid norm 3.580827398197e10 true resid norm 3.424888565589e04 r(i)/b 3.011313168514e03 > 8 KSP unpreconditioned resid norm 1.371032175862e11 true resid norm 3.818671006476e04 r(i)/b 3.357544068312e03 > 9 KSP unpreconditioned resid norm 1.130911552385e12 true resid norm 2.475276237056e04 r(i)/b 2.176372102511e03 > 10 KSP unpreconditioned resid norm 4.683505266751e14 true resid norm 3.820485023471e04 r(i)/b 3.359139032108e03 > And after adding "ksp_gmres_modifiedgramschmidt" too: > 0 KSP unpreconditioned resid norm 1.137340546775e01 true resid norm 1.137340546775e01 r(i)/b 1.000000000000e+00 > 1 KSP unpreconditioned resid norm 1.851221771429e02 true resid norm 1.850940946785e02 r(i)/b 1.627428963148e01 > 2 KSP unpreconditioned resid norm 8.058732139350e04 true resid norm 8.290754235563e04 r(i)/b 7.289596998076e03 > 3 KSP unpreconditioned resid norm 1.409277416992e04 true resid norm 3.366737464418e04 r(i)/b 2.960184154134e03 > 4 KSP unpreconditioned resid norm 3.016318237424e06 true resid norm 3.904608063041e04 r(i)/b 3.433103720879e03 > 5 KSP unpreconditioned resid norm 1.684322222043e07 true resid norm 1.987083874964e04 r(i)/b 1.747131833643e03 > 6 KSP unpreconditioned resid norm 7.043704865544e09 true resid norm 3.021936047142e04 r(i)/b 2.657019531846e03 > 7 KSP unpreconditioned resid norm 3.538690849000e10 true resid norm 3.937474047247e04 r(i)/b 3.462000944582e03 > 8 KSP unpreconditioned resid norm 2.298664686412e11 true resid norm 2.516665836236e04 r(i)/b 2.212763664649e03 > 9 KSP unpreconditioned resid norm 8.918329988448e13 true resid norm 3.883206450876e04 r(i)/b 3.414286479003e03 > 10 KSP unpreconditioned resid norm 3.074816936477e14 true resid norm 2.987382118533e04 r(i)/b 2.626638192935e03 This seems to be a peculiar linear algebra phenomenon, seemingly due to a bad shift strategy for the indefinite matrix (see below). FGMRES, GCR, and and BiCG manage anyway while GMRES and BiCGStab do not. >> How big is this matrix? Can you write it with "ksp_view_mat binary" >> and send it to me (or post somewhere)? > > Not even 100x100; it's a verywellsimplified version of the real > problem. > > http://users.ices.utexas.edu/~roystgnr/binaryoutput src/ksp/ksp/examples/tutorials$ ./ex10 f binaryoutput rhs [0]PETSC ERROR:  Error Message  [0]PETSC ERROR: Zero pivot in LU factorization: http://www.mcs.anl.gov/petsc/documentation/faq.html#ZeroPivot [0]PETSC ERROR: Zero pivot row 4 value 0 tolerance 2.22045e14 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.5.2939geeb4e01 GIT Date: 20141105 15:13:20 0600 [0]PETSC ERROR: ./ex10 on a mpichg named batura by jed Fri Jan 9 11:43:38 2015 [0]PETSC ERROR: Configure options downloadchaco downloadctetgen downloadgenerator downloadhypre downloadml downloadsundials downloadsuperlu downloadsuperlu_dist downloadtriangle withc2html withexodusii withhdf5 withlgrind withmetis withmpidir=/home/jed/usr/ccache/mpich/ withnetcdf withparmetis withsowing withsuitesparse withx PETSC_ARCH=mpichg COPTFLAGS="Og g" [0]PETSC ERROR: #1 MatPivotCheck_none() line 634 in /home/jed/petsc/include/petscprivate/matimpl.h [0]PETSC ERROR: #2 MatPivotCheck() line 653 in /home/jed/petsc/include/petscprivate/matimpl.h [0]PETSC ERROR: #3 MatLUFactorNumeric_SeqAIJ_Inode() line 1424 in /home/jed/petsc/src/mat/impls/aij/seq/inode.c [0]PETSC ERROR: #4 MatLUFactorNumeric() line 2930 in /home/jed/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #5 PCSetUp_ILU() line 232 in /home/jed/petsc/src/ksp/pc/impls/factor/ilu/ilu.c [0]PETSC ERROR: #6 PCSetUp() line 902 in /home/jed/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #7 KSPSetUp() line 306 in /home/jed/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 main() line 312 in /home/jed/petsc/src/ksp/ksp/examples/tutorials/ex10.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: f binaryoutput [0]PETSC ERROR: malloc_test [0]PETSC ERROR: rhs [0]PETSC ERROR: End of Error Message send entire error message to petscmaint@... $ ./ex10 f binaryoutput rhs pc_type lu [0]PETSC ERROR:  Error Message  [0]PETSC ERROR: Zero pivot in LU factorization: http://www.mcs.anl.gov/petsc/documentation/faq.html#ZeroPivot [0]PETSC ERROR: Zero pivot row 12 value 0 tolerance 2.22045e14 [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting. [0]PETSC ERROR: Petsc Development GIT revision: v3.5.2939geeb4e01 GIT Date: 20141105 15:13:20 0600 [0]PETSC ERROR: ./ex10 on a mpichg named batura by jed Fri Jan 9 11:44:29 2015 [0]PETSC ERROR: Configure options downloadchaco downloadctetgen downloadgenerator downloadhypre downloadml downloadsundials downloadsuperlu downloadsuperlu_dist downloadtriangle withc2html withexodusii withhdf5 withlgrind withmetis withmpidir=/home/jed/usr/ccache/mpich/ withnetcdf withparmetis withsowing withsuitesparse withx PETSC_ARCH=mpichg COPTFLAGS="Og g" [0]PETSC ERROR: #1 MatPivotCheck_none() line 634 in /home/jed/petsc/include/petscprivate/matimpl.h [0]PETSC ERROR: #2 MatPivotCheck() line 653 in /home/jed/petsc/include/petscprivate/matimpl.h [0]PETSC ERROR: #3 MatLUFactorNumeric_SeqAIJ_Inode() line 1424 in /home/jed/petsc/src/mat/impls/aij/seq/inode.c [0]PETSC ERROR: #4 MatLUFactorNumeric() line 2930 in /home/jed/petsc/src/mat/interface/matrix.c [0]PETSC ERROR: #5 PCSetUp_LU() line 152 in /home/jed/petsc/src/ksp/pc/impls/factor/lu/lu.c [0]PETSC ERROR: #6 PCSetUp() line 902 in /home/jed/petsc/src/ksp/pc/interface/precon.c [0]PETSC ERROR: #7 KSPSetUp() line 306 in /home/jed/petsc/src/ksp/ksp/interface/itfunc.c [0]PETSC ERROR: #8 main() line 312 in /home/jed/petsc/src/ksp/ksp/examples/tutorials/ex10.c [0]PETSC ERROR: PETSc Option Table entries: [0]PETSC ERROR: f binaryoutput [0]PETSC ERROR: malloc_test [0]PETSC ERROR: pc_type lu [0]PETSC ERROR: rhs [0]PETSC ERROR: End of Error Message send entire error message to petscmaint@... A "positive definite" shift works, but converges very slowly. $ ./ex10 f binaryoutput rhs ksp_converged_reason ksp_monitor_true_residual ksp_rtol 1e10 pc_factor_shift_type positive_definite 0 KSP preconditioned resid norm 5.126361879870e+01 true resid norm 9.899494936612e+00 r(i)/b 1.000000000000e+00 1 KSP preconditioned resid norm 1.742253255361e+01 true resid norm 9.945020941036e+00 r(i)/b 1.004598820921e+00 2 KSP preconditioned resid norm 1.656435081546e+01 true resid norm 1.017851736308e+01 r(i)/b 1.028185521408e+00 3 KSP preconditioned resid norm 1.381919181644e+01 true resid norm 1.317793534482e+01 r(i)/b 1.331172492052e+00 [...] 92 KSP preconditioned resid norm 1.209846782924e08 true resid norm 7.516470758142e09 r(i)/b 7.592782062390e10 93 KSP preconditioned resid norm 1.010690565657e08 true resid norm 5.111314393418e09 r(i)/b 5.163207240517e10 94 KSP preconditioned resid norm 6.900556652393e09 true resid norm 3.274216395272e09 r(i)/b 3.307458023098e10 95 KSP preconditioned resid norm 4.665959835776e09 true resid norm 3.435941544160e09 r(i)/b 3.470825093766e10 Linear solve converged due to CONVERGED_RTOL iterations 95 Number of iterations = 95 Residual norm 3.43594e09 The "nonzero" shift does not: $ ./ex10 f binaryoutput rhs ksp_converged_reason ksp_monitor_true_residual ksp_rtol 1e10 pc_factor_shift_type nonzero 0 KSP preconditioned resid norm 7.686671891138e+01 true resid norm 9.899494936612e+00 r(i)/b 1.000000000000e+00 1 KSP preconditioned resid norm 1.125291740842e+01 true resid norm 2.677042946539e+00 r(i)/b 2.704221744322e01 2 KSP preconditioned resid norm 4.169562114751e01 true resid norm 1.053544866606e01 r(i)/b 1.064241027802e02 3 KSP preconditioned resid norm 3.884939867076e02 true resid norm 2.597689725056e02 r(i)/b 2.624062885722e03 4 KSP preconditioned resid norm 1.870022102589e03 true resid norm 2.366328914521e02 r(i)/b 2.390353174251e03 5 KSP preconditioned resid norm 1.006640708018e04 true resid norm 2.362168570871e02 r(i)/b 2.386150592527e03 6 KSP preconditioned resid norm 6.695456376917e06 true resid norm 2.362181124027e02 r(i)/b 2.386163273129e03 7 KSP preconditioned resid norm 2.568374375496e07 true resid norm 2.362165514116e02 r(i)/b 2.386147504737e03 8 KSP preconditioned resid norm 1.798370685705e08 true resid norm 2.362165522405e02 r(i)/b 2.386147513111e03 9 KSP preconditioned resid norm 8.573160013296e10 true resid norm 2.362165478290e02 r(i)/b 2.386147468548e03 Linear solve converged due to CONVERGED_RTOL iterations 9 Number of iterations = 9 Residual norm 0.0236217 Same story with "inblocks", but convergence appears to be fast if you use FGMRES, GCR, BiCG, or CGS. Now back to the failing case, I'm finding that if I write out the explicit preconditioned operator, then read that in with ex10 and execute with no preconditioner, I get nice convergence with either right or leftpreconditioned GMRES. (See reproducible test case below. Now that's weird and I don't have an explanation. I've had this email halfwritten for a while and have tinkered with the code trying to explain it, but haven't had an extended block of time so I figure I should post this in case anyone else has ideas. The next thing I have in mind is to create the same RHS vector (after preconditioning) and compare the iterates which should be identical (though the true residual won't because the operator used to compute the true residual is different). $ ./ex10 f binaryoutput rhs ksp_converged_reason ksp_monitor_true_residual ksp_rtol 1e16 pc_factor_shift_type nonzero ksp_norm_type preconditioned ksp_type gmres ksp_view_preconditioned_operator_explicit binary:pciluleft 0 KSP preconditioned resid norm 7.686671891138e+01 true resid norm 9.899494936612e+00 r(i)/b 1.000000000000e+00 1 KSP preconditioned resid norm 1.125291740842e+01 true resid norm 2.677042946539e+00 r(i)/b 2.704221744322e01 2 KSP preconditioned resid norm 4.169562114751e01 true resid norm 1.053544866606e01 r(i)/b 1.064241027802e02 3 KSP preconditioned resid norm 3.884939867076e02 true resid norm 2.597689725056e02 r(i)/b 2.624062885722e03 4 KSP preconditioned resid norm 1.870022102589e03 true resid norm 2.366328914521e02 r(i)/b 2.390353174251e03 5 KSP preconditioned resid norm 1.006640708018e04 true resid norm 2.362168570871e02 r(i)/b 2.386150592527e03 6 KSP preconditioned resid norm 6.695456376917e06 true resid norm 2.362181124027e02 r(i)/b 2.386163273129e03 7 KSP preconditioned resid norm 2.568374375496e07 true resid norm 2.362165514116e02 r(i)/b 2.386147504737e03 8 KSP preconditioned resid norm 1.798370685705e08 true resid norm 2.362165522405e02 r(i)/b 2.386147513111e03 9 KSP preconditioned resid norm 8.573160013296e10 true resid norm 2.362165478290e02 r(i)/b 2.386147468548e03 10 KSP preconditioned resid norm 4.225844037925e11 true resid norm 2.362165479751e02 r(i)/b 2.386147470024e03 11 KSP preconditioned resid norm 1.874852209366e12 true resid norm 2.362165479842e02 r(i)/b 2.386147470116e03 12 KSP preconditioned resid norm 6.799500125012e14 true resid norm 2.362165479837e02 r(i)/b 2.386147470110e03 13 KSP preconditioned resid norm 7.912052053829e15 true resid norm 2.362165479837e02 r(i)/b 2.386147470110e03 14 KSP preconditioned resid norm 5.551931114634e15 true resid norm 2.362165479837e02 r(i)/b 2.386147470110e03 Linear solve converged due to CONVERGED_RTOL iterations 14 Number of iterations = 14 Residual norm 0.0236217 $ ./ex10 f pciluleft rhs ksp_converged_reason ksp_monitor_true_residual ksp_rtol 1e16 pc_type none ksp_norm_type preconditioned ksp_type gmres 0 KSP preconditioned resid norm 9.899494936612e+00 true resid norm 9.899494936612e+00 r(i)/b 1.000000000000e+00 1 KSP preconditioned resid norm 2.184329742133e+00 true resid norm 2.184329742133e+00 r(i)/b 2.206506247157e01 2 KSP preconditioned resid norm 1.849697585610e01 true resid norm 1.849697585610e01 r(i)/b 1.868476722756e02 3 KSP preconditioned resid norm 7.233163029549e03 true resid norm 7.233163029549e03 r(i)/b 7.306598039460e04 4 KSP preconditioned resid norm 3.819911952092e04 true resid norm 3.819911952090e04 r(i)/b 3.858693778369e05 5 KSP preconditioned resid norm 1.940407608184e05 true resid norm 1.940407608155e05 r(i)/b 1.960107682846e06 6 KSP preconditioned resid norm 1.446966001367e06 true resid norm 1.446966000978e06 r(i)/b 1.461656387769e07 7 KSP preconditioned resid norm 4.828587794298e08 true resid norm 4.828587816553e08 r(i)/b 4.877610269485e09 8 KSP preconditioned resid norm 3.454073475625e09 true resid norm 3.454073609875e09 r(i)/b 3.489141246086e10 9 KSP preconditioned resid norm 2.149824726703e10 true resid norm 2.149827018270e10 r(i)/b 2.171653232853e11 10 KSP preconditioned resid norm 9.143524324851e12 true resid norm 9.143698087696e12 r(i)/b 9.236529889904e13 11 KSP preconditioned resid norm 1.841531740366e13 true resid norm 1.838833676326e13 r(i)/b 1.857502517149e14 12 KSP preconditioned resid norm 9.365474613992e15 true resid norm 9.795791624425e15 r(i)/b 9.895243835316e16 13 KSP preconditioned resid norm 3.696199042068e15 true resid norm 4.825320989910e15 r(i)/b 4.874310276239e16 14 KSP preconditioned resid norm 2.770629172379e15 true resid norm 5.677355330545e15 r(i)/b 5.734994933477e16 15 KSP preconditioned resid norm 2.303066887546e15 true resid norm 5.583218758734e15 r(i)/b 5.639902635926e16 16 KSP preconditioned resid norm 2.012645615368e15 true resid norm 5.463831520276e15 r(i)/b 5.519303313211e16 17 KSP preconditioned resid norm 1.810217685689e15 true resid norm 4.363896801037e15 r(i)/b 4.408201457731e16 18 KSP preconditioned resid norm 1.658676365430e15 true resid norm 5.310534564585e15 r(i)/b 5.364450003348e16 19 KSP preconditioned resid norm 1.539766535244e15 true resid norm 5.279107563524e15 r(i)/b 5.332703938259e16 20 KSP preconditioned resid norm 1.443237688687e15 true resid norm 5.579906258437e15 r(i)/b 5.636556505323e16 21 KSP preconditioned resid norm 1.362848936052e15 true resid norm 4.342661167469e15 r(i)/b 4.386750228446e16 22 KSP preconditioned resid norm 1.294551799862e15 true resid norm 5.338314359525e15 r(i)/b 5.392511833894e16 23 KSP preconditioned resid norm 1.235590344831e15 true resid norm 5.297753476103e15 r(i)/b 5.351539154296e16 24 KSP preconditioned resid norm 1.184014779259e15 true resid norm 5.576591790512e15 r(i)/b 5.633208387115e16 25 KSP preconditioned resid norm 1.138401595244e15 true resid norm 4.325597599836e15 r(i)/b 4.369513422184e16 26 KSP preconditioned resid norm 1.097683968914e15 true resid norm 5.338314359525e15 r(i)/b 5.392511833894e16 27 KSP preconditioned resid norm 1.061044509051e15 true resid norm 5.260395559362e15 r(i)/b 5.313801959641e16 28 KSP preconditioned resid norm 1.027844958802e15 true resid norm 5.482973286448e15 r(i)/b 5.538639417018e16 29 KSP preconditioned resid norm 9.975786885717e16 true resid norm 4.325597599836e15 r(i)/b 4.369513422184e16 30 KSP preconditioned resid norm 9.698377326135e16 true resid norm 5.338314359525e15 r(i)/b 5.392511833894e16 Linear solve converged due to CONVERGED_RTOL iterations 30 Number of iterations = 30 Residual norm < 1.e12 
From: Roy Stogner <roystgnr@ic...>  20150112 22:35:30

On Mon, 12 Jan 2015, Paul T. Bauman wrote: > On Mon, Jan 12, 2015 at 11:32 AM, Paul T. Bauman <ptbauman@...> wrote: > I'm going to try with OpenMPI and see if this is replicated or not. > > The introduction/ex4 and indeed make check all pass in dbg/devel/opt with LIBMESH_RUN="mpiexec np 2" using OpenMPI 1.8.4. :/ Want to hear something even weirder? introduction_ex4 works fine with MPICH when using enablecomplex. This ought to be such a simple problem to debug, too: mpirun np 2 ./exampledevel d 1 n 2 o FIRST Generates just 2 elements, 3 nodes. Hangs in devel and opt modes with MPICH, inside that (empty) send_receive_packed_range of constrained nodes.  Roy 
From: Paul T. Bauman <ptbauman@gm...>  20150112 16:32:14

Running ./exampledbg d 2 n 15 ksp_monitor PARMETIS ERROR: The sum of tpwgts for constraint #0 is not 1.0 PARMETIS ERROR: The sum of tpwgts for constraint #0 is not 1.0 Mesh Information: mesh_dimension()=2 spatial_dimension()=3 n_nodes()=961 n_local_nodes()=513 n_elem()=225 n_local_elem()=114 n_active_elem()=225 n_subdomains()=1 n_partitions()=2 n_processors()=2 n_threads()=1 processor_id()=0 EquationSystems n_systems()=1 System #0, "Poisson" Type "LinearImplicit" Variables="u" Finite Element Types="LAGRANGE", "JACOBI_20_00" Infinite Element Mapping="CARTESIAN" Approximation Orders="SECOND", "THIRD" n_dofs()=961 n_local_dofs()=513 n_constrained_dofs()=120 n_local_constrained_dofs()=62 n_vectors()=1 n_matrices()=1 DofMap Sparsity Average OnProcessor Bandwidth <= 14.7336 Average OffProcessor Bandwidth <= 0.720083 Maximum OnProcessor Bandwidth <= 26 Maximum OffProcessor Bandwidth <= 15 DofMap Constraints Number of DoF Constraints = 120 Number of Heterogenous Constraints= 118 Average DoF Constraint Length= 0 Number of Node Constraints = 0 Mesh Information: mesh_dimension()=2 spatial_dimension()=3 n_nodes()=961 n_local_nodes()=513 n_elem()=225 n_local_elem()=114 n_active_elem()=225 n_subdomains()=1 n_partitions()=2 n_processors()=2 n_threads()=1 processor_id()=0   Processor id: 0   Num Processors: 2   Time: Mon Jan 12 11:22:46 2015   OS: Linux   HostName: fry.eng.buffalo.edu   OS Release: 2.6.32504.3.3.el6.x86_64   OS Version: #1 SMP Fri Dec 12 16:05:43 EST 2014   Machine: x86_64   Username: pbauman   Configuration: ../../libMesh/configure 'prefix=/fry1/data/users/pbauman/software/libs/libmesh/master'  'enableeverything'   'withmetis=PETSc'   'enableparmesh'   'CXX=g++'   'CC=gcc'   'FC=gfortran'   'F77=gfortran'   'PETSC_DIR=/fry1/data/users/pbauman/software/libs/petsc/petsc3.5.2'   'PETSC_ARCH=gcc4.8.2mpich3.0.4openblas0.2.9.rc1cxxopt'   'VTK_INCLUDE=/fry1/data/users/pbauman/software/libs/vtk/5.10.1/gcc/4.8.2/include/vtk5.10'  'VTK_DIR=/fry1/data/users/pbauman/software/libs/vtk/5.10.1/gcc/4.8.2'     Matrix Assembly Performance: Alive time=0.032835, Active time=0.025345    Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time   w/o Sub w/o Sub With Sub With Sub w/o S With S      Fe 114 0.0012 0.000010 0.0012 0.000010 4.65 4.65   Ke 114 0.0068 0.000059 0.0068 0.000059 26.71 26.71   elem init 114 0.0167 0.000147 0.0167 0.000147 66.06 66.06   matrix insertion 114 0.0007 0.000006 0.0007 0.000006 2.59 2.59    Totals: 456 0.0253 100.00   0 KSP Residual norm 6.281555363893e+00 1 KSP Residual norm 1.441578986136e+00 2 KSP Residual norm 8.657661935565e01 3 KSP Residual norm 6.298483075757e01 4 KSP Residual norm 4.584797396291e01 5 KSP Residual norm 3.672004215548e01 6 KSP Residual norm 2.854933103510e01 7 KSP Residual norm 2.207387889462e01 8 KSP Residual norm 1.507528183709e01 9 KSP Residual norm 7.609993007381e02 10 KSP Residual norm 3.870496969236e02 11 KSP Residual norm 1.694470333105e02 12 KSP Residual norm 1.056205718874e02 13 KSP Residual norm 7.962944594936e03 14 KSP Residual norm 6.247891445290e03 15 KSP Residual norm 5.142557174263e03 16 KSP Residual norm 4.064335778374e03 17 KSP Residual norm 2.986839198661e03 18 KSP Residual norm 2.212248591674e03 19 KSP Residual norm 1.711006399209e03 20 KSP Residual norm 1.423591127897e03 21 KSP Residual norm 1.228658042495e03 22 KSP Residual norm 1.021344123622e03 23 KSP Residual norm 8.969047514369e04 24 KSP Residual norm 7.995662891498e04 25 KSP Residual norm 6.652930701941e04 26 KSP Residual norm 5.304353279729e04 27 KSP Residual norm 4.030242592245e04 28 KSP Residual norm 2.730993578525e04 29 KSP Residual norm 1.645810058886e04 30 KSP Residual norm 1.103538427762e04 31 KSP Residual norm 8.610454940155e05 32 KSP Residual norm 6.498404359506e05 33 KSP Residual norm 4.556983417930e05 34 KSP Residual norm 3.231154166980e05 35 KSP Residual norm 2.577303633035e05 36 KSP Residual norm 2.113582798346e05 37 KSP Residual norm 1.752446809300e05 38 KSP Residual norm 1.484672477927e05 39 KSP Residual norm 1.302629184089e05 40 KSP Residual norm 1.172471062290e05 41 KSP Residual norm 1.006971964980e05 42 KSP Residual norm 8.230366349966e06 43 KSP Residual norm 6.171634470124e06 44 KSP Residual norm 4.117409795018e06 45 KSP Residual norm 2.611987673662e06 46 KSP Residual norm 1.745591647676e06 47 KSP Residual norm 1.190676355730e06 48 KSP Residual norm 9.028453572394e07 49 KSP Residual norm 7.224248002419e07 50 KSP Residual norm 5.339195753159e07 51 KSP Residual norm 3.748508543720e07 52 KSP Residual norm 2.677517396278e07 53 KSP Residual norm 1.840889093657e07 54 KSP Residual norm 1.207493202252e07 55 KSP Residual norm 7.670458622698e08 56 KSP Residual norm 4.744749826056e08 57 KSP Residual norm 2.888924255267e08 58 KSP Residual norm 1.855668558305e08 59 KSP Residual norm 1.225261973163e08 60 KSP Residual norm 7.274693375029e09 61 KSP Residual norm 5.678840027999e09 62 KSP Residual norm 4.272621003973e09 63 KSP Residual norm 3.297977189053e09 64 KSP Residual norm 2.507083460763e09 65 KSP Residual norm 1.834020100457e09 66 KSP Residual norm 1.337490896343e09 67 KSP Residual norm 1.047810076369e09 68 KSP Residual norm 7.828493744498e10 69 KSP Residual norm 5.516829101298e10 70 KSP Residual norm 3.616568657065e10 71 KSP Residual norm 2.576313931394e10 72 KSP Residual norm 2.089089652561e10 73 KSP Residual norm 1.754670989786e10 74 KSP Residual norm 1.516055264043e10 75 KSP Residual norm 1.314547789622e10 76 KSP Residual norm 1.151571443845e10 77 KSP Residual norm 1.002170334358e10 78 KSP Residual norm 8.399470485732e11 79 KSP Residual norm 6.720961716070e11 80 KSP Residual norm 4.385649839608e11 81 KSP Residual norm 2.932111979815e11 82 KSP Residual norm 1.808693464627e11 83 KSP Residual norm 1.155116888517e11 84 KSP Residual norm 7.787886447474e12 85 KSP Residual norm 5.539837062396e12 Warning: This MeshOutput subclass only supports meshes which have been serialized! Warning: This MeshOutput subclass only supports meshes which have been serialized!   Reference count information    N7libMesh10FEAbstractE reference count information:  Creations: 4  Destructions: 4  N7libMesh10Parameters5ValueE reference count information:  Creations: 2  Destructions: 2  N7libMesh12LinearSolverIdEE reference count information:  Creations: 1  Destructions: 1  N7libMesh12SparseMatrixIdEE reference count information:  Creations: 1  Destructions: 1  N7libMesh13NumericVectorIdEE reference count information:  Creations: 5  Destructions: 5  N7libMesh15EquationSystemsE reference count information:  Creations: 1  Destructions: 1  N7libMesh4ElemE reference count information:  Creations: 2632  Destructions: 2632  N7libMesh4NodeE reference count information:  Creations: 1333  Destructions: 1333  N7libMesh5QBaseE reference count information:  Creations: 5  Destructions: 5  N7libMesh6DofMapE reference count information:  Creations: 1  Destructions: 1  N7libMesh6SystemE reference count information:  Creations: 1  Destructions: 1  N7libMesh9DofObjectE reference count information:  Creations: 3965  Destructions: 3965    libMesh Performance: Alive time=1.62748, Active time=1.36299    Event nCalls Total Time Avg Time Total Time Avg Time % of Active Time   w/o Sub w/o Sub With Sub With Sub w/o S With S        DofMap   add_neighbors_to_send_list() 1 0.0050 0.005017 0.0073 0.007321 0.37 0.54   build_constraint_matrix_and_vector() 114 0.0018 0.000016 0.0018 0.000016 0.13 0.13   build_sparsity() 1 0.0313 0.031347 0.0378 0.037773 2.30 2.77   create_dof_constraints() 1 0.0072 0.007207 0.0170 0.017019 0.53 1.25   distribute_dofs() 1 0.1193 0.119323 0.6241 0.624148 8.75 45.79   dof_indices() 524 0.0257 0.000049 0.0257 0.000049 1.89 1.89   hetero_cnstrn_elem_mat_vec() 114 0.0011 0.000010 0.0011 0.000010 0.08 0.08   prepare_send_list() 1 0.0006 0.000619 0.0006 0.000619 0.05 0.05   reinit() 1 0.0148 0.014765 0.0148 0.014765 1.08 1.08     EquationSystems   build_solution_vector() 1 0.0019 0.001909 0.0077 0.007734 0.14 0.57     ExodusII_IO   write_nodal_data() 1 0.0057 0.005673 0.0060 0.005965 0.42 0.44     FE   compute_shape_functions() 144 0.0058 0.000040 0.0058 0.000040 0.43 0.43   init_shape_functions() 31 0.0002 0.000007 0.0002 0.000007 0.02 0.02   inverse_map() 90 0.0009 0.000010 0.0009 0.000010 0.07 0.07     FEMap   compute_affine_map() 144 0.0025 0.000018 0.0025 0.000018 0.19 0.19   compute_face_map() 30 0.0009 0.000028 0.0019 0.000063 0.06 0.14   init_face_shape_functions() 30 0.0002 0.000007 0.0002 0.000007 0.02 0.02   init_reference_to_physical_map() 31 0.0009 0.000029 0.0009 0.000029 0.07 0.07     Mesh   find_neighbors() 2 0.0289 0.014471 0.0317 0.015828 2.12 2.32     MeshCommunication   (all)gather() 1 0.0438 0.043773 0.0580 0.057992 3.21 4.25   compute_hilbert_indices() 4 0.0049 0.001213 0.0049 0.001213 0.36 0.36   delete_remote_elements() 3 0.0104 0.003473 0.0119 0.003965 0.76 0.87   find_global_indices() 4 0.0106 0.002642 0.0217 0.005420 0.78 1.59   parallel_sort() 4 0.0042 0.001053 0.0054 0.001358 0.31 0.40     MeshOutput   write_equation_systems() 1 0.0132 0.013242 0.1180 0.117971 0.97 8.66     MeshTools::Generation   build_cube() 1 0.0141 0.014134 0.0141 0.014134 1.04 1.04     Parallel   allgather() 24 0.0013 0.000052 0.0018 0.000074 0.09 0.13   broadcast() 1 0.0000 0.000009 0.0000 0.000009 0.00 0.00   max(bool) 8310 0.0544 0.000007 0.0544 0.000007 3.99 3.99   max(scalar) 19698 0.1212 0.000006 0.1212 0.000006 8.90 8.90   max(vector) 3763 0.0506 0.000013 0.1410 0.000037 3.71 10.34   max(vector<bool>) 3 0.0007 0.000224 0.0008 0.000252 0.05 0.06   min(bool) 8158 0.0525 0.000006 0.0525 0.000006 3.85 3.85   min(scalar) 61241 0.3972 0.000006 0.3972 0.000006 29.14 29.14   min(vector) 3763 0.0518 0.000014 0.1450 0.000039 3.80 10.64   probe() 67 0.0016 0.000024 0.0016 0.000024 0.12 0.12   receive() 67 0.0006 0.000009 0.0023 0.000035 0.04 0.17   send() 67 0.0003 0.000004 0.0003 0.000004 0.02 0.02   send_receive() 98 0.0011 0.000012 0.0042 0.000043 0.08 0.31   sum() 52 0.0008 0.000016 0.0016 0.000030 0.06 0.11     Parallel::Request   wait() 70 0.0002 0.000003 0.0002 0.000003 0.01 0.01     ParallelMesh   renumber_nodes_and_elements() 2 0.0722 0.036124 0.2182 0.109118 5.30 16.01     ParmetisPartitioner   repartition() 1 0.1673 0.167309 0.1818 0.181825 12.28 13.34     Partitioner   set_node_processor_ids() 1 0.0097 0.009723 0.0265 0.026458 0.71 1.94   set_parent_processor_ids() 1 0.0007 0.000657 0.0007 0.000657 0.05 0.05     PetscLinearSolver   solve() 1 0.0069 0.006875 0.0069 0.006875 0.50 0.50     System   assemble() 1 0.0158 0.015771 0.0331 0.033131 1.16 2.43    Totals: 106669 1.3630 100.00   
From: Roy Stogner <roystgnr@ic...>  20150112 15:31:11

On Mon, 12 Jan 2015, Paul T. Bauman wrote: > I should've asked: any special compile options? I just did > enableeverything and withmetis=PETSc and ran introduction/ex4 > as is. Was this ParallelMesh? Yes, that's probably critical: enableeverything enableparmesh The failure seems to be within a ParallelMeshonly code path.  Roy 
From: Paul T. Bauman <ptbauman@gm...>  20150112 15:08:20

I should've asked: any special compile options? I just did enableeverything and withmetis=PETSc and ran introduction/ex4 as is. Was this ParallelMesh? On Mon, Jan 12, 2015 at 10:05 AM, Paul T. Bauman <ptbauman@...> wrote: > I pulled the laster master and ran on my workstation with no problem. > Perhaps a conflicting MPI setup? > > On Sat, Jan 10, 2015 at 12:32 AM, Paul T. Bauman <ptbauman@...> > wrote: > >> I've been using mpich 3.1.x for months. Mostly the 0.9.4 branch. I'll >> give master a swing this weekend. >> >> >> >> > On Jan 9, 2015, at 6:06 PM, Roy Stogner <roystgnr@...> >> wrote: >> > >> > >> > I'm seeing the most astonishing bug: >> > >> > In devel mode, in introduction_ex4 of all places, miscommunications >> > are causing parallel runs to hang in devel mode. Running through gdb >> > I can actually watch as one process sends "0" and another receives >> > "1". >> > >> > I haven't yet ruled out a bug on our part (this is in >> > send_receive_packed_range code that does asynchronous I/O; maybe there >> > was a "1" already in the send queue?) but the same code (as well as >> > all other examples) is fine in dbg and opt modes, as well as with >> > openmpi. (I'm building mvapich2 to try that now) >> > >> > I haven't got any idea how it could be a bug on their part either; in >> > particular I see the same bug with MPICH2 1.4.1, 1.5, and MPICH 3.1.3. >> > >> > Switching compilers from GCC 4.8 to Intel 13.1 doesn't help either. >> > >> > So now I'm starting to wonder if there's merely some new MPI header or >> > linker conflict on the system I'm trying all this on. Are there any >> > other mpich/mpich2 users out there who are using the libMesh git >> > master? >> >  >> > Roy >> > >> > >>  >> > Dive into the World of Parallel Programming! The Go Parallel Website, >> > sponsored by Intel and developed in partnership with Slashdot Media, is >> your >> > hub for all things parallel software development, from weekly thought >> > leadership blogs to news, videos, case studies, tutorials and more. >> Take a >> > look and join the conversation now. http://goparallel.sourceforge.net >> > _______________________________________________ >> > Libmeshdevel mailing list >> > Libmeshdevel@... >> > https://lists.sourceforge.net/lists/listinfo/libmeshdevel >> > > 
From: Paul T. Bauman <ptbauman@gm...>  20150112 15:05:35

I pulled the laster master and ran on my workstation with no problem. Perhaps a conflicting MPI setup? On Sat, Jan 10, 2015 at 12:32 AM, Paul T. Bauman <ptbauman@...> wrote: > I've been using mpich 3.1.x for months. Mostly the 0.9.4 branch. I'll give > master a swing this weekend. > > > > > On Jan 9, 2015, at 6:06 PM, Roy Stogner <roystgnr@...> > wrote: > > > > > > I'm seeing the most astonishing bug: > > > > In devel mode, in introduction_ex4 of all places, miscommunications > > are causing parallel runs to hang in devel mode. Running through gdb > > I can actually watch as one process sends "0" and another receives > > "1". > > > > I haven't yet ruled out a bug on our part (this is in > > send_receive_packed_range code that does asynchronous I/O; maybe there > > was a "1" already in the send queue?) but the same code (as well as > > all other examples) is fine in dbg and opt modes, as well as with > > openmpi. (I'm building mvapich2 to try that now) > > > > I haven't got any idea how it could be a bug on their part either; in > > particular I see the same bug with MPICH2 1.4.1, 1.5, and MPICH 3.1.3. > > > > Switching compilers from GCC 4.8 to Intel 13.1 doesn't help either. > > > > So now I'm starting to wonder if there's merely some new MPI header or > > linker conflict on the system I'm trying all this on. Are there any > > other mpich/mpich2 users out there who are using the libMesh git > > master? > >  > > Roy > > > > >  > > Dive into the World of Parallel Programming! The Go Parallel Website, > > sponsored by Intel and developed in partnership with Slashdot Media, is > your > > hub for all things parallel software development, from weekly thought > > leadership blogs to news, videos, case studies, tutorials and more. Take > a > > look and join the conversation now. http://goparallel.sourceforge.net > > _______________________________________________ > > Libmeshdevel mailing list > > Libmeshdevel@... > > https://lists.sourceforge.net/lists/listinfo/libmeshdevel > 
From: Paul T. Bauman <ptbauman@gm...>  20150110 05:32:14

I've been using mpich 3.1.x for months. Mostly the 0.9.4 branch. I'll give master a swing this weekend. > On Jan 9, 2015, at 6:06 PM, Roy Stogner <roystgnr@...> wrote: > > > I'm seeing the most astonishing bug: > > In devel mode, in introduction_ex4 of all places, miscommunications > are causing parallel runs to hang in devel mode. Running through gdb > I can actually watch as one process sends "0" and another receives > "1". > > I haven't yet ruled out a bug on our part (this is in > send_receive_packed_range code that does asynchronous I/O; maybe there > was a "1" already in the send queue?) but the same code (as well as > all other examples) is fine in dbg and opt modes, as well as with > openmpi. (I'm building mvapich2 to try that now) > > I haven't got any idea how it could be a bug on their part either; in > particular I see the same bug with MPICH2 1.4.1, 1.5, and MPICH 3.1.3. > > Switching compilers from GCC 4.8 to Intel 13.1 doesn't help either. > > So now I'm starting to wonder if there's merely some new MPI header or > linker conflict on the system I'm trying all this on. Are there any > other mpich/mpich2 users out there who are using the libMesh git > master? >  > Roy > >  > Dive into the World of Parallel Programming! The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net > _______________________________________________ > Libmeshdevel mailing list > Libmeshdevel@... > https://lists.sourceforge.net/lists/listinfo/libmeshdevel 
From: Roy Stogner <roystgnr@ic...>  20150110 00:26:14

I'm seeing the most astonishing bug: In devel mode, in introduction_ex4 of all places, miscommunications are causing parallel runs to hang in devel mode. Running through gdb I can actually watch as one process sends "0" and another receives "1". I haven't yet ruled out a bug on our part (this is in send_receive_packed_range code that does asynchronous I/O; maybe there was a "1" already in the send queue?) but the same code (as well as all other examples) is fine in dbg and opt modes, as well as with openmpi. (I'm building mvapich2 to try that now) I haven't got any idea how it could be a bug on their part either; in particular I see the same bug with MPICH2 1.4.1, 1.5, and MPICH 3.1.3. Switching compilers from GCC 4.8 to Intel 13.1 doesn't help either. So now I'm starting to wonder if there's merely some new MPI header or linker conflict on the system I'm trying all this on. Are there any other mpich/mpich2 users out there who are using the libMesh git master?  Roy 
From: Derek Gaston <friedmud@gm...>  20150109 23:16:38

I seriously feel like I'm going crazy here. I commented out that "if" statement in TreeNode::find_element() but it still isn't helping. What's happening is that I have a particular setup that is _unstable_... i.e. I can run the same code with the same inputs and every once in a while it fails. Also: This only happens in parallel (2 MPI). By every once in a while... I mean I have to run the code 45 _hundred_ times before it fails... Here's the deal: PointLocator (retrieved from the Mesh object) is NOT acting the same way every time. I put a print statement inside the "if (!allowed_subdomains..." statement in TreeNode::find_element(). I also printed out the point I'm searching for and whether or not the PL found the point (and what element it found). Here's what it looks like when it works: Point: (x,y,z)=( 0.3, 0.5, 0) searching: 0 searching: 1 searching: 2 searching: 3 searching: 4 searching: 5 searching: 6 searching: 7 searching: 8 searching: 9 searching: 10 searching: 11 searching: 12 searching: 13 searching: 14 searching: 15 searching: 16 searching: 17 searching: 18 searching: 19 searching: 20 searching: 21 searching: 22 searching: 23 searching: 24 searching: 25 searching: 26 searching: 27 searching: 28 searching: 29 searching: 30 searching: 31 searching: 32 searching: 33 searching: 34 searching: 35 searching: 36 searching: 37 searching: 38 searching: 39 searching: 40 searching: 41 searching: 42 found elem: 42 and when it fails it looks like this: Point: (x,y,z)=( 0.3, 0.5, 0) searching: 2 searching: 50 searching: 51 searching: 52 0 didn't find it! As you can see  it definitely took a different (weird) path down the "bins"... and then for some reason just didn't search anything else. The PL is not completely broken though because it is able to find other points in the same run like so: Point: (x,y,z)=( 0.8, 0.5, 0) searching: 2 searching: 50 searching: 51 searching: 52 searching: 53 searching: 54 searching: 55 searching: 56 searching: 9 searching: 10 searching: 1 searching: 4 searching: 21 searching: 22 searching: 23 searching: 24 searching: 14 searching: 15 searching: 16 searching: 3 searching: 17 searching: 18 searching: 19 searching: 20 searching: 5 searching: 6 searching: 7 searching: 8 searching: 33 searching: 34 searching: 35 searching: 36 searching: 25 searching: 26 searching: 27 searching: 28 searching: 29 searching: 30 searching: 31 searching: 32 searching: 37 searching: 38 searching: 39 searching: 40 searching: 41 searching: 42 searching: 43 searching: 44 found elem! 47 Anyone can run this same problem pretty easily (if you have MOOSE just update it and rebuild moose_test) You can see a failed run here: https://www.moosebuild.org/view_result/10982 You can run the test over and over in a loop like I am like so: while ./run_tests re=line_value_sampler.test p 2 ; do :; done That will stop once it fails. I've been able to get it to fail on both OSX and Linux with both GCC and Clang... so there is a real issue here... but I seriously can't figure it out... (Also: Valgrind doesn't show anything) Any ideas? Derek On Fri Jan 09 2015 at 3:26:37 PM Derek Gaston <friedmud@...> wrote: > I'm investigating an issue with PointLocators... and so I was digging > through the logic in TreeNode... and came across some weirdness. > > There appears to be some logic mismatches between TreeNode::find_element() > and TreeNode::find_element_in_children() > > I can see what the _intention_ was  but I don't think the code actually > does what the comments say it is doing. > > The problem is that find_element() duplicates some of the checks already > being done in find_element_in_children()... specifically the bounding box > checks. Because of this... even though the comments in > find_element_in_children() ultimately say that every element in the mesh is > searched... that is NOT true! > > Here's what's actually happening in find_elements_in_children(): > > 1. Active children who's bounding box contains the point are searched. > 2. If that fails then all active children are _tried_ >  HOWEVER: The bounding box check is _repeated_ in find_element()... > meaning that the elements won't be searched because the bounding box check > will fail (otherwise these children would have been searched in step 1) >  THEREFORE: Even though find_elements_in_children() _tries_ do do an > exhaustive search... it really just loops over all other Tree nodes and > bails out at the bounding box search in find_element()... NEVER actually > testing an element! > 3. Returns NULL > > What this means is that if the bounding boxes have floating point > tolerance issues (which they do)... you can end up with a situation where a > point "slips through the cracks" between the bounding box checks and the > Elem::contains_point() checks... > > My proposal to fix this? Well... I think the logic could be changed quite > a bit in find_elements_in_children()... it is trying to do too much (it > should just recurse instead of doing checks... but I understand that's it's > trying to speed things up by recursing favorably into children who's > bounding boxes contain the point first) > > BUT  a simple fix that doesn't change too much logic is simply to remove > the "if (this>bounds_point(p)  this>contains_ifems)" line from > find_element(). If you made it into find_element() and that node is active > then it means we really _do_ want to search the elements in that node... > regardless of what the bounding box says! > > Finally: I think that something that could speed things up and make things > more robust is to use a "fuzzy" bounding box. There's no reason why they > have to be so rigid. We should use floating point fuzzy comparisons to see > if we lie in the bounding boxes. The bounding boxes are simply there to > speed up getting down to a set of elements that's in the right area... and > having floating point tolerance issues keep us from traversing down into > the right set of elements doesn't make sense... > > Let me know if I've missed something... > > Derek > 
From: Derek Gaston <friedmud@gm...>  20150109 20:26:45

I'm investigating an issue with PointLocators... and so I was digging through the logic in TreeNode... and came across some weirdness. There appears to be some logic mismatches between TreeNode::find_element() and TreeNode::find_element_in_children() I can see what the _intention_ was  but I don't think the code actually does what the comments say it is doing. The problem is that find_element() duplicates some of the checks already being done in find_element_in_children()... specifically the bounding box checks. Because of this... even though the comments in find_element_in_children() ultimately say that every element in the mesh is searched... that is NOT true! Here's what's actually happening in find_elements_in_children(): 1. Active children who's bounding box contains the point are searched. 2. If that fails then all active children are _tried_  HOWEVER: The bounding box check is _repeated_ in find_element()... meaning that the elements won't be searched because the bounding box check will fail (otherwise these children would have been searched in step 1)  THEREFORE: Even though find_elements_in_children() _tries_ do do an exhaustive search... it really just loops over all other Tree nodes and bails out at the bounding box search in find_element()... NEVER actually testing an element! 3. Returns NULL What this means is that if the bounding boxes have floating point tolerance issues (which they do)... you can end up with a situation where a point "slips through the cracks" between the bounding box checks and the Elem::contains_point() checks... My proposal to fix this? Well... I think the logic could be changed quite a bit in find_elements_in_children()... it is trying to do too much (it should just recurse instead of doing checks... but I understand that's it's trying to speed things up by recursing favorably into children who's bounding boxes contain the point first) BUT  a simple fix that doesn't change too much logic is simply to remove the "if (this>bounds_point(p)  this>contains_ifems)" line from find_element(). If you made it into find_element() and that node is active then it means we really _do_ want to search the elements in that node... regardless of what the bounding box says! Finally: I think that something that could speed things up and make things more robust is to use a "fuzzy" bounding box. There's no reason why they have to be so rigid. We should use floating point fuzzy comparisons to see if we lie in the bounding boxes. The bounding boxes are simply there to speed up getting down to a set of elements that's in the right area... and having floating point tolerance issues keep us from traversing down into the right set of elements doesn't make sense... Let me know if I've missed something... Derek 
From: Roy Stogner <roystgnr@ic...>  20150109 14:20:44

On Thu, 8 Jan 2015, Jed Brown wrote: > Roy Stogner <roystgnr@...> writes: > >> By "neither seems to do much"? >> >> I mean that if I set either "ksp_pc_side right" or "ksp_norm_type >> unpreconditioned", then instead of ending up falsely "converged" at >> 10 KSP preconditioned resid norm 1.580464771307e13 true resid norm 3.084021824157e04 r(i)/b 2.711608086867e03 >> I end up at >> 10 KSP unpreconditioned resid norm 4.683505266751e14 true resid norm 3.820485023471e04 r(i)/b 3.359139032108e03 > > Interesting. Can you include the full unpreconditioned output? And > please compare with and without ksp_gmres_modifiedgramschmidt. >From "./mainopt ksp_monitor_true_residual ksp_norm_type unpreconditioned": 0 KSP unpreconditioned resid norm 1.137340546775e01 true resid norm 1.137340546775e01 r(i)/b 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.851221771429e02 true resid norm 1.850940946785e02 r(i)/b 1.627428963148e01 2 KSP unpreconditioned resid norm 8.058732139350e04 true resid norm 8.487837216519e04 r(i)/b 7.462881052281e03 3 KSP unpreconditioned resid norm 1.410067346760e04 true resid norm 3.750759001371e04 r(i)/b 3.297832836441e03 4 KSP unpreconditioned resid norm 3.036297998627e06 true resid norm 3.296862054294e04 r(i)/b 2.898746609924e03 5 KSP unpreconditioned resid norm 1.628004374290e07 true resid norm 4.048903454605e04 r(i)/b 3.559974596954e03 6 KSP unpreconditioned resid norm 6.839537794328e09 true resid norm 2.859920502301e04 r(i)/b 2.514568315013e03 7 KSP unpreconditioned resid norm 3.580827398197e10 true resid norm 3.424888565589e04 r(i)/b 3.011313168514e03 8 KSP unpreconditioned resid norm 1.371032175862e11 true resid norm 3.818671006476e04 r(i)/b 3.357544068312e03 9 KSP unpreconditioned resid norm 1.130911552385e12 true resid norm 2.475276237056e04 r(i)/b 2.176372102511e03 10 KSP unpreconditioned resid norm 4.683505266751e14 true resid norm 3.820485023471e04 r(i)/b 3.359139032108e03 And after adding "ksp_gmres_modifiedgramschmidt" too: 0 KSP unpreconditioned resid norm 1.137340546775e01 true resid norm 1.137340546775e01 r(i)/b 1.000000000000e+00 1 KSP unpreconditioned resid norm 1.851221771429e02 true resid norm 1.850940946785e02 r(i)/b 1.627428963148e01 2 KSP unpreconditioned resid norm 8.058732139350e04 true resid norm 8.290754235563e04 r(i)/b 7.289596998076e03 3 KSP unpreconditioned resid norm 1.409277416992e04 true resid norm 3.366737464418e04 r(i)/b 2.960184154134e03 4 KSP unpreconditioned resid norm 3.016318237424e06 true resid norm 3.904608063041e04 r(i)/b 3.433103720879e03 5 KSP unpreconditioned resid norm 1.684322222043e07 true resid norm 1.987083874964e04 r(i)/b 1.747131833643e03 6 KSP unpreconditioned resid norm 7.043704865544e09 true resid norm 3.021936047142e04 r(i)/b 2.657019531846e03 7 KSP unpreconditioned resid norm 3.538690849000e10 true resid norm 3.937474047247e04 r(i)/b 3.462000944582e03 8 KSP unpreconditioned resid norm 2.298664686412e11 true resid norm 2.516665836236e04 r(i)/b 2.212763664649e03 9 KSP unpreconditioned resid norm 8.918329988448e13 true resid norm 3.883206450876e04 r(i)/b 3.414286479003e03 10 KSP unpreconditioned resid norm 3.074816936477e14 true resid norm 2.987382118533e04 r(i)/b 2.626638192935e03 > How big is this matrix? Can you write it with "ksp_view_mat binary" > and send it to me (or post somewhere)? Not even 100x100; it's a verywellsimplified version of the real problem. http://users.ices.utexas.edu/~roystgnr/binaryoutput > Dammit, these lists of methods are always wrong. We give useful error > messages if it's not supported. I'll fix/delete these doc problems. Thanks! In my head this thread has now been upgraded to "useful bug report", a big step up from "wasting Jed's time".  Roy 
From: David Knezevic <david.knezevic@ak...>  20150108 23:03:42

That works well, thanks Barry, thanks Paul. I've made a PR for this: https://github.com/libMesh/libmesh/pull/426 David <http://www.akselos.com>; On Wed, Jan 7, 2015 at 11:55 PM, Barry Smith <bsmith@...> wrote: > > If you want to skip the new factorization you can simply call > KSPSetReusePreconditioner() appropriate in the #else case below > > Barry > > > On Jan 7, 2015, at 10:16 PM, David Knezevic <david.knezevic@...> > wrote: > > > > I notice that in PetscLinearSolver<T>::solve we have: > > > >  > > > > #if PETSC_RELEASE_LESS_THAN(3,5,0) > > ierr = KSPSetOperators(_ksp, submat, subprecond, > > this>same_preconditioner ? > SAME_PRECONDITIONER : DIFFERENT_NONZERO_PATTERN); > > #else > > ierr = KSPSetOperators(_ksp, submat, subprecond); > > #endif > > > >  > > > > I use the this>same_preconditioner flag with a direct solver to get > good performance for a sequence of solves with the same matrix and > different rhs (by reusing the LU factorization). But it looks like this > flag has no effect in PETSc 3.5? Does anyone know the right way to reuse > the LU factorization with PETSc 3.5? > > > > Thanks, > > David > > > > >  > > Dive into the World of Parallel Programming! The Go Parallel Website, > > sponsored by Intel and developed in partnership with Slashdot Media, is > your > > hub for all things parallel software development, from weekly thought > > leadership blogs to news, videos, case studies, tutorials and more. Take > a > > look and join the conversation now. > http://goparallel.sourceforge.net_______________________________________________ > > Libmeshdevel mailing list > > Libmeshdevel@... > > https://lists.sourceforge.net/lists/listinfo/libmeshdevel > > 
From: Jed Brown <jed@je...>  20150108 22:50:14

Roy Stogner <roystgnr@...> writes: > By "neither seems to do much"? > > I mean that if I set either "ksp_pc_side right" or "ksp_norm_type > unpreconditioned", then instead of ending up falsely "converged" at > 10 KSP preconditioned resid norm 1.580464771307e13 true resid norm 3.084021824157e04 r(i)/b 2.711608086867e03 > I end up at > 10 KSP unpreconditioned resid norm 4.683505266751e14 true resid norm 3.820485023471e04 r(i)/b 3.359139032108e03 Interesting. Can you include the full unpreconditioned output? And please compare with and without ksp_gmres_modifiedgramschmidt. >>> It's odd that incomplete LU would be singular but incomplete Cholesky >>> would work fine, too, isn't it? >> >> Where do you see ILU being singular while ICC is not? > > Right here, assuming the singular preconditioner theory is correct. > With the default (or with explicitly set) "pc_type ilu" I see the > above; with "pc_type icc" I end up at > 92 KSP preconditioned resid norm 1.322044991650e13 true resid norm 8.345374951602e14 r(i)/b 7.337621942051e13 > or > 93 KSP unpreconditioned resid norm 6.339459949689e14 true resid norm 6.339403586772e14 r(i)/b 5.573883393805e13 > depending on whether I set "ksp_norm_type unpreconditioneed" > >> What do you intend ICC to mean if the matrix is nonsymmetric? > > Mu. The matrix is symmetric. Loading it up in Octave I get > norm(K)=2.0000, norm(KK')=1.1395e16 How big is this matrix? Can you write it with "ksp_view_mat binary" and send it to me (or post somewhere)? >> What KSP are you using here? > > The default (GMRES, IIRC and based on the unchanged results when I set > it explicitly). I briefly tried BCGS when I got the > apparentlymistaken impression from > http://www.mcs.anl.gov/petsc/petsccurrent/docs/manualpages/KSP/KSPSetNormType.html > that ksp_norm_type wouldn't work with GMRES. Dammit, these lists of methods are always wrong. We give useful error messages if it's not supported. I'll fix/delete these doc problems. 
From: Roy Stogner <roystgnr@ic...>  20150108 22:39:13

On Thu, 8 Jan 2015, Jed Brown wrote: > Roy Stogner <roystgnr@...> writes: > >> On Thu, 8 Jan 2015, Jed Brown wrote: >> >>>> 10 KSP preconditioned resid norm 1.580464771307e13 true resid norm 3.084021824157e04 r(i)/b 2.711608086867e03 >>>> number of iterations to solve adjoint: 10 final residual of adjoint solve: 1.58046e13 >>> >>> This is usually caused by a singular preconditioner. >> >>> http://mid.mailarchive.com/87y6iyzl73.fsf@... >> >> Thanks! Is there a workaround that avoids false positives in such >> cases? I've tried (with PETSc 3.5.2) "ksp_pc_side right" and >> "ksp_type bcgs ksp_norm_type unpreconditioned", but neither seems to >> do much. > > What do you mean? By "neither seems to do much"? I mean that if I set either "ksp_pc_side right" or "ksp_norm_type unpreconditioned", then instead of ending up falsely "converged" at 10 KSP preconditioned resid norm 1.580464771307e13 true resid norm 3.084021824157e04 r(i)/b 2.711608086867e03 I end up at 10 KSP unpreconditioned resid norm 4.683505266751e14 true resid norm 3.820485023471e04 r(i)/b 3.359139032108e03 >> It's odd that incomplete LU would be singular but incomplete Cholesky >> would work fine, too, isn't it? > > Where do you see ILU being singular while ICC is not? Right here, assuming the singular preconditioner theory is correct. With the default (or with explicitly set) "pc_type ilu" I see the above; with "pc_type icc" I end up at 92 KSP preconditioned resid norm 1.322044991650e13 true resid norm 8.345374951602e14 r(i)/b 7.337621942051e13 or 93 KSP unpreconditioned resid norm 6.339459949689e14 true resid norm 6.339403586772e14 r(i)/b 5.573883393805e13 depending on whether I set "ksp_norm_type unpreconditioneed" > What do you intend ICC to mean if the matrix is nonsymmetric? Mu. The matrix is symmetric. Loading it up in Octave I get norm(K)=2.0000, norm(KK')=1.1395e16 > What KSP are you using here? The default (GMRES, IIRC and based on the unchanged results when I set it explicitly). I briefly tried BCGS when I got the apparentlymistaken impression from http://www.mcs.anl.gov/petsc/petsccurrent/docs/manualpages/KSP/KSPSetNormType.html that ksp_norm_type wouldn't work with GMRES.  Roy 
From: Jed Brown <jed@je...>  20150108 22:24:13

Roy Stogner <roystgnr@...> writes: > On Thu, 8 Jan 2015, Jed Brown wrote: > >>> 10 KSP preconditioned resid norm 1.580464771307e13 true resid norm 3.084021824157e04 r(i)/b 2.711608086867e03 >>> number of iterations to solve adjoint: 10 final residual of adjoint solve: 1.58046e13 >> >> This is usually caused by a singular preconditioner. > >> http://mid.mailarchive.com/87y6iyzl73.fsf@... > > Thanks! Is there a workaround that avoids false positives in such > cases? I've tried (with PETSc 3.5.2) "ksp_pc_side right" and > "ksp_type bcgs ksp_norm_type unpreconditioned", but neither seems to > do much. What do you mean? We use preconditioned norm by default mostly because some applications use penalty boundary conditions or poor scaling so that the unpreconditioned residual drops by many orders of magnitude without solving the problem. If you know your formulation is wellbehaved and instead don't trust your preconditioner, then you should use unpreconditioned norm. Unfortunately, it costs a lot more in general to compute both norms. > It's odd that incomplete LU would be singular but incomplete Cholesky > would work fine, too, isn't it? Where do you see ILU being singular while ICC is not? What do you intend ICC to mean if the matrix is nonsymmetric? >>> It still seems like we've got to be doing something wrong in >>> PetscLinearSolver::adjoint_solve(). I haven't tried this myself, but >>> I'm told that setting up the transpose linearized problem here and >>> using a straight solve() works fine. >> >> What solver configuration is being used? Can you show >> ksp_monitor_true_residual output for the original forward solve and the >> explicitly transposed solve? > > Okay, I take back the above. When I try this myself, doing a *linear* > forward solve from a zero initial guess, the behavior is nearly the > same as for the linear adjoint solve. (I am a little disturbed by > "nearly the same" for a symmetric matrix) Are you sure the matrix is exactly symmetric? What KSP are you using here? > Adjoint solve: > 0 KSP preconditioned resid norm 5.893300515271e01 true resid norm 1.137340546775e01 r(i)/b 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.323095860276e01 true resid norm 1.854948903571e02 r(i)/b 1.630952935627e01 > 2 KSP preconditioned resid norm 2.957976209084e03 true resid norm 9.076848423484e04 r(i)/b 7.980765698734e03 > 3 KSP preconditioned resid norm 4.630654713249e04 true resid norm 3.394455698647e04 r(i)/b 2.984555248884e03 > 4 KSP preconditioned resid norm 1.042927953583e05 true resid norm 3.082072165862e04 r(i)/b 2.709893861255e03 > 5 KSP preconditioned resid norm 7.161971654746e07 true resid norm 3.084038310662e04 r(i)/b 2.711622582529e03 > 6 KSP preconditioned resid norm 3.126650757674e08 true resid norm 3.084036219553e04 r(i)/b 2.711620743934e03 > 7 KSP preconditioned resid norm 1.102895578517e09 true resid norm 3.084021853448e04 r(i)/b 2.711608112620e03 > 8 KSP preconditioned resid norm 5.540930099957e11 true resid norm 3.084021840854e04 r(i)/b 2.711608101548e03 > 9 KSP preconditioned resid norm 2.937449992531e12 true resid norm 3.084021825944e04 r(i)/b 2.711608088437e03 > 10 KSP preconditioned resid norm 1.580464771307e13 true resid norm 3.084021824157e04 r(i)/b 2.711608086867e03 > > > Linear forward solve: > 0 KSP preconditioned resid norm 5.893175435351e01 true resid norm 1.137340546775e01 r(i)/b 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.323059758093e01 true resid norm 1.854739741057e02 r(i)/b 1.630769030715e01 > 2 KSP preconditioned resid norm 2.926513590091e03 true resid norm 9.072530459119e04 r(i)/b 7.976969153912e03 > 3 KSP preconditioned resid norm 4.631266360016e04 true resid norm 3.185176705725e04 r(i)/b 2.800547922745e03 > 4 KSP preconditioned resid norm 1.075300967442e05 true resid norm 2.824015720242e04 r(i)/b 2.482999246135e03 > 5 KSP preconditioned resid norm 5.982748177679e07 true resid norm 2.825507661757e04 r(i)/b 2.484311026956e03 > 6 KSP preconditioned resid norm 2.632996665496e08 true resid norm 2.825588496190e04 r(i)/b 2.484382100156e03 > 7 KSP preconditioned resid norm 2.222034839878e09 true resid norm 2.825586756678e04 r(i)/b 2.484380570700e03 > 8 KSP preconditioned resid norm 1.814426836084e10 true resid norm 2.825586967601e04 r(i)/b 2.484380756153e03 > 9 KSP preconditioned resid norm 2.844498948759e12 true resid norm 2.825586987199e04 r(i)/b 2.484380773384e03 > 10 KSP preconditioned resid norm 8.360209016600e14 true resid norm 2.825586987717e04 r(i)/b 2.484380773840e03 > > > *repeated* linear forward solves (using the solution of one as initial > iterate for the next) work, so perhaps NewtonSolver or PetscDiffSolver > was what was making up for the bad preconditioning. >  > Roy 
From: Roy Stogner <roystgnr@ic...>  20150108 22:14:54

On Thu, 8 Jan 2015, Jed Brown wrote: >> 10 KSP preconditioned resid norm 1.580464771307e13 true resid norm 3.084021824157e04 r(i)/b 2.711608086867e03 >> number of iterations to solve adjoint: 10 final residual of adjoint solve: 1.58046e13 > > This is usually caused by a singular preconditioner. > http://mid.mailarchive.com/87y6iyzl73.fsf@... Thanks! Is there a workaround that avoids false positives in such cases? I've tried (with PETSc 3.5.2) "ksp_pc_side right" and "ksp_type bcgs ksp_norm_type unpreconditioned", but neither seems to do much. It's odd that incomplete LU would be singular but incomplete Cholesky would work fine, too, isn't it? On the other hand, this is nothing like a diagonallydominant matrix, so I wouldn't be surprised to see either of those screwing up in interesting ways. >> It still seems like we've got to be doing something wrong in >> PetscLinearSolver::adjoint_solve(). I haven't tried this myself, but >> I'm told that setting up the transpose linearized problem here and >> using a straight solve() works fine. > > What solver configuration is being used? Can you show > ksp_monitor_true_residual output for the original forward solve and the > explicitly transposed solve? Okay, I take back the above. When I try this myself, doing a *linear* forward solve from a zero initial guess, the behavior is nearly the same as for the linear adjoint solve. (I am a little disturbed by "nearly the same" for a symmetric matrix) Adjoint solve: 0 KSP preconditioned resid norm 5.893300515271e01 true resid norm 1.137340546775e01 r(i)/b 1.000000000000e+00 1 KSP preconditioned resid norm 1.323095860276e01 true resid norm 1.854948903571e02 r(i)/b 1.630952935627e01 2 KSP preconditioned resid norm 2.957976209084e03 true resid norm 9.076848423484e04 r(i)/b 7.980765698734e03 3 KSP preconditioned resid norm 4.630654713249e04 true resid norm 3.394455698647e04 r(i)/b 2.984555248884e03 4 KSP preconditioned resid norm 1.042927953583e05 true resid norm 3.082072165862e04 r(i)/b 2.709893861255e03 5 KSP preconditioned resid norm 7.161971654746e07 true resid norm 3.084038310662e04 r(i)/b 2.711622582529e03 6 KSP preconditioned resid norm 3.126650757674e08 true resid norm 3.084036219553e04 r(i)/b 2.711620743934e03 7 KSP preconditioned resid norm 1.102895578517e09 true resid norm 3.084021853448e04 r(i)/b 2.711608112620e03 8 KSP preconditioned resid norm 5.540930099957e11 true resid norm 3.084021840854e04 r(i)/b 2.711608101548e03 9 KSP preconditioned resid norm 2.937449992531e12 true resid norm 3.084021825944e04 r(i)/b 2.711608088437e03 10 KSP preconditioned resid norm 1.580464771307e13 true resid norm 3.084021824157e04 r(i)/b 2.711608086867e03 Linear forward solve: 0 KSP preconditioned resid norm 5.893175435351e01 true resid norm 1.137340546775e01 r(i)/b 1.000000000000e+00 1 KSP preconditioned resid norm 1.323059758093e01 true resid norm 1.854739741057e02 r(i)/b 1.630769030715e01 2 KSP preconditioned resid norm 2.926513590091e03 true resid norm 9.072530459119e04 r(i)/b 7.976969153912e03 3 KSP preconditioned resid norm 4.631266360016e04 true resid norm 3.185176705725e04 r(i)/b 2.800547922745e03 4 KSP preconditioned resid norm 1.075300967442e05 true resid norm 2.824015720242e04 r(i)/b 2.482999246135e03 5 KSP preconditioned resid norm 5.982748177679e07 true resid norm 2.825507661757e04 r(i)/b 2.484311026956e03 6 KSP preconditioned resid norm 2.632996665496e08 true resid norm 2.825588496190e04 r(i)/b 2.484382100156e03 7 KSP preconditioned resid norm 2.222034839878e09 true resid norm 2.825586756678e04 r(i)/b 2.484380570700e03 8 KSP preconditioned resid norm 1.814426836084e10 true resid norm 2.825586967601e04 r(i)/b 2.484380756153e03 9 KSP preconditioned resid norm 2.844498948759e12 true resid norm 2.825586987199e04 r(i)/b 2.484380773384e03 10 KSP preconditioned resid norm 8.360209016600e14 true resid norm 2.825586987717e04 r(i)/b 2.484380773840e03 *repeated* linear forward solves (using the solution of one as initial iterate for the next) work, so perhaps NewtonSolver or PetscDiffSolver was what was making up for the bad preconditioning.  Roy 
From: Jed Brown <jed@je...>  20150108 22:09:34

John Peterson <jwpeterson@...> writes: > Are you definitely allowed to pass the same matrix for both "A" and > the preconditioner when you call KSPSolveTranspose()? Yes, at least with KSPs capable of transpose solves. Can you send ksp_view along with the output requested in my last email? 
From: John Peterson <jwpeterson@gm...>  20150108 21:54:43

On Thu, Jan 8, 2015 at 1:31 PM, Roy Stogner <roystgnr@...> wrote: > > Copying discussion from libmeshusers, both because it looks like a > library rather than a userlevel problem and because I'm hoping one > of our PETSc expert lurkers will chime in. > > Summary: running a libMesh adjoint_solve() on a particular coupled > multiphysics system on a small mesh claims to solve the adjoint system > down past 1e10 tolerance, but manually evaluating the residual > magnitude afterwards shows more like 1e4. Turning off > preconditioning eliminates the discrepancy. > > >  Forwarded message  > Date: Wed, 7 Jan 2015 17:06:16 0600 (CST) > From: Roy Stogner <roystgnr@...> > To: libmeshusers@... > Subject: Re: [Libmeshusers] Adjoint Solve > > > On Wed, 7 Jan 2015, Roy Stogner wrote: > >> Was using PETSc without any good solver packages built in, but >> "mat_view ::ascii_matlab" and octave were enough to catch the red >> herring; the matrix is fine. > > The matrix is fine, but the preconditioners aren't. If I run with > "pc_type jacobi" (how does this not NaN with your matrix with zeros > on the diagonal?) or just plain "pc_type none", then everything's > fine: > > > > ~ *~*~*~*~*~*~*~*~ adjoint solve start ~*~*~*~*~*~*~*~*~ > > number of iterations to solve adjoint: 87 > final residual of adjoint solve: 8.65357e14 > > ~ *~*~*~*~*~*~*~*~ adjoint solve end ~*~*~*~*~*~*~*~*~ > >  herp derp  > > > adjoint system residual (discrete L2): 8.65326e14 > adjoint system residual (L2, all): 2.62094e14 > adjoint system residual (L2, 0): 2.06412e14 > adjoint system residual (L2, 1): 1.61515e14 > > > > Are we doing something wrong with the preconditioner code in > PetscLinearSolver::adjoint_solve()? My first thought: maybe the Are you definitely allowed to pass the same matrix for both "A" and the preconditioner when you call KSPSolveTranspose()?  John 
From: Jed Brown <jed@je...>  20150108 21:40:11

Roy Stogner <roystgnr@...> writes: > On Thu, 8 Jan 2015, Jed Brown wrote: > >> Can you provide output with ksp_monitor_true_residual? Are you >> comparing a preconditioned residual to an unpreconditioned residual? > > Yes, and yes: > > 0 KSP preconditioned resid norm 5.893300515271e01 true resid norm 1.137340546775e01 r(i)/b 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.323095860276e01 true resid norm 1.854948903571e02 r(i)/b 1.630952935627e01 > 2 KSP preconditioned resid norm 2.957976209084e03 true resid norm 9.076848423484e04 r(i)/b 7.980765698734e03 > 3 KSP preconditioned resid norm 4.630654713249e04 true resid norm 3.394455698647e04 r(i)/b 2.984555248884e03 > 4 KSP preconditioned resid norm 1.042927953583e05 true resid norm 3.082072165862e04 r(i)/b 2.709893861255e03 > 5 KSP preconditioned resid norm 7.161971654746e07 true resid norm 3.084038310662e04 r(i)/b 2.711622582529e03 > 6 KSP preconditioned resid norm 3.126650757674e08 true resid norm 3.084036219553e04 r(i)/b 2.711620743934e03 > 7 KSP preconditioned resid norm 1.102895578517e09 true resid norm 3.084021853448e04 r(i)/b 2.711608112620e03 > 8 KSP preconditioned resid norm 5.540930099957e11 true resid norm 3.084021840854e04 r(i)/b 2.711608101548e03 > 9 KSP preconditioned resid norm 2.937449992531e12 true resid norm 3.084021825944e04 r(i)/b 2.711608088437e03 > 10 KSP preconditioned resid norm 1.580464771307e13 true resid norm 3.084021824157e04 r(i)/b 2.711608086867e03 > number of iterations to solve adjoint: 10 final residual of adjoint solve: 1.58046e13 This is usually caused by a singular preconditioner. http://mid.mailarchive.com/87y6iyzl73.fsf@... > It still seems like we've got to be doing something wrong in > PetscLinearSolver::adjoint_solve(). I haven't tried this myself, but > I'm told that setting up the transpose linearized problem here and > using a straight solve() works fine. What solver configuration is being used? Can you show ksp_monitor_true_residual output for the original forward solve and the explicitly transposed solve? >> Note that penalty boundary conditions render unpreconditioned residuals >> nearly useless. > > Agreed; fortunately she's using DirichletBoundary constraints. For > this tiny boiled down version of the problem the norm of the matrix is > only 32. >  > Roy 
From: Roy Stogner <roystgnr@ic...>  20150108 20:51:37

On Thu, 8 Jan 2015, Jed Brown wrote: > Can you provide output with ksp_monitor_true_residual? Are you > comparing a preconditioned residual to an unpreconditioned residual? Yes, and yes: 0 KSP preconditioned resid norm 5.893300515271e01 true resid norm 1.137340546775e01 r(i)/b 1.000000000000e+00 1 KSP preconditioned resid norm 1.323095860276e01 true resid norm 1.854948903571e02 r(i)/b 1.630952935627e01 2 KSP preconditioned resid norm 2.957976209084e03 true resid norm 9.076848423484e04 r(i)/b 7.980765698734e03 3 KSP preconditioned resid norm 4.630654713249e04 true resid norm 3.394455698647e04 r(i)/b 2.984555248884e03 4 KSP preconditioned resid norm 1.042927953583e05 true resid norm 3.082072165862e04 r(i)/b 2.709893861255e03 5 KSP preconditioned resid norm 7.161971654746e07 true resid norm 3.084038310662e04 r(i)/b 2.711622582529e03 6 KSP preconditioned resid norm 3.126650757674e08 true resid norm 3.084036219553e04 r(i)/b 2.711620743934e03 7 KSP preconditioned resid norm 1.102895578517e09 true resid norm 3.084021853448e04 r(i)/b 2.711608112620e03 8 KSP preconditioned resid norm 5.540930099957e11 true resid norm 3.084021840854e04 r(i)/b 2.711608101548e03 9 KSP preconditioned resid norm 2.937449992531e12 true resid norm 3.084021825944e04 r(i)/b 2.711608088437e03 10 KSP preconditioned resid norm 1.580464771307e13 true resid norm 3.084021824157e04 r(i)/b 2.711608086867e03 number of iterations to solve adjoint: 10 final residual of adjoint solve: 1.58046e13 It still seems like we've got to be doing something wrong in PetscLinearSolver::adjoint_solve(). I haven't tried this myself, but I'm told that setting up the transpose linearized problem here and using a straight solve() works fine. > Note that penalty boundary conditions render unpreconditioned residuals > nearly useless. Agreed; fortunately she's using DirichletBoundary constraints. For this tiny boiled down version of the problem the norm of the matrix is only 32.  Roy 
From: Jed Brown <jed@je...>  20150108 20:40:58

Roy Stogner <roystgnr@...> writes: > Copying discussion from libmeshusers, both because it looks like a > library rather than a userlevel problem and because I'm hoping one > of our PETSc expert lurkers will chime in. > > Summary: running a libMesh adjoint_solve() on a particular coupled > multiphysics system on a small mesh claims to solve the adjoint system > down past 1e10 tolerance, but manually evaluating the residual > magnitude afterwards shows more like 1e4. Turning off > preconditioning eliminates the discrepancy. Can you provide output with ksp_monitor_true_residual? Are you comparing a preconditioned residual to an unpreconditioned residual? Note that penalty boundary conditions render unpreconditioned residuals nearly useless. >  Forwarded message  > Date: Wed, 7 Jan 2015 17:06:16 0600 (CST) > From: Roy Stogner <roystgnr@...> > To: libmeshusers@... > Subject: Re: [Libmeshusers] Adjoint Solve > > > On Wed, 7 Jan 2015, Roy Stogner wrote: > >> Was using PETSc without any good solver packages built in, but >> "mat_view ::ascii_matlab" and octave were enough to catch the red >> herring; the matrix is fine. > > The matrix is fine, but the preconditioners aren't. If I run with > "pc_type jacobi" (how does this not NaN with your matrix with zeros > on the diagonal?) or just plain "pc_type none", then everything's > fine: > > > > ~ *~*~*~*~*~*~*~*~ adjoint solve start ~*~*~*~*~*~*~*~*~ > > number of iterations to solve adjoint: 87 > final residual of adjoint solve: 8.65357e14 > > ~ *~*~*~*~*~*~*~*~ adjoint solve end ~*~*~*~*~*~*~*~*~ > >  herp derp  > > > adjoint system residual (discrete L2): 8.65326e14 > adjoint system residual (L2, all): 2.62094e14 > adjoint system residual (L2, 0): 2.06412e14 > adjoint system residual (L2, 1): 1.61515e14 > > > > Are we doing something wrong with the preconditioner code in > PetscLinearSolver::adjoint_solve()? My first thought: maybe the > preconditioner isn't getting transposed properly, so asymmetric > preconditioners like ILU show inconsistent behavior? You've boiled > down the problem to a symmetric matrix, but IIRC ILU application still > won't be symmetric. If we *tell* PETSc to precondition for a > symmetric matrix, with "pc_type icc", then everything still works > fine... >  > Roy > >  > Dive into the World of Parallel Programming! The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net > _______________________________________________ > Libmeshdevel mailing list > Libmeshdevel@... > https://lists.sourceforge.net/lists/listinfo/libmeshdevel 
From: Roy Stogner <roystgnr@ic...>  20150108 20:31:13

Copying discussion from libmeshusers, both because it looks like a library rather than a userlevel problem and because I'm hoping one of our PETSc expert lurkers will chime in. Summary: running a libMesh adjoint_solve() on a particular coupled multiphysics system on a small mesh claims to solve the adjoint system down past 1e10 tolerance, but manually evaluating the residual magnitude afterwards shows more like 1e4. Turning off preconditioning eliminates the discrepancy.  Forwarded message  Date: Wed, 7 Jan 2015 17:06:16 0600 (CST) From: Roy Stogner <roystgnr@...> To: libmeshusers@... Subject: Re: [Libmeshusers] Adjoint Solve On Wed, 7 Jan 2015, Roy Stogner wrote: > Was using PETSc without any good solver packages built in, but > "mat_view ::ascii_matlab" and octave were enough to catch the red > herring; the matrix is fine. The matrix is fine, but the preconditioners aren't. If I run with "pc_type jacobi" (how does this not NaN with your matrix with zeros on the diagonal?) or just plain "pc_type none", then everything's fine: ~ *~*~*~*~*~*~*~*~ adjoint solve start ~*~*~*~*~*~*~*~*~ number of iterations to solve adjoint: 87 final residual of adjoint solve: 8.65357e14 ~ *~*~*~*~*~*~*~*~ adjoint solve end ~*~*~*~*~*~*~*~*~  herp derp  adjoint system residual (discrete L2): 8.65326e14 adjoint system residual (L2, all): 2.62094e14 adjoint system residual (L2, 0): 2.06412e14 adjoint system residual (L2, 1): 1.61515e14 Are we doing something wrong with the preconditioner code in PetscLinearSolver::adjoint_solve()? My first thought: maybe the preconditioner isn't getting transposed properly, so asymmetric preconditioners like ILU show inconsistent behavior? You've boiled down the problem to a symmetric matrix, but IIRC ILU application still won't be symmetric. If we *tell* PETSc to precondition for a symmetric matrix, with "pc_type icc", then everything still works fine...  Roy 
From: Barry Smith <bsmith@mc...>  20150108 04:55:27

If you want to skip the new factorization you can simply call KSPSetReusePreconditioner() appropriate in the #else case below Barry > On Jan 7, 2015, at 10:16 PM, David Knezevic <david.knezevic@...> wrote: > > I notice that in PetscLinearSolver<T>::solve we have: > >  > > #if PETSC_RELEASE_LESS_THAN(3,5,0) > ierr = KSPSetOperators(_ksp, submat, subprecond, > this>same_preconditioner ? SAME_PRECONDITIONER : DIFFERENT_NONZERO_PATTERN); > #else > ierr = KSPSetOperators(_ksp, submat, subprecond); > #endif > >  > > I use the this>same_preconditioner flag with a direct solver to get good performance for a sequence of solves with the same matrix and different rhs (by reusing the LU factorization). But it looks like this flag has no effect in PETSc 3.5? Does anyone know the right way to reuse the LU factorization with PETSc 3.5? > > Thanks, > David > >  > Dive into the World of Parallel Programming! The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net_______________________________________________ > Libmeshdevel mailing list > Libmeshdevel@... > https://lists.sourceforge.net/lists/listinfo/libmeshdevel 