From: Roy Stogner <roystgnr@ic...>  20111215 16:13:21

On Thu, 15 Dec 2011, John Peterson wrote: > On Thu, Dec 15, 2011 at 7:35 AM, Roy Stogner <roystgnr@...> wrote: >> >> but I've no idea what causes the difference. > > Hmm... just a thought: the RB stuff uses some random number generation stuff. > > Perhaps this could explain different greedy parameter selection order > on different systems, but not outright failure? That could explain why we're triggering failure in some cases but not in others, though.  Roy 
From: Roy Stogner <roystgnr@ic...>  20111215 03:38:52

We're hitting this with my plain standard enableeverything build, too: ...  Basis dimension: 19  Performing RB solves on training set Maximum (absolute) error bound is 0.199127 Performing truth solve at parameter: mu[0] = 0.5 mu[1] = 1 Enriching the RB space Updating RB matrices  Basis dimension: 20  Performing RB solves on training set Maximum (absolute) error bound is 0.198364 Maximum number of basis functions reached: Nmax = 20. Perform one more Greedy iteration for error bounds. Performing truth solve at parameter: mu[0] = 1 mu[1] = 0.5 Enriching the RB space Updating RB matrices  Basis dimension: 20  Performing RB solves on training set Maximum (absolute) error bound is 0.198364 Extra Greedy iteration finished. In RBEvaluation::write_offline_data_to_files, directory eim_data already exists, overwriting contents. Assertion `theta_q_f[i] != NULL' failed. [0] src/reduced_basis/rb_theta_expansion.C, line 89, compiled Dec 14 2011 at 20:58:16 terminate called after throwing an instance of 'libMesh::LogicError' what(): Error in libMesh internal logic make[2]: *** [run] Aborted make[2]: Leaving directory `/workspace/buildbot/slave/libmeshtrunk/build/examples/reduced_basis/reduced_basis_ex4' 
From: David Knezevic <dknezevic@se...>  20111215 13:37:02

hmm, I can't seem to reproduce this error...? On 12/14/2011 10:38 PM, Roy Stogner wrote: > > We're hitting this with my plain standard enableeverything build, > too: > > ... > >  Basis dimension: 19  > Performing RB solves on training set > Maximum (absolute) error bound is 0.199127 > > Performing truth solve at parameter: > mu[0] = 0.5 > mu[1] = 1 > > Enriching the RB space > Updating RB matrices > >  Basis dimension: 20  > Performing RB solves on training set > Maximum (absolute) error bound is 0.198364 > > Maximum number of basis functions reached: Nmax = 20. > Perform one more Greedy iteration for error bounds. > Performing truth solve at parameter: > mu[0] = 1 > mu[1] = 0.5 > > Enriching the RB space > Updating RB matrices > >  Basis dimension: 20  > Performing RB solves on training set > Maximum (absolute) error bound is 0.198364 > > Extra Greedy iteration finished. > In RBEvaluation::write_offline_data_to_files, directory eim_data > already exists, overwriting contents. > Assertion `theta_q_f[i] != NULL' failed. > [0] src/reduced_basis/rb_theta_expansion.C, line 89, compiled Dec 14 > 2011 at 20:58:16 > terminate called after throwing an instance of 'libMesh::LogicError' > what(): Error in libMesh internal logic > make[2]: *** [run] Aborted > make[2]: Leaving directory > `/workspace/buildbot/slave/libmeshtrunk/build/examples/reduced_basis/reduced_basis_ex4' > 
From: Roy Stogner <roystgnr@ic...>  20111215 14:36:10

I can't seem to reproduce it easily, myself! BuildBot is showing a failure every time with my default build: loadmodules intel tbb mpich2/1.2.1 mklpecos petsc slepc trilinos glpk vtk &&./configure enableeverything but it's showing success with literally every other build I've configured it to try. They're all being run with "LIBMESH_RUN='mpirun np 2'", even. The failing build and the success build seem to differ starting early: FAILURE:  Basis dimension: 5  Performing RB solves on training set Maximum (absolute) error bound is 0.946406 Performing truth solve at parameter: mu[0] = 0 mu[1] = 1 SUCCESS:  Basis dimension: 5  Performing RB solves on training set Maximum (absolute) error bound is 0.946406 Performing truth solve at parameter: mu[0] = 1 mu[1] = 0 but I've no idea what causes the difference.  Roy On Thu, 15 Dec 2011, David Knezevic wrote: > hmm, I can't seem to reproduce this error...? > > > > On 12/14/2011 10:38 PM, Roy Stogner wrote: >> >> We're hitting this with my plain standard enableeverything build, >> too: >> >> ... >> >>  Basis dimension: 19  >> Performing RB solves on training set >> Maximum (absolute) error bound is 0.199127 >> >> Performing truth solve at parameter: >> mu[0] = 0.5 >> mu[1] = 1 >> >> Enriching the RB space >> Updating RB matrices >> >>  Basis dimension: 20  >> Performing RB solves on training set >> Maximum (absolute) error bound is 0.198364 >> >> Maximum number of basis functions reached: Nmax = 20. >> Perform one more Greedy iteration for error bounds. >> Performing truth solve at parameter: >> mu[0] = 1 >> mu[1] = 0.5 >> >> Enriching the RB space >> Updating RB matrices >> >>  Basis dimension: 20  >> Performing RB solves on training set >> Maximum (absolute) error bound is 0.198364 >> >> Extra Greedy iteration finished. >> In RBEvaluation::write_offline_data_to_files, directory eim_data >> already exists, overwriting contents. >> Assertion `theta_q_f[i] != NULL' failed. >> [0] src/reduced_basis/rb_theta_expansion.C, line 89, compiled Dec 14 >> 2011 at 20:58:16 >> terminate called after throwing an instance of 'libMesh::LogicError' >> what(): Error in libMesh internal logic >> make[2]: *** [run] Aborted >> make[2]: Leaving directory >> `/workspace/buildbot/slave/libmeshtrunk/build/examples/reduced_basis/reduced_basis_ex4' > > 
From: John Peterson <jwpeterson@gm...>  20111215 15:35:37

On Thu, Dec 15, 2011 at 7:35 AM, Roy Stogner <roystgnr@...> wrote: > > but I've no idea what causes the difference. Hmm... just a thought: the RB stuff uses some random number generation stuff. Perhaps this could explain different greedy parameter selection order on different systems, but not outright failure?  John 
From: David Knezevic <dknezevic@se...>  20111215 15:57:23

John's right that there are random numbers in general, but in that example the training sets are not randomly generated. But the way the algorithm chooses the next parameter is by finding the one with the maximum error bound, and as you can see this problem has the same error bound at two different parameter values, so rounding error would determine which one you end up with. So I think the "early" difference is not surprising. I don't see where the NULL pointer is coming from though in the failure case though... On 12/15/2011 09:35 AM, Roy Stogner wrote: > > I can't seem to reproduce it easily, myself! BuildBot is showing a > failure every time with my default build: > > loadmodules intel tbb mpich2/1.2.1 mklpecos petsc slepc trilinos glpk > vtk &&./configure enableeverything > > but it's showing success with literally every other build I've > configured it to try. They're all being run with "LIBMESH_RUN='mpirun > np 2'", even. > > The failing build and the success build seem to differ starting early: > > FAILURE: > >  Basis dimension: 5  > Performing RB solves on training set > Maximum (absolute) error bound is 0.946406 > > Performing truth solve at parameter: > mu[0] = 0 > mu[1] = 1 > > SUCCESS: > >  Basis dimension: 5  > Performing RB solves on training set > Maximum (absolute) error bound is 0.946406 > > Performing truth solve at parameter: > mu[0] = 1 > mu[1] = 0 > > but I've no idea what causes the difference. >  > Roy > > On Thu, 15 Dec 2011, David Knezevic wrote: > >> hmm, I can't seem to reproduce this error...? >> >> >> >> On 12/14/2011 10:38 PM, Roy Stogner wrote: >>> >>> We're hitting this with my plain standard enableeverything build, >>> too: >>> >>> ... >>> >>>  Basis dimension: 19  >>> Performing RB solves on training set >>> Maximum (absolute) error bound is 0.199127 >>> >>> Performing truth solve at parameter: >>> mu[0] = 0.5 >>> mu[1] = 1 >>> >>> Enriching the RB space >>> Updating RB matrices >>> >>>  Basis dimension: 20  >>> Performing RB solves on training set >>> Maximum (absolute) error bound is 0.198364 >>> >>> Maximum number of basis functions reached: Nmax = 20. >>> Perform one more Greedy iteration for error bounds. >>> Performing truth solve at parameter: >>> mu[0] = 1 >>> mu[1] = 0.5 >>> >>> Enriching the RB space >>> Updating RB matrices >>> >>>  Basis dimension: 20  >>> Performing RB solves on training set >>> Maximum (absolute) error bound is 0.198364 >>> >>> Extra Greedy iteration finished. >>> In RBEvaluation::write_offline_data_to_files, directory eim_data >>> already exists, overwriting contents. >>> Assertion `theta_q_f[i] != NULL' failed. >>> [0] src/reduced_basis/rb_theta_expansion.C, line 89, compiled Dec 14 >>> 2011 at 20:58:16 >>> terminate called after throwing an instance of 'libMesh::LogicError' >>> what(): Error in libMesh internal logic >>> make[2]: *** [run] Aborted >>> make[2]: Leaving directory >>> `/workspace/buildbot/slave/libmeshtrunk/build/examples/reduced_basis/reduced_basis_ex4' >> >> >> 
From: Roy Stogner <roystgnr@ic...>  20111215 16:13:21

On Thu, 15 Dec 2011, John Peterson wrote: > On Thu, Dec 15, 2011 at 7:35 AM, Roy Stogner <roystgnr@...> wrote: >> >> but I've no idea what causes the difference. > > Hmm... just a thought: the RB stuff uses some random number generation stuff. > > Perhaps this could explain different greedy parameter selection order > on different systems, but not outright failure? That could explain why we're triggering failure in some cases but not in others, though.  Roy 
From: Cody Permann <codypermann@gm...>  20111215 16:51:51

Random number generation has caused us many issues in MOOSE as well. Long ago we bundled in a free platform independent random number generator which resolved all of these issues. I don't know if we'd want to go those extremes in the libMesh library but it has worked well for us. Cody Sent from my evil iPhone On Dec 15, 2011, at 9:13 AM, Roy Stogner <roystgnr@...> wrote: > > > On Thu, 15 Dec 2011, John Peterson wrote: > >> On Thu, Dec 15, 2011 at 7:35 AM, Roy Stogner <roystgnr@...> wrote: >>> >>> but I've no idea what causes the difference. >> >> Hmm... just a thought: the RB stuff uses some random number generation stuff. >> >> Perhaps this could explain different greedy parameter selection order >> on different systems, but not outright failure? > > That could explain why we're triggering failure in some cases but not > in others, though. >  > Roy > >  > 10 Tips for Better Server Consolidation > Server virtualization is being driven by many needs. > But none more important than the need to reduce IT complexity > while improving strategic productivity. Learn More! > http://www.accelacomm.com/jaw/sdnl/114/51507609/ > _______________________________________________ > Libmeshdevel mailing list > Libmeshdevel@... > https://lists.sourceforge.net/lists/listinfo/libmeshdevel 
From: Roy Stogner <roystgnr@ic...>  20120125 20:59:36

This bug has gone away, and I reluctantly have to ask: did it actually get identified and fixed, or did it just randomly stop manifesting when other changes were made? ;)  Roy > On 12/15/2011 09:35 AM, Roy Stogner wrote: >> >> I can't seem to reproduce it easily, myself! BuildBot is showing a >> failure every time with my default build: >> >> loadmodules intel tbb mpich2/1.2.1 mklpecos petsc slepc trilinos glpk vtk >> &&./configure enableeverything >> >> but it's showing success with literally every other build I've >> configured it to try. They're all being run with "LIBMESH_RUN='mpirun np >> 2'", even. >> >> The failing build and the success build seem to differ starting early: >> >> FAILURE: >> >>  Basis dimension: 5  >> Performing RB solves on training set >> Maximum (absolute) error bound is 0.946406 >> >> Performing truth solve at parameter: >> mu[0] = 0 >> mu[1] = 1 >> >> SUCCESS: >> >>  Basis dimension: 5  >> Performing RB solves on training set >> Maximum (absolute) error bound is 0.946406 >> >> Performing truth solve at parameter: >> mu[0] = 1 >> mu[1] = 0 >> >> but I've no idea what causes the difference. 
From: David Knezevic <dknezevic@se...>  20120125 21:16:13

OK, interesting. It wasn't identified and fixed as far as I know (I wasn't able to reproduce the bug on my system)... sounds like it randomly stopped manifesting! Dave On 01/25/2012 03:59 PM, Roy Stogner wrote: > > This bug has gone away, and I reluctantly have to ask: did it actually > get identified and fixed, or did it just randomly stop manifesting > when other changes were made? ;) >  > Roy > >> On 12/15/2011 09:35 AM, Roy Stogner wrote: >>> >>> I can't seem to reproduce it easily, myself! BuildBot is showing a >>> failure every time with my default build: >>> >>> loadmodules intel tbb mpich2/1.2.1 mklpecos petsc slepc trilinos >>> glpk vtk &&./configure enableeverything >>> >>> but it's showing success with literally every other build I've >>> configured it to try. They're all being run with >>> "LIBMESH_RUN='mpirun np 2'", even. >>> >>> The failing build and the success build seem to differ starting early: >>> >>> FAILURE: >>> >>>  Basis dimension: 5  >>> Performing RB solves on training set >>> Maximum (absolute) error bound is 0.946406 >>> >>> Performing truth solve at parameter: >>> mu[0] = 0 >>> mu[1] = 1 >>> >>> SUCCESS: >>> >>>  Basis dimension: 5  >>> Performing RB solves on training set >>> Maximum (absolute) error bound is 0.946406 >>> >>> Performing truth solve at parameter: >>> mu[0] = 1 >>> mu[1] = 0 >>> >>> but I've no idea what causes the difference. 