From: Cody P. <cod...@gm...> - 2014-07-07 17:59:26
|
Devs, We updated the MOOSE users to a new revision of libMesh last week and picked up a Valgrind error in the process. John spent a good portion of the day on Thursday just trying to reproduce it on one of our development Linux boxes without success. Towards the end of the day he finally logged into one of the build boxes (hpcbuild5) where we are actually seeing the error and he was finally able to replicate the error on that box with his own build of libMesh+MOOSE. What differences exist between the two boxes is still under investigation and I hope to be able to reproduce the error on the rest of our boxes. I took his progress one step further by reproducing the error using pure libMesh. As of now, I can grab a fresh copy of libMesh, configure and build with the normal set of options that we use for MOOSE and reproduce the error with "adaptivity example 5". I'll paste the error here so you can see it in case anyone might know what has changed in this code without we pulling up my sleeves. I'll continue to dig for more answers. In fact, I just noticed that we are using the default-comm-world option still on our build boxes which isn't right. Thanks, Cody Here's my configure from config.log which admittedly is a huge mess: $ ../configure --with-methods=opt oprof dbg --prefix=/home/moosetest/permcj/moose/scripts/../libmesh/installed --enable-silent-rules --enable-openmp --enable-shared --disable-static --enable-unique-id --disable-warnings --enable-default-comm-world --disable-cxx11 CXX=mpicxx CC=mpicc FC=mpif90 F77=mpif77 METHODS=opt oprof dbg PETSC_DIR=/opt/moose/petsc/mpich_petsc-3.4.3/gcc-opt --disable-netcdf-4 --disable-testsets CXX=mpicxx CC=mpicc F77=mpif77 FC=mpif90 CPPFLAGS= LIBS= --no-create --no-recursion Valgrind Error: [moosetest][~/permcj/moose/libmesh/installed/examples/adaptivity/ex5]> valgrind --leak-check=full --track-origins=yes ./example-opt -init_timestep 0 -n_timesteps 5 ==28408== Memcheck, a memory error detector ==28408== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. ==28408== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info ==28408== Command: ./example-opt -init_timestep 0 -n_timesteps 5 ==28408== Usage: ./example-opt -init_timestep 0 OR ./example-opt -read_solution -init_timestep 26 Running: ./example-opt -init_timestep 0 -n_timesteps 5 Mesh Information: mesh_dimension()=2 spatial_dimension()=3 n_nodes()=4225 n_local_nodes()=4225 n_elem()=5460 n_local_elem()=5460 n_active_elem()=4096 n_subdomains()=1 n_partitions()=1 n_processors()=1 n_threads()=1 processor_id()=0 ==28408== Conditional jump or move depends on uninitialised value(s) ==28408== at 0x72B1D30: rml::internal::isLargeObject(void*) (frontend.cpp:2163) ==28408== by 0x72B2A14: scalable_free (frontend.cpp:2564) ==28408== by 0x55FECD7: std::_Rb_tree<unsigned int, std::pair<unsigned int const, double>, std::_Select1st<std::pair<unsigned int const, double> >, std::less<unsigned int>, libMesh::Threads::scalable_allocator<std::pair<unsigned int const, double> > >::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<unsigned int const, double> >) (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x55FED58: std::_Rb_tree<unsigned int, std::pair<unsigned int const, double>, std::_Select1st<std::pair<unsigned int const, double> >, std::less<unsigned int>, libMesh::Threads::scalable_allocator<std::pair<unsigned int const, double> > >::erase(unsigned int const&) (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x55F6F51: libMesh::DofMap::process_constraints(libMesh::MeshBase&) (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x5B859AF: libMesh::System::init_data() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x5B6B8F2: libMesh::ImplicitSystem::init_data() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x5B734D8: libMesh::LinearImplicitSystem::init_data() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x5BBF794: libMesh::TransientSystem<libMesh::LinearImplicitSystem>::init_data() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x5B85C38: libMesh::System::init() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x5B493ED: libMesh::EquationSystems::init() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x40DB8D: main (in /home/moosetest/permcj/moose/libmesh/installed/examples/adaptivity/ex5/example-opt) ==28408== Uninitialised value was created by a stack allocation ==28408== at 0x5707D00: libMesh::FEGenericBase<double>::compute_periodic_constraints(libMesh::DofConstraints&, libMesh::DofMap&, libMesh::PeriodicBoundaries const&, libMesh::MeshBase const&, libMesh::PointLocatorBase const*, unsigned int, libMesh::Elem const*) (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== ==28408== Conditional jump or move depends on uninitialised value(s) ==28408== at 0x72B1D30: rml::internal::isLargeObject(void*) (frontend.cpp:2163) ==28408== by 0x72B2A14: scalable_free (frontend.cpp:2564) ==28408== by 0x56187AA: libMesh::SparsityPattern::Build::~Build() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x55CB575: libMesh::DofMap::compute_sparsity(libMesh::MeshBase const&) (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x5B68824: libMesh::ImplicitSystem::init_matrices() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x5B734D8: libMesh::LinearImplicitSystem::init_data() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x5BBF794: libMesh::TransientSystem<libMesh::LinearImplicitSystem>::init_data() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x5B85C38: libMesh::System::init() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x5B493ED: libMesh::EquationSystems::init() (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== by 0x40DB8D: main (in /home/moosetest/permcj/moose/libmesh/installed/examples/adaptivity/ex5/example-opt) ==28408== Uninitialised value was created by a stack allocation ==28408== at 0x5707D00: libMesh::FEGenericBase<double>::compute_periodic_constraints(libMesh::DofConstraints&, libMesh::DofMap&, libMesh::PeriodicBoundaries const&, libMesh::MeshBase const&, libMesh::PointLocatorBase const*, unsigned int, libMesh::Elem const*) (in /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) ==28408== Initial H1 norm = 1.74175 |
From: Cody P. <cod...@gm...> - 2014-07-07 19:41:30
|
UPDATE: I cleaned up my configure, the error still exists on at least two of our build boxes in opt, oprof and dbg modes. (that's a good sign - repeatable). Jason, is going to rebuild one of the build boxes itself with a clean software stack and try once more, we are still not able to make it fail on our developer test box (called "rod"). I am unable to see any bugs in the compute_periodic_constraints method so this is still open: ../configure --with-methods=opt oprof dbg --prefix=/home/moosetest/permcj/moose/scripts/../libmesh/installed --enable-silent-rules --enable-unique-id --disable-warnings --disable-cxx11 --enable-openmp --enable-shared --disable-static On Mon, Jul 7, 2014 at 11:59 AM, Cody Permann <cod...@gm...> wrote: > Devs, > > We updated the MOOSE users to a new revision of libMesh last week and > picked up a Valgrind error in the process. John spent a good portion of the > day on Thursday just trying to reproduce it on one of our development Linux > boxes without success. Towards the end of the day he finally logged into > one of the build boxes (hpcbuild5) where we are actually seeing the error > and he was finally able to replicate the error on that box with his own > build of libMesh+MOOSE. What differences exist between the two boxes is > still under investigation and I hope to be able to reproduce the error on > the rest of our boxes. I took his progress one step further by reproducing > the error using pure libMesh. > > As of now, I can grab a fresh copy of libMesh, configure and build with > the normal set of options that we use for MOOSE and reproduce the error > with "adaptivity example 5". I'll paste the error here so you can see it > in case anyone might know what has changed in this code without we pulling > up my sleeves. I'll continue to dig for more answers. In fact, I just > noticed that we are using the default-comm-world option still on our build > boxes which isn't right. > > Thanks, > Cody > > Here's my configure from config.log which admittedly is a huge mess: > > $ ../configure --with-methods=opt oprof dbg > --prefix=/home/moosetest/permcj/moose/scripts/../libmesh/installed > --enable-silent-rules --enable-openmp --enable-shared --disable-static > --enable-unique-id --disable-warnings --enable-default-comm-world > --disable-cxx11 CXX=mpicxx CC=mpicc FC=mpif90 F77=mpif77 METHODS=opt oprof > dbg PETSC_DIR=/opt/moose/petsc/mpich_petsc-3.4.3/gcc-opt --disable-netcdf-4 > --disable-testsets CXX=mpicxx CC=mpicc F77=mpif77 FC=mpif90 CPPFLAGS= > LIBS= --no-create --no-recursion > > Valgrind Error: > [moosetest][~/permcj/moose/libmesh/installed/examples/adaptivity/ex5]> > valgrind --leak-check=full --track-origins=yes ./example-opt > -init_timestep 0 -n_timesteps 5 > > ==28408== Memcheck, a memory error detector > > ==28408== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. > > ==28408== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info > > ==28408== Command: ./example-opt -init_timestep 0 -n_timesteps 5 > > ==28408== > > Usage: > > ./example-opt -init_timestep 0 > > OR > > ./example-opt -read_solution -init_timestep 26 > > > Running: ./example-opt -init_timestep 0 -n_timesteps 5 > > > Mesh Information: > > mesh_dimension()=2 > > spatial_dimension()=3 > > n_nodes()=4225 > > n_local_nodes()=4225 > > n_elem()=5460 > > n_local_elem()=5460 > > n_active_elem()=4096 > > n_subdomains()=1 > > n_partitions()=1 > > n_processors()=1 > > n_threads()=1 > > processor_id()=0 > > > ==28408== Conditional jump or move depends on uninitialised value(s) > > ==28408== at 0x72B1D30: rml::internal::isLargeObject(void*) > (frontend.cpp:2163) > > ==28408== by 0x72B2A14: scalable_free (frontend.cpp:2564) > > ==28408== by 0x55FECD7: std::_Rb_tree<unsigned int, std::pair<unsigned > int const, double>, std::_Select1st<std::pair<unsigned int const, double> > >, std::less<unsigned int>, > libMesh::Threads::scalable_allocator<std::pair<unsigned int const, double> > > >::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<unsigned int > const, double> >) (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x55FED58: std::_Rb_tree<unsigned int, std::pair<unsigned > int const, double>, std::_Select1st<std::pair<unsigned int const, double> > >, std::less<unsigned int>, > libMesh::Threads::scalable_allocator<std::pair<unsigned int const, double> > > >::erase(unsigned int const&) (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x55F6F51: > libMesh::DofMap::process_constraints(libMesh::MeshBase&) (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x5B859AF: libMesh::System::init_data() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x5B6B8F2: libMesh::ImplicitSystem::init_data() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x5B734D8: libMesh::LinearImplicitSystem::init_data() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x5BBF794: > libMesh::TransientSystem<libMesh::LinearImplicitSystem>::init_data() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x5B85C38: libMesh::System::init() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x5B493ED: libMesh::EquationSystems::init() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x40DB8D: main (in > /home/moosetest/permcj/moose/libmesh/installed/examples/adaptivity/ex5/example-opt) > > ==28408== Uninitialised value was created by a stack allocation > > ==28408== at 0x5707D00: > libMesh::FEGenericBase<double>::compute_periodic_constraints(libMesh::DofConstraints&, > libMesh::DofMap&, libMesh::PeriodicBoundaries const&, libMesh::MeshBase > const&, libMesh::PointLocatorBase const*, unsigned int, libMesh::Elem > const*) (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== > > ==28408== Conditional jump or move depends on uninitialised value(s) > > ==28408== at 0x72B1D30: rml::internal::isLargeObject(void*) > (frontend.cpp:2163) > > ==28408== by 0x72B2A14: scalable_free (frontend.cpp:2564) > > ==28408== by 0x56187AA: libMesh::SparsityPattern::Build::~Build() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x55CB575: > libMesh::DofMap::compute_sparsity(libMesh::MeshBase const&) (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x5B68824: libMesh::ImplicitSystem::init_matrices() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x5B734D8: libMesh::LinearImplicitSystem::init_data() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x5BBF794: > libMesh::TransientSystem<libMesh::LinearImplicitSystem>::init_data() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x5B85C38: libMesh::System::init() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x5B493ED: libMesh::EquationSystems::init() (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== by 0x40DB8D: main (in > /home/moosetest/permcj/moose/libmesh/installed/examples/adaptivity/ex5/example-opt) > > ==28408== Uninitialised value was created by a stack allocation > > ==28408== at 0x5707D00: > libMesh::FEGenericBase<double>::compute_periodic_constraints(libMesh::DofConstraints&, > libMesh::DofMap&, libMesh::PeriodicBoundaries const&, libMesh::MeshBase > const&, libMesh::PointLocatorBase const*, unsigned int, libMesh::Elem > const*) (in > /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) > > ==28408== > > Initial H1 norm = 1.74175 > |
From: Cody P. <cod...@gm...> - 2014-07-08 16:21:03
|
UPDATE: With our testing we have determined that the culprit is --disable-cxx11 If you take a clean Linux box, configure libMesh with that option and run adaptivity example 5 with Valgrind you will see an error similar to the first message in this thread. I suppose that it really could be a compiler bug since I still can't see what is wrong with that method. That means that I have a workaround for running our Valgrind tests but since we only recently added that option, I can't easily find if or when that problem was introduced - yikes! Cody On Mon, Jul 7, 2014 at 1:41 PM, Cody Permann <cod...@gm...> wrote: > UPDATE: I cleaned up my configure, the error still exists on at least two > of our build boxes in opt, oprof and dbg modes. (that's a good sign - > repeatable). Jason, is going to rebuild one of the build boxes itself with > a clean software stack and try once more, we are still not able to make it > fail on our developer test box (called "rod"). I am unable to see any bugs > in the compute_periodic_constraints method so this is still open: > > ../configure --with-methods=opt oprof dbg > --prefix=/home/moosetest/permcj/moose/scripts/../libmesh/installed > --enable-silent-rules --enable-unique-id --disable-warnings --disable-cxx11 > --enable-openmp --enable-shared --disable-static > > > > > On Mon, Jul 7, 2014 at 11:59 AM, Cody Permann <cod...@gm...> > wrote: > >> Devs, >> >> We updated the MOOSE users to a new revision of libMesh last week and >> picked up a Valgrind error in the process. John spent a good portion of the >> day on Thursday just trying to reproduce it on one of our development Linux >> boxes without success. Towards the end of the day he finally logged into >> one of the build boxes (hpcbuild5) where we are actually seeing the error >> and he was finally able to replicate the error on that box with his own >> build of libMesh+MOOSE. What differences exist between the two boxes is >> still under investigation and I hope to be able to reproduce the error on >> the rest of our boxes. I took his progress one step further by reproducing >> the error using pure libMesh. >> >> As of now, I can grab a fresh copy of libMesh, configure and build with >> the normal set of options that we use for MOOSE and reproduce the error >> with "adaptivity example 5". I'll paste the error here so you can see it >> in case anyone might know what has changed in this code without we pulling >> up my sleeves. I'll continue to dig for more answers. In fact, I just >> noticed that we are using the default-comm-world option still on our build >> boxes which isn't right. >> >> Thanks, >> Cody >> >> Here's my configure from config.log which admittedly is a huge mess: >> >> $ ../configure --with-methods=opt oprof dbg >> --prefix=/home/moosetest/permcj/moose/scripts/../libmesh/installed >> --enable-silent-rules --enable-openmp --enable-shared --disable-static >> --enable-unique-id --disable-warnings --enable-default-comm-world >> --disable-cxx11 CXX=mpicxx CC=mpicc FC=mpif90 F77=mpif77 METHODS=opt oprof >> dbg PETSC_DIR=/opt/moose/petsc/mpich_petsc-3.4.3/gcc-opt --disable-netcdf-4 >> --disable-testsets CXX=mpicxx CC=mpicc F77=mpif77 FC=mpif90 CPPFLAGS= >> LIBS= --no-create --no-recursion >> >> Valgrind Error: >> [moosetest][~/permcj/moose/libmesh/installed/examples/adaptivity/ex5]> >> valgrind --leak-check=full --track-origins=yes ./example-opt >> -init_timestep 0 -n_timesteps 5 >> >> ==28408== Memcheck, a memory error detector >> >> ==28408== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al. >> >> ==28408== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright >> info >> >> ==28408== Command: ./example-opt -init_timestep 0 -n_timesteps 5 >> >> ==28408== >> >> Usage: >> >> ./example-opt -init_timestep 0 >> >> OR >> >> ./example-opt -read_solution -init_timestep 26 >> >> >> Running: ./example-opt -init_timestep 0 -n_timesteps 5 >> >> >> Mesh Information: >> >> mesh_dimension()=2 >> >> spatial_dimension()=3 >> >> n_nodes()=4225 >> >> n_local_nodes()=4225 >> >> n_elem()=5460 >> >> n_local_elem()=5460 >> >> n_active_elem()=4096 >> >> n_subdomains()=1 >> >> n_partitions()=1 >> >> n_processors()=1 >> >> n_threads()=1 >> >> processor_id()=0 >> >> >> ==28408== Conditional jump or move depends on uninitialised value(s) >> >> ==28408== at 0x72B1D30: rml::internal::isLargeObject(void*) >> (frontend.cpp:2163) >> >> ==28408== by 0x72B2A14: scalable_free (frontend.cpp:2564) >> >> ==28408== by 0x55FECD7: std::_Rb_tree<unsigned int, std::pair<unsigned >> int const, double>, std::_Select1st<std::pair<unsigned int const, double> >> >, std::less<unsigned int>, >> libMesh::Threads::scalable_allocator<std::pair<unsigned int const, double> >> > >::_M_erase_aux(std::_Rb_tree_const_iterator<std::pair<unsigned int >> const, double> >) (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x55FED58: std::_Rb_tree<unsigned int, std::pair<unsigned >> int const, double>, std::_Select1st<std::pair<unsigned int const, double> >> >, std::less<unsigned int>, >> libMesh::Threads::scalable_allocator<std::pair<unsigned int const, double> >> > >::erase(unsigned int const&) (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x55F6F51: >> libMesh::DofMap::process_constraints(libMesh::MeshBase&) (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x5B859AF: libMesh::System::init_data() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x5B6B8F2: libMesh::ImplicitSystem::init_data() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x5B734D8: libMesh::LinearImplicitSystem::init_data() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x5BBF794: >> libMesh::TransientSystem<libMesh::LinearImplicitSystem>::init_data() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x5B85C38: libMesh::System::init() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x5B493ED: libMesh::EquationSystems::init() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x40DB8D: main (in >> /home/moosetest/permcj/moose/libmesh/installed/examples/adaptivity/ex5/example-opt) >> >> ==28408== Uninitialised value was created by a stack allocation >> >> ==28408== at 0x5707D00: >> libMesh::FEGenericBase<double>::compute_periodic_constraints(libMesh::DofConstraints&, >> libMesh::DofMap&, libMesh::PeriodicBoundaries const&, libMesh::MeshBase >> const&, libMesh::PointLocatorBase const*, unsigned int, libMesh::Elem >> const*) (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== >> >> ==28408== Conditional jump or move depends on uninitialised value(s) >> >> ==28408== at 0x72B1D30: rml::internal::isLargeObject(void*) >> (frontend.cpp:2163) >> >> ==28408== by 0x72B2A14: scalable_free (frontend.cpp:2564) >> >> ==28408== by 0x56187AA: libMesh::SparsityPattern::Build::~Build() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x55CB575: >> libMesh::DofMap::compute_sparsity(libMesh::MeshBase const&) (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x5B68824: libMesh::ImplicitSystem::init_matrices() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x5B734D8: libMesh::LinearImplicitSystem::init_data() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x5BBF794: >> libMesh::TransientSystem<libMesh::LinearImplicitSystem>::init_data() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x5B85C38: libMesh::System::init() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x5B493ED: libMesh::EquationSystems::init() (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== by 0x40DB8D: main (in >> /home/moosetest/permcj/moose/libmesh/installed/examples/adaptivity/ex5/example-opt) >> >> ==28408== Uninitialised value was created by a stack allocation >> >> ==28408== at 0x5707D00: >> libMesh::FEGenericBase<double>::compute_periodic_constraints(libMesh::DofConstraints&, >> libMesh::DofMap&, libMesh::PeriodicBoundaries const&, libMesh::MeshBase >> const&, libMesh::PointLocatorBase const*, unsigned int, libMesh::Elem >> const*) (in >> /home/moosetest/permcj/moose/libmesh/installed/lib/libmesh_opt.so.0.0.0) >> >> ==28408== >> >> Initial H1 norm = 1.74175 >> > > |
From: Kirk, B. (JSC-EG311) <ben...@na...> - 2014-07-08 16:23:55
|
> On Jul 8, 2014, at 11:21 AM, "Cody Permann" <cod...@gm...> wrote: > > ilar to the first message in this thread. I suppose that it really could be a compiler bug since I still can't see what is wrong with that method. Thanks for the update. I wonder if that's the case and of so maybe it could be replicated with a small example using one of the offending containers? What gcc version is on the affected box? -Ben |
From: John P. <jwp...@gm...> - 2014-07-08 16:27:13
|
On Tue, Jul 8, 2014 at 10:20 AM, Cody Permann <cod...@gm...> wrote: > UPDATE: With our testing we have determined that the culprit is > --disable-cxx11 > > If you take a clean Linux box, configure libMesh with that option and run > adaptivity example 5 with Valgrind you will see an error similar to the > first message in this thread. I suppose that it really could be a compiler > bug since I still can't see what is wrong with that method. That means that > I have a workaround for running our Valgrind tests but since we only > recently added that option, I can't easily find if or when that problem was > introduced - yikes! Wait, what? --disable-cxx11 is the same behavior we've had since forever, so it can't be causing a *new* valgrind error. It is interesting/weird that enabling cxx11 would make it go away though... -- John |
From: Roy S. <roy...@ic...> - 2014-07-08 17:20:40
|
On Tue, 8 Jul 2014, John Peterson wrote: > Wait, what? --disable-cxx11 is the same behavior we've had since > forever, so it can't be causing a *new* valgrind error. No, sadly it's not what our behavior used to be, just what our behavior *should* have been. I think I had a long-open issue about this - we'd been sticking -std=c++0x in our CXXFLAGS for g++ via compiler.m4 Still *very* strange. I wouldn't have been too shocked to see a compile-time error from turning off C++11 support, but I can't imagine what would have caused a runtime error. Googling... there are some libstdc++ ABI changes in STL containers. Are we inadvertently passing lists/pairs/sets/etc between C++98 and C++11 builds? Or... perhaps --disable-cxx11 is being strict enough to turn off unordered_foo, and we have some bug that only manifests in the case of an ordered set/map/multimap? --- Roy |
From: John P. <jwp...@gm...> - 2014-07-08 17:37:50
|
> On Jul 8, 2014, at 12:20 PM, Roy Stogner <roy...@ic...> wrote: > > >> On Tue, 8 Jul 2014, John Peterson wrote: >> >> Wait, what? --disable-cxx11 is the same behavior we've had since >> forever, so it can't be causing a *new* valgrind error. > > No, sadly it's not what our behavior used to be, just what our > behavior *should* have been. I think I had a long-open issue about > this - we'd been sticking -std=c++0x in our CXXFLAGS for g++ via > compiler.m4 Oh yeah, I removed those manual flags in the patches adding automatic detection. Good catch! > > Still *very* strange. I wouldn't have been too shocked to see a > compile-time error from turning off C++11 support, but I can't imagine > what would have caused a runtime error. > > Googling... there are some libstdc++ ABI changes in STL containers. > Are we inadvertently passing lists/pairs/sets/etc between C++98 and > C++11 builds? > > Or... perhaps --disable-cxx11 is being strict enough to turn off > unordered_foo, and we have some bug that only manifests in the case of > an ordered set/map/multimap? > --- > Roy |
From: Cody P. <cod...@gm...> - 2014-07-08 17:47:20
|
John, just so you are up to speed Jason has reproduced this result on "rod", "cone" and "hpcbuild4" in addition to the original failure on "hpcbuild5", you just need to add that new flag so it's definitely popping up on multiple systems now. Roy's theory is interesting but I'm surprised that we aren't seeing this error in other places in the library if that's the case. It only shows up when you turn on periodic boundaries and might be related to the constraint function as it shows, but I just can't see it! We're in the middle of a MOOSE workshop so it might be a few days before we can dig much deeper. Cody On Tue, Jul 8, 2014 at 11:37 AM, John Peterson <jwp...@gm...> wrote: > > > > On Jul 8, 2014, at 12:20 PM, Roy Stogner <roy...@ic...> > wrote: > > > > > >> On Tue, 8 Jul 2014, John Peterson wrote: > >> > >> Wait, what? --disable-cxx11 is the same behavior we've had since > >> forever, so it can't be causing a *new* valgrind error. > > > > No, sadly it's not what our behavior used to be, just what our > > behavior *should* have been. I think I had a long-open issue about > > this - we'd been sticking -std=c++0x in our CXXFLAGS for g++ via > > compiler.m4 > > Oh yeah, I removed those manual flags in the patches adding automatic > detection. Good catch! > > > > > > Still *very* strange. I wouldn't have been too shocked to see a > > compile-time error from turning off C++11 support, but I can't imagine > > what would have caused a runtime error. > > > > Googling... there are some libstdc++ ABI changes in STL containers. > > Are we inadvertently passing lists/pairs/sets/etc between C++98 and > > C++11 builds? > > > > Or... perhaps --disable-cxx11 is being strict enough to turn off > > unordered_foo, and we have some bug that only manifests in the case of > > an ordered set/map/multimap? > > --- > > Roy > |
From: John P. <jwp...@gm...> - 2014-07-08 17:52:17
|
> On Jul 8, 2014, at 12:47 PM, Cody Permann <cod...@gm...> wrote: > > John, just so you are up to speed Jason has reproduced this result on "rod", "cone" and "hpcbuild4" in addition to the original failure on "hpcbuild5", you just need to add that new flag so it's definitely popping up on multiple systems now. Roy's theory is interesting but I'm surprised that we aren't seeing this error in other places in the library if that's the case. It only shows up when you turn on periodic boundaries and might be related to the constraint function as it shows, but I just can't see it! Ok... I think those are all GCC 4.6 installations? I might try a 4.7.2 build on one of those Linux boxes and see if it has the same issue, depending on how fast the network connection from here is... > > We're in the middle of a MOOSE workshop so it might be a few days before we can dig much deeper. > > Cody > > >> On Tue, Jul 8, 2014 at 11:37 AM, John Peterson <jwp...@gm...> wrote: >> >> >> > On Jul 8, 2014, at 12:20 PM, Roy Stogner <roy...@ic...> wrote: >> > >> > >> >> On Tue, 8 Jul 2014, John Peterson wrote: >> >> >> >> Wait, what? --disable-cxx11 is the same behavior we've had since >> >> forever, so it can't be causing a *new* valgrind error. >> > >> > No, sadly it's not what our behavior used to be, just what our >> > behavior *should* have been. I think I had a long-open issue about >> > this - we'd been sticking -std=c++0x in our CXXFLAGS for g++ via >> > compiler.m4 >> >> Oh yeah, I removed those manual flags in the patches adding automatic detection. Good catch! >> >> >> > >> > Still *very* strange. I wouldn't have been too shocked to see a >> > compile-time error from turning off C++11 support, but I can't imagine >> > what would have caused a runtime error. >> > >> > Googling... there are some libstdc++ ABI changes in STL containers. >> > Are we inadvertently passing lists/pairs/sets/etc between C++98 and >> > C++11 builds? >> > >> > Or... perhaps --disable-cxx11 is being strict enough to turn off >> > unordered_foo, and we have some bug that only manifests in the case of >> > an ordered set/map/multimap? >> > --- >> > Roy > |
From: John P. <jwp...@gm...> - 2014-07-08 18:46:00
|
On Tue, Jul 8, 2014 at 11:52 AM, John Peterson <jwp...@gm...> wrote: > > > On Jul 8, 2014, at 12:47 PM, Cody Permann <cod...@gm...> wrote: > > John, just so you are up to speed Jason has reproduced this result on "rod", > "cone" and "hpcbuild4" in addition to the original failure on "hpcbuild5", > you just need to add that new flag so it's definitely popping up on multiple > systems now. Roy's theory is interesting but I'm surprised that we aren't > seeing this error in other places in the library if that's the case. It > only shows up when you turn on periodic boundaries and might be related to > the constraint function as it shows, but I just can't see it! Hmmm... are we absolutely sure that DofMap::process_constraints() isn't invalidating an iterator somehow through its various calls to erase()? I doubt it, but I'd need to look at the code a bit more to be sure. Another thought: is it possible that the DofConstraints, DofConstraintValueMap, and AdjointDofConstraintValues classes, which derive from std::map, are being deleted through a pointer to std::map somehow? Again, I doubt it... -- John |
From: John P. <jwp...@gm...> - 2014-07-08 18:50:26
|
On Tue, Jul 8, 2014 at 12:45 PM, John Peterson <jwp...@gm...> wrote: > On Tue, Jul 8, 2014 at 11:52 AM, John Peterson <jwp...@gm...> wrote: >> >> >> On Jul 8, 2014, at 12:47 PM, Cody Permann <cod...@gm...> wrote: >> >> John, just so you are up to speed Jason has reproduced this result on "rod", >> "cone" and "hpcbuild4" in addition to the original failure on "hpcbuild5", >> you just need to add that new flag so it's definitely popping up on multiple >> systems now. Roy's theory is interesting but I'm surprised that we aren't >> seeing this error in other places in the library if that's the case. It >> only shows up when you turn on periodic boundaries and might be related to >> the constraint function as it shows, but I just can't see it! > > Hmmm... are we absolutely sure that DofMap::process_constraints() > isn't invalidating an iterator somehow through its various calls to > erase()? I doubt it, but I'd need to look at the code a bit more to > be sure. > > Another thought: is it possible that the DofConstraints, > DofConstraintValueMap, and AdjointDofConstraintValues classes, which > derive from std::map, are being deleted through a pointer to std::map > somehow? Again, I doubt it... Is it possible to compile/run the code without the "Threads::scalable_allocator" argument? Possibly something in tbb::scalable_allocator is acting up on us... -- John |
From: Derek G. <fri...@gm...> - 2014-07-09 00:32:08
|
Have you tried disabling TBB? That might show something interesting. Could it be that TBB is compiled with C++11 support... so passing an object from it down into a C++98 compiled STL might not work out? I don't remember how we get TBB... Jason: did _we_ compile TBB ourselves? Derek On Tue, Jul 8, 2014 at 12:49 PM, John Peterson <jwp...@gm...> wrote: > On Tue, Jul 8, 2014 at 12:45 PM, John Peterson <jwp...@gm...> > wrote: > > On Tue, Jul 8, 2014 at 11:52 AM, John Peterson <jwp...@gm...> > wrote: > >> > >> > >> On Jul 8, 2014, at 12:47 PM, Cody Permann <cod...@gm...> > wrote: > >> > >> John, just so you are up to speed Jason has reproduced this result on > "rod", > >> "cone" and "hpcbuild4" in addition to the original failure on > "hpcbuild5", > >> you just need to add that new flag so it's definitely popping up on > multiple > >> systems now. Roy's theory is interesting but I'm surprised that we > aren't > >> seeing this error in other places in the library if that's the case. It > >> only shows up when you turn on periodic boundaries and might be related > to > >> the constraint function as it shows, but I just can't see it! > > > > Hmmm... are we absolutely sure that DofMap::process_constraints() > > isn't invalidating an iterator somehow through its various calls to > > erase()? I doubt it, but I'd need to look at the code a bit more to > > be sure. > > > > Another thought: is it possible that the DofConstraints, > > DofConstraintValueMap, and AdjointDofConstraintValues classes, which > > derive from std::map, are being deleted through a pointer to std::map > > somehow? Again, I doubt it... > > Is it possible to compile/run the code without the > "Threads::scalable_allocator" argument? Possibly something in > tbb::scalable_allocator is acting up on us... > > -- > John > > > ------------------------------------------------------------------------------ > Open source business process management suite built on Java and Eclipse > Turn processes into business applications with Bonita BPM Community Edition > Quickly connect people, data, and systems into organized workflows > Winner of BOSSIE, CODIE, OW2 and Gartner awards > http://p.sf.net/sfu/Bonitasoft > _______________________________________________ > Libmesh-devel mailing list > Lib...@li... > https://lists.sourceforge.net/lists/listinfo/libmesh-devel > |
From: John P. <jwp...@gm...> - 2014-07-16 19:25:51
|
On Tue, Jul 8, 2014 at 6:32 PM, Derek Gaston <fri...@gm...> wrote: > Have you tried disabling TBB? That might show something interesting. So, I can confirm that configuring libmesh with --disable-tbb --disable-cxx11 gets rid of the valgrind error. Since all the valgrind errors seem to have this "scalable_free" function ==11839== by 0x8915A14: scalable_free (frontend.cpp:2564) at their root, I'm currently blaming the combination of .) tbb::scalable_allocator .) GCC4.6 .) no -std=c++0x for this issue. The next thing I'll try is re-enabling tbb, but removing the scalable_allocator template parameters from dof_map.h and sparsity_pattern.h, and making sure we're still valgrind-clean. Assuming this works, what is the correct long-term fix? scalable_allocator seems to be an optimization, so the simplest approach is to just remove it... I wouldn't be opposed to e.g. enabling it only when C++11 is enabled, although that is quite a hack, and only shows that we do not know what the real problem is. -- John |
From: Kirk, B. (JSC-EG311) <ben...@na...> - 2014-07-16 19:33:30
|
On Jul 16, 2014, at 2:25 PM, John Peterson <jwp...@gm...> wrote: > The next thing I'll try is re-enabling tbb, but removing the > scalable_allocator template parameters from dof_map.h and > sparsity_pattern.h, and making sure we're still valgrind-clean. > Assuming this works, what is the correct long-term fix? > scalable_allocator seems to be an optimization, so the simplest > approach is to just remove it... I wouldn't be opposed to e.g. > enabling it only when C++11 is enabled, although that is quite a hack, > and only shows that we do not know what the real problem is. IIRC the scalable allocator is almost a prerequisite to actually see a speedup with TBB here. Otherwise there's so much per-thread allocation contention things can actually slow down, so I think we need to use it when possible. Might be worth profiling on a more recent system though rather than relying in my increasingly hazy memory. Alternatively, does enabling C++11 give us any other allocator we can stick in there? Also, just found this: http://valgrind.org/docs/manual/dist.news.html looks like bug #317318 317318 Support for Threading Building Blocks "scalable_malloc" should be fixed in valgrind 3.9.0. What version are you running? |
From: John P. <jwp...@gm...> - 2014-07-16 19:39:48
|
On Wed, Jul 16, 2014 at 1:33 PM, Kirk, Benjamin (JSC-EG311) <ben...@na...> wrote: > On Jul 16, 2014, at 2:25 PM, John Peterson <jwp...@gm...> wrote: > >> The next thing I'll try is re-enabling tbb, but removing the >> scalable_allocator template parameters from dof_map.h and >> sparsity_pattern.h, and making sure we're still valgrind-clean. >> Assuming this works, what is the correct long-term fix? >> scalable_allocator seems to be an optimization, so the simplest >> approach is to just remove it... I wouldn't be opposed to e.g. >> enabling it only when C++11 is enabled, although that is quite a hack, >> and only shows that we do not know what the real problem is. > > IIRC the scalable allocator is almost a prerequisite to actually see a speedup with TBB here. Otherwise there's so much per-thread allocation contention things can actually slow down, so I think we need to use it when possible. Might be worth profiling on a more recent system though rather than relying in my increasingly hazy memory. Good to know, we should certainly try to keep it in the code! > Alternatively, does enabling C++11 give us any other allocator we can stick in there? > > Also, just found this: > > http://valgrind.org/docs/manual/dist.news.html > looks like bug #317318 > > 317318 Support for Threading Building Blocks "scalable_malloc" Oh yeah, forgot about google ... :P Also found this little hint: http://kate-editor.org/2013/07/29/intel-threading-building-blocks-scalable-allocator-valgrind/ > should be fixed in valgrind 3.9.0. What version are you running? 3.7.0, I could try and see if there is a newer binary available for the system I'm on. -- John |
From: John P. <jwp...@gm...> - 2014-07-16 20:32:23
|
On Wed, Jul 16, 2014 at 1:39 PM, John Peterson <jwp...@gm...> wrote: > On Wed, Jul 16, 2014 at 1:33 PM, Kirk, Benjamin (JSC-EG311) > <ben...@na...> wrote: >> >> should be fixed in valgrind 3.9.0. What version are you running? > > 3.7.0, I could try and see if there is a newer binary available for > the system I'm on. FYI, built 3.9.0 from source, same error as in 3.7.0. -- John |
From: John P. <jwp...@gm...> - 2014-07-16 20:59:10
|
On Wed, Jul 16, 2014 at 2:31 PM, John Peterson <jwp...@gm...> wrote: > On Wed, Jul 16, 2014 at 1:39 PM, John Peterson <jwp...@gm...> wrote: >> On Wed, Jul 16, 2014 at 1:33 PM, Kirk, Benjamin (JSC-EG311) >> <ben...@na...> wrote: >>> >>> should be fixed in valgrind 3.9.0. What version are you running? >> >> 3.7.0, I could try and see if there is a newer binary available for >> the system I'm on. > > FYI, built 3.9.0 from source, same error as in 3.7.0. Just pushed the commit below. It doesn't fix the valgrind error, but definitely seems wrong. Actually not sure how it compiled... https://github.com/libMesh/libmesh/commit/49a181db4742e4b4458e024d9e1fa152278cfeca -- John |
From: John P. <jwp...@gm...> - 2014-07-16 21:02:42
|
On Wed, Jul 16, 2014 at 2:58 PM, John Peterson <jwp...@gm...> wrote: > On Wed, Jul 16, 2014 at 2:31 PM, John Peterson <jwp...@gm...> wrote: >> On Wed, Jul 16, 2014 at 1:39 PM, John Peterson <jwp...@gm...> wrote: >>> On Wed, Jul 16, 2014 at 1:33 PM, Kirk, Benjamin (JSC-EG311) >>> <ben...@na...> wrote: >>>> >>>> should be fixed in valgrind 3.9.0. What version are you running? >>> >>> 3.7.0, I could try and see if there is a newer binary available for >>> the system I'm on. >> >> FYI, built 3.9.0 from source, same error as in 3.7.0. > > Just pushed the commit below. It doesn't fix the valgrind error, but > definitely seems wrong. Actually not sure how it compiled... > > https://github.com/libMesh/libmesh/commit/49a181db4742e4b4458e024d9e1fa152278cfeca Looks like that bug was introduced here: https://github.com/libMesh/libmesh/commit/64715adf2018d4b78df559c9c7dd8e2c2d00e55d -- John |
From: John P. <jwp...@gm...> - 2014-07-16 21:53:07
|
On Wed, Jul 16, 2014 at 3:02 PM, John Peterson <jwp...@gm...> wrote: > On Wed, Jul 16, 2014 at 2:58 PM, John Peterson <jwp...@gm...> wrote: >> On Wed, Jul 16, 2014 at 2:31 PM, John Peterson <jwp...@gm...> wrote: >>> On Wed, Jul 16, 2014 at 1:39 PM, John Peterson <jwp...@gm...> wrote: >>>> On Wed, Jul 16, 2014 at 1:33 PM, Kirk, Benjamin (JSC-EG311) >>>> <ben...@na...> wrote: >>>>> >>>>> should be fixed in valgrind 3.9.0. What version are you running? >>>> >>>> 3.7.0, I could try and see if there is a newer binary available for >>>> the system I'm on. >>> >>> FYI, built 3.9.0 from source, same error as in 3.7.0. >> >> Just pushed the commit below. It doesn't fix the valgrind error, but >> definitely seems wrong. Actually not sure how it compiled... >> >> https://github.com/libMesh/libmesh/commit/49a181db4742e4b4458e024d9e1fa152278cfeca > > Looks like that bug was introduced here: > > https://github.com/libMesh/libmesh/commit/64715adf2018d4b78df559c9c7dd8e2c2d00e55d Also confirmed that keeping TBB enabled but deleting the scalable allocator from the few classes that use them also fixes the valgrind errors. So... if we have to have the scalable allocators for speed and the only *actual* errors they cause are in valgrind (we've never seen any real segfaults caused by them as far as I know), I'm leaning toward just adding them to our suppression file. -- John |
From: Derek G. <fri...@gm...> - 2014-07-17 16:55:06
|
I vote for suppression as well. On Wed, Jul 16, 2014 at 3:52 PM, John Peterson <jwp...@gm...> wrote: > On Wed, Jul 16, 2014 at 3:02 PM, John Peterson <jwp...@gm...> > wrote: > > On Wed, Jul 16, 2014 at 2:58 PM, John Peterson <jwp...@gm...> > wrote: > >> On Wed, Jul 16, 2014 at 2:31 PM, John Peterson <jwp...@gm...> > wrote: > >>> On Wed, Jul 16, 2014 at 1:39 PM, John Peterson <jwp...@gm...> > wrote: > >>>> On Wed, Jul 16, 2014 at 1:33 PM, Kirk, Benjamin (JSC-EG311) > >>>> <ben...@na...> wrote: > >>>>> > >>>>> should be fixed in valgrind 3.9.0. What version are you running? > >>>> > >>>> 3.7.0, I could try and see if there is a newer binary available for > >>>> the system I'm on. > >>> > >>> FYI, built 3.9.0 from source, same error as in 3.7.0. > >> > >> Just pushed the commit below. It doesn't fix the valgrind error, but > >> definitely seems wrong. Actually not sure how it compiled... > >> > >> > https://github.com/libMesh/libmesh/commit/49a181db4742e4b4458e024d9e1fa152278cfeca > > > > Looks like that bug was introduced here: > > > > > https://github.com/libMesh/libmesh/commit/64715adf2018d4b78df559c9c7dd8e2c2d00e55d > > Also confirmed that keeping TBB enabled but deleting the scalable > allocator from the few classes that use them also fixes the valgrind > errors. > > So... if we have to have the scalable allocators for speed and the > only *actual* errors they cause are in valgrind (we've never seen any > real segfaults caused by them as far as I know), I'm leaning toward > just adding them to our suppression file. > > -- > John > |
From: John P. <jwp...@gm...> - 2014-07-17 17:07:16
|
On Thu, Jul 17, 2014 at 10:54 AM, Derek Gaston <fri...@gm...> wrote: > I vote for suppression as well. Cool, do you mind merging my MOOSE PR #3546 that does this? -- John |