From: Derek G. <fri...@gm...> - 2013-04-10 02:27:49
|
Another data point... job starts fine on half the procs.... Derek On Tue, Apr 9, 2013 at 8:26 PM, Derek Gaston <fri...@gm...> wrote: > Is there any way to disable the hilbert stuff for now? With serial mesh > can we just take the numbering from the node numbering? > > > On Tue, Apr 9, 2013 at 8:21 PM, Derek Gaston <fri...@gm...> wrote: > >> serial >> >> >> On Tue, Apr 9, 2013 at 8:21 PM, Kirk, Benjamin (JSC-EG311) < >> ben...@na...> wrote: >> >>> Serial or parallel mesh? >>> >>> >>> >>> On Apr 9, 2013, at 9:16 PM, "Kirk, Benjamin (JSC-EG311)" < >>> ben...@na...> wrote: >>> >>> > Hmm - I'll look through that section of code tomorrow morning and see >>> if there could possibly be any mismatched send/receives or anything. >>> > >>> > -Ben >>> > >>> > On Apr 9, 2013, at 8:48 PM, "Derek Gaston" <fri...@gm...> wrote: >>> > >>> >> Hey guys, >>> >> >>> >> I've got a fairly large job (>3500 procs) that is hanging while >>> trying to setup the mesh. The procs are in 2 separate places. ~Half of >>> them are here: >>> >> >>> >> #35 0x00002b1746aea7f0 in >>> libMesh::Parallel::Communicator::send_receive<Hilbert::HilbertIndices> () >>> from /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0 >>> >> #36 0x00002b1746afc17b in void >>> libMesh::MeshCommunication::find_global_indices<libMesh::MeshBase::element_iterator>(libMesh::MeshTools::BoundingBox >>> const&, libMesh::MeshBase::element_iterator const&, >>> libMesh::MeshBase::element_iterator const&, std::vector<unsigned int, >>> std::allocator<unsigned int> >&) const () from >>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0 >>> >> #37 0x00002b1746c78528 in >>> libMesh::Partitioner::partition_unpartitioned_elements(libMesh::MeshBase&, >>> unsigned int) () from >>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0 >>> >> #38 0x00002b1746c7b3e5 in >>> libMesh::Partitioner::partition(libMesh::MeshBase&, unsigned int) () from >>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0 >>> >> #39 0x00002b1746ad1d32 in libMesh::MeshBase::prepare_for_use(bool) () >>> from /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0 >>> >> >>> >> >>> >> And the other ~half are here: >>> >> >>> >> #6 0x00002ba5c95c8b90 in >>> libMesh::Parallel::Communicator::send_receive<unsigned int> () from >>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0 >>> >> #7 0x00002ba5c95da2e2 in void >>> libMesh::MeshCommunication::find_global_indices<libMesh::MeshBase::element_iterator>(libMesh::MeshTools::BoundingBox >>> const&, libMesh::MeshBase::element_iterator const&, >>> libMesh::MeshBase::element_iterator const&, std::vector<unsigned int, >>> std::allocator<unsigned int> >&) const () from >>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0 >>> >> #8 0x00002ba5c9756528 in >>> libMesh::Partitioner::partition_unpartitioned_elements(libMesh::MeshBase&, >>> unsigned int) () from >>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0 >>> >> #9 0x00002ba5c97593e5 in >>> libMesh::Partitioner::partition(libMesh::MeshBase&, unsigned int) () from >>> /home/gastdr/projects/fission/libmesh/lib/libmesh_oprof.so.0 >>> >> >>> >> >>> >> >>> >> Obviously they are in slightly different spots... >>> >> >>> >> >>> >> Any ideas on what's going on here or where to start looking? >>> >> >>> >> I was intermittently getting weird errors around this point from >>> mvapich so I've tried to switch to OpenMPI.... and it's hanging up here. >>> >> >>> >> The mesh itself isn't enormous... it's only about 1 million nodes or >>> so. We've definitely done more than this before. >>> >> >>> >> Thanks in advance for any advice! >>> >> >>> >> Derek >>> >> >>> ------------------------------------------------------------------------------ >>> >> Precog is a next-generation analytics platform capable of advanced >>> >> analytics on semi-structured data. The platform includes APIs for >>> building >>> >> apps and a phenomenal toolset for data science. Developers can use >>> >> our toolset for easy data analysis & visualization. Get a free >>> account! >>> >> http://www2.precog.com/precogplatform/slashdotnewsletter >>> >> _______________________________________________ >>> >> Libmesh-devel mailing list >>> >> Lib...@li... >>> >> https://lists.sourceforge.net/lists/listinfo/libmesh-devel >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Precog is a next-generation analytics platform capable of advanced >>> > analytics on semi-structured data. The platform includes APIs for >>> building >>> > apps and a phenomenal toolset for data science. Developers can use >>> > our toolset for easy data analysis & visualization. Get a free account! >>> > http://www2.precog.com/precogplatform/slashdotnewsletter >>> > _______________________________________________ >>> > Libmesh-devel mailing list >>> > Lib...@li... >>> > https://lists.sourceforge.net/lists/listinfo/libmesh-devel >>> >> >> > |