From: Salazar De T. M. <sal...@ll...> - 2018-03-20 14:12:57
|
this->make_node_proc_ids_parallel_consistent(mesh) did not work. I get the exact same error. Is this strange? Miguel ________________________________ From: Roy Stogner <roy...@ic...> Sent: Monday, March 19, 2018 12:28:26 PM To: Salazar De Troya, Miguel Cc: lib...@li... Subject: Re: [Libmesh-users] Assertion `min_id == node->processor_id()' failed.; On Mon, 19 Mar 2018, Salazar De Troya, Miguel wrote: > I found a slight difference between the trace files: > > The traceout_8_142118.txt contains > > libMesh::MeshTools::libmesh_assert_parallel_consistent_procids<libMesh::Node> (mesh=...) at src/mesh/mesh_tools.C:1608 > > whereas traceout_57_85461.txt and traceout_11_104555.txt : > > libMesh::MeshTools::libmesh_assert_parallel_consistent_procids<libMesh::Node> (mesh=...) at src/mesh/mesh_tools.C:1609 > > Not sure if this helps. No; I'm afraid that's expected from that stack trace: processors who think the node should be on processor 57 are screaming that 57 doesn't match the minimum proc_id of 11, but processors who think it should be on processor 11 are screaming that 11 doesn't match the maximum proc_id of 57. > #7 0x00002aaaaebe174e in libMesh::MeshTools::libmesh_assert_parallel_consistent_procids<libMesh::Node> (mesh=...) at src/mesh/mesh_tools.C:1608 > #8 0x00002aaaaeba931e in libMesh::MeshTools::correct_node_proc_ids (mesh=...) at src/mesh/mesh_tools.C:1844 > #9 0x00002aaaae69a0ce in libMesh::MeshCommunication::make_new_nodes_parallel_consistent (this=0x2320a, mesh=...) at src/mesh/mesh_communication.C:1776 > #10 0x00002aaaaea95919 in libMesh::MeshRefinement::_refine_elements (this=0x2320a) at src/mesh/mesh_refinement.C:1601 > #11 0x00002aaaaea6a4d1 in libMesh::MeshRefinement::refine_and_coarsen_elements (this=0x2320a) at src/mesh/mesh_refinement.C:578 > #12 0x00002aaab9d69dcd in OptiProblem::solve (this=0x7fffffffabd8) at /g/g92/miguel/code/topsm/src/opti_problem.C:370 > #13 0x00000000004371b8 in main (argc=4, argv=0x7fffffffb798) at /g/g92/miguel/code/topsm/test/3D_stress_constraint/linear_stress_opti.C:196 > > Are there other things I can do to debug this? One possible fix you could try first: in mesh_communication.C:1767, where it says this->make_new_node_proc_ids_parallel_consistent(mesh); Try changing it to this->make_node_proc_ids_parallel_consistent(mesh); It could be that you're in some corner case I didn't imagine, which causes a processor to fail to identify and correct a new potentially-inconsistent processor_id, and if so then maybe telling the code to sync up *all* node processor_id() values will fix that. Let me know whether or not that works? This is a frighteningly tricky part of the code; you can gawk at the current state of my failed attempts to improve load balancing of processor ids in https://github.com/libMesh/libmesh/pull/1621 in fact. The good news about that PR is it has me digging into corner cases here myself, so hopefully when I'm finished it will fix your code too if my suggested fix above doesn't. The bad news is that there's also a chance of me immediately re-*breaking* your code even if my suggested fix above works - if you wouldn't mind, I'll let you know when the PR is ready so you can run your own tests, just in case they catch something that our own CI misses. --- Roy |