From: 蔡园武 <yua...@gm...> - 2012-08-27 09:19:36
|
Hi, guys, I have a problem using MPI. The behaviour of my code becomes weird. The following is part of my code where the problem is: // First output mesh and nodal data const MeshBase& mesh = es.get_mesh(); std::cout << '[' << libMesh::processor_id() << "] " << "writing es..." << std::endl; GMVIO(mesh).write_equation_systems(file_name, es); // Read in gmv file, and do something std::cout << '[' << libMesh::processor_id() << "] " << "open file " << file_name << std::endl; std::ifstream fin(file_name.c_str()); if (!fin) { std::cerr << '[' << libMesh::processor_id() << "] " << "ERROR: Can't open file " << file_name << std::endl; libmesh_error(); } And the output of this part: [0] writing es... [1] ERROR: Can't open file Iter_0.gmv [1] src/output.C, line 127, compiled Aug 27 2012 at 01:52:52 terminate called after throwing an instance of 'libMesh::LogicError' what(): Error in libMesh internal logic [ubuntu:05720] *** Process received signal *** [ubuntu:05720] Signal: Aborted (6) [ubuntu:05720] Signal code: (-6) [ubuntu:05720] [ 0] [0x8f240c] [ubuntu:05720] [ 1] [0x8f2416] [ubuntu:05720] [ 2] /lib/i386-linux-gnu/libc.so.6(gsignal+0x4f) [0x1edfc8f] [ubuntu:05720] [ 3] /lib/i386-linux-gnu/libc.so.6(abort+0x175) [0x1ee32b5] [ubuntu:05720] [ 4] /usr/lib/i386-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x14d) [0x6ee94ed] [ubuntu:05720] [ 5] /home/cyw/libmesh-0.7.3.1/libmesh/lib/i686-pc-linux-gnu_opt/libmesh.so(_ZN7libMesh25libmesh_terminate_handlerEv+0x17) [0xda4087] [ubuntu:05720] [ 6] /usr/lib/i386-linux-gnu/libstdc++.so.6(+0xaf283) [0x6ee7283] [ubuntu:05720] [ 7] /usr/lib/i386-linux-gnu/libstdc++.so.6(+0xaf2bf) [0x6ee72bf] [ubuntu:05720] [ 8] /usr/lib/i386-linux-gnu/libstdc++.so.6(+0xaf40e) [0x6ee740e] [ubuntu:05720] [ 9] ./main-opt(_Z10gmv_outputRKSsRN7libMesh15EquationSystemsERKb+0x3033) [0x8147e33] [ubuntu:05720] [10] ./main-opt(_Z21topology_optimizationR6GetPot+0x43d7) [0x813d8e7] [ubuntu:05720] [11] ./main-opt(main+0xc32) [0x8062c12] [ubuntu:05720] [12] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x1ecb113] [ubuntu:05720] [13] ./main-opt() [0x8063411] [ubuntu:05720] *** End of error message *** [0] open file Iter_0.gmv It seems that the processor_id 0 is writing equation_systems to file "Iter_0.gmv", and at the same time, processor_id 1 is trying to open the file "Iter_0.gmv". Is it correct? I don't know much about MPI. Could someone explain this error, and how to solve it? Thanks! -- Cai Yuanwu 蔡园武 Dept. of Engineering Mechanics, Dalian University of Technology, Dalian 116024, China |
From: Roy S. <roy...@ic...> - 2012-08-27 14:35:25
|
On Mon, 27 Aug 2012, 蔡园武 wrote: > It seems that the processor_id 0 is writing equation_systems to file > "Iter_0.gmv", and at the same time, processor_id 1 is trying to open > the file "Iter_0.gmv". Is it correct? That would be my first guess. > I don't know much about MPI. Could someone explain this error, For serialized file formats, we gather data to processor_id 0 and then other processors are allowed to continue while 0 writes. > and how to solve it? Try a Parallel::barrier() call after the GMVIO? --- Roy |
From: 蔡园武 <yua...@gm...> - 2012-08-29 02:44:47
|
2012/8/27 Roy Stogner <roy...@ic...>: > > On Mon, 27 Aug 2012, 蔡园武 wrote: > >> It seems that the processor_id 0 is writing equation_systems to file >> "Iter_0.gmv", and at the same time, processor_id 1 is trying to open >> the file "Iter_0.gmv". Is it correct? > > > That would be my first guess. > > >> I don't know much about MPI. Could someone explain this error, > > > For serialized file formats, we gather data to processor_id 0 and then > other processors are allowed to continue while 0 writes. > > >> and how to solve it? > > > Try a Parallel::barrier() call after the GMVIO? > --- > Roy I found the definition of Parallel::barrier: void libMesh::Parallel::barrier ( const Communicator & comm = Communicator_World ) [inline] and the explanation: Pause execution until all processors reach a certain point. But I don't know what should be 'a certain point'? What can I pass as variable 'Communicator_World'? Thanks! -- Cai Yuanwu 蔡园武 Dept. of Engineering Mechanics, Dalian University of Technology, Dalian 116024, China |
From: Roy S. <roy...@ic...> - 2012-08-29 04:56:19
|
On Wed, 29 Aug 2012, 蔡园武 wrote: > I found the definition of Parallel::barrier: > void libMesh::Parallel::barrier ( const Communicator & comm = > Communicator_World ) [inline] > and the explanation: > Pause execution until all processors reach a certain point. > But I don't know what should be 'a certain point'? The barrier() call. No processor gets to continue past the barrier until all other processors have reached the barrier. > What can I pass as variable 'Communicator_World'? That's a C++ argument-with-a-default-value. You don't pass anything to it unless you want to use a non-default value. Just use Parallel::barrier(); and you're done. --- Roy |
From: 蔡园武 <yua...@gm...> - 2012-08-29 06:03:53
|
Thanks for your detailed explanation. I tried your method and my code works right now! 2012/8/29 Roy Stogner <roy...@ic...>: > > On Wed, 29 Aug 2012, 蔡园武 wrote: > >> I found the definition of Parallel::barrier: >> void libMesh::Parallel::barrier ( const Communicator & >> comm = >> Communicator_World ) [inline] >> and the explanation: >> Pause execution until all processors reach a certain point. > > >> But I don't know what should be 'a certain point'? > > > The barrier() call. No processor gets to continue past the barrier > until all other processors have reached the barrier. > > >> What can I pass as variable 'Communicator_World'? > > > That's a C++ argument-with-a-default-value. You don't pass anything > to it unless you want to use a non-default value. Just use > > Parallel::barrier(); > > and you're done. > --- > Roy -- Cai Yuanwu 蔡园武 Dept. of Engineering Mechanics, Dalian University of Technology, Dalian 116024, China |