From: Joshua J. L. <jl...@um...> - 2018-01-22 23:02:57
|
Hi Dr.Ferrer It seems like you are running out of memory on your cards . What is the box size ? .Also you might want to give an odd number of MPI because one will always become a master an the rest will be slaves Sincerely Joshua Lobo On Mon, Jan 22, 2018 at 11:46 AM, Montserrat Fabrega Ferrer < mf...@ib...> wrote: > Hi, > > I am trying to run a Relion auto-refine in Scipion v1.1 (2017-06-14) > Balbino. The Relion version is 2.0.3. However, I get the error I copy > below. Does anybody have any suggestion that would help? > > Thank you very much in advance, > > Montserrat Fabrega > > 00001: RUNNING PROTOCOL ----------------- > 00002: PID: 10237 > 00003: Scipion: v1.1 (2017-06-14) Balbino > 00004: currentDir: /gpfs/projects/irb12/irb12336/ > ScipionUserData/projects/Titan > 00005: workingDir: Runs/011639_ProtRelionRefine3D > 00006: runMode: Continue > 00007: MPI: 4 > 00008: threads: 4 > 00009: len(steps) 3 len(prevSteps) 0 > 00010: Starting at step: 1 > 00011: Running steps > 00012: STARTED: convertInputStep, step 1 > 00013: 2018-01-22 00:13:40.523885 > 00014: Converting set from 'Runs/011589_ProtUserSubSet/particles.sqlite' > into 'Runs/011639_ProtRelionRefine3D/input_particles.star' > 00015: FINISHED: convertInputStep, step 1 > 00016: 2018-01-22 00:13:45.845200 > 00017: STARTED: runRelionStep, step 2 > 00018: 2018-01-22 00:13:45.860738 > 00019: srun `which relion_refine_mpi` --gpu --low_resol_join_halves 40 > --pool 3 --auto_local_healpix_order 4 --angpix 1.04 > --dont_combine_weights_via_disc --ref Runs/011639_ProtRelionRefine3D/tmp/proposedVolume00003.mrc > --scale --offset_range 5.0 --ini_high 60.0 --offset_step 2.0 > --healpix_order 2 --auto_refine --ctf --oversampling 1 > --split_random_halves --o Runs/011639_ProtRelionRefine3D/extra/relion > --i Runs/011639_ProtRelionRefine3D/input_particles.star --zero_mask > --norm --firstiter_cc --sym c12 --flatten_solvent --particle_diameter > 228.8 --j 4 > 00020: === RELION MPI setup === > 00021: + Number of MPI processes = 4 > 00022: + Number of threads per MPI process = 4 > 00023: + Total number of threads therefore = 16 > 00024: + Master (0) runs on host = nvb36 > 00025: + Slave 1 runs on host = nvb36 > 00026: + Slave 2 runs on host = nvb36 > 00027: + Slave 3 runs on host = nvb36 > 00028: ================= > 00029: uniqueHost nvb36 has 3 ranks. > 00030: GPU-ids not specified for this rank, threads will automatically > be mapped to available devices. > 00031: Thread 0 on slave 1 mapped to device 0 > 00032: Thread 1 on slave 1 mapped to device 0 > 00033: Thread 2 on slave 1 mapped to device 0 > 00034: Thread 3 on slave 1 mapped to device 1 > 00035: GPU-ids not specified for this rank, threads will automatically > be mapped to available devices. > 00036: Thread 0 on slave 2 mapped to device 1 > 00037: Thread 1 on slave 2 mapped to device 1 > 00038: Thread 2 on slave 2 mapped to device 2 > 00039: Thread 3 on slave 2 mapped to device 2 > 00040: GPU-ids not specified for this rank, threads will automatically > be mapped to available devices. > 00041: Thread 0 on slave 3 mapped to device 2 > 00042: Thread 1 on slave 3 mapped to device 3 > 00043: Thread 2 on slave 3 mapped to device 3 > 00044: Thread 3 on slave 3 mapped to device 3 > 00045: Device 1 on nvb36 is split between 2 slaves > 00046: Device 2 on nvb36 is split between 2 slaves > 00047: [nvb36:10305] *** Process received signal *** > 00048: [nvb36:10305] Signal: Segmentation fault (11) > 00049: [nvb36:10305] Signal code: Address not mapped (1) > 00050: [nvb36:10305] Failing at address: 0x2802b08 > 00051: [nvb36:10305] [ 0] /lib64/libpthread.so.0() [0x358740f790] > 00052: [nvb36:10305] [ 1] /opt/mpi/bullxmpi/1.2.9.1/lib/ > libmpi.so.1(opal_memory_ptmalloc2_free+0x26) > <http://1.2.9.1/lib/libmpi.so.1%28opal_memory_ptmalloc2_free+0x26%29> > [0x2ac8aeb94046] > 00053: [nvb36:10305] [ 2] /apps/RELION/2.0.3/lib/libreli > on_lib.so(_ZN14MlOptimiserMpi10initialiseEv+0x115f) [0x2ac8a7491f0f] > 00054: [nvb36:10305] [ 3] /apps/RELION/2.0.3/bin/relion_refine_mpi(main+0x218) > [0x4052c8] > 00055: [nvb36:10305] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) > [0x3586c1ed5d] > 00056: [nvb36:10305] [ 5] /apps/RELION/2.0.3/bin/relion_refine_mpi() > [0x404fe9] > 00057: [nvb36:10305] *** End of error message *** > 00058: srun: error: nvb36: task 0: Segmentation fault > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > > |