From: Grigory S. <sha...@gm...> - 2021-07-08 09:32:45
|
Hello Dmitry, this is a bit difficult to debug. Can you check if your raw movies are accessible? Also, you can make a test run with 1 movie / few particles with no MPI / no threads to see the errors more clearly since mpi might obscure them. Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Thu, Jul 8, 2021 at 8:16 AM Dmitry Semchonok <Sem...@gm...> wrote: > Dear colleagues, > > Could you please advice - > > I am receiving an error during bayesian polishing: > > First I got this error: > > 1.44/7.13 hrs > ............~~(,_,">-------------------------------------------------------------------------- > mpirun noticed that process rank 6 with PID 114780 on node > dataanalysisserver1 exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > Traceback (most recent call last): > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 197, in run > self._run() > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 248, in _run > resultFiles = self._runFunc() > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 244, in _runFunc > return self._func(*self._args) > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/relion/protocols/protocol_bayesian_polishing.py", > line 351, in trainOrPolishStep > self.runJob(self._getProgram('relion_motion_refine'), args) > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 1388, in runJob > self._stepsExecutor.runJob(self._log, program, arguments, **kwargs) > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/executor.py", > line 65, in runJob > process.runJob(log, programName, params, > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/utils/process.py", > line 52, in runJob > return runCommand(command, env, cwd) > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/utils/process.py", > line 67, in runCommand > check_call(command, shell=True, stdout=sys.stdout, stderr=sys.stderr, > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/subprocess.py", > line 364, in check_call > raise CalledProcessError(retcode, cmd) > subprocess.CalledProcessError: Command ' mpirun -np 8 `which > relion_motion_refine_mpi` --i > Runs/064772_ProtRelionBayesianPolishing/input_particles.star --o > Runs/064772_ProtRelionBayesianPolishing/extra --f > Runs/064668_ProtRelionPostprocess/extra/postprocess.star --angpix_ref > 0.96120 --corr_mic > Runs/064772_ProtRelionBayesianPolishing/input_corrected_micrographs.star > --first_frame 4 --last_frame 33 --s_vel 0.096 --s_div 1830.000 --s_acc > 2.500 --bfac_minfreq 20.000 --bfac_maxfreq -1.000 --combine_frames --j 6 ' > returned non-zero exit status 139. > [31mProtocol failed: Command ' mpirun -np 8 `which > relion_motion_refine_mpi` --i > Runs/064772_ProtRelionBayesianPolishing/input_particles.star --o > Runs/064772_ProtRelionBayesianPolishing/extra --f > Runs/064668_ProtRelionPostprocess/extra/postprocess.star --angpix_ref > 0.96120 --corr_mic > Runs/064772_ProtRelionBayesianPolishing/input_corrected_micrographs.star > --first_frame 4 --last_frame 33 --s_vel 0.096 --s_div 1830.000 --s_acc > 2.500 --bfac_minfreq 20.000 --bfac_maxfreq -1.000 --combine_frames --j 6 ' > returned non-zero exit status 139. [0m > [35mFAILED [0m: trainOrPolishStep, step 2, time 2021-07-07 19:03:26.495010 > *** Last status is failed > > > > > ================================================================================ > > After resubmission - this one - > > 0.02/7.27 hrs ~~(,_,"> > 0.03/6.81 hrs ~~(,_,"> > 0.04/6.70 hrs ~~(,_,"> > 0.05/6.38 hrs > ~~(,_,">-------------------------------------------------------------------------- > mpirun noticed that process rank 3 with PID 173927 on node > dataanalysisserver1 exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > Traceback (most recent call last): > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 197, in run > self._run() > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 248, in _run > resultFiles = self._runFunc() > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 244, in _runFunc > return self._func(*self._args) > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/relion/protocols/protocol_bayesian_polishing.py", > line 351, in trainOrPolishStep > self.runJob(self._getProgram('relion_motion_refine'), args) > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 1388, in runJob > self._stepsExecutor.runJob(self._log, program, arguments, **kwargs) > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/executor.py", > line 65, in runJob > process.runJob(log, programName, params, > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/utils/process.py", > line 52, in runJob > return runCommand(command, env, cwd) > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/utils/process.py", > line 67, in runCommand > check_call(command, shell=True, stdout=sys.stdout, stderr=sys.stderr, > File > "/home/user/Data/Software/miniconda/envs/scipion3/lib/python3.8/subprocess.py", > line 364, in check_call > raise CalledProcessError(retcode, cmd) > subprocess.CalledProcessError: Command ' mpirun -np 6 `which > relion_motion_refine_mpi` --i > Runs/064772_ProtRelionBayesianPolishing/input_particles.star --o > Runs/064772_ProtRelionBayesianPolishing/extra --f > Runs/064668_ProtRelionPostprocess/extra/postprocess.star --angpix_ref > 0.96120 --corr_mic > Runs/064772_ProtRelionBayesianPolishing/input_corrected_micrographs.star > --first_frame 4 --last_frame 33 --s_vel 0.096 --s_div 1830.000 --s_acc > 2.500 --bfac_minfreq 20.000 --bfac_maxfreq -1.000 --combine_frames --j 4 ' > returned non-zero exit status 139. > [31mProtocol failed: Command ' mpirun -np 6 `which > relion_motion_refine_mpi` --i > Runs/064772_ProtRelionBayesianPolishing/input_particles.star --o > Runs/064772_ProtRelionBayesianPolishing/extra --f > Runs/064668_ProtRelionPostprocess/extra/postprocess.star --angpix_ref > 0.96120 --corr_mic > Runs/064772_ProtRelionBayesianPolishing/input_corrected_micrographs.star > --first_frame 4 --last_frame 33 --s_vel 0.096 --s_div 1830.000 --s_acc > 2.500 --bfac_minfreq 20.000 --bfac_maxfreq -1.000 --combine_frames --j 4 ' > returned non-zero exit status 139. [0m > [35mFAILED [0m: trainOrPolishStep, step 2, time 2021-07-08 06:47:13.642052 > *** Last status is failed > [32m------------------- PROTOCOL FAILED (DONE 2/3) [0m > > > > Thanks in advance > > Sincerely, > Dmitry > > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |