From: Huabin Z. <Hua...@UT...> - 2022-04-15 15:36:02
|
Dear all, I recently got scipion installed on our workstation, which configured with AMD 16-core processor, RTX3090 (cuda 11.2) and Ubuntu 20.04. After the installation, I run Xmipp tests on the medium and large scale without problems. However, I got the following errors when I process the partilces alignment in TomoFlow. Could you please tell me the potential culprit and how to solve it? Thank you so much! Run.stdout File 'Runs/000201_FlexProtSubtomogramAveraging/logs/run.stdout' doesn't existFile 'Runs/000201_FlexProtSubtomogramAveraging/logs/run.stdout' doesn't exist00001: Logging configured. STDOUT --> Runs/000201_FlexProtSubtomogramAveraging/logs/run.stdout , STDERR --> Runs/000201_FlexProtSubtomogramAveraging/logs/run.stderr 00002: RUNNING PROTOCOL ----------------- 00003: Protocol starts 00004: Hostname: alienware.dhcp.swmed.org 00005: PID: 76953 00006: pyworkflow: 3.0.19 00007: plugin: continuousflex 00008: plugin v: 3.1.4 00009: currentDir: /home/huabin/ScipionUserData/projects/test 00010: workingDir: Runs/000201_FlexProtSubtomogramAveraging 00011: runMode: Restart 00012: MPI: 5 00013: threads: 1 00014: Starting at step: 1 00015: Running steps 00016: STARTED: convertInputStep, step 1, time 2022-04-15 09:39:54.033455 00017: FINISHED: convertInputStep, step 1, time 2022-04-15 09:39:54.076001 00018: STARTED: doAlignmentStep, step 2, time 2022-04-15 09:39:54.080785 00019: tempdir is Runs/000201_FlexProtSubtomogramAveraging/tmp 00020: imgFn is Runs/000201_FlexProtSubtomogramAveraging/extra/volumes.xmd 00021: frm_freq is 0.25 00022: frm_maxshift is 10 00023: max_itr is 10 00024: iter_result is Runs/000201_FlexProtSubtomogramAveraging/extra/result.xmd 00025: reference is /home/huabin/software/scipion/test/rec.mrc 00026: mpirun -np 5 `which xmipp_mpi_volumeset_align` -i Runs/000201_FlexProtSubtomogramAveraging/extra/volumes.xmd -o Runs/000201_FlexProtSubtomogramAveraging/extra/params_itr_1.xmd --odir Runs/000201_FlexProtSubtomogramAveraging/tmp --resume --ref /home/huabin/software/scipion/test/rec.mrc --frm_parameters 0.250000 10 --tilt_values -60 60 00027: Input File: Runs/000201_FlexProtSubtomogramAveraging/extra/volumes.xmd 00028: Output File: Runs/000201_FlexProtSubtomogramAveraging/extra/params_itr_1.xmd 00029: Protocol failed: Command ' mpirun -np 5 `which xmipp_mpi_volumeset_align` -i Runs/000201_FlexProtSubtomogramAveraging/extra/volumes.xmd -o Runs/000201_FlexProtSubtomogramAveraging/extra/params_itr_1.xmd --odir Runs/000201_FlexProtSubtomogramAveraging/tmp --resume --ref /home/huabin/software/scipion/test/rec.mrc --frm_parameters 0.250000 10 --tilt_values -60 60 ' returned non-zero exit status 139. 00030: FAILED: doAlignmentStep, step 2, time 2022-04-15 09:39:54.731311 00031: *** Last status is failed 00032: ------------------- PROTOCOL FAILED (DONE 2/3) Run.stderr File 'Runs/000201_FlexProtSubtomogramAveraging/logs/run.stderr' doesn't existFile 'Runs/000201_FlexProtSubtomogramAveraging/logs/run.stderr' doesn't exist00001: Invalid MIT-MAGIC-COOKIE-1 key0000/???? sec. ------------------------------------------------------------Segmentation fault (core dumped) 00002: [alienware:77029] *** Process received signal *** 00003: [alienware:77029] Signal: Segmentation fault (11) 00004: [alienware:77029] Signal code: Address not mapped (1) 00005: [alienware:77029] Failing at address: 0xc0 00006: [alienware:77029] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x430c0)[0x7f86ae5460c0] 00007: [alienware:77029] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x63ac4)[0x7f86ae566ac4] 00008: [alienware:77029] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__isoc99_fscanf+0x9d)[0x7f86ae56626d] 00009: [alienware:77029] [ 3] /home/huabin/software/scipion/software/em/xmipp/lib/libXmipp.so(_ZN18ProgVolumeSetAlign14computeFitnessEv+0x331)[0x7f86adf32c51] 00010: [alienware:77029] [ 4] /home/huabin/software/scipion/software/em/xmipp/lib/libXmipp.so(_ZN18ProgVolumeSetAlign12processImageERK8FileNameS2_RK5MDRowRS3_+0x63)[0x7f86adf32e93] 00011: [alienware:77029] [ 5] /home/huabin/software/scipion/software/em/xmipp/lib/libXmippCore.so(_ZN20XmippMetadataProgram3runEv+0x467)[0x7f86ad089ab7] 00012: [alienware:77029] [ 6] /home/huabin/software/scipion/software/em/xmipp/lib/libXmippCore.so(_ZN12XmippProgram6tryRunEv+0x2f)[0x7f86ad490aff] 00013: [alienware:77029] [ 7] /home/huabin/software/scipion/software/em/xmipp/bin/xmipp_mpi_volumeset_align(+0xbe22)[0x560db2ba9e22] 00014: [alienware:77029] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f86ae5270b3] 00015: [alienware:77029] [ 9] /home/huabin/software/scipion/software/em/xmipp/bin/xmipp_mpi_volumeset_align(+0xbebe)[0x560db2ba9ebe] 00016: [alienware:77029] *** End of error message *** 00017: Segmentation fault (core dumped) 00018: [alienware:77030] *** Process received signal *** 00019: [alienware:77030] Signal: Segmentation fault (11) 00020: [alienware:77030] Signal code: Address not mapped (1) 00021: [alienware:77030] Failing at address: 0xc0 00022: [alienware:77030] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x430c0)[0x7f8cb06aa0c0] 00023: [alienware:77030] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x63ac4)[0x7f8cb06caac4] 00024: [alienware:77030] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__isoc99_fscanf+0x9d)[0x7f8cb06ca26d] 00025: [alienware:77030] [ 3] /home/huabin/software/scipion/software/em/xmipp/lib/libXmipp.so(_ZN18ProgVolumeSetAlign14computeFitnessEv+0x331)[0x7f8cb0096c51] 00026: [alienware:77030] [ 4] /home/huabin/software/scipion/software/em/xmipp/lib/libXmipp.so(_ZN18ProgVolumeSetAlign12processImageERK8FileNameS2_RK5MDRowRS3_+0x63)[0x7f8cb0096e93] 00027: [alienware:77030] [ 5] /home/huabin/software/scipion/software/em/xmipp/lib/libXmippCore.so(_ZN20XmippMetadataProgram3runEv+0x467)[0x7f8caf1edab7] 00028: [alienware:77030] [ 6] /home/huabin/software/scipion/software/em/xmipp/lib/libXmippCore.so(_ZN12XmippProgram6tryRunEv+0x2f)[0x7f8caf5f4aff] 00029: [alienware:77030] [ 7] /home/huabin/software/scipion/software/em/xmipp/bin/xmipp_mpi_volumeset_align(+0xbe22)[0x55807f863e22] 00030: [alienware:77030] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f8cb068b0b3] 00031: [alienware:77030] [ 9] /home/huabin/software/scipion/software/em/xmipp/bin/xmipp_mpi_volumeset_align(+0xbebe)[0x55807f863ebe] 00032: [alienware:77030] *** End of error message *** 00033: Segmentation fault (core dumped) 00034: [alienware:77031] *** Process received signal *** 00035: [alienware:77031] Signal: Segmentation fault (11) 00036: [alienware:77031] Signal code: Address not mapped (1) 00037: [alienware:77031] Failing at address: 0xc0 00038: [alienware:77031] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x430c0)[0x7f09b053b0c0] 00039: [alienware:77031] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x63ac4)[0x7f09b055bac4] 00040: [alienware:77031] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__isoc99_fscanf+0x9d)[0x7f09b055b26d] 00041: [alienware:77031] [ 3] /home/huabin/software/scipion/software/em/xmipp/lib/libXmipp.so(_ZN18ProgVolumeSetAlign14computeFitnessEv+0x331)[0x7f09aff27c51] 00042: [alienware:77031] [ 4] /home/huabin/software/scipion/software/em/xmipp/lib/libXmipp.so(_ZN18ProgVolumeSetAlign12processImageERK8FileNameS2_RK5MDRowRS3_+0x63)[0x7f09aff27e93] 00043: [alienware:77031] [ 5] /home/huabin/software/scipion/software/em/xmipp/lib/libXmippCore.so(_ZN20XmippMetadataProgram3runEv+0x467)[0x7f09af07eab7] 00044: [alienware:77031] [ 6] /home/huabin/software/scipion/software/em/xmipp/lib/libXmippCore.so(_ZN12XmippProgram6tryRunEv+0x2f)[0x7f09af485aff] 00045: [alienware:77031] [ 7] /home/huabin/software/scipion/software/em/xmipp/bin/xmipp_mpi_volumeset_align(+0xbe22)[0x56074112de22] 00046: [alienware:77031] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f09b051c0b3] 00047: [alienware:77031] [ 9] /home/huabin/software/scipion/software/em/xmipp/bin/xmipp_mpi_volumeset_align(+0xbebe)[0x56074112debe] 00048: [alienware:77031] *** End of error message *** 00049: Segmentation fault (core dumped) 00050: [alienware:77028] *** Process received signal *** 00051: [alienware:77028] Signal: Segmentation fault (11) 00052: [alienware:77028] Signal code: Address not mapped (1) 00053: [alienware:77028] Failing at address: 0xc0 00054: [alienware:77028] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x430c0)[0x7fdd913e40c0] 00055: [alienware:77028] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x63ac4)[0x7fdd91404ac4] 00056: [alienware:77028] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__isoc99_fscanf+0x9d)[0x7fdd9140426d] 00057: [alienware:77028] [ 3] /home/huabin/software/scipion/software/em/xmipp/lib/libXmipp.so(_ZN18ProgVolumeSetAlign14computeFitnessEv+0x331)[0x7fdd90dd0c51] 00058: [alienware:77028] [ 4] /home/huabin/software/scipion/software/em/xmipp/lib/libXmipp.so(_ZN18ProgVolumeSetAlign12processImageERK8FileNameS2_RK5MDRowRS3_+0x63)[0x7fdd90dd0e93] 00059: [alienware:77028] [ 5] /home/huabin/software/scipion/software/em/xmipp/lib/libXmippCore.so(_ZN20XmippMetadataProgram3runEv+0x467)[0x7fdd8ff27ab7] 00060: [alienware:77028] [ 6] /home/huabin/software/scipion/software/em/xmipp/lib/libXmippCore.so(_ZN12XmippProgram6tryRunEv+0x2f)[0x7fdd9032eaff] 00061: [alienware:77028] [ 7] /home/huabin/software/scipion/software/em/xmipp/bin/xmipp_mpi_volumeset_align(+0xbe22)[0x56336bac0e22] 00062: [alienware:77028] [ 8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7fdd913c50b3] 00063: [alienware:77028] [ 9] /home/huabin/software/scipion/software/em/xmipp/bin/xmipp_mpi_volumeset_align(+0xbebe)[0x56336bac0ebe] 00064: [alienware:77028] *** End of error message *** 00065: -------------------------------------------------------------------------- 00066: Primary job terminated normally, but 1 process returned 00067: a non-zero exit code. Per user-direction, the job has been aborted. 00068: -------------------------------------------------------------------------- 00069: -------------------------------------------------------------------------- 00070: mpirun noticed that process rank 2 with PID 0 on node alienware exited on signal 11 (Segmentation fault). 00071: -------------------------------------------------------------------------- 00072: Traceback (most recent call last): 00073: File "/home/huabin/anaconda3/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 201, in run 00074: self._run() 00075: File "/home/huabin/anaconda3/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 252, in _run 00076: resultFiles = self._runFunc() 00077: File "/home/huabin/anaconda3/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 248, in _runFunc 00078: return self._func(*self._args) 00079: File "/home/huabin/software/scipion/scipion-em-continuousflex/continuousflex/protocols/protocol_subtomogram_averaging.py", line 238, in doAlignmentStep 00080: self.runJob("xmipp_volumeset_align", args % locals(), 00081: File "/home/huabin/anaconda3/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 1441, in runJob 00082: self._stepsExecutor.runJob(self._log, program, arguments, **kwargs) 00083: File "/home/huabin/anaconda3/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/executor.py", line 65, in runJob 00084: process.runJob(log, programName, params, 00085: File "/home/huabin/anaconda3/envs/scipion3/lib/python3.8/site-packages/pyworkflow/utils/process.py", line 52, in runJob 00086: return runCommand(command, env, cwd) 00087: File "/home/huabin/anaconda3/envs/scipion3/lib/python3.8/site-packages/pyworkflow/utils/process.py", line 67, in runCommand 00088: check_call(command, shell=True, stdout=sys.stdout, stderr=sys.stderr, 00089: File "/home/huabin/anaconda3/envs/scipion3/lib/python3.8/subprocess.py", line 364, in check_call 00090: raise CalledProcessError(retcode, cmd) 00091: subprocess.CalledProcessError: Command ' mpirun -np 5 `which xmipp_mpi_volumeset_align` -i Runs/000201_FlexProtSubtomogramAveraging/extra/volumes.xmd -o Runs/000201_FlexProtSubtomogramAveraging/extra/params_itr_1.xmd --odir Runs/000201_FlexProtSubtomogramAveraging/tmp --resume --ref /home/huabin/software/scipion/test/rec.mrc --frm_parameters 0.250000 10 --tilt_values -60 60 ' returned non-zero exit status 139. 00092: Protocol failed: Command ' mpirun -np 5 `which xmipp_mpi_volumeset_align` -i Runs/000201_FlexProtSubtomogramAveraging/extra/volumes.xmd -o Runs/000201_FlexProtSubtomogramAveraging/extra/params_itr_1.xmd --odir Runs/000201_FlexProtSubtomogramAveraging/tmp --resume --ref /home/huabin/software/scipion/test/rec.mrc --frm_parameters 0.250000 10 --tilt_values -60 60 ' returned non-zero exit status 139. I tried to run it without mpi and the error is basically the same. Thank you! Huabin Zhou, Ph.D. Postdoctoral Researcher, Michael Rosen Lab Department of Biophysics UT Southwestern Medical Center ________________________________ UT Southwestern Medical Center The future of medicine, today. |