From: Manoel P. <Man...@un...> - 2018-06-21 12:18:31
|
Dear users, I am just starting using Scipion for CL2D classification of a small set of manually picked objects. I get an error while the softwaer starts the second step of the command. Can you help me understanding the problem? You can find the error in the run.stdout here: 00001: RUNNING PROTOCOL ----------------- 00002: PID: 9060 00003: Scipion: v1.1 (2017-06-14) Balbino 00004: currentDir: /data/prouteau/Mano_newdata_frames2-16_DW/TOROID-Sides 00005: workingDir: Runs/000400_XmippProtCL2D 00006: runMode: Continue 00007: MPI: 4 00008: threads: 1 00009: len(steps) 13 len(prevSteps) 0 00010: Starting at step: 1 00011: Running steps 00012: STARTED: convertInputStep, step 1 00013: 2018-06-20 15:03:42.471496 00014: FINISHED: convertInputStep, step 1 00015: 2018-06-20 15:03:43.256208 00016: STARTED: runJob, step 2 00017: 2018-06-20 15:03:43.312482 00018: mpirun -np 4 -bynode `which xmipp_mpi_classify_CL2D` -i Runs/000400_XmippProtCL2D/tmp/input_particles.xmd --odir Runs/000400_XmippProtCL2D/extra --oroot level --nref 15 --iter 10 --distance correlation --classicalMultiref --nref0 4 00019: -------------------------------------------------------------------------- 00020: The following command line options and corresponding MCA parameter have 00021: been deprecated and replaced as follows: 00022: 00023: Command line options: 00024: Deprecated: --bynode, -bynode 00025: Replacement: --map-by node 00026: 00027: Equivalent MCA parameter: 00028: Deprecated: rmaps_base_bynode 00029: Replacement: rmaps_base_mapping_policy=node 00030: 00031: The deprecated forms *will* disappear in a future version of Open MPI. 00032: Please update to the new syntax. 00033: -------------------------------------------------------------------------- 00034: Input images: Runs/000400_XmippProtCL2D/tmp/input_particles.xmd 00035: Output root: level 00036: Output dir: Runs/000400_XmippProtCL2D/extra 00037: Iterations: 10 00038: CodesSel0: 00039: Codes0: 4 00040: Codes: 15 00041: Neighbours: 4 00042: Minimum node size: 20 00043: Use Correlation: 1 00044: Classical Multiref: 1 00045: Classical Split: 0 00046: Maximum shift: 10 00047: Classify all images: 0 00048: Normalize images: 1 00049: Mirror images: 1 00050: Align images: 1 00051: Initializing ... 00052: 0/ 0 sec. ............................................................ 00053: Quantizing with 4 codes... 00054: Iteration 1 ... 00055: 13/ 25 sec. ...............................RUNNING PROTOCOL ----------------- 00056: PID: 9099 00057: Scipion: v1.1 (2017-06-14) Balbino 00058: currentDir: /data/prouteau/Mano_newdata_frames2-16_DW/TOROID-Sides 00059: workingDir: Runs/000400_XmippProtCL2D 00060: runMode: Continue 00061: MPI: 32 00062: threads: 1 00063: len(steps) 13 len(prevSteps) 13 00064: Starting at step: 2 00065: Running steps 00066: STARTED: runJob, step 2 00067: 2018-06-20 15:04:06.958333 00068: mpirun -np 32 -bynode `which xmipp_mpi_classify_CL2D` -i Runs/000400_XmippProtCL2D/tmp/input_particles.xmd --odir Runs/000400_XmippProtCL2D/extra --oroot level --nref 15 --iter 10 --distance correlation --classicalMultiref --nref0 4 00069: -------------------------------------------------------------------------- 00070: The following command line options and corresponding MCA parameter have 00071: been deprecated and replaced as follows: 00072: 00073: Command line options: 00074: Deprecated: --bynode, -bynode 00075: Replacement: --map-by node 00076: 00077: Equivalent MCA parameter: 00078: Deprecated: rmaps_base_bynode 00079: Replacement: rmaps_base_mapping_policy=node 00080: 00081: The deprecated forms *will* disappear in a future version of Open MPI. 00082: Please update to the new syntax. 00083: -------------------------------------------------------------------------- 00084: Input images: Runs/000400_XmippProtCL2D/tmp/input_particles.xmd 00085: Output root: level 00086: Output dir: Runs/000400_XmippProtCL2D/extra 00087: Iterations: 10 00088: CodesSel0: 00089: Codes0: 4 00090: Codes: 15 00091: Neighbours: 4 00092: Minimum node size: 20 00093: Use Correlation: 1 00094: Classical Multiref: 1 00095: Classical Split: 0 00096: Maximum shift: 10 00097: Classify all images: 0 00098: Normalize images: 1 00099: Mirror images: 1 00100: Align images: 1 00101: Initializing ... 00102: 0/ 0 sec. ............................................................ 00103: Quantizing with 4 codes... 00104: Iteration 1 ... 00105: 10/ 10 sec. ............................................................ 00106: 00107: Average correlation with input vectors=0.0310552 00108: Number of assignment changes=0 00109: Iteration 2 ... 00110: 10/ 10 sec. ............................................................ 00111: 00112: Average correlation with input vectors=0.0882044 00113: Number of assignment changes=324 00114: Iteration 3 ... 00115: 10/ 10 sec. ............................................................ 00116: 00117: Average correlation with input vectors=0.107101 00118: Number of assignment changes=378 00119: Iteration 4 ... 00120: 9/ 9 sec. ............................................................ 00121: 00122: Average correlation with input vectors=0.122994 00123: Number of assignment changes=225 00124: Iteration 5 ... 00125: 10/ 10 sec. ............................................................ 00126: 00127: Average correlation with input vectors=0.119519 00128: Number of assignment changes=290 00129: Iteration 6 ... 00130: 9/ 9 sec. ............................................................ 00131: 00132: Average correlation with input vectors=0.127653 00133: Number of assignment changes=233 00134: Iteration 7 ... 00135: 10/ 10 sec. ............................................................ 00136: 00137: Average correlation with input vectors=0.127296 00138: Number of assignment changes=223 00139: Iteration 8 ... 00140: 9/ 9 sec. ............................................................ 00141: 00142: Average correlation with input vectors=0.129356 00143: Number of assignment changes=236 00144: Iteration 9 ... 00145: 10/ 10 sec. ............................................................ 00146: 00147: Average correlation with input vectors=0.143878 00148: Number of assignment changes=126 00149: Iteration 10 ... 00150: 9/ 9 sec. ............................................................ 00151: 00152: Average correlation with input vectors=0.138916 00153: Number of assignment changes=187 00154: Spliting nodes ... 00155: Currently there are 5 nodes 00156: Currently there are 6 nodes 00157: Currently there are 7 nodes 00158: Currently there are 8 nodes 00159: Quantizing with 8 codes... 00160: Iteration 1 ... 00161: 28/ 28 sec. ............................................................ 00162: 00163: Average correlation with input vectors=0.139535 00164: Number of assignment changes=0 00165: Iteration 2 ... 00166: 26/ 26 sec. ............................................................ 00167: 00168: Average correlation with input vectors=0.153304 00169: Number of assignment changes=181 00170: Iteration 3 ... 00171: 26/ 26 sec. ............................................................ 00172: 00173: Average correlation with input vectors=0.159167 00174: Number of assignment changes=265 00175: Iteration 4 ... 00176: 25/ 25 sec. ............................................................ 00177: 00178: Average correlation with input vectors=0.151184 00179: Number of assignment changes=424 00180: Iteration 5 ... 00181: 25/ 25 sec. ............................................................ 00182: 00183: Average correlation with input vectors=0.155143 00184: Number of assignment changes=177 00185: Iteration 6 ... 00186: 23/ 23 sec. ............................................................ 00187: 00188: Average correlation with input vectors=0.147184 00189: Number of assignment changes=263 00190: Iteration 7 ... 00191: 27/ 27 sec. ............................................................ 00192: 00193: Average correlation with input vectors=0.159538 00194: Number of assignment changes=119 00195: Iteration 8 ... 00196: 25/ 25 sec. ............................................................ 00197: 00198: Average correlation with input vectors=0.160486 00199: Number of assignment changes=139 00200: Iteration 9 ... 00201: 26/ 26 sec. ............................................................ 00202: 00203: Average correlation with input vectors=0.164716 00204: Number of assignment changes=120 00205: Iteration 10 ... 00206: 27/ 27 sec. ............................................................ 00207: 00208: Average correlation with input vectors=0.162771 00209: Number of assignment changes=130 00210: Spliting nodes ... 00211: Currently there are 9 nodes 00212: Currently there are 10 nodes 00213: Currently there are 11 nodes 00214: Currently there are 12 nodes 00215: -------------------------------------------------------------------------- 00216: mpirun noticed that process rank 11 with PID 9147 on node smaug exited on signal 9 (Killed). 00217: -------------------------------------------------------------------------- 00218: Traceback (most recent call last): 00219: File "/opt/scipion/pyworkflow/protocol/protocol.py", line 182, in run 00220: self._run() 00221: File "/opt/scipion/pyworkflow/protocol/protocol.py", line 228, in _run 00222: resultFiles = self._runFunc() 00223: File "/opt/scipion/pyworkflow/protocol/protocol.py", line 224, in _runFunc 00224: return self._func(*self._args) 00225: File "/opt/scipion/pyworkflow/protocol/protocol.py", line 1077, in runJob 00226: self._stepsExecutor.runJob(self._log, program, arguments, **kwargs) 00227: File "/opt/scipion/pyworkflow/protocol/executor.py", line 56, in runJob 00228: env=env, cwd=cwd) 00229: File "/opt/scipion/pyworkflow/utils/process.py", line 51, in runJob 00230: return runCommand(command, env, cwd) 00231: File "/opt/scipion/pyworkflow/utils/process.py", line 65, in runCommand 00232: check_call(command, shell=True, stdout=sys.stdout, stderr=sys.stderr, env=env, cwd=cwd) 00233: File "/opt/scipion/software/lib/python2.7/subprocess.py", line 540, in check_call 00234: raise CalledProcessError(retcode, cmd) 00235: CalledProcessError: Command 'mpirun -np 32 -bynode `which xmipp_mpi_classify_CL2D` -i Runs/000400_XmippProtCL2D/tmp/input_particles.xmd --odir Runs/000400_XmippProtCL2D/extra --oroot level --nref 15 --iter 10 --distance correlation --classicalMultiref --nref0 4' returned non-zero exit status 137 00236: Protocol failed: Command 'mpirun -np 32 -bynode `which xmipp_mpi_classify_CL2D` -i Runs/000400_XmippProtCL2D/tmp/input_particles.xmd --odir Runs/000400_XmippProtCL2D/extra --oroot level --nref 15 --iter 10 --distance correlation --classicalMultiref --nref0 4' returned non-zero exit status 137 00237: FAILED: runJob, step 2 00238: 2018-06-20 15:31:45.758171 00239: ------------------- PROTOCOL FAILED (DONE 2/13) Thanks in advance for your help, Cheers, Manoël Prouteau, Ph.D. Scientific Collaborator Department of Molecular Biology Sciences III - University of Geneva Quai Ernest Ansermet, 30 1211 Geneve 04 Switzerland (+41) 022 379 61 18 man...@un... http://www.unige.ch |