From: Carlos O. S. <co...@cn...> - 2018-06-21 13:40:37
|
Dear Manoel, from the stdout there is no obvious reason why the process has finished. There is no error other than it has been killed. In some machines there is a limit on the time a process can be running, and beyond this time, processes have to be submitted through a queue. I don't know if this could be the case in this case. Kind regards, Carlos Oscar On 21/06/2018 14:18, Manoel Prouteau wrote: > > Dear users, > > > I am just starting using Scipion for CL2D classification of a small > set of manually picked objects. > > I get an error while the softwaer starts the second step of the > command. Can you help me understanding the problem? > > > You can find the error in the run.stdout here: > > > 00001: RUNNING PROTOCOL ----------------- > 00002: PID: 9060 > 00003: Scipion: v1.1 (2017-06-14) Balbino > 00004: currentDir: > /data/prouteau/Mano_newdata_frames2-16_DW/TOROID-Sides > 00005: workingDir: Runs/000400_XmippProtCL2D > 00006: runMode: Continue > 00007: MPI: 4 > 00008: threads: 1 > 00009: len(steps) 13 len(prevSteps) 0 > 00010: Starting at step: 1 > 00011: Running steps > 00012: STARTED: convertInputStep, step 1 > 00013: 2018-06-20 15:03:42.471496 > 00014: FINISHED: convertInputStep, step 1 > 00015: 2018-06-20 15:03:43.256208 > 00016: STARTED: runJob, step 2 > 00017: 2018-06-20 15:03:43.312482 > 00018: mpirun -np 4 -bynode `which xmipp_mpi_classify_CL2D` -i > Runs/000400_XmippProtCL2D/tmp/input_particles.xmd --odir > Runs/000400_XmippProtCL2D/extra --oroot level --nref 15 --iter 10 > --distance correlation --classicalMultiref --nref0 4 > 00019: > -------------------------------------------------------------------------- > 00020: The following command line options and corresponding MCA > parameter have > 00021: been deprecated and replaced as follows: > 00022: > 00023: Command line options: > 00024: Deprecated: --bynode, -bynode > 00025: Replacement: --map-by node > 00026: > 00027: Equivalent MCA parameter: > 00028: Deprecated: rmaps_base_bynode > 00029: Replacement: rmaps_base_mapping_policy=node > 00030: > 00031: The deprecated forms *will* disappear in a future version of > Open MPI. > 00032: Please update to the new syntax. > 00033: > -------------------------------------------------------------------------- > 00034: Input images: Runs/000400_XmippProtCL2D/tmp/input_particles.xmd > 00035: Output root: level > 00036: Output dir: Runs/000400_XmippProtCL2D/extra > 00037: Iterations: 10 > 00038: CodesSel0: > 00039: Codes0: 4 > 00040: Codes: 15 > 00041: Neighbours: 4 > 00042: Minimum node size: 20 > 00043: Use Correlation: 1 > 00044: Classical Multiref: 1 > 00045: Classical Split: 0 > 00046: Maximum shift: 10 > 00047: Classify all images: 0 > 00048: Normalize images: 1 > 00049: Mirror images: 1 > 00050: Align images: 1 > 00051: Initializing ... > 00052: 0/ 0 sec. > ............................................................ > 00053: Quantizing with 4 codes... > 00054: Iteration 1 ... > 00055: 13/ 25 sec. ...............................RUNNING > PROTOCOL ----------------- > 00056: PID: 9099 > 00057: Scipion: v1.1 (2017-06-14) Balbino > 00058: currentDir: > /data/prouteau/Mano_newdata_frames2-16_DW/TOROID-Sides > 00059: workingDir: Runs/000400_XmippProtCL2D > 00060: runMode: Continue > 00061: MPI: 32 > 00062: threads: 1 > 00063: len(steps) 13 len(prevSteps) 13 > 00064: Starting at step: 2 > 00065: Running steps > 00066: STARTED: runJob, step 2 > 00067: 2018-06-20 15:04:06.958333 > 00068: mpirun -np 32 -bynode `which xmipp_mpi_classify_CL2D` -i > Runs/000400_XmippProtCL2D/tmp/input_particles.xmd --odir > Runs/000400_XmippProtCL2D/extra --oroot level --nref 15 --iter 10 > --distance correlation --classicalMultiref --nref0 4 > 00069: > -------------------------------------------------------------------------- > 00070: The following command line options and corresponding MCA > parameter have > 00071: been deprecated and replaced as follows: > 00072: > 00073: Command line options: > 00074: Deprecated: --bynode, -bynode > 00075: Replacement: --map-by node > 00076: > 00077: Equivalent MCA parameter: > 00078: Deprecated: rmaps_base_bynode > 00079: Replacement: rmaps_base_mapping_policy=node > 00080: > 00081: The deprecated forms *will* disappear in a future version of > Open MPI. > 00082: Please update to the new syntax. > 00083: > -------------------------------------------------------------------------- > 00084: Input images: Runs/000400_XmippProtCL2D/tmp/input_particles.xmd > 00085: Output root: level > 00086: Output dir: Runs/000400_XmippProtCL2D/extra > 00087: Iterations: 10 > 00088: CodesSel0: > 00089: Codes0: 4 > 00090: Codes: 15 > 00091: Neighbours: 4 > 00092: Minimum node size: 20 > 00093: Use Correlation: 1 > 00094: Classical Multiref: 1 > 00095: Classical Split: 0 > 00096: Maximum shift: 10 > 00097: Classify all images: 0 > 00098: Normalize images: 1 > 00099: Mirror images: 1 > 00100: Align images: 1 > 00101: Initializing ... > 00102: 0/ 0 sec. > ............................................................ > 00103: Quantizing with 4 codes... > 00104: Iteration 1 ... > 00105: 10/ 10 sec. > ............................................................ > 00106: > 00107: Average correlation with input vectors=0.0310552 > 00108: Number of assignment changes=0 > 00109: Iteration 2 ... > 00110: 10/ 10 sec. > ............................................................ > 00111: > 00112: Average correlation with input vectors=0.0882044 > 00113: Number of assignment changes=324 > 00114: Iteration 3 ... > 00115: 10/ 10 sec. > ............................................................ > 00116: > 00117: Average correlation with input vectors=0.107101 > 00118: Number of assignment changes=378 > 00119: Iteration 4 ... > 00120: 9/ 9 sec. > ............................................................ > 00121: > 00122: Average correlation with input vectors=0.122994 > 00123: Number of assignment changes=225 > 00124: Iteration 5 ... > 00125: 10/ 10 sec. > ............................................................ > 00126: > 00127: Average correlation with input vectors=0.119519 > 00128: Number of assignment changes=290 > 00129: Iteration 6 ... > 00130: 9/ 9 sec. > ............................................................ > 00131: > 00132: Average correlation with input vectors=0.127653 > 00133: Number of assignment changes=233 > 00134: Iteration 7 ... > 00135: 10/ 10 sec. > ............................................................ > 00136: > 00137: Average correlation with input vectors=0.127296 > 00138: Number of assignment changes=223 > 00139: Iteration 8 ... > 00140: 9/ 9 sec. > ............................................................ > 00141: > 00142: Average correlation with input vectors=0.129356 > 00143: Number of assignment changes=236 > 00144: Iteration 9 ... > 00145: 10/ 10 sec. > ............................................................ > 00146: > 00147: Average correlation with input vectors=0.143878 > 00148: Number of assignment changes=126 > 00149: Iteration 10 ... > 00150: 9/ 9 sec. > ............................................................ > 00151: > 00152: Average correlation with input vectors=0.138916 > 00153: Number of assignment changes=187 > 00154: Spliting nodes ... > 00155: Currently there are 5 nodes > 00156: Currently there are 6 nodes > 00157: Currently there are 7 nodes > 00158: Currently there are 8 nodes > 00159: Quantizing with 8 codes... > 00160: Iteration 1 ... > 00161: 28/ 28 sec. > ............................................................ > 00162: > 00163: Average correlation with input vectors=0.139535 > 00164: Number of assignment changes=0 > 00165: Iteration 2 ... > 00166: 26/ 26 sec. > ............................................................ > 00167: > 00168: Average correlation with input vectors=0.153304 > 00169: Number of assignment changes=181 > 00170: Iteration 3 ... > 00171: 26/ 26 sec. > ............................................................ > 00172: > 00173: Average correlation with input vectors=0.159167 > 00174: Number of assignment changes=265 > 00175: Iteration 4 ... > 00176: 25/ 25 sec. > ............................................................ > 00177: > 00178: Average correlation with input vectors=0.151184 > 00179: Number of assignment changes=424 > 00180: Iteration 5 ... > 00181: 25/ 25 sec. > ............................................................ > 00182: > 00183: Average correlation with input vectors=0.155143 > 00184: Number of assignment changes=177 > 00185: Iteration 6 ... > 00186: 23/ 23 sec. > ............................................................ > 00187: > 00188: Average correlation with input vectors=0.147184 > 00189: Number of assignment changes=263 > 00190: Iteration 7 ... > 00191: 27/ 27 sec. > ............................................................ > 00192: > 00193: Average correlation with input vectors=0.159538 > 00194: Number of assignment changes=119 > 00195: Iteration 8 ... > 00196: 25/ 25 sec. > ............................................................ > 00197: > 00198: Average correlation with input vectors=0.160486 > 00199: Number of assignment changes=139 > 00200: Iteration 9 ... > 00201: 26/ 26 sec. > ............................................................ > 00202: > 00203: Average correlation with input vectors=0.164716 > 00204: Number of assignment changes=120 > 00205: Iteration 10 ... > 00206: 27/ 27 sec. > ............................................................ > 00207: > 00208: Average correlation with input vectors=0.162771 > 00209: Number of assignment changes=130 > 00210: Spliting nodes ... > 00211: Currently there are 9 nodes > 00212: Currently there are 10 nodes > 00213: Currently there are 11 nodes > 00214: Currently there are 12 nodes > 00215: > -------------------------------------------------------------------------- > 00216: mpirun noticed that process rank 11 with PID 9147 on node > smaug exited on signal 9 (Killed). > 00217: > -------------------------------------------------------------------------- > 00218: Traceback (most recent call last): > 00219: File "/opt/scipion/pyworkflow/protocol/protocol.py", line > 182, in run > 00220: self._run() > 00221: File "/opt/scipion/pyworkflow/protocol/protocol.py", line > 228, in _run > 00222: resultFiles = self._runFunc() > 00223: File "/opt/scipion/pyworkflow/protocol/protocol.py", line > 224, in _runFunc > 00224: return self._func(*self._args) > 00225: File "/opt/scipion/pyworkflow/protocol/protocol.py", line > 1077, in runJob > 00226: self._stepsExecutor.runJob(self._log, program, arguments, > **kwargs) > 00227: File "/opt/scipion/pyworkflow/protocol/executor.py", line > 56, in runJob > 00228: env=env, cwd=cwd) > 00229: File "/opt/scipion/pyworkflow/utils/process.py", line 51, > in runJob > 00230: return runCommand(command, env, cwd) > 00231: File "/opt/scipion/pyworkflow/utils/process.py", line 65, > in runCommand > 00232: check_call(command, shell=True, stdout=sys.stdout, > stderr=sys.stderr, env=env, cwd=cwd) > 00233: File "/opt/scipion/software/lib/python2.7/subprocess.py", > line 540, in check_call > 00234: raise CalledProcessError(retcode, cmd) > 00235: CalledProcessError: Command 'mpirun -np 32 -bynode `which > xmipp_mpi_classify_CL2D` -i > Runs/000400_XmippProtCL2D/tmp/input_particles.xmd --odir > Runs/000400_XmippProtCL2D/extra --oroot level --nref 15 --iter 10 > --distance correlation --classicalMultiref --nref0 4' returned > non-zero exit status 137 > 00236: Protocol failed: Command 'mpirun -np 32 -bynode `which > xmipp_mpi_classify_CL2D` -i > Runs/000400_XmippProtCL2D/tmp/input_particles.xmd --odir > Runs/000400_XmippProtCL2D/extra --oroot level --nref 15 --iter 10 > --distance correlation --classicalMultiref --nref0 4' returned > non-zero exit status 137 > 00237: FAILED: runJob, step 2 > 00238: 2018-06-20 15:31:45.758171 > 00239: ------------------- PROTOCOL FAILED (DONE 2/13) > > > Thanks in advance for your help, > > > Cheers, > > > *Manoël Prouteau, /Ph.D./* > > Scientific Collaborator > > Department of Molecular Biology > > Sciences III - University of Geneva > > Quai Ernest Ansermet, 30 > > 1211 Geneve 04 > > Switzerland > > (+41) 022 379 61 18 > > man...@un... > > http://www.unige.ch > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users -- ------------------------------------------------------------------------ Carlos Oscar Sánchez Sorzano e-mail: co...@cn... Biocomputing unit http://i2pc.es/coss National Center of Biotechnology (CSIC) c/Darwin, 3 Campus Universidad Autónoma (Cantoblanco) Tlf: 34-91-585 4510 28049 MADRID (SPAIN) Fax: 34-91-585 4506 ------------------------------------------------------------------------ |