From: Valerie B. <val...@ib...> - 2017-01-18 10:06:13
|
Dear all, I have installed the latest Scipion version on a linux machine Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 4.4.0-59-generic x86_64). the install tests have run OK but I have recurrent problems with CL2D: at first it was looking for libmpi.so.1 so I created a symbolic link : lrwxrwxrwx 1 root root 12 janv. 17 17:25 libmpi.so.1 -> libmpi.so.12 Now it fails with the message below. Can you help me fix this, please? Best regards, Valerie 00001: RUNNING PROTOCOL ----------------- 00002: Scipion: v1.0.1 00003: currentDir: /home/biou/ScipionUserData/projects/EX_LMNG 00004: workingDir: Runs/001509_XmippProtCL2D 00005: runMode: Restart 00006: MPI: 2 00007: threads: 1 00008: Starting at step: 1 00009: Running steps 00010: STARTED: convertInputStep, step 1 00011: 2017-01-18 10:28:06.919867 00012: FINISHED: convertInputStep, step 1 00013: 2017-01-18 10:28:14.324241 00014: STARTED: runJob, step 2 00015: 2017-01-18 10:28:14.436293 00016: mpirun -np 2 -bynode `which xmipp_mpi_classify_CL2D` -i Runs/001509_XmippProtCL2D/extra/images.xmd --odir Runs/001509_XmippProtCL2D/extra --oroot level --nref 20 --iter 10 --distance correlation --classicalMultiref --nref0 4 00017: -------------------------------------------------------------------------- 00018: The following command line options and corresponding MCA parameter have 00019: been deprecated and replaced as follows: 00020: 00021: Command line options: 00022: Deprecated: --bynode, -bynode 00023: Replacement: --map-by node 00024: 00025: Equivalent MCA parameter: 00026: Deprecated: rmaps_base_bynode 00027: Replacement: rmaps_base_mapping_policy=node 00028: 00029: The deprecated forms *will* disappear in a future version of Open MPI. 00030: Please update to the new syntax. 00031: -------------------------------------------------------------------------- 00032: -------------------------------------------------------------------------- 00033: A requested component was not found, or was unable to be opened. This 00034: means that this component is either not installed or is unable to be 00035: used on your system (e.g., sometimes this means that shared libraries 00036: that the component requires are unable to be found/loaded). Note that 00037: Open MPI stopped checking at the first component that it did not find. 00038: 00039: Host: vblinux 00040: Framework: ess 00041: Component: pmi 00042: -------------------------------------------------------------------------- 00043: -------------------------------------------------------------------------- 00044: A requested component was not found, or was unable to be opened. This 00045: means that this component is either not installed or is unable to be 00046: used on your system (e.g., sometimes this means that shared libraries 00047: that the component requires are unable to be found/loaded). Note that 00048: Open MPI stopped checking at the first component that it did not find. 00049: 00050: Host: vblinux 00051: Framework: ess 00052: Component: pmi 00053: -------------------------------------------------------------------------- 00054: [vblinux:02124] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 129 00055: [vblinux:02123] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file runtime/orte_init.c at line 129 00056: -------------------------------------------------------------------------- 00057: It looks like orte_init failed for some reason; your parallel process is 00058: likely to abort. There are many reasons that a parallel process can 00059: fail during orte_init; some of which are due to configuration or 00060: environment problems. This failure appears to be an internal failure; 00061: here's some additional information (which may only be relevant to an 00062: Open MPI developer): 00063: 00064: orte_ess_base_open failed 00065: --> Returned value Not found (-13) instead of ORTE_SUCCESS 00066: -------------------------------------------------------------------------- 00067: -------------------------------------------------------------------------- 00068: It looks like orte_init failed for some reason; your parallel process is 00069: likely to abort. There are many reasons that a parallel process can 00070: fail during orte_init; some of which are due to configuration or 00071: environment problems. This failure appears to be an internal failure; 00072: here's some additional information (which may only be relevant to an 00073: Open MPI developer): 00074: 00075: orte_ess_base_open failed 00076: --> Returned value Not found (-13) instead of ORTE_SUCCESS 00077: -------------------------------------------------------------------------- 00078: *** An error occurred in MPI_Init 00079: *** on a NULL communicator 00080: *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, 00081: *** and potentially your MPI job) 00082: *** An error occurred in MPI_Init 00083: *** on a NULL communicator 00084: *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, 00085: *** and potentially your MPI job) 00086: -------------------------------------------------------------------------- 00087: It looks like MPI_INIT failed for some reason; your parallel process is 00088: likely to abort. There are many reasons that a parallel process can 00089: fail during MPI_INIT; some of which are due to configuration or environment 00090: problems. This failure appears to be an internal failure; here's some 00091: additional information (which may only be relevant to an Open MPI 00092: developer): 00093: 00094: ompi_mpi_init: ompi_rte_init failed 00095: --> Returned "Not found" (-13) instead of "Success" (0) 00096: -------------------------------------------------------------------------- 00097: -------------------------------------------------------------------------- 00098: It looks like MPI_INIT failed for some reason; your parallel process is 00099: likely to abort. There are many reasons that a parallel process can 00100: fail during MPI_INIT; some of which are due to configuration or environment 00101: problems. This failure appears to be an internal failure; here's some 00102: additional information (which may only be relevant to an Open MPI 00103: developer): 00104: 00105: ompi_mpi_init: ompi_rte_init failed 00106: --> Returned "Not found" (-13) instead of "Success" (0) 00107: -------------------------------------------------------------------------- 00108: [vblinux:2124] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! 00109: [vblinux:2123] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! 00110: ------------------------------------------------------- 00111: Primary job terminated normally, but 1 process returned 00112: a non-zero exit code.. Per user-direction, the job has been aborted. 00113: ------------------------------------------------------- 00114: -------------------------------------------------------------------------- 00115: mpirun detected that one or more processes exited with non-zero status, thus causing 00116: the job to be terminated. The first process to do so was: 00117: 00118: Process name: [[63772,1],0] 00119: Exit code: 1 00120: -------------------------------------------------------------------------- 00121: Traceback (most recent call last): 00122: File "/usr/local/scipion/pyworkflow/protocol/protocol.py", line 167, in run 00123: self._run() 00124: File "/usr/local/scipion/pyworkflow/protocol/protocol.py", line 211, in _run 00125: resultFiles = self._runFunc() 00126: File "/usr/local/scipion/pyworkflow/protocol/protocol.py", line 207, in _runFunc 00127: return self._func(*self._args) 00128: File "/usr/local/scipion/pyworkflow/protocol/protocol.py", line 960, in runJob 00129: self._stepsExecutor.runJob(self._log, program, arguments, **kwargs) 00130: File "/usr/local/scipion/pyworkflow/protocol/executor.py", line 56, in runJob 00131: env=env, cwd=cwd) 00132: File "/usr/local/scipion/pyworkflow/utils/process.py", line 51, in runJob 00133: return runCommand(command, env, cwd) 00134: File "/usr/local/scipion/pyworkflow/utils/process.py", line 65, in runCommand 00135: check_call(command, shell=True, stdout=sys.stdout, stderr=sys.stderr, env=env, cwd=cwd) 00136: File "/usr/local/scipion/software/lib/python2.7/subprocess.py", line 540, in check_call 00137: raise CalledProcessError(retcode, cmd) 00138: CalledProcessError: Command 'mpirun -np 2 -bynode `which xmipp_mpi_classify_CL2D` -i Runs/001509_XmippProtCL2D/extra/images.xmd --odir Runs/001509_XmippProtCL2D/extra --oroot level --nref 20 --iter 10 --distance correlation --classicalMultiref --nref0 4' returned non-zero exit status 1 00139: Protocol failed: Command 'mpirun -np 2 -bynode `which xmipp_mpi_classify_CL2D` -i Runs/001509_XmippProtCL2D/extra/images.xmd --odir Runs/001509_XmippProtCL2D/extra --oroot level --nref 20 --iter 10 --distance correlation --classicalMultiref --nref0 4' returned non-zero exit status 1 00140: FAILED: runJob, step 2 00141: 2017-01-18 10:28:14.966673 00142: Cleaning temporarly files.... 00143: ------------------- PROTOCOL FAILED (DONE 2/13) |