From: Pablo C. <pc...@cn...> - 2021-01-22 15:13:51
|
I believe it should be: GPUs: 0 1 0 1 MPI: 3 I also think there is a bug when using this case we missed when migrating to python 3: https://github.com/scipion-em/scipion-pyworkflow/blob/devel/pyworkflow/protocol/executor.py#L175 chunk = nGpu / nThreads should be chunk = int(nGpu / nThreads) I have to test this and release a fix. On 22/1/21 14:34, Hoover, David (NIH/CIT) [E] via scipion-users wrote: > Hi all, > > I am trying to launch a MotionCorr job using Scipion across multiple > nodes with multiple GPUs per node. > > If I leave GPU IDs blank, it attempts to run the tasks, but fails > because the gpu ids are missing (command has -Gpu %(GPU)s). > > If I set GPU IDs to "0", it finishes some of the tasks (where the > command includes -Gpu 0.0), but it fails on other tasks with > non-integer values for the gpu (e.g. -gpu 2.5). > > If I set GPU IDs to "0 1" (there are 2 gpus per node), the entire job > fails with this error: > > Traceback (most recent call last): > File > "/usr/local/apps/scipion/3.0.6/anaconda/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_mpirun.py", > line 54, in <module> > runProtocolMainMPI(projectPath, dbPath, protId, comm) > File > "/usr/local/apps/scipion/3.0.6/anaconda/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", > line 2229, in runProtocolMainMPI > executor = MPIStepExecutor(hostConfig, protocol.numberOfMpi.get() > - 1, > File > "/usr/local/apps/scipion/3.0.6/anaconda/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/protocol/executor.py", > line 342, in __init__ > ThreadStepExecutor.__init__(self, hostConfig, nMPI, **kwargs) > File > "/usr/local/apps/scipion/3.0.6/anaconda/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/protocol/executor.py", > line 177, in __init__ > self.gpuDict[node] = list(self.gpuList[i*chunk:(i+1)*chunk]) > TypeError: slice indices must be integers or None or have an __index__ > method > > How exactly should one designate the GPU IDs for such a situation? > For example, 2 nodes, each with 2 GPU? > > David -- Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* |