From: Lugmayr, W. <w.l...@uk...> - 2020-09-16 14:00:59
|
hi, ad GPU) I will use nvidia-smi for testing ad hosts.conf) The first time setup it is good that it is generated. When calling "scipion config --update" it would be more practical if you just validate it in memory and do not write back the new file without the #SBATCH comments. Or you write it back inlcuding the # lines. I think it would be enough if you print the validation error messages just to the screen so fixing/adapting the hosts.conf template can be an iterative process. I assumed you are validating because of the message: All the expected sections and options found in /beegfs/cssb/software/em/scipion/3.0/devel/config/hosts.conf Thanks & cheers, Wolfgang From: "Pablo Conesa" <pc...@cn...> To: "Mailing list for Scipion users" <sci...@li...> Sent: Monday, 14 September, 2020 18:40:10 Subject: Re: [scipion-users] scipion3, own special scipion conda - cluster submission problems/question Hi Wolfgang, thanks for the feedback. Not sure if this will help your pourposes.... The number of GPUs go in the template into the: GPU_COUNT variable So you can have the number of GPUs un the template. Not sure how can you make us of this since I barely have experience in administering/configuring queue systems. Regarding the config......are you running scipion config --update for the scipion.conf? and don't want the host.conf to be replaced? May be we can skip the host.conf when making an update? On 14/9/20 11:37, Lugmayr, Wolfgang wrote: Hi, thanks for your help, I think this was my fault. We have CPU nodes with no CUDA libs on the cluster and GPU nodes. It seems that I've sent the xmipp_cuda_movie_alignment_correlation job to a CPU node where it died immediately without any log message. I have to improve my scipion3 hosts.conf template. For relion its easy, I check my script if --gpu is on or off. But in scipion3 I cannot determine if its a CPU/GPU job on the cluster submission script level. Just to make clear, my python command line came from the template: %_(JOB_COMMAND)s and due a mpi4py error the settings mentioned below were correct: subprocess.CalledProcessError: Command 'mpirun -np 3 -bynode `which python3` "-m" "scipion" "runprotocol" "/beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_mpirun.py" "/home/lugmayr/ScipionUserData/projects/Scipion3_TestSuite" "Runs/000146_XmippProtMovieCorr/logs/run.db" "146"' returned non-zero exit status 1. pip install mpi4py solved it (conda install will also give my mpich, so I used pip). Thanks for your help. Cheers, Wolfgang P.S. FYI Each time I call scipion3 config --update all my comment lines from the hosts.conf file are removed. But these contain useful SLURM settings like: #SBATCH -p %_(JOB_QUEUE)s I copy it over the generated one afterwards, but maybe there will be a better solution? From: "Pablo Conesa" [ mailto:pc...@cn... | <pc...@cn...> ] To: "Mailing list for Scipion users" [ mailto:sci...@li... | <sci...@li...> ] Sent: Friday, 11 September, 2020 18:06:18 Subject: Re: [scipion-users] scipion3, own special scipion conda - cluster submission problems/question Let me see if I'm getting it. Are you trying to avoid the launching script (scipion3) and instead you want to run "manually" a protocol? If so, you need to replicate what the launching script does: Here is what I've got in the launching script hightlighting the key lines: #!/usr/bin/env python # Scipion launcher import os import sys from os.path import dirname, abspath, join, basename # Set SCIPION_HOME to the location of this file scipionHome = dirname(abspath(__file__)) os.environ["SCIPION_TESTS_CMD"] = basename(__file__) + " tests" os.environ["LD_LIBRARY_PATH"] = ":".join([os.environ.get("LD_LIBRARY_PATH", ""), join(scipionHome, "software", "lib")]) os.environ["PYTHONPATH"] = ":".join([os.environ.get("PYTHONPATH", ""), join(scipionHome, "software", "bindings")]) cmd = "" if len(sys.argv) > 1 and sys.argv[1] == 'git': for repo in ['scipion-app', 'scipion-pyworkflow', 'scipion-em']: cmd += ("(cd "+join(scipionHome, repo)+" ; echo ' > in "+repo+":' ;" " git "+' '.join(sys.argv[2:])+" ; echo) ; ") else: # Activate the environment cmd = '. /home/pablo/software/scipion/.scipion3env/bin/activate && ' cmd += "python -m scipion %s" % " ".join(sys.argv[1:]) # Set SCIPION_HOME os.environ["SCIPION_HOME"] = scipionHome exit(os.WEXITSTATUS(os.system(cmd))) A short explanation: LD_LIBRARY_PATH and PYTHONPATH are set to allow scipion use xmipp fileformats convertions, metadata utilities and it's viewer. SCIPION_HOME, is mandatory, since now the path of pyworkflow (pip package) cannot determine scipion home as it was for scipion2. python -m scipion is to run entry point commands but in your case you can skip this. On 11/9/20 16:44, Lugmayr, Wolfgang wrote: BQ_BEGIN Hi, I have re-installed recently scipion3 with its own conda environment. Mainly for the reason that the plugins could overwrite my conda installation e.g. mine is cryolo-1.7 and the scipion plugin creates a cryolo-1.7.2 My new scipion3 conda looks now like: base * /beegfs/cssb/software/em/scipion/3.0/anaconda3 .scipion3env /beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/.scipion3env cryolo-1.7.2 /beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/cryolo-1.7.2 cryoloCPU-1.7.2 /beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/cryoloCPU-1.7.2 xmipp_DLTK_v0.3 /beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/xmipp_DLTK_v0.3 xmipp_MicCleaner /beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/xmipp_MicCleaner If I run everything locally via scipion3 the task runs fine. The task script for the cluster contains: : source /beegfs/cssb/software/em/scipion/3.0/anaconda3/etc/profile.d/conda.sh conda activate .scipion3env which python3 ################################ python3 /beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py "/home/lugmayr/ScipionUserData/projects/Scipion3_TestSuite" "Runs/000146_XmippProtMovieCorr/logs/run.db" 146 Now I get the following error: ### jobid : 5452483 ### /beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/.scipion3env/bin/python3 Traceback (most recent call last): File "/beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py", line 39, in <module> runProtocolMain(projPath, dbPath, protId) File "/beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 2167, in runProtocolMain protocol = getProtocolFromDb(projectPath, protDbPath, protId, chdir=True) File "/beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", line 2249, in getProtocolFromDb project.load(dbPath=os.path.join(projectPath, protDbPath), chdir=chdir, File "/beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/project/project.py", line 244, in load self._loadDb(dbPath) File "/beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/project/project.py", line 302, in _loadDb self.mapper = self.createMapper(absDbPath) File "/beegfs/cssb/software/em/scipion/3.0/anaconda3/envs/.scipion3env/lib/python3.8/site-packages/pyworkflow/project/project.py", line 211, in createMapper classesDict.update(self._domain.getMapperDict()) AttributeError: 'NoneType' object has no attribute 'getMapperDict' Before when my base conda included all other scipion conda envs the cluster submission was fine. I have set autoactivation for my base to off in my .condarc. I have to use the python3 from the .scipion3env otherwise there would not be the pyworkflow package. What else could I change for submission? Cheers, Wolfgang -- Pablo Conesa - Madrid [ http://scipion.i2pc.es/ | Scipion ] team _______________________________________________ scipion-users mailing list [ mailto:sci...@li... | sci...@li... ] [ https://lists.sourceforge.net/lists/listinfo/scipion-users | https://lists.sourceforge.net/lists/listinfo/scipion-users ] _______________________________________________ scipion-users mailing list [ mailto:sci...@li... | sci...@li... ] [ https://lists.sourceforge.net/lists/listinfo/scipion-users | https://lists.sourceforge.net/lists/listinfo/scipion-users ] BQ_END -- Pablo Conesa - Madrid [ http://scipion.i2pc.es/ | Scipion ] team _______________________________________________ scipion-users mailing list sci...@li... https://lists.sourceforge.net/lists/listinfo/scipion-users |