From: Pablo C. <pc...@cn...> - 2022-11-04 08:12:42
|
Hi Helder! I got no experience with singularity, but I guess you have a cluster of singuarity nodes? Nodes should have scipion installed as well in the same way(paths) as the "login node". I guess the challenge here is to make slurm "talk to other singularity nodes"? Regarding the error ...: var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No such file or directory uniq: gpu01: No such file or directory python3: can't open file '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': [Errno 2] No such file or directory I only understand the last line (not sure if the first 2 are a consequence of the 3rd one). That path look correct, providing you have a "Scipion virtualenv" installation at /opt. On 3/11/22 19:36, helder veras wrote: > Hi Mohamad! > > Thank you for your reply! Yes, I'm using the host configuration file > to launch slurm. > Let me provide more details about my issue: > > I built a singularity container that has scipion and all the required > dependencies and programs installed as well. This container works fine > and I tested it on a desktop machine and on an HPC node without the > queue option as well. Programs inside scipion are correctly executed > and everything works fine. > To be able to launch scipion using the queue option with slurm I had > to bind the slurm/murge paths to the container and export some paths > (just as presented in > https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container ) > ( I also included slurm user on the container). By doing this Scipion > was able to see the queue (which I changed in hosts.conf file) and > successfully launch the job to the queue. The problem is that the > sbatch script calls the pw_protocol_run.py that is inside the > container, which raises the error in .err file: > > /var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No > such file or directory > uniq: gpu01: No such file or directory > python3: can't open file > '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': > [Errno 2] No such file or directory > > I think the problem is that the slurm is trying to execute the script > that is only available inside the container. > Usage of Slurm within a Singularity Container - GWDG > <https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container> > Starting a singularity container is quite cumbersome for the operating > system of the host and, in case too many requests to start a > singularity container are received at the same time, it might fail. > info.gwdg.de > > > Best, > > Helder Ribeiro > > ------------------------------------------------------------------------ > *De:* Mohamad HARASTANI <moh...@so...> > *Enviado:* quinta-feira, 3 de novembro de 2022 09:29 > *Para:* Mailing list for Scipion users > <sci...@li...> > *Assunto:* Re: [scipion-users] scipion - singularity - HPC > Hello Helder, > > Have you taken a look at the host configuration here > (https://scipion-em.github.io/docs/release-3.0.0/docs/scipion-modes/host-configuration.html)? > > Best of luck, > Mohamad > > ------------------------------------------------------------------------ > *From: *"Helder Veras Ribeiro Filho" <hel...@ln...> > *To: *sci...@li... > *Sent: *Wednesday, November 2, 2022 5:07:53 PM > *Subject: *[scipion-users] scipion - singularity - HPC > > Hello scipion group! > > I'm trying to launch Scipion from a singularity container in an HPC > with the slurm as a scheduler. The container works fine and I'm able > to execute Scipion routines correctly without using a queue. The > problem is when I try to send Scipion jobs using the queue in the > Scipion interface. I suppose that it is a slurm/singularity > configuration problem. > Could anyone who was successful in sending jobs to queue from a > singularity launched scipion help me with some tips? > > Best, > > Helder > > *Helder Veras Ribeiro Filho, PhD*** > Brazilian Biosciences National Laboratory - LNBio > Brazilian Center for Research in Energy and Materials - CNPEM > 10,000 Giuseppe Maximo Scolfaro St. > Campinas, SP - Brazil13083-100 > +55(19) 3512-1255 > > > Aviso Legal: Esta mensagem e seus anexos podem conter informações > confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo e > considere eventual consulta ao remetente antes de copiá-la, divulgá-la > ou distribuí-la. Se você recebeu esta mensagem por engano, por favor > avise o remetente e apague-a imediatamente. > > Disclaimer: This email and its attachments may contain confidential > and/or privileged information. Observe its content carefully and > consider possible querying to the sender before copying, disclosing or > distributing it. If you have received this email by mistake, please > notify the sender and delete it immediately. > > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* |