From: Laura <lde...@cn...> - 2022-11-10 12:04:06
|
Hi Helder, in our group we have not experience with singularity but I have prepared a similar installation with docker-slurm. Dockerfiles are here <https://github.com/I2PC/scipion-docker/>, one for master and one for worker, please have a look at the master-image/Dockerfile and the hosts.conf file, maybe it gives you some hint. Our installation works fine. best regards Laura On 10/11/22 12:33, helder veras wrote: > Hi Pablo! > > Thank you for the suggestion! > > I've tested to execute the protocols command in the login node (the > node I'm currently running the GUI) and tested to execute the sbatch > file that calls the protocols command and that runs on the computing > node. Both returned the relion plugins as expected. > There're two things that I noted during the scipion installation in > the singularity container that maybe could be related to this problem. > > 1. Since I'm installing scipion inside the container, I didn't use a > conda environment. For some reason, is that conda environment > necessary for the correct installation of scipion and plugins? > 2. After installation, I noted some errors related to the > scipion-pyworkflow and scipion-em. The modules were not found when > executing the protocols. So I installed them directly from pip > inside the container. Do you think that could be a problem? > > I've attached the singularity definition file (a text file), as you > can see this file contains the commands that I used to install scipion > and plugins inside the container. > > Best, > > Helder > ------------------------------------------------------------------------ > *De:* Pablo Conesa <pc...@cn...> > *Enviado:* quinta-feira, 10 de novembro de 2022 03:38 > *Para:* sci...@li... > <sci...@li...> > *Assunto:* Re: [scipion-users] scipion - singularity - HPC > > This could happen if you have a different node installation where you > are missing some plugins. > > > I.e: You have relion plugin in the "login node", but it is not present > in the computing node? > > > One way to check is to compare the output of: > > > scipion3 protocols > > > from both machines. Are they equal? > > > On 8/11/22 21:06, helder veras wrote: >> Hi Pablo! >> >> Thank you for your thoughts and comments! >> >> I tested to modify the hosts.conf file to execute the singularity >> from the sbatch scripts. It seems the previous errors were solved, >> but now I got this one: >> >> *run.stdout:* >> RUNNING PROTOCOL -----------------ESC[0m >> Protocol starts >> Hostname: gpu01.cnpem.local >> PID: 2044741 >> pyworkflow: 3.0.27 >> plugin: orphan >> currentDir: /home/helder.ribeiro/ScipionUserData/projects/scipion_teste >> workingDir: Runs/000641_ProtRelionClassify2D >> runMode: Restart >> MPI: 1 >> threads: 1 >> 'NoneType' object has no attribute 'Plugin' LegacyProtocol >> installation couldn't be validated. Possible cause could be a >> configuration issue. Try to run scipion config. >> ESC[31mProtocol has validation errors: >> 'NoneType' object has no attribute 'Plugin' LegacyProtocol >> installation couldn't be validated. Possible cause could be a >> configuration issue. Try to run scipion config.ESC[0m >> ESC[32m------------------- PROTOCOL FAILED (DONE 0/0)ESC[0m >> >> Do you have any idea what could be causing this error? >> >> Best, >> >> Helder >> ------------------------------------------------------------------------ >> *De:* Pablo Conesa <pc...@cn...> <mailto:pc...@cn...> >> *Enviado:* sexta-feira, 4 de novembro de 2022 03:12 >> *Para:* sci...@li... >> <mailto:sci...@li...> >> <sci...@li...> >> <mailto:sci...@li...> >> *Assunto:* Re: [scipion-users] scipion - singularity - HPC >> >> Hi Helder! >> >> >> I got no experience with singularity, but I guess you have a cluster >> of singuarity nodes? >> >> >> Nodes should have scipion installed as well in the same way(paths) as >> the "login node". >> >> >> I guess the challenge here is to make slurm "talk to other >> singularity nodes"? >> >> >> Regarding the error ...: >> >> >> var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No >> such file or directory >> >> uniq: gpu01: No such file or directory >> python3: can't open file >> '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': >> [Errno 2] No such file or directory >> >> I only understand the last line (not sure if the first 2 are a >> consequence of the 3rd one). That path look correct, providing you >> have a "Scipion virtualenv" installation at /opt. >> >> >> On 3/11/22 19:36, helder veras wrote: >>> Hi Mohamad! >>> >>> Thank you for your reply! Yes, I'm using the host configuration file >>> to launch slurm. >>> Let me provide more details about my issue: >>> >>> I built a singularity container that has scipion and all the >>> required dependencies and programs installed as well. This container >>> works fine and I tested it on a desktop machine and on an HPC node >>> without the queue option as well. Programs inside scipion are >>> correctly executed and everything works fine. >>> To be able to launch scipion using the queue option with slurm I had >>> to bind the slurm/murge paths to the container and export some paths >>> (just as presented in >>> https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container ) >>> ( I also included slurm user on the container). By doing this >>> Scipion was able to see the queue (which I changed in hosts.conf >>> file) and successfully launch the job to the queue. The problem is >>> that the sbatch script calls the pw_protocol_run.py that is inside >>> the container, which raises the error in .err file: >>> >>> /var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No >>> such file or directory >>> uniq: gpu01: No such file or directory >>> python3: can't open file >>> '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': >>> [Errno 2] No such file or directory >>> >>> I think the problem is that the slurm is trying to execute the >>> script that is only available inside the container. >>> Usage of Slurm within a Singularity Container - GWDG >>> <https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container> >>> Starting a singularity container is quite cumbersome for the >>> operating system of the host and, in case too many requests to start >>> a singularity container are received at the same time, it might fail. >>> info.gwdg.de >>> >>> >>> Best, >>> >>> Helder Ribeiro >>> >>> ------------------------------------------------------------------------ >>> *De:* Mohamad HARASTANI <moh...@so...> >>> <mailto:moh...@so...> >>> *Enviado:* quinta-feira, 3 de novembro de 2022 09:29 >>> *Para:* Mailing list for Scipion users >>> <sci...@li...> >>> <mailto:sci...@li...> >>> *Assunto:* Re: [scipion-users] scipion - singularity - HPC >>> Hello Helder, >>> >>> Have you taken a look at the host configuration here >>> (https://scipion-em.github.io/docs/release-3.0.0/docs/scipion-modes/host-configuration.html)? >>> >>> Best of luck, >>> Mohamad >>> >>> ------------------------------------------------------------------------ >>> *From: *"Helder Veras Ribeiro Filho" <hel...@ln...> >>> <mailto:hel...@ln...> >>> *To: *sci...@li... >>> <mailto:sci...@li...> >>> *Sent: *Wednesday, November 2, 2022 5:07:53 PM >>> *Subject: *[scipion-users] scipion - singularity - HPC >>> >>> Hello scipion group! >>> >>> I'm trying to launch Scipion from a singularity container in an HPC >>> with the slurm as a scheduler. The container works fine and I'm able >>> to execute Scipion routines correctly without using a queue. The >>> problem is when I try to send Scipion jobs using the queue in the >>> Scipion interface. I suppose that it is a slurm/singularity >>> configuration problem. >>> Could anyone who was successful in sending jobs to queue from a >>> singularity launched scipion help me with some tips? >>> >>> Best, >>> >>> Helder >>> >>> *Helder Veras Ribeiro Filho, PhD*** >>> Brazilian Biosciences National Laboratory - LNBio >>> Brazilian Center for Research in Energy and Materials - CNPEM >>> 10,000 Giuseppe Maximo Scolfaro St. >>> Campinas, SP - Brazil13083-100 >>> +55(19) 3512-1255 >>> >>> >>> Aviso Legal: Esta mensagem e seus anexos podem conter informações >>> confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo >>> e considere eventual consulta ao remetente antes de copiá-la, >>> divulgá-la ou distribuí-la. Se você recebeu esta mensagem por >>> engano, por favor avise o remetente e apague-a imediatamente. >>> >>> Disclaimer: This email and its attachments may contain confidential >>> and/or privileged information. Observe its content carefully and >>> consider possible querying to the sender before copying, disclosing >>> or distributing it. If you have received this email by mistake, >>> please notify the sender and delete it immediately. >>> >>> >>> >>> _______________________________________________ >>> scipion-users mailing list >>> sci...@li... >>> <mailto:sci...@li...> >>> https://lists.sourceforge.net/lists/listinfo/scipion-users >>> >>> >>> _______________________________________________ >>> scipion-users mailing list >>> sci...@li... <mailto:sci...@li...> >>> https://lists.sourceforge.net/lists/listinfo/scipion-users >> -- >> Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* >> >> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users > -- > Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users |