From: helder v. <hel...@ho...> - 2022-11-10 12:17:16
|
Hi Laura! Great!! Thank you! I'll take a look at these docker files! I'll be very helpful! ________________________________ De: Laura <lde...@cn...> Enviado: quinta-feira, 10 de novembro de 2022 07:03 Para: Mailing list for Scipion users <sci...@li...>; helder veras <hel...@ho...> Assunto: Re: [scipion-users] scipion - singularity - HPC Hi Helder, in our group we have not experience with singularity but I have prepared a similar installation with docker-slurm. Dockerfiles are here<https://github.com/I2PC/scipion-docker/>, one for master and one for worker, please have a look at the master-image/Dockerfile and the hosts.conf file, maybe it gives you some hint. Our installation works fine. best regards Laura On 10/11/22 12:33, helder veras wrote: Hi Pablo! Thank you for the suggestion! I've tested to execute the protocols command in the login node (the node I'm currently running the GUI) and tested to execute the sbatch file that calls the protocols command and that runs on the computing node. Both returned the relion plugins as expected. There're two things that I noted during the scipion installation in the singularity container that maybe could be related to this problem. 1. Since I'm installing scipion inside the container, I didn't use a conda environment. For some reason, is that conda environment necessary for the correct installation of scipion and plugins? 2. After installation, I noted some errors related to the scipion-pyworkflow and scipion-em. The modules were not found when executing the protocols. So I installed them directly from pip inside the container. Do you think that could be a problem? I've attached the singularity definition file (a text file), as you can see this file contains the commands that I used to install scipion and plugins inside the container. Best, Helder ________________________________ De: Pablo Conesa <pc...@cn...><mailto:pc...@cn...> Enviado: quinta-feira, 10 de novembro de 2022 03:38 Para: sci...@li...<mailto:sci...@li...> <sci...@li...><mailto:sci...@li...> Assunto: Re: [scipion-users] scipion - singularity - HPC This could happen if you have a different node installation where you are missing some plugins. I.e: You have relion plugin in the "login node", but it is not present in the computing node? One way to check is to compare the output of: scipion3 protocols from both machines. Are they equal? On 8/11/22 21:06, helder veras wrote: Hi Pablo! Thank you for your thoughts and comments! I tested to modify the hosts.conf file to execute the singularity from the sbatch scripts. It seems the previous errors were solved, but now I got this one: run.stdout: RUNNING PROTOCOL -----------------ESC[0m Protocol starts Hostname: gpu01.cnpem.local PID: 2044741 pyworkflow: 3.0.27 plugin: orphan currentDir: /home/helder.ribeiro/ScipionUserData/projects/scipion_teste workingDir: Runs/000641_ProtRelionClassify2D runMode: Restart MPI: 1 threads: 1 'NoneType' object has no attribute 'Plugin' LegacyProtocol installation couldn't be validated. Possible cause could be a configuration issue. Try to run scipion config. ESC[31mProtocol has validation errors: 'NoneType' object has no attribute 'Plugin' LegacyProtocol installation couldn't be validated. Possible cause could be a configuration issue. Try to run scipion config.ESC[0m ESC[32m------------------- PROTOCOL FAILED (DONE 0/0)ESC[0m Do you have any idea what could be causing this error? Best, Helder ________________________________ De: Pablo Conesa <pc...@cn...><mailto:pc...@cn...> Enviado: sexta-feira, 4 de novembro de 2022 03:12 Para: sci...@li...<mailto:sci...@li...> <sci...@li...><mailto:sci...@li...> Assunto: Re: [scipion-users] scipion - singularity - HPC Hi Helder! I got no experience with singularity, but I guess you have a cluster of singuarity nodes? Nodes should have scipion installed as well in the same way(paths) as the "login node". I guess the challenge here is to make slurm "talk to other singularity nodes"? Regarding the error ...: var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No such file or directory uniq: gpu01: No such file or directory python3: can't open file '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': [Errno 2] No such file or directory I only understand the last line (not sure if the first 2 are a consequence of the 3rd one). That path look correct, providing you have a "Scipion virtualenv" installation at /opt. On 3/11/22 19:36, helder veras wrote: Hi Mohamad! Thank you for your reply! Yes, I'm using the host configuration file to launch slurm. Let me provide more details about my issue: I built a singularity container that has scipion and all the required dependencies and programs installed as well. This container works fine and I tested it on a desktop machine and on an HPC node without the queue option as well. Programs inside scipion are correctly executed and everything works fine. To be able to launch scipion using the queue option with slurm I had to bind the slurm/murge paths to the container and export some paths (just as presented in https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container ) ( I also included slurm user on the container). By doing this Scipion was able to see the queue (which I changed in hosts.conf file) and successfully launch the job to the queue. The problem is that the sbatch script calls the pw_protocol_run.py that is inside the container, which raises the error in .err file: /var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No such file or directory uniq: gpu01: No such file or directory python3: can't open file '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': [Errno 2] No such file or directory I think the problem is that the slurm is trying to execute the script that is only available inside the container. Usage of Slurm within a Singularity Container - GWDG<https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container> Starting a singularity container is quite cumbersome for the operating system of the host and, in case too many requests to start a singularity container are received at the same time, it might fail. info.gwdg.de Best, Helder Ribeiro ________________________________ De: Mohamad HARASTANI <moh...@so...><mailto:moh...@so...> Enviado: quinta-feira, 3 de novembro de 2022 09:29 Para: Mailing list for Scipion users <sci...@li...><mailto:sci...@li...> Assunto: Re: [scipion-users] scipion - singularity - HPC Hello Helder, Have you taken a look at the host configuration here (https://scipion-em.github.io/docs/release-3.0.0/docs/scipion-modes/host-configuration.html)? Best of luck, Mohamad ________________________________ From: "Helder Veras Ribeiro Filho" <hel...@ln...><mailto:hel...@ln...> To: sci...@li...<mailto:sci...@li...> Sent: Wednesday, November 2, 2022 5:07:53 PM Subject: [scipion-users] scipion - singularity - HPC Hello scipion group! I'm trying to launch Scipion from a singularity container in an HPC with the slurm as a scheduler. The container works fine and I'm able to execute Scipion routines correctly without using a queue. The problem is when I try to send Scipion jobs using the queue in the Scipion interface. I suppose that it is a slurm/singularity configuration problem. Could anyone who was successful in sending jobs to queue from a singularity launched scipion help me with some tips? Best, Helder Helder Veras Ribeiro Filho, PhD Brazilian Biosciences National Laboratory - LNBio Brazilian Center for Research in Energy and Materials - CNPEM 10,000 Giuseppe Maximo Scolfaro St. Campinas, SP - Brazil 13083-100 +55(19) 3512-1255 Aviso Legal: Esta mensagem e seus anexos podem conter informações confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo e considere eventual consulta ao remetente antes de copiá-la, divulgá-la ou distribuí-la. Se você recebeu esta mensagem por engano, por favor avise o remetente e apague-a imediatamente. Disclaimer: This email and its attachments may contain confidential and/or privileged information. Observe its content carefully and consider possible querying to the sender before copying, disclosing or distributing it. If you have received this email by mistake, please notify the sender and delete it immediately. _______________________________________________ scipion-users mailing list sci...@li...<mailto:sci...@li...> https://lists.sourceforge.net/lists/listinfo/scipion-users _______________________________________________ scipion-users mailing list sci...@li...<mailto:sci...@li...> https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - Madrid Scipion<http://scipion.i2pc.es> team _______________________________________________ scipion-users mailing list sci...@li...<mailto:sci...@li...> https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - Madrid Scipion<http://scipion.i2pc.es> team _______________________________________________ scipion-users mailing list sci...@li...<mailto:sci...@li...> https://lists.sourceforge.net/lists/listinfo/scipion-users |