From: Pablo C. <pc...@cn...> - 2022-11-10 08:39:02
|
This could happen if you have a different node installation where you are missing some plugins. I.e: You have relion plugin in the "login node", but it is not present in the computing node? One way to check is to compare the output of: scipion3 protocols from both machines. Are they equal? On 8/11/22 21:06, helder veras wrote: > Hi Pablo! > > Thank you for your thoughts and comments! > > I tested to modify the hosts.conf file to execute the singularity from > the sbatch scripts. It seems the previous errors were solved, but now > I got this one: > > *run.stdout:* > RUNNING PROTOCOL -----------------ESC[0m > Protocol starts > Hostname: gpu01.cnpem.local > PID: 2044741 > pyworkflow: 3.0.27 > plugin: orphan > currentDir: /home/helder.ribeiro/ScipionUserData/projects/scipion_teste > workingDir: Runs/000641_ProtRelionClassify2D > runMode: Restart > MPI: 1 > threads: 1 > 'NoneType' object has no attribute 'Plugin' LegacyProtocol > installation couldn't be validated. Possible cause could be a > configuration issue. Try to run scipion config. > ESC[31mProtocol has validation errors: > 'NoneType' object has no attribute 'Plugin' LegacyProtocol > installation couldn't be validated. Possible cause could be a > configuration issue. Try to run scipion config.ESC[0m > ESC[32m------------------- PROTOCOL FAILED (DONE 0/0)ESC[0m > > Do you have any idea what could be causing this error? > > Best, > > Helder > ------------------------------------------------------------------------ > *De:* Pablo Conesa <pc...@cn...> > *Enviado:* sexta-feira, 4 de novembro de 2022 03:12 > *Para:* sci...@li... > <sci...@li...> > *Assunto:* Re: [scipion-users] scipion - singularity - HPC > > Hi Helder! > > > I got no experience with singularity, but I guess you have a cluster > of singuarity nodes? > > > Nodes should have scipion installed as well in the same way(paths) as > the "login node". > > > I guess the challenge here is to make slurm "talk to other singularity > nodes"? > > > Regarding the error ...: > > > var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No such > file or directory > > uniq: gpu01: No such file or directory > python3: can't open file > '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': > [Errno 2] No such file or directory > > I only understand the last line (not sure if the first 2 are a > consequence of the 3rd one). That path look correct, providing you > have a "Scipion virtualenv" installation at /opt. > > > On 3/11/22 19:36, helder veras wrote: >> Hi Mohamad! >> >> Thank you for your reply! Yes, I'm using the host configuration file >> to launch slurm. >> Let me provide more details about my issue: >> >> I built a singularity container that has scipion and all the required >> dependencies and programs installed as well. This container works >> fine and I tested it on a desktop machine and on an HPC node without >> the queue option as well. Programs inside scipion are correctly >> executed and everything works fine. >> To be able to launch scipion using the queue option with slurm I had >> to bind the slurm/murge paths to the container and export some paths >> (just as presented in >> https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container >> <https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container> ) >> ( I also included slurm user on the container). By doing this Scipion >> was able to see the queue (which I changed in hosts.conf file) and >> successfully launch the job to the queue. The problem is that the >> sbatch script calls the pw_protocol_run.py that is inside the >> container, which raises the error in .err file: >> >> /var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No >> such file or directory >> uniq: gpu01: No such file or directory >> python3: can't open file >> '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': >> [Errno 2] No such file or directory >> >> I think the problem is that the slurm is trying to execute the script >> that is only available inside the container. >> Usage of Slurm within a Singularity Container - GWDG >> <https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container> >> Starting a singularity container is quite cumbersome for the >> operating system of the host and, in case too many requests to start >> a singularity container are received at the same time, it might fail. >> info.gwdg.de >> >> >> Best, >> >> Helder Ribeiro >> >> ------------------------------------------------------------------------ >> *De:* Mohamad HARASTANI <moh...@so...> >> <mailto:moh...@so...> >> *Enviado:* quinta-feira, 3 de novembro de 2022 09:29 >> *Para:* Mailing list for Scipion users >> <sci...@li...> >> <mailto:sci...@li...> >> *Assunto:* Re: [scipion-users] scipion - singularity - HPC >> Hello Helder, >> >> Have you taken a look at the host configuration here >> (https://scipion-em.github.io/docs/release-3.0.0/docs/scipion-modes/host-configuration.html >> <https://scipion-em.github.io/docs/release-3.0.0/docs/scipion-modes/host-configuration.html>)? >> >> Best of luck, >> Mohamad >> >> ------------------------------------------------------------------------ >> *From: *"Helder Veras Ribeiro Filho" <hel...@ln...> >> <mailto:hel...@ln...> >> *To: *sci...@li... >> <mailto:sci...@li...> >> *Sent: *Wednesday, November 2, 2022 5:07:53 PM >> *Subject: *[scipion-users] scipion - singularity - HPC >> >> Hello scipion group! >> >> I'm trying to launch Scipion from a singularity container in an HPC >> with the slurm as a scheduler. The container works fine and I'm able >> to execute Scipion routines correctly without using a queue. The >> problem is when I try to send Scipion jobs using the queue in the >> Scipion interface. I suppose that it is a slurm/singularity >> configuration problem. >> Could anyone who was successful in sending jobs to queue from a >> singularity launched scipion help me with some tips? >> >> Best, >> >> Helder >> >> *Helder Veras Ribeiro Filho, PhD*** >> Brazilian Biosciences National Laboratory - LNBio >> Brazilian Center for Research in Energy and Materials - CNPEM >> 10,000 Giuseppe Maximo Scolfaro St. >> Campinas, SP - Brazil13083-100 >> +55(19) 3512-1255 >> >> >> Aviso Legal: Esta mensagem e seus anexos podem conter informações >> confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo >> e considere eventual consulta ao remetente antes de copiá-la, >> divulgá-la ou distribuí-la. Se você recebeu esta mensagem por engano, >> por favor avise o remetente e apague-a imediatamente. >> >> Disclaimer: This email and its attachments may contain confidential >> and/or privileged information. Observe its content carefully and >> consider possible querying to the sender before copying, disclosing >> or distributing it. If you have received this email by mistake, >> please notify the sender and delete it immediately. >> >> >> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users >> <https://lists.sourceforge.net/lists/listinfo/scipion-users> >> >> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users <https://lists.sourceforge.net/lists/listinfo/scipion-users> > -- > Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* |