From: Pablo C. <pc...@cn...> - 2022-11-10 12:37:03
|
It may be related... here are some comments... when you run: python3 -m scipioninstaller -noXmipp -noAsk /opt the installer is going to create an environment (conda environemnt if conda is found otherwise virtualenv, log should give you a hint) for scipion. We do not want to use System's python. Therefore, my guess is that there is an scipion environment and later installp command should be fine. What it does not make sense is to: pip install scipion-pyworkflow scipion-em This is going to your system python, probably having an "empty" scipion core installation in the system. This is probably what brings the "Legacy" error. Your call to the GPU node is ending up happening in the "empty installation" on the system's python. So, how through slurm, the call should end up in the right environment on the node? I think our calls have always an absolute path to the python of the environment. This is meant to "define" the environment to use. Does this work for the case of a virtualenv environment? Not sure. Also, how environment variables are passed from slurm to the node may be affecting. (note, I'm not an expert in Slurm). First thing I'd do would be to verify which python is being called in the node: system's one with pyworkflow and scipion-em or the correct one created by the installer. Can you see the command sent to slurm and received by the node? I'd also check if the installer created and environment (conda? or virtualenv one?) Let's hunt this! ;-) On 10/11/22 12:33, helder veras wrote: > Hi Pablo! > > Thank you for the suggestion! > > I've tested to execute the protocols command in the login node (the > node I'm currently running the GUI) and tested to execute the sbatch > file that calls the protocols command and that runs on the computing > node. Both returned the relion plugins as expected. > There're two things that I noted during the scipion installation in > the singularity container that maybe could be related to this problem. > > 1. Since I'm installing scipion inside the container, I didn't use a > conda environment. For some reason, is that conda environment > necessary for the correct installation of scipion and plugins? > 2. After installation, I noted some errors related to the > scipion-pyworkflow and scipion-em. The modules were not found when > executing the protocols. So I installed them directly from pip > inside the container. Do you think that could be a problem? > > I've attached the singularity definition file (a text file), as you > can see this file contains the commands that I used to install scipion > and plugins inside the container. > > Best, > > Helder > ------------------------------------------------------------------------ > *De:* Pablo Conesa <pc...@cn...> > *Enviado:* quinta-feira, 10 de novembro de 2022 03:38 > *Para:* sci...@li... > <sci...@li...> > *Assunto:* Re: [scipion-users] scipion - singularity - HPC > > This could happen if you have a different node installation where you > are missing some plugins. > > > I.e: You have relion plugin in the "login node", but it is not present > in the computing node? > > > One way to check is to compare the output of: > > > scipion3 protocols > > > from both machines. Are they equal? > > > On 8/11/22 21:06, helder veras wrote: >> Hi Pablo! >> >> Thank you for your thoughts and comments! >> >> I tested to modify the hosts.conf file to execute the singularity >> from the sbatch scripts. It seems the previous errors were solved, >> but now I got this one: >> >> *run.stdout:* >> RUNNING PROTOCOL -----------------ESC[0m >> Protocol starts >> Hostname: gpu01.cnpem.local >> PID: 2044741 >> pyworkflow: 3.0.27 >> plugin: orphan >> currentDir: /home/helder.ribeiro/ScipionUserData/projects/scipion_teste >> workingDir: Runs/000641_ProtRelionClassify2D >> runMode: Restart >> MPI: 1 >> threads: 1 >> 'NoneType' object has no attribute 'Plugin' LegacyProtocol >> installation couldn't be validated. Possible cause could be a >> configuration issue. Try to run scipion config. >> ESC[31mProtocol has validation errors: >> 'NoneType' object has no attribute 'Plugin' LegacyProtocol >> installation couldn't be validated. Possible cause could be a >> configuration issue. Try to run scipion config.ESC[0m >> ESC[32m------------------- PROTOCOL FAILED (DONE 0/0)ESC[0m >> >> Do you have any idea what could be causing this error? >> >> Best, >> >> Helder >> ------------------------------------------------------------------------ >> *De:* Pablo Conesa <pc...@cn...> <mailto:pc...@cn...> >> *Enviado:* sexta-feira, 4 de novembro de 2022 03:12 >> *Para:* sci...@li... >> <mailto:sci...@li...> >> <sci...@li...> >> <mailto:sci...@li...> >> *Assunto:* Re: [scipion-users] scipion - singularity - HPC >> >> Hi Helder! >> >> >> I got no experience with singularity, but I guess you have a cluster >> of singuarity nodes? >> >> >> Nodes should have scipion installed as well in the same way(paths) as >> the "login node". >> >> >> I guess the challenge here is to make slurm "talk to other >> singularity nodes"? >> >> >> Regarding the error ...: >> >> >> var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No >> such file or directory >> >> uniq: gpu01: No such file or directory >> python3: can't open file >> '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': >> [Errno 2] No such file or directory >> >> I only understand the last line (not sure if the first 2 are a >> consequence of the 3rd one). That path look correct, providing you >> have a "Scipion virtualenv" installation at /opt. >> >> >> On 3/11/22 19:36, helder veras wrote: >>> Hi Mohamad! >>> >>> Thank you for your reply! Yes, I'm using the host configuration file >>> to launch slurm. >>> Let me provide more details about my issue: >>> >>> I built a singularity container that has scipion and all the >>> required dependencies and programs installed as well. This container >>> works fine and I tested it on a desktop machine and on an HPC node >>> without the queue option as well. Programs inside scipion are >>> correctly executed and everything works fine. >>> To be able to launch scipion using the queue option with slurm I had >>> to bind the slurm/murge paths to the container and export some paths >>> (just as presented in >>> https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container >>> <https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container> ) >>> ( I also included slurm user on the container). By doing this >>> Scipion was able to see the queue (which I changed in hosts.conf >>> file) and successfully launch the job to the queue. The problem is >>> that the sbatch script calls the pw_protocol_run.py that is inside >>> the container, which raises the error in .err file: >>> >>> /var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No >>> such file or directory >>> uniq: gpu01: No such file or directory >>> python3: can't open file >>> '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': >>> [Errno 2] No such file or directory >>> >>> I think the problem is that the slurm is trying to execute the >>> script that is only available inside the container. >>> Usage of Slurm within a Singularity Container - GWDG >>> <https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container> >>> Starting a singularity container is quite cumbersome for the >>> operating system of the host and, in case too many requests to start >>> a singularity container are received at the same time, it might fail. >>> info.gwdg.de >>> >>> >>> Best, >>> >>> Helder Ribeiro >>> >>> ------------------------------------------------------------------------ >>> *De:* Mohamad HARASTANI <moh...@so...> >>> <mailto:moh...@so...> >>> *Enviado:* quinta-feira, 3 de novembro de 2022 09:29 >>> *Para:* Mailing list for Scipion users >>> <sci...@li...> >>> <mailto:sci...@li...> >>> *Assunto:* Re: [scipion-users] scipion - singularity - HPC >>> Hello Helder, >>> >>> Have you taken a look at the host configuration here >>> (https://scipion-em.github.io/docs/release-3.0.0/docs/scipion-modes/host-configuration.html >>> <https://scipion-em.github.io/docs/release-3.0.0/docs/scipion-modes/host-configuration.html>)? >>> >>> Best of luck, >>> Mohamad >>> >>> ------------------------------------------------------------------------ >>> *From: *"Helder Veras Ribeiro Filho" <hel...@ln...> >>> <mailto:hel...@ln...> >>> *To: *sci...@li... >>> <mailto:sci...@li...> >>> *Sent: *Wednesday, November 2, 2022 5:07:53 PM >>> *Subject: *[scipion-users] scipion - singularity - HPC >>> >>> Hello scipion group! >>> >>> I'm trying to launch Scipion from a singularity container in an HPC >>> with the slurm as a scheduler. The container works fine and I'm able >>> to execute Scipion routines correctly without using a queue. The >>> problem is when I try to send Scipion jobs using the queue in the >>> Scipion interface. I suppose that it is a slurm/singularity >>> configuration problem. >>> Could anyone who was successful in sending jobs to queue from a >>> singularity launched scipion help me with some tips? >>> >>> Best, >>> >>> Helder >>> >>> *Helder Veras Ribeiro Filho, PhD*** >>> Brazilian Biosciences National Laboratory - LNBio >>> Brazilian Center for Research in Energy and Materials - CNPEM >>> 10,000 Giuseppe Maximo Scolfaro St. >>> Campinas, SP - Brazil13083-100 >>> +55(19) 3512-1255 >>> >>> >>> Aviso Legal: Esta mensagem e seus anexos podem conter informações >>> confidenciais e/ou de uso restrito. Observe atentamente seu conteúdo >>> e considere eventual consulta ao remetente antes de copiá-la, >>> divulgá-la ou distribuí-la. Se você recebeu esta mensagem por >>> engano, por favor avise o remetente e apague-a imediatamente. >>> >>> Disclaimer: This email and its attachments may contain confidential >>> and/or privileged information. Observe its content carefully and >>> consider possible querying to the sender before copying, disclosing >>> or distributing it. If you have received this email by mistake, >>> please notify the sender and delete it immediately. >>> >>> >>> >>> _______________________________________________ >>> scipion-users mailing list >>> sci...@li... >>> <mailto:sci...@li...> >>> https://lists.sourceforge.net/lists/listinfo/scipion-users >>> <https://lists.sourceforge.net/lists/listinfo/scipion-users> >>> >>> >>> _______________________________________________ >>> scipion-users mailing list >>> sci...@li... <mailto:sci...@li...> >>> https://lists.sourceforge.net/lists/listinfo/scipion-users <https://lists.sourceforge.net/lists/listinfo/scipion-users> >> -- >> Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* >> >> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users <https://lists.sourceforge.net/lists/listinfo/scipion-users> > -- > Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* |