From: Pablo C. <pc...@cn...> - 2022-11-10 19:12:16
|
Hi Helder! Good job, and thank you for sharing the singularity file. If you have a public url where (like a github repository) we can link to it in our site under the "Scipion ecosystem" menu. I'm glad it worked. All the best, Pablo On 10/11/22 19:14, helder veras wrote: > Hi Pablo! > > Your comment about the virtualenv was great! The container was not > accessing the correct python. I just exported the right scipion python > in the container and now everything is working!! > So now I'm executing the container from a GUI node and I'm able to > launch the scipion interface. After setting up the protocols, I could > successfully launch the job to the target queues. > > I've attached the new singularity definition file just in case someone > would be interested in testing/using it (I'll also improve the command > lines to make it easier for the users to launch the container in the > cluster and I can share it) > > Thank you (and the others) so much for all the help! > > Best, > > Helder > > ------------------------------------------------------------------------ > *De:* Pablo Conesa <pc...@cn...> > *Enviado:* quinta-feira, 10 de novembro de 2022 07:36 > *Para:* sci...@li... > <sci...@li...> > *Assunto:* Re: [scipion-users] scipion - singularity - HPC > > It may be related... here are some comments... > > > when you run: > > > python3 -m scipioninstaller -noXmipp -noAsk /opt > > > the installer is going to create an environment (conda environemnt if > conda is found otherwise virtualenv, log should give you a hint) for > scipion. We do not want to use System's python. Therefore, my guess is > that there is an scipion environment and later installp command should > be fine. > > > What it does not make sense is to: > > > pip install scipion-pyworkflow scipion-em > > > This is going to your system python, probably having an "empty" > scipion core installation in the system. This is probably what brings > the "Legacy" error. > > > Your call to the GPU node is ending up happening in the "empty > installation" on the system's python. > > > So, how through slurm, the call should end up in the right environment > on the node? > > > I think our calls have always an absolute path to the python of the > environment. This is meant to "define" the environment to use. > > > Does this work for the case of a virtualenv environment? Not sure. > > > Also, how environment variables are passed from slurm to the node may > be affecting. (note, I'm not an expert in Slurm). > > > First thing I'd do would be to verify which python is being called in > the node: system's one with pyworkflow and scipion-em or the correct > one created by the installer. > > > Can you see the command sent to slurm and received by the node? > > I'd also check if the installer created and environment (conda? or > virtualenv one?) > > > Let's hunt this! ;-) > > > > On 10/11/22 12:33, helder veras wrote: >> Hi Pablo! >> >> Thank you for the suggestion! >> >> I've tested to execute the protocols command in the login node (the >> node I'm currently running the GUI) and tested to execute the sbatch >> file that calls the protocols command and that runs on the computing >> node. Both returned the relion plugins as expected. >> There're two things that I noted during the scipion installation in >> the singularity container that maybe could be related to this problem. >> >> 1. Since I'm installing scipion inside the container, I didn't use a >> conda environment. For some reason, is that conda environment >> necessary for the correct installation of scipion and plugins? >> 2. After installation, I noted some errors related to the >> scipion-pyworkflow and scipion-em. The modules were not found >> when executing the protocols. So I installed them directly from >> pip inside the container. Do you think that could be a problem? >> >> I've attached the singularity definition file (a text file), as you >> can see this file contains the commands that I used to install >> scipion and plugins inside the container. >> >> Best, >> >> Helder >> ------------------------------------------------------------------------ >> *De:* Pablo Conesa <pc...@cn...> <mailto:pc...@cn...> >> *Enviado:* quinta-feira, 10 de novembro de 2022 03:38 >> *Para:* sci...@li... >> <mailto:sci...@li...> >> <sci...@li...> >> <mailto:sci...@li...> >> *Assunto:* Re: [scipion-users] scipion - singularity - HPC >> >> This could happen if you have a different node installation where you >> are missing some plugins. >> >> >> I.e: You have relion plugin in the "login node", but it is not >> present in the computing node? >> >> >> One way to check is to compare the output of: >> >> >> scipion3 protocols >> >> >> from both machines. Are they equal? >> >> >> On 8/11/22 21:06, helder veras wrote: >>> Hi Pablo! >>> >>> Thank you for your thoughts and comments! >>> >>> I tested to modify the hosts.conf file to execute the singularity >>> from the sbatch scripts. It seems the previous errors were solved, >>> but now I got this one: >>> >>> *run.stdout:* >>> RUNNING PROTOCOL -----------------ESC[0m >>> Protocol starts >>> Hostname: gpu01.cnpem.local >>> PID: 2044741 >>> pyworkflow: 3.0.27 >>> plugin: orphan >>> currentDir: /home/helder.ribeiro/ScipionUserData/projects/scipion_teste >>> workingDir: Runs/000641_ProtRelionClassify2D >>> runMode: Restart >>> MPI: 1 >>> threads: 1 >>> 'NoneType' object has no attribute 'Plugin' LegacyProtocol >>> installation couldn't be validated. Possible cause could be a >>> configuration issue. Try to run scipion config. >>> ESC[31mProtocol has validation errors: >>> 'NoneType' object has no attribute 'Plugin' LegacyProtocol >>> installation couldn't be validated. Possible cause could be a >>> configuration issue. Try to run scipion config.ESC[0m >>> ESC[32m------------------- PROTOCOL FAILED (DONE 0/0)ESC[0m >>> >>> Do you have any idea what could be causing this error? >>> >>> Best, >>> >>> Helder >>> ------------------------------------------------------------------------ >>> *De:* Pablo Conesa <pc...@cn...> <mailto:pc...@cn...> >>> *Enviado:* sexta-feira, 4 de novembro de 2022 03:12 >>> *Para:* sci...@li... >>> <mailto:sci...@li...> >>> <sci...@li...> >>> <mailto:sci...@li...> >>> *Assunto:* Re: [scipion-users] scipion - singularity - HPC >>> >>> Hi Helder! >>> >>> >>> I got no experience with singularity, but I guess you have a cluster >>> of singuarity nodes? >>> >>> >>> Nodes should have scipion installed as well in the same way(paths) >>> as the "login node". >>> >>> >>> I guess the challenge here is to make slurm "talk to other >>> singularity nodes"? >>> >>> >>> Regarding the error ...: >>> >>> >>> var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No >>> such file or directory >>> >>> uniq: gpu01: No such file or directory >>> python3: can't open file >>> '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': >>> [Errno 2] No such file or directory >>> >>> I only understand the last line (not sure if the first 2 are a >>> consequence of the 3rd one). That path look correct, providing you >>> have a "Scipion virtualenv" installation at /opt. >>> >>> >>> On 3/11/22 19:36, helder veras wrote: >>>> Hi Mohamad! >>>> >>>> Thank you for your reply! Yes, I'm using the host configuration >>>> file to launch slurm. >>>> Let me provide more details about my issue: >>>> >>>> I built a singularity container that has scipion and all the >>>> required dependencies and programs installed as well. This >>>> container works fine and I tested it on a desktop machine and on an >>>> HPC node without the queue option as well. Programs inside scipion >>>> are correctly executed and everything works fine. >>>> To be able to launch scipion using the queue option with slurm I >>>> had to bind the slurm/murge paths to the container and export some >>>> paths (just as presented in >>>> https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container >>>> <https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container> ) >>>> ( I also included slurm user on the container). By doing this >>>> Scipion was able to see the queue (which I changed in hosts.conf >>>> file) and successfully launch the job to the queue. The problem is >>>> that the sbatch script calls the pw_protocol_run.py that is inside >>>> the container, which raises the error in .err file: >>>> >>>> /var/spool/slurmd/slurmd/job02147/slurm_script: line 28: gpu01: No >>>> such file or directory >>>> uniq: gpu01: No such file or directory >>>> python3: can't open file >>>> '/opt/.scipion3/lib/python3.8/site-packages/pyworkflow/apps/pw_protocol_run.py': >>>> [Errno 2] No such file or directory >>>> >>>> I think the problem is that the slurm is trying to execute the >>>> script that is only available inside the container. >>>> Usage of Slurm within a Singularity Container - GWDG >>>> <https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:usage_of_slurm_within_a_singularity_container> >>>> Starting a singularity container is quite cumbersome for the >>>> operating system of the host and, in case too many requests to >>>> start a singularity container are received at the same time, it >>>> might fail. >>>> info.gwdg.de >>>> >>>> >>>> Best, >>>> >>>> Helder Ribeiro >>>> >>>> ------------------------------------------------------------------------ >>>> *De:* Mohamad HARASTANI <moh...@so...> >>>> <mailto:moh...@so...> >>>> *Enviado:* quinta-feira, 3 de novembro de 2022 09:29 >>>> *Para:* Mailing list for Scipion users >>>> <sci...@li...> >>>> <mailto:sci...@li...> >>>> *Assunto:* Re: [scipion-users] scipion - singularity - HPC >>>> Hello Helder, >>>> >>>> Have you taken a look at the host configuration here >>>> (https://scipion-em.github.io/docs/release-3.0.0/docs/scipion-modes/host-configuration.html >>>> <https://scipion-em.github.io/docs/release-3.0.0/docs/scipion-modes/host-configuration.html>)? >>>> >>>> Best of luck, >>>> Mohamad >>>> >>>> ------------------------------------------------------------------------ >>>> *From: *"Helder Veras Ribeiro Filho" >>>> <hel...@ln...> <mailto:hel...@ln...> >>>> *To: *sci...@li... >>>> <mailto:sci...@li...> >>>> *Sent: *Wednesday, November 2, 2022 5:07:53 PM >>>> *Subject: *[scipion-users] scipion - singularity - HPC >>>> >>>> Hello scipion group! >>>> >>>> I'm trying to launch Scipion from a singularity container in an HPC >>>> with the slurm as a scheduler. The container works fine and I'm >>>> able to execute Scipion routines correctly without using a queue. >>>> The problem is when I try to send Scipion jobs using the queue in >>>> the Scipion interface. I suppose that it is a slurm/singularity >>>> configuration problem. >>>> Could anyone who was successful in sending jobs to queue from a >>>> singularity launched scipion help me with some tips? >>>> >>>> Best, >>>> >>>> Helder >>>> >>>> *Helder Veras Ribeiro Filho, PhD*** >>>> Brazilian Biosciences National Laboratory - LNBio >>>> Brazilian Center for Research in Energy and Materials - CNPEM >>>> 10,000 Giuseppe Maximo Scolfaro St. >>>> Campinas, SP - Brazil13083-100 >>>> +55(19) 3512-1255 >>>> >>>> >>>> Aviso Legal: Esta mensagem e seus anexos podem conter informações >>>> confidenciais e/ou de uso restrito. Observe atentamente seu >>>> conteúdo e considere eventual consulta ao remetente antes de >>>> copiá-la, divulgá-la ou distribuí-la. Se você recebeu esta mensagem >>>> por engano, por favor avise o remetente e apague-a imediatamente. >>>> >>>> Disclaimer: This email and its attachments may contain confidential >>>> and/or privileged information. Observe its content carefully and >>>> consider possible querying to the sender before copying, disclosing >>>> or distributing it. If you have received this email by mistake, >>>> please notify the sender and delete it immediately. >>>> >>>> >>>> >>>> _______________________________________________ >>>> scipion-users mailing list >>>> sci...@li... >>>> <mailto:sci...@li...> >>>> https://lists.sourceforge.net/lists/listinfo/scipion-users >>>> <https://lists.sourceforge.net/lists/listinfo/scipion-users> >>>> >>>> >>>> _______________________________________________ >>>> scipion-users mailing list >>>> sci...@li... <mailto:sci...@li...> >>>> https://lists.sourceforge.net/lists/listinfo/scipion-users <https://lists.sourceforge.net/lists/listinfo/scipion-users> >>> -- >>> Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* >>> >>> >>> _______________________________________________ >>> scipion-users mailing list >>> sci...@li... <mailto:sci...@li...> >>> https://lists.sourceforge.net/lists/listinfo/scipion-users <https://lists.sourceforge.net/lists/listinfo/scipion-users> >> -- >> Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* >> >> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... <mailto:sci...@li...> >> https://lists.sourceforge.net/lists/listinfo/scipion-users <https://lists.sourceforge.net/lists/listinfo/scipion-users> > -- > Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users -- Pablo Conesa - *Madrid Scipion <http://scipion.i2pc.es> team* |