From: Grigory S. <sha...@gm...> - 2021-01-19 11:37:26
|
Hi Yangyang, happy new year! Let's try a step-by-step approach. First, you might check that xmipp was installed successfully by importing any 3D volume and trying to display results. If that fails then xmipp compilation was not successful. Second, does the same job you are trying to run, runs locally (without submission to slurm cluster) with less mpis/threads? This will sort out MPI-related issues. Third, what is the error you are getting with slurm (I didn't notice anything wrong with the config)? You can attach run.stdout to the email. Best regards, Grigory -------------------------------------------------------------------------------- Grigory Sharov, Ph.D. MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge CB2 0QH, UK. tel. +44 (0) 1223 267228 <+44%201223%20267228> e-mail: gs...@mr... On Tue, Jan 19, 2021 at 9:12 AM Yangyang Yi <yy...@si...> wrote: > Thanks for your reply. Happy new year. I see. I tried to install > scipion-em-xmipp again, it has reported a successful install, but when I > submit job via slurm, I failed to run. Also, I tried to run a small job on > headnode (only system since our headnode do not have GPU cards), and it > works. Maybe something related to my config files, especially my host.conf? > Here’s my config files: > host.conf > [localhost] > PARALLEL_COMMAND = mpirun -np %_(JOB_NODES)d %_(COMMAND)s > NAME = SCIPION_SLURM > MANDATORY = False > SUBMIT_COMMAND = sbatch %_(JOB_SCRIPT)s > CANCEL_COMMAND = scancel %_(JOB_ID)s > CHECK_COMMAND = squeue -h -j %_(JOB_ID)s > SUBMIT_TEMPLATE = #!/bin/bash > ### Inherit all current environment variables > #SBATCH --export=ALL > ### Job name > #SBATCH -J %_(JOB_NAME)s > ### Queue name > #SBATCH -p %_(JOB_QUEUE)s > ### Standard output and standard error messages > #SBATCH -o %_(JOB_LOGS)s.out > #SBATCH -e %_(JOB_LOGS)s.err > ### Specify the number of nodes and thread (ppn) for your job. > #SBATCH --ntasks=%_(JOB_NODES)d > #SBATCH --cpus-per-task=%_(JOB_THREADS)d > #SBATCH --mem=%_(JOB_MEMORY)s > #SBATCH --gres=gpu:%_(GPU_COUNT)s > ### Tell PBS the anticipated run-time for your job, where > walltime=HH:MM:SS > #SBATCH --time=%_(JOB_TIME)s:00:00 > # Use as working dir the path where qsub was launched > WORKDIR=$SLURM_SUBMIT_DIR > ################################# > ### Set environment variable to know running mode is non interactive > export XMIPP_IN_QUEUE=1 > ### Switch to the working directory; > cd $WORKDIR > # Make a copy of PBS_NODEFILE > cp $SLURM_JOB_NODELIST > %_(JOB_NODEFILE)s > # Calculate the number of processors allocated to this run. > NPROCS=`wc -l < $SLURM_JOB_NODELIST` > # Calculate the number of nodes allocated. > ###NNODES=`uniq $SLURM_JOB_NODELIST | wc -l` > ### Display the job context > echo Running on host `hostname` > echo Time is `date` > echo Working directory is `pwd` > ###echo Using ${NPROCS} processors across ${NNODES} nodes > echo NODE LIST - config: > echo $SLURM_JOB_NODELIST > echo CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES > module load cuda-10.1 > module load impi-2019.4 > module load relion-3.1.0 > module load java-1.8.0 > ################################# > # echo '%_(JOB_COMMAND)s' >> /tmp/slurm-jobs.log > %_(JOB_COMMAND)s > ###find "$SLURM_SUBMIT_DIR" -type f -user $USER -perm 644 -exec chmod > 664 {} + > ; #Next variable is used to provide a regex to check if a job is finished > on a queue system > QUEUES = {"gpu": [["JOB_MEMORY", "8192", "Memory (MB)", "Select amount of > memory (in megabytes) for this job"], > ["JOB_TIME", "48", "Time (hours)", "Select the time expected (in > hours) for this job"], > ["GPU_COUNT", "8", "Number of GPUs", "Select the number of GPUs if > protocol has been set up to use them"], > ["QUEUE_FOR_JOBS", "N", "Use queue for jobs", "Send individual > jobs to queue"]]} > scipion.conf: > [PYWORKFLOW] > CONDA_ACTIVATION_CMD = eval "$(/share/apps/software/anconda/bin/conda > shell.bash hook)" > SCIPION_DOMAIN = pwem > SCIPION_FONT_NAME = Helvetica > SCIPION_LOG = ~/ScipionUserData/logs/scipion.log > SCIPION_LOGO = scipion_logo.gif > SCIPION_LOGS = ~/ScipionUserData/logs > SCIPION_NOTES_FILE = notes.txt > SCIPION_NOTIFY = True > SCIPION_PLUGIN_REPO_URL = http://scipion.i2pc.es/getplugins/ > SCIPION_SOFTWARE = ${SCIPION_HOME}/software > SCIPION_SUPPORT_EMAIL = sc...@cn... > SCIPION_TESTS = ${SCIPION_HOME}/data/tests > SCIPION_TESTS_CMD = scipion3 tests > SCIPION_TESTS_OUTPUT = ~/ScipionUserData/Tests > SCIPION_TMP = ~/ScipionUserData/tmp > SCIPION_URL = http://scipion.cnb.csic.es/downloads/scipion > SCIPION_URL_SOFTWARE = > http://scipion.cnb.csic.es/downloads/scipion/software > SCIPION_URL_TESTDATA = > http://scipion.cnb.csic.es/downloads/scipion/data/tests > SCIPION_USER_DATA = ~/ScipionUserData > WIZARD_MASK_COLOR = [0.125, 0.909, 0.972] > SCIPION_NOTES_ARGS = > SCIPION_NOTES_PROGRAM = > > [PLUGINS] > EM_ROOT = software/em > MAXIT_HOME = %(EM_ROOT)s/maxit-10.1 > XMIPP_HOME = %(EM_ROOT)s/xmipp > CUDA_BIN = /share/apps/software/cuda-10.1/bin > CUDA_LIB = /share/apps/software/cuda-10.1/lib64 > CHIMERA_HOME = %(EM_ROOT)s/chimerax-1.1 > GCTF_HOME = %(EM_ROOT)s/Gctf_v1.18 > GAUTOMATCH = Gautomatch_v0.56_sm30-75_cu10.1 > MOTIONCOR2_CUDA_LIB = /share/apps/software/cuda-10.1/lib64 > RELION_HOME = software/em/relion-3.1.0 > RESMAP = ResMap-1.95-cuda-Centos7x64 > GCTF_CUDA_LIB = /share/apps/software/cuda-10.1/lib64 > MOTIONCOR2_BIN = MotionCor2_1.4.0_Cuda101 > CRYOSPARC_HOME = > RESMAP_GPU_LIB = ResMap_krnl-cuda-V8.0.61-sm60_gpu.so > CRYO_PROJECTS_DIR = scipion_projects > GAUTOMATCH_HOME = software/em/gautomatch-0.56 > RELION_CUDA_LIB = /share/apps/software/cuda-10.1/lib64 > RELION_CUDA_BIN = /share/apps/software/cuda-10.1/bin > CTFFIND4_HOME = software/em/ctffind4-4.1.14 > RESMAP_HOME = software/em/resmap-1.95 > GAUTOMATCH_CUDA_LIB = /share/apps/software/cuda-10.1/lib64 > GCTF = Gctf_v1.18_sm30-75_cu10.1 > CISTEM_HOME = software/em/cistem-1.0.0-beta > MOTIONCOR2_HOME = software/em/motioncor2-1.4.0 > > 2020年12月30日 下午4:50,Grigory Sharov <sha...@gm...> 写道: > > Hi Yangyang, the line > 00016: GHOST in place, read call ignored > means that you don't have xmipp installed which is still required for all > basic operations > > Grigory . > > > On Wed, Dec 30, 2020, 08:44 Yangyang Yi <yy...@si...> wrote: > >> Hi, >> Happy Holiday! >> Thanks for the help in last email about Scipion-2.0 cluster config. >> I have solved the problem, since I used “Use queue for jobs:Y” instead of >> "N", and some plugins only download partly. I edited those parameter and >> reinstalled some of the plugin, now it works. >> However, our users now prefer scipion-3.0 and I found new errors >> with similar settings. Now all the motion correction job works and result >> files in extra directory (micrographs and shifts) and generated, however, I >> failed to run CTF after motion correction. I have imported some motion >> correction micrographs but still failed to run CTF (both Gctf and CTFFIND). >> The error is similar as below: >> >> CTFFIND: >> 00002: Hostname: gpu04 >> 00003: PID: 138990 >> 00004: pyworkflow: 3.0.8 >> 00005: plugin: cistem >> 00006: plugin v: 3.0.9 >> 00007: currentDir: >> /share/home/biotest/stu01/ScipionUserData/projects/test >> 00008: workingDir: Runs/001767_CistemProtCTFFind >> 00009: runMode: Restart >> 00010: MPI: 1 >> 00011: threads: 1 >> 00012: Starting at step: 1 >> 00013: Running steps >> 00014: STARTED: estimateCtfStep, step 1, time 2020-12-30 15:35:57.887435 >> 00015: Estimating CTF of micrograph: 1 >> 00016: GHOST in place, read call ignored!. >> 00017: Traceback (most recent call last): >> 00018: File >> "/share/apps/software/anconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", >> line 189, in run >> 00019: self._run() >> 00020: File >> "/share/apps/software/anconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", >> line 240, in _run >> 00021: resultFiles = self._runFunc() >> 00022: File >> "/share/apps/software/anconda/envs/scipion3/lib/python3.8/site-packages/pyworkflow/protocol/protocol.py", >> line 236, in _runFunc >> 00023: return self._func(*self._args) >> 00024: File >> "/share/apps/software/anconda/envs/scipion3/lib/python3.8/site-packages/pwem/protocols/protocol_micrographs.py", >> line 247, in estimateCtfStep >> 00025: self._estimateCTF(mic, *args) >> 00026: File >> "/share/apps/software/anconda/envs/scipion3/lib/python3.8/site-packages/cistem/protocols/protocol_ctffind.py", >> line 108, in _estimateCTF >> 00027: self._doCtfEstimation(mic) >> 00028: File >> "/share/apps/software/anconda/envs/scipion3/lib/python3.8/site-packages/cistem/protocols/protocol_ctffind.py", >> line 78, in _doCtfEstimation >> 00029: ih.convert(micFn, micFnMrc, emlib.DT_FLOAT) >> 00030: File >> "/share/apps/software/anconda/envs/scipion3/lib/python3.8/site-packages/pwem/emlib/image/image_handler.py", >> line 170, in convert >> 00031: self._img.write(outputLoc) >> 00032: AttributeError: 'Image' object has no attribute 'write' >> 00033: Protocol failed: 'Image' object has no attribute 'write' >> 00034: FAILED: estimateCtfStep, step 1, time 2020-12-30 15:35:57.909908 >> >> 00002: Hostname: gpu08 >> 00003: PID: 101015 >> 00004: pyworkflow: 3.0.8 >> 00005: plugin: gctf >> 00006: plugin v: 3.0.11 >> 00007: currentDir: >> /share/home/biotest/stu01/ScipionUserData/projects/test >> 00008: workingDir: Runs/001508_ProtGctf >> 00009: runMode: Restart >> 00010: MPI: 1 >> 00011: threads: 1 >> 00012: Starting at step: 1 >> 00013: Running steps >> 00014: STARTED: estimateCtfStep, step 1, time 2020-12-30 16:20:37.950836 >> 00015: Estimating CTF of micrograph: 1 >> 00016: GHOST in place, read call ignored!. >> 00017: ERROR: Gctf has failed on Runs/001508_ProtGctf/tmp/mic_0001/*.mrc >> 00018: Traceback (most recent call last): >> 00019: File >> "/share/apps/software/anconda/envs/scipion3/lib/python3.8/site-packages/gctf/protocols/protocol_gctf.py", >> line 82, in _estimateCtfList >> 00020: ih.convert(micFn, micFnMrc, emlib.DT_FLOAT) >> 00021: File >> "/share/apps/software/anconda/envs/scipion3/lib/python3.8/site-packages/pwem/emlib/image/image_handler.py", >> line 170, in convert >> 00022: self._img.write(outputLoc) >> 00023: AttributeError: 'Image' object has no attribute 'write' >> 00024: FINISHED: estimateCtfStep, step 1, time 2020-12-30 >> 16:20:38.060255 >> >> Is there any suggestions? Thanks! >> >> >> >> _______________________________________________ >> scipion-users mailing list >> sci...@li... >> https://lists.sourceforge.net/lists/listinfo/scipion-users >> > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > > > _______________________________________________ > scipion-users mailing list > sci...@li... > https://lists.sourceforge.net/lists/listinfo/scipion-users > |