From: Walenz, B. <bw...@jc...> - 2012-07-27 20:42:35
|
Hi- runCA isn’t using –sync. It uses –hold_jid to tell SGE to hold a job in queue until all other jobs with that name have completed. One ‘feature’ of runCA that might help you out here is “useGrid=1 scriptOnGrid=0”. This will set all the jobs up for sge, but not actually submit anything. It’ll tell you to submit an array of jobs, then when those finish, to restart runCA. What scheduler are you using? b On 7/27/12 4:33 PM, "Powers, Jason" <jp...@ex...> wrote: Hi all, We do not have SGE-proper here so myself and co-workers have been getting an SGE-compatible qsub up and running so everything is compatible with our distributed computing environment. We’ve got pacBioToCA working now with the system, and just today I’ve started to try to get runCA working. Oddly, it seems like runCA is submitting jobs without the “-sync” flag. Predictably, this causes downstream applications to fail. Here is the out file: ----------------------------------------START Fri Jul 27 15:31:51 2012 /opt/bin/wgs-bin/gatekeeper -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.BUILDING -T -F /mnt/scratch/test_assembly/50X/20X_PB/illum5x.frg /mnt/scratch/test_assembly/50X/20X_PB/test.LongReads.20X.corrected.frg > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:00 2012 (9 seconds) numFrags = 329675 ----------------------------------------START Fri Jul 27 15:32:00 2012 /opt/bin/wgs-bin/meryl -B -C -v -m 11 -memory 128000 -threads 8 -c 0 -L 2 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/meryl.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:35 2012 (35 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/estimate-mer-threshold -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -m /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.out 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta.err ----------------------------------------END Fri Jul 27 15:32:36 2012 (1 seconds) Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -cwd -N mbt_Distributed_test \ -t 1-1 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.\$TASK_ID.sge.err \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh Your job 3756459 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.$TASK_ID.sge.err 2> STDIN.embt_Distributed_test") has been submitted DEBUG START mbt_Distributed_test: jobs 3756459 DEBUG NOSYNC mbt_Distributed_test ----------------------------------------END Fri Jul 27 15:32:36 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 -hold_jid "mbt_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh sh: 2: Syntax error: Unterminated quoted string Your job 3756460 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:32:37 2012 (1 seconds) You can see that during 0-mertrim it does not sync. If I rerun things a little later, it gets to 0-overlaptrim-overlap stage, but again no syncing, and so it fails trying to move forward Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:54:59 2012 find /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim -name \*.merTrim -print | sort > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list ----------------------------------------END Fri Jul 27 15:54:59 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:54:59 2012 /opt/bin/wgs-bin/merTrimApply \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -L /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list \ -l /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.log \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrimApply.err 2>&1 ----------------------------------------END Fri Jul 27 15:55:01 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:01 2012 /opt/bin/wgs-bin/initialTrim \ -log /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.log \ -frg /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.summary \ 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.err ----------------------------------------END Fri Jul 27 15:55:04 2012 (3 seconds) ----------------------------------------START Fri Jul 27 15:55:05 2012 /opt/bin/wgs-bin/overlap_partition \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -bl 20000000 \ -bs 0 \ -rs 5000000 \ -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap HASH 1- 210916 REFR 1- 329675 STRINGS 210916 BASES 20000041 HASH 210917- 289851 REFR 1- 329675 STRINGS 78935 BASES 20000051 HASH 289852- 303091 REFR 1- 329675 STRINGS 13240 BASES 20002127 HASH 303092- 317562 REFR 1- 329675 STRINGS 14471 BASES 20000574 HASH 317563- 329675 REFR 1- 329675 STRINGS 12113 BASES 16625470 ----------------------------------------END Fri Jul 27 15:55:05 2012 (0 seconds) Created 5 overlap jobs. Last batch '001', last job '000005'. ----------------------------------------START Fri Jul 27 15:55:05 2012 qsub -A assembly -pe threads 4 -cwd -N ovl_Distributed_test \ -t 1-5 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/\$TASK_ID.out \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh Your job 3757130 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757132 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757133 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757134 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757136 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted DEBUG START ovl_Distributed_test: jobs 3757130 3757132 3757133 3757134 3757136 DEBUG NOSYNC ovl_Distributed_test ----------------------------------------END Fri Jul 27 15:55:07 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:07 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 -hold_jid "ovl_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh sh: 2: Syntax error: Unterminated quoted string Your job 3757137 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:55:08 2012 (1 seconds) Any thoughts? |