From: Walenz, B. <bw...@jc...> - 2012-07-28 04:57:28
|
I saw the syntax error. I don't think it's from runCA. All it is doing at that point is running the reported qsub command through perl's system(). I don't see any misplaced quotes in there. Actually, I was a little confused on the first message as to what the error was; everything looked fine....except for that syntax error. Do you have the 'sync' capability? The non-grid run mode in runCA is to run N concurrent processes. I think you should be able to modify this to instead of running the job on the local machine, to submit to the grid with -sync. When a job finishes, the qsub command returns, and runCA runs the next job. This might work for LSF too, if anyone is listening that cares. b ________________________________________ From: Powers, Jason [jp...@ex...] Sent: Friday, July 27, 2012 9:12 PM To: Walenz, Brian; wgs...@li... Subject: RE: [wgs-assembler-users] runCA and qsub...no syncing? Hi Brian, We use an internally developed scheduler called grun. At one point it was available on this website, http://code.google.com/p/ea-utils/, but we haven’t been maintaining that site very well and it appears as though it was pulled recently. Anyway I thought we had hold_jid working, but perhaps not. I will have to check it out on Monday. Alternatively, do you have any other theories regarding why it’s not working? As you can see that error is actually “Syntax error” – my assumption was that it the underlying cause was that jobs weren’t waiting for each other appropriately. Do you think that’s the root of it all? Thanks Jason From: Walenz, Brian [mailto:bw...@jc...] Sent: Friday, July 27, 2012 4:42 PM To: Powers, Jason; wgs...@li... Subject: Re: [wgs-assembler-users] runCA and qsub...no syncing? Hi- runCA isn’t using –sync. It uses –hold_jid to tell SGE to hold a job in queue until all other jobs with that name have completed. One ‘feature’ of runCA that might help you out here is “useGrid=1 scriptOnGrid=0”. This will set all the jobs up for sge, but not actually submit anything. It’ll tell you to submit an array of jobs, then when those finish, to restart runCA. What scheduler are you using? b On 7/27/12 4:33 PM, "Powers, Jason" <jp...@ex...> wrote: Hi all, We do not have SGE-proper here so myself and co-workers have been getting an SGE-compatible qsub up and running so everything is compatible with our distributed computing environment. We’ve got pacBioToCA working now with the system, and just today I’ve started to try to get runCA working. Oddly, it seems like runCA is submitting jobs without the “-sync” flag. Predictably, this causes downstream applications to fail. Here is the out file: ----------------------------------------START Fri Jul 27 15:31:51 2012 /opt/bin/wgs-bin/gatekeeper -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.BUILDING -T -F /mnt/scratch/test_assembly/50X/20X_PB/illum5x.frg /mnt/scratch/test_assembly/50X/20X_PB/test.LongReads.20X.corrected.frg > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:00 2012 (9 seconds) numFrags = 329675 ----------------------------------------START Fri Jul 27 15:32:00 2012 /opt/bin/wgs-bin/meryl -B -C -v -m 11 -memory 128000 -threads 8 -c 0 -L 2 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/meryl.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:35 2012 (35 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/estimate-mer-threshold -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -m /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.out 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta.err ----------------------------------------END Fri Jul 27 15:32:36 2012 (1 seconds) Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -cwd -N mbt_Distributed_test \ -t 1-1 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.\$TASK_ID.sge.err \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh Your job 3756459 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.$TASK_ID.sge.err 2> STDIN.embt_Distributed_test") has been submitted DEBUG START mbt_Distributed_test: jobs 3756459 DEBUG NOSYNC mbt_Distributed_test ----------------------------------------END Fri Jul 27 15:32:36 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 -hold_jid "mbt_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh sh: 2: Syntax error: Unterminated quoted string Your job 3756460 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:32:37 2012 (1 seconds) You can see that during 0-mertrim it does not sync. If I rerun things a little later, it gets to 0-overlaptrim-overlap stage, but again no syncing, and so it fails trying to move forward Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:54:59 2012 find /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim -name \*.merTrim -print | sort > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list ----------------------------------------END Fri Jul 27 15:54:59 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:54:59 2012 /opt/bin/wgs-bin/merTrimApply \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -L /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list \ -l /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.log \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrimApply.err 2>&1 ----------------------------------------END Fri Jul 27 15:55:01 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:01 2012 /opt/bin/wgs-bin/initialTrim \ -log /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.log \ -frg /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.summary \ 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.err ----------------------------------------END Fri Jul 27 15:55:04 2012 (3 seconds) ----------------------------------------START Fri Jul 27 15:55:05 2012 /opt/bin/wgs-bin/overlap_partition \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -bl 20000000 \ -bs 0 \ -rs 5000000 \ -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap HASH 1- 210916 REFR 1- 329675 STRINGS 210916 BASES 20000041 HASH 210917- 289851 REFR 1- 329675 STRINGS 78935 BASES 20000051 HASH 289852- 303091 REFR 1- 329675 STRINGS 13240 BASES 20002127 HASH 303092- 317562 REFR 1- 329675 STRINGS 14471 BASES 20000574 HASH 317563- 329675 REFR 1- 329675 STRINGS 12113 BASES 16625470 ----------------------------------------END Fri Jul 27 15:55:05 2012 (0 seconds) Created 5 overlap jobs. Last batch '001', last job '000005'. ----------------------------------------START Fri Jul 27 15:55:05 2012 qsub -A assembly -pe threads 4 -cwd -N ovl_Distributed_test \ -t 1-5 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/\$TASK_ID.out \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh Your job 3757130 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757132 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757133 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757134 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757136 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted DEBUG START ovl_Distributed_test: jobs 3757130 3757132 3757133 3757134 3757136 DEBUG NOSYNC ovl_Distributed_test ----------------------------------------END Fri Jul 27 15:55:07 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:07 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 -hold_jid "ovl_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh sh: 2: Syntax error: Unterminated quoted string Your job 3757137 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:55:08 2012 (1 seconds) Any thoughts? |