You can subscribe to this list here.
2012 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(29) |
May
(8) |
Jun
(5) |
Jul
(46) |
Aug
(16) |
Sep
(5) |
Oct
(6) |
Nov
(17) |
Dec
(7) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2013 |
Jan
(5) |
Feb
(2) |
Mar
(10) |
Apr
(13) |
May
(20) |
Jun
(7) |
Jul
(6) |
Aug
(14) |
Sep
(9) |
Oct
(19) |
Nov
(17) |
Dec
(3) |
2014 |
Jan
(3) |
Feb
|
Mar
(7) |
Apr
(1) |
May
(1) |
Jun
(30) |
Jul
(10) |
Aug
(2) |
Sep
(18) |
Oct
(3) |
Nov
(4) |
Dec
(13) |
2015 |
Jan
(27) |
Feb
|
Mar
(19) |
Apr
(12) |
May
(10) |
Jun
(18) |
Jul
(4) |
Aug
(2) |
Sep
(2) |
Oct
|
Nov
(1) |
Dec
(9) |
2016 |
Jan
(6) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Walenz, B. <bw...@jc...> - 2012-07-30 16:44:29
|
Hi, Heiner- Working backwards through your email: We've also noticed the 'large scaffold gets lots of little contigs added' problem. This seems to be dominating our run time. I'm working on this problem at the moment. Our previous solution was basically what you did: let it run until we get impatient, then kill it and restart from the next checkpoint label. The CVS tip has a slight improvement in cgw, committed around the 20th. I hope to have much more within the next week. You can ignore the mates in the library, but not the reads. To ignore the mates, simply delete the mate link from gkpStore. At the very bottom of the 'gatekeeper' page on the wiki is 'allfragsunmated' which will remove the mate link from all reads in a single library. This is a destructive operation! Save a backup of gkpStore/fnm and gkpStore/fpk if you want to revert. (these two files store metadata for long and short fragments resp.) http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Gatekeep er FYI- The 5-consensus-insert-size directory has a plot of the insert size histogram for each library. These are based on unitigs, and so the 20k library might not be represented well. tigStore (the command) can also analyze mate pairs for contigs/unitigs in the store with -d matepair. b On 7/26/12 5:39 PM, "kuhl" <ku...@mo...> wrote: > Hi Brian et al., > > I am currently running a huge assembly with CA7 (2.5Gb 30x Illumina + 454, > cgw takes 150-300Gb RAM). It is now in step 7-2 and I have just stopped cgw > at MergeScaffoldsAggressive iteration 1641 and restarted it at ckp08-2SM. I > did this also in 7-0 at iteration 2xxx. Now I am not sure, if I should > maybe rerun scaffolding without 20 kb mate pairs, which I think are > responsible for this mess. So I have two questions: > > How can I convince cgw to ignore a certain library without doing steps 0-5 > again? > > Is there a rule of thumb, when MergeScaffoldsAggressive should be stopped? > > > In my case it looks like cgw is only very slightly progressing with each > iteration and there is one large scaffold that is growing more and more... > > ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60498 > ExamineUsableSEdges()- maxWeightEdge from 0 to 19 at idx 8774 out of 60498 > ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60500 > ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60500 > ExamineUsableSEdges()- maxWeightEdge from 0 to 16 at idx 10594 out of > 60500 > ExamineUsableSEdges()- maxWeightEdge from 0 to 7 at idx 20348 out of 60500 > ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60489 > ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60489 > ExamineUsableSEdges()- maxWeightEdge from 0 to 19 at idx 8773 out of 60489 > ExamineUsableSEdges()- maxWeightEdge from 0 to 9 at idx 16854 out of 60489 > ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60486 > ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60486 > ExamineUsableSEdges()- maxWeightEdge from 0 to 16 at idx 10593 out of > 60486 > ExamineUsableSEdges()- maxWeightEdge from 0 to 7 at idx 20428 out of 60486 > > Regards, Heiner > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Walenz, B. <bw...@jc...> - 2012-07-28 04:57:28
|
I saw the syntax error. I don't think it's from runCA. All it is doing at that point is running the reported qsub command through perl's system(). I don't see any misplaced quotes in there. Actually, I was a little confused on the first message as to what the error was; everything looked fine....except for that syntax error. Do you have the 'sync' capability? The non-grid run mode in runCA is to run N concurrent processes. I think you should be able to modify this to instead of running the job on the local machine, to submit to the grid with -sync. When a job finishes, the qsub command returns, and runCA runs the next job. This might work for LSF too, if anyone is listening that cares. b ________________________________________ From: Powers, Jason [jp...@ex...] Sent: Friday, July 27, 2012 9:12 PM To: Walenz, Brian; wgs...@li... Subject: RE: [wgs-assembler-users] runCA and qsub...no syncing? Hi Brian, We use an internally developed scheduler called grun. At one point it was available on this website, http://code.google.com/p/ea-utils/, but we haven’t been maintaining that site very well and it appears as though it was pulled recently. Anyway I thought we had hold_jid working, but perhaps not. I will have to check it out on Monday. Alternatively, do you have any other theories regarding why it’s not working? As you can see that error is actually “Syntax error” – my assumption was that it the underlying cause was that jobs weren’t waiting for each other appropriately. Do you think that’s the root of it all? Thanks Jason From: Walenz, Brian [mailto:bw...@jc...] Sent: Friday, July 27, 2012 4:42 PM To: Powers, Jason; wgs...@li... Subject: Re: [wgs-assembler-users] runCA and qsub...no syncing? Hi- runCA isn’t using –sync. It uses –hold_jid to tell SGE to hold a job in queue until all other jobs with that name have completed. One ‘feature’ of runCA that might help you out here is “useGrid=1 scriptOnGrid=0”. This will set all the jobs up for sge, but not actually submit anything. It’ll tell you to submit an array of jobs, then when those finish, to restart runCA. What scheduler are you using? b On 7/27/12 4:33 PM, "Powers, Jason" <jp...@ex...> wrote: Hi all, We do not have SGE-proper here so myself and co-workers have been getting an SGE-compatible qsub up and running so everything is compatible with our distributed computing environment. We’ve got pacBioToCA working now with the system, and just today I’ve started to try to get runCA working. Oddly, it seems like runCA is submitting jobs without the “-sync” flag. Predictably, this causes downstream applications to fail. Here is the out file: ----------------------------------------START Fri Jul 27 15:31:51 2012 /opt/bin/wgs-bin/gatekeeper -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.BUILDING -T -F /mnt/scratch/test_assembly/50X/20X_PB/illum5x.frg /mnt/scratch/test_assembly/50X/20X_PB/test.LongReads.20X.corrected.frg > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:00 2012 (9 seconds) numFrags = 329675 ----------------------------------------START Fri Jul 27 15:32:00 2012 /opt/bin/wgs-bin/meryl -B -C -v -m 11 -memory 128000 -threads 8 -c 0 -L 2 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/meryl.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:35 2012 (35 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/estimate-mer-threshold -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -m /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.out 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta.err ----------------------------------------END Fri Jul 27 15:32:36 2012 (1 seconds) Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -cwd -N mbt_Distributed_test \ -t 1-1 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.\$TASK_ID.sge.err \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh Your job 3756459 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.$TASK_ID.sge.err 2> STDIN.embt_Distributed_test") has been submitted DEBUG START mbt_Distributed_test: jobs 3756459 DEBUG NOSYNC mbt_Distributed_test ----------------------------------------END Fri Jul 27 15:32:36 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 -hold_jid "mbt_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh sh: 2: Syntax error: Unterminated quoted string Your job 3756460 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:32:37 2012 (1 seconds) You can see that during 0-mertrim it does not sync. If I rerun things a little later, it gets to 0-overlaptrim-overlap stage, but again no syncing, and so it fails trying to move forward Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:54:59 2012 find /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim -name \*.merTrim -print | sort > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list ----------------------------------------END Fri Jul 27 15:54:59 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:54:59 2012 /opt/bin/wgs-bin/merTrimApply \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -L /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list \ -l /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.log \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrimApply.err 2>&1 ----------------------------------------END Fri Jul 27 15:55:01 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:01 2012 /opt/bin/wgs-bin/initialTrim \ -log /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.log \ -frg /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.summary \ 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.err ----------------------------------------END Fri Jul 27 15:55:04 2012 (3 seconds) ----------------------------------------START Fri Jul 27 15:55:05 2012 /opt/bin/wgs-bin/overlap_partition \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -bl 20000000 \ -bs 0 \ -rs 5000000 \ -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap HASH 1- 210916 REFR 1- 329675 STRINGS 210916 BASES 20000041 HASH 210917- 289851 REFR 1- 329675 STRINGS 78935 BASES 20000051 HASH 289852- 303091 REFR 1- 329675 STRINGS 13240 BASES 20002127 HASH 303092- 317562 REFR 1- 329675 STRINGS 14471 BASES 20000574 HASH 317563- 329675 REFR 1- 329675 STRINGS 12113 BASES 16625470 ----------------------------------------END Fri Jul 27 15:55:05 2012 (0 seconds) Created 5 overlap jobs. Last batch '001', last job '000005'. ----------------------------------------START Fri Jul 27 15:55:05 2012 qsub -A assembly -pe threads 4 -cwd -N ovl_Distributed_test \ -t 1-5 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/\$TASK_ID.out \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh Your job 3757130 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757132 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757133 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757134 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757136 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted DEBUG START ovl_Distributed_test: jobs 3757130 3757132 3757133 3757134 3757136 DEBUG NOSYNC ovl_Distributed_test ----------------------------------------END Fri Jul 27 15:55:07 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:07 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 -hold_jid "ovl_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh sh: 2: Syntax error: Unterminated quoted string Your job 3757137 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:55:08 2012 (1 seconds) Any thoughts? |
From: Powers, J. <jp...@ex...> - 2012-07-28 01:12:16
|
Hi Brian, We use an internally developed scheduler called grun. At one point it was available on this website, http://code.google.com/p/ea-utils/, but we haven't been maintaining that site very well and it appears as though it was pulled recently. Anyway I thought we had hold_jid working, but perhaps not. I will have to check it out on Monday. Alternatively, do you have any other theories regarding why it's not working? As you can see that error is actually "Syntax error" - my assumption was that it the underlying cause was that jobs weren't waiting for each other appropriately. Do you think that's the root of it all? Thanks Jason From: Walenz, Brian [mailto:bw...@jc...] Sent: Friday, July 27, 2012 4:42 PM To: Powers, Jason; wgs...@li... Subject: Re: [wgs-assembler-users] runCA and qsub...no syncing? Hi- runCA isn't using -sync. It uses -hold_jid to tell SGE to hold a job in queue until all other jobs with that name have completed. One 'feature' of runCA that might help you out here is "useGrid=1 scriptOnGrid=0". This will set all the jobs up for sge, but not actually submit anything. It'll tell you to submit an array of jobs, then when those finish, to restart runCA. What scheduler are you using? b On 7/27/12 4:33 PM, "Powers, Jason" <jp...@ex...> wrote: Hi all, We do not have SGE-proper here so myself and co-workers have been getting an SGE-compatible qsub up and running so everything is compatible with our distributed computing environment. We've got pacBioToCA working now with the system, and just today I've started to try to get runCA working. Oddly, it seems like runCA is submitting jobs without the "-sync" flag. Predictably, this causes downstream applications to fail. Here is the out file: ----------------------------------------START Fri Jul 27 15:31:51 2012 /opt/bin/wgs-bin/gatekeeper -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.BUILDING -T -F /mnt/scratch/test_assembly/50X/20X_PB/illum5x.frg /mnt/scratch/test_assembly/50X/20X_PB/test.LongReads.20X.corrected.frg > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:00 2012 (9 seconds) numFrags = 329675 ----------------------------------------START Fri Jul 27 15:32:00 2012 /opt/bin/wgs-bin/meryl -B -C -v -m 11 -memory 128000 -threads 8 -c 0 -L 2 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/meryl.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:35 2012 (35 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/estimate-mer-threshold -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -m /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.out 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta.err ----------------------------------------END Fri Jul 27 15:32:36 2012 (1 seconds) Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -cwd -N mbt_Distributed_test \ -t 1-1 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.\$TASK_ID.sge.err \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh Your job 3756459 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.$TASK_ID.sge.err 2> STDIN.embt_Distributed_test") has been submitted DEBUG START mbt_Distributed_test: jobs 3756459 DEBUG NOSYNC mbt_Distributed_test ----------------------------------------END Fri Jul 27 15:32:36 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 -hold_jid "mbt_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh sh: 2: Syntax error: Unterminated quoted string Your job 3756460 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:32:37 2012 (1 seconds) You can see that during 0-mertrim it does not sync. If I rerun things a little later, it gets to 0-overlaptrim-overlap stage, but again no syncing, and so it fails trying to move forward Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:54:59 2012 find /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim -name \*.merTrim -print | sort > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list ----------------------------------------END Fri Jul 27 15:54:59 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:54:59 2012 /opt/bin/wgs-bin/merTrimApply \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -L /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list \ -l /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.log \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrimApply.err 2>&1 ----------------------------------------END Fri Jul 27 15:55:01 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:01 2012 /opt/bin/wgs-bin/initialTrim \ -log /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.log \ -frg /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.summary \ 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.err ----------------------------------------END Fri Jul 27 15:55:04 2012 (3 seconds) ----------------------------------------START Fri Jul 27 15:55:05 2012 /opt/bin/wgs-bin/overlap_partition \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -bl 20000000 \ -bs 0 \ -rs 5000000 \ -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap HASH 1- 210916 REFR 1- 329675 STRINGS 210916 BASES 20000041 HASH 210917- 289851 REFR 1- 329675 STRINGS 78935 BASES 20000051 HASH 289852- 303091 REFR 1- 329675 STRINGS 13240 BASES 20002127 HASH 303092- 317562 REFR 1- 329675 STRINGS 14471 BASES 20000574 HASH 317563- 329675 REFR 1- 329675 STRINGS 12113 BASES 16625470 ----------------------------------------END Fri Jul 27 15:55:05 2012 (0 seconds) Created 5 overlap jobs. Last batch '001', last job '000005'. ----------------------------------------START Fri Jul 27 15:55:05 2012 qsub -A assembly -pe threads 4 -cwd -N ovl_Distributed_test \ -t 1-5 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/\$TASK_ID.out \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh Your job 3757130 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757132 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757133 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757134 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757136 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted DEBUG START ovl_Distributed_test: jobs 3757130 3757132 3757133 3757134 3757136 DEBUG NOSYNC ovl_Distributed_test ----------------------------------------END Fri Jul 27 15:55:07 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:07 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 -hold_jid "ovl_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh sh: 2: Syntax error: Unterminated quoted string Your job 3757137 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:55:08 2012 (1 seconds) Any thoughts? |
From: Walenz, B. <bw...@jc...> - 2012-07-27 20:42:35
|
Hi- runCA isn’t using –sync. It uses –hold_jid to tell SGE to hold a job in queue until all other jobs with that name have completed. One ‘feature’ of runCA that might help you out here is “useGrid=1 scriptOnGrid=0”. This will set all the jobs up for sge, but not actually submit anything. It’ll tell you to submit an array of jobs, then when those finish, to restart runCA. What scheduler are you using? b On 7/27/12 4:33 PM, "Powers, Jason" <jp...@ex...> wrote: Hi all, We do not have SGE-proper here so myself and co-workers have been getting an SGE-compatible qsub up and running so everything is compatible with our distributed computing environment. We’ve got pacBioToCA working now with the system, and just today I’ve started to try to get runCA working. Oddly, it seems like runCA is submitting jobs without the “-sync” flag. Predictably, this causes downstream applications to fail. Here is the out file: ----------------------------------------START Fri Jul 27 15:31:51 2012 /opt/bin/wgs-bin/gatekeeper -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.BUILDING -T -F /mnt/scratch/test_assembly/50X/20X_PB/illum5x.frg /mnt/scratch/test_assembly/50X/20X_PB/test.LongReads.20X.corrected.frg > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:00 2012 (9 seconds) numFrags = 329675 ----------------------------------------START Fri Jul 27 15:32:00 2012 /opt/bin/wgs-bin/meryl -B -C -v -m 11 -memory 128000 -threads 8 -c 0 -L 2 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/meryl.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:35 2012 (35 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/estimate-mer-threshold -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -m /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.out 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta.err ----------------------------------------END Fri Jul 27 15:32:36 2012 (1 seconds) Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -cwd -N mbt_Distributed_test \ -t 1-1 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.\$TASK_ID.sge.err \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh Your job 3756459 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.$TASK_ID.sge.err 2> STDIN.embt_Distributed_test") has been submitted DEBUG START mbt_Distributed_test: jobs 3756459 DEBUG NOSYNC mbt_Distributed_test ----------------------------------------END Fri Jul 27 15:32:36 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 -hold_jid "mbt_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh sh: 2: Syntax error: Unterminated quoted string Your job 3756460 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:32:37 2012 (1 seconds) You can see that during 0-mertrim it does not sync. If I rerun things a little later, it gets to 0-overlaptrim-overlap stage, but again no syncing, and so it fails trying to move forward Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:54:59 2012 find /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim -name \*.merTrim -print | sort > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list ----------------------------------------END Fri Jul 27 15:54:59 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:54:59 2012 /opt/bin/wgs-bin/merTrimApply \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -L /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list \ -l /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.log \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrimApply.err 2>&1 ----------------------------------------END Fri Jul 27 15:55:01 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:01 2012 /opt/bin/wgs-bin/initialTrim \ -log /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.log \ -frg /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.summary \ 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.err ----------------------------------------END Fri Jul 27 15:55:04 2012 (3 seconds) ----------------------------------------START Fri Jul 27 15:55:05 2012 /opt/bin/wgs-bin/overlap_partition \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -bl 20000000 \ -bs 0 \ -rs 5000000 \ -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap HASH 1- 210916 REFR 1- 329675 STRINGS 210916 BASES 20000041 HASH 210917- 289851 REFR 1- 329675 STRINGS 78935 BASES 20000051 HASH 289852- 303091 REFR 1- 329675 STRINGS 13240 BASES 20002127 HASH 303092- 317562 REFR 1- 329675 STRINGS 14471 BASES 20000574 HASH 317563- 329675 REFR 1- 329675 STRINGS 12113 BASES 16625470 ----------------------------------------END Fri Jul 27 15:55:05 2012 (0 seconds) Created 5 overlap jobs. Last batch '001', last job '000005'. ----------------------------------------START Fri Jul 27 15:55:05 2012 qsub -A assembly -pe threads 4 -cwd -N ovl_Distributed_test \ -t 1-5 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/\$TASK_ID.out \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh Your job 3757130 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757132 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757133 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757134 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757136 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted DEBUG START ovl_Distributed_test: jobs 3757130 3757132 3757133 3757134 3757136 DEBUG NOSYNC ovl_Distributed_test ----------------------------------------END Fri Jul 27 15:55:07 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:07 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 -hold_jid "ovl_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh sh: 2: Syntax error: Unterminated quoted string Your job 3757137 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:55:08 2012 (1 seconds) Any thoughts? |
From: Powers, J. <jp...@ex...> - 2012-07-27 20:33:34
|
Hi all, We do not have SGE-proper here so myself and co-workers have been getting an SGE-compatible qsub up and running so everything is compatible with our distributed computing environment. We've got pacBioToCA working now with the system, and just today I've started to try to get runCA working. Oddly, it seems like runCA is submitting jobs without the "-sync" flag. Predictably, this causes downstream applications to fail. Here is the out file: ----------------------------------------START Fri Jul 27 15:31:51 2012 /opt/bin/wgs-bin/gatekeeper -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.BUILDING -T -F /mnt/scratch/test_assembly/50X/20X_PB/illum5x.frg /mnt/scratch/test_assembly/50X/20X_PB/test.LongReads.20X.corrected.frg > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:00 2012 (9 seconds) numFrags = 329675 ----------------------------------------START Fri Jul 27 15:32:00 2012 /opt/bin/wgs-bin/meryl -B -C -v -m 11 -memory 128000 -threads 8 -c 0 -L 2 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/meryl.err 2>&1 ----------------------------------------END Fri Jul 27 15:32:35 2012 (35 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/estimate-mer-threshold -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore:chain -m /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.out 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0.estMerThresh.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.ovl.fasta.err ----------------------------------------END Fri Jul 27 15:32:35 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:35 2012 /opt/bin/wgs-bin/meryl -Dt -n 421 -s /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test-C-ms11-cm0 > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mercounts/Distributed_test.nmers.obt.fasta.err ----------------------------------------END Fri Jul 27 15:32:36 2012 (1 seconds) Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -cwd -N mbt_Distributed_test \ -t 1-1 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.\$TASK_ID.sge.err \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh Your job 3756459 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/mertrim.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.$TASK_ID.sge.err 2> STDIN.embt_Distributed_test") has been submitted DEBUG START mbt_Distributed_test: jobs 3756459 DEBUG NOSYNC mbt_Distributed_test ----------------------------------------END Fri Jul 27 15:32:36 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:32:36 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 -hold_jid "mbt_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh sh: 2: Syntax error: Unterminated quoted string Your job 3756460 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.00 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:32:37 2012 (1 seconds) You can see that during 0-mertrim it does not sync. If I rerun things a little later, it gets to 0-overlaptrim-overlap stage, but again no syncing, and so it fails trying to move forward Reset OBT mer threshold from auto to 421. Reset OVL mer threshold from auto to 421. ----------------------------------------START Fri Jul 27 15:54:59 2012 find /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim -name \*.merTrim -print | sort > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list ----------------------------------------END Fri Jul 27 15:54:59 2012 (0 seconds) ----------------------------------------START Fri Jul 27 15:54:59 2012 /opt/bin/wgs-bin/merTrimApply \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -L /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.list \ -l /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrim.log \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-mertrim/Distributed_test.merTrimApply.err 2>&1 ----------------------------------------END Fri Jul 27 15:55:01 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:01 2012 /opt/bin/wgs-bin/initialTrim \ -log /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.log \ -frg /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.summary \ 2> /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim/Distributed_test.initialTrim.err ----------------------------------------END Fri Jul 27 15:55:04 2012 (3 seconds) ----------------------------------------START Fri Jul 27 15:55:05 2012 /opt/bin/wgs-bin/overlap_partition \ -g /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/Distributed_test.gkpStore \ -bl 20000000 \ -bs 0 \ -rs 5000000 \ -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap HASH 1- 210916 REFR 1- 329675 STRINGS 210916 BASES 20000041 HASH 210917- 289851 REFR 1- 329675 STRINGS 78935 BASES 20000051 HASH 289852- 303091 REFR 1- 329675 STRINGS 13240 BASES 20002127 HASH 303092- 317562 REFR 1- 329675 STRINGS 14471 BASES 20000574 HASH 317563- 329675 REFR 1- 329675 STRINGS 12113 BASES 16625470 ----------------------------------------END Fri Jul 27 15:55:05 2012 (0 seconds) Created 5 overlap jobs. Last batch '001', last job '000005'. ----------------------------------------START Fri Jul 27 15:55:05 2012 qsub -A assembly -pe threads 4 -cwd -N ovl_Distributed_test \ -t 1-5 \ -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/\$TASK_ID.out \ /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh Your job 3757130 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757132 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757133 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757134 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted Your job 3757136 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/overlap.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/0-overlaptrim-overlap/$TASK_ID.out 2> STDIN.eovl_Distributed_test") has been submitted DEBUG START ovl_Distributed_test: jobs 3757130 3757132 3757133 3757134 3757136 DEBUG NOSYNC ovl_Distributed_test ----------------------------------------END Fri Jul 27 15:55:07 2012 (2 seconds) ----------------------------------------START Fri Jul 27 15:55:07 2012 qsub -A assembly -pe threads 4 -cwd -N "rCA_Distributed_test" -j y -o /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 -hold_jid "ovl_Distributed_test" /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh sh: 2: Syntax error: Unterminated quoted string Your job 3757137 ("/mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01.sh > /mnt/scratch/test_assembly/50X/20X_PB/distributed_test/runCA.sge.out.01 2> STDIN.erCA_Distributed_test") has been submitted ----------------------------------------END Fri Jul 27 15:55:08 2012 (1 seconds) Any thoughts? |
From: kuhl <ku...@mo...> - 2012-07-26 21:39:23
|
Hi Brian et al., I am currently running a huge assembly with CA7 (2.5Gb 30x Illumina + 454, cgw takes 150-300Gb RAM). It is now in step 7-2 and I have just stopped cgw at MergeScaffoldsAggressive iteration 1641 and restarted it at ckp08-2SM. I did this also in 7-0 at iteration 2xxx. Now I am not sure, if I should maybe rerun scaffolding without 20 kb mate pairs, which I think are responsible for this mess. So I have two questions: How can I convince cgw to ignore a certain library without doing steps 0-5 again? Is there a rule of thumb, when MergeScaffoldsAggressive should be stopped? In my case it looks like cgw is only very slightly progressing with each iteration and there is one large scaffold that is growing more and more... ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60498 ExamineUsableSEdges()- maxWeightEdge from 0 to 19 at idx 8774 out of 60498 ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60500 ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60500 ExamineUsableSEdges()- maxWeightEdge from 0 to 16 at idx 10594 out of 60500 ExamineUsableSEdges()- maxWeightEdge from 0 to 7 at idx 20348 out of 60500 ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60489 ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60489 ExamineUsableSEdges()- maxWeightEdge from 0 to 19 at idx 8773 out of 60489 ExamineUsableSEdges()- maxWeightEdge from 0 to 9 at idx 16854 out of 60489 ExamineUsableSEdges()- maxWeightEdge from 0 to 55 at idx 286 out of 60486 ExamineUsableSEdges()- maxWeightEdge from 0 to 32 at idx 3355 out of 60486 ExamineUsableSEdges()- maxWeightEdge from 0 to 16 at idx 10593 out of 60486 ExamineUsableSEdges()- maxWeightEdge from 0 to 7 at idx 20428 out of 60486 Regards, Heiner |
From: Thomas H. <tho...@un...> - 2012-07-23 19:09:18
|
Hi, thanks for the quick reply. I tried a lot, including the fixes from the "Known Issues" section. I also did not believe that the pipeline would create overlaps between pacBio-pacBio. What got me wondering was that I ran into the same error after I removed all but one pacBio read e.g. from the PacBio E.coli sample data set and tried to correct this one read with the corresponding illumina data set. I also tried all kinds of shortRead settings and smaller kmer-sizes and basically the pacbio.spec from the sourceforge web site. For the sample data I used 100bp raw illumina reads (phread+64) and 2-10kb fragments from contigs of an assembly of these reads, with and without artificially introduced indels. In addition I used a mapper (Shrimp2) to manually map Illumina reads onto my sample pacBio reads, which always produced valid mappings. I will try again tomorrow - a bit more organized - and log the different settings and scenarios to pinpoint the problem. 'til then thanks Thomas Am 23.07.2012 17:31, schrieb Sergey Koren: > Hi Thomas, > > As Brian mentioned, the pipeline will only overlap PacBio to Illumina (or short read data). So the no overlaps error means that no short-reads could be mapped to the PacBio reads and thus no correction can be done. This error is normally caused by either the fastq format output by the 1.3.0 SMRTportal software (which is since fixed in 1.3.1) or by short Illumina data (<64bp). The pacBioToCA wiki page lists suggested solutions to both issues under the Known Issues section: > https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=PacBioToCA#Known_Issues > > Please let me know if neither one of those suggestions fixes your issue. > > Thanks, > Sergey > Bioinformatics Scientist > NBACC > > On Jul 23, 2012, at 11:23 AM, Walenz, Brian wrote: > >> Hi, Thomas- >> >> The pipeline is only looking for overlaps between Illumina and PacBio, not >> Illumina-to-Illumina or Illumina-to-PacBio. From the overlaps it builds a >> consensus sequence representing the PacBio read, and that is the corrected >> read. >> >> Can you describe your data a bit? How short are the Illumina? Any changes >> to parameters for the pipeline? >> >> bri >> -- >> Brian Walenz >> Senior Software Engineer >> J. Craig Venter Institute >> >> On 7/23/12 10:13 AM, "Thomas Hackl" <tho...@un...> wrote: >> >>> Hello, >>> >>> I am currently testing the pacBioToCA pipeline on some sample data in >>> preparation for some upcoming experiments. The pipeline always stopped >>> with the error: "No Overlaps found". I already figured out that >>> obviously these overlaps are required within the pacBio read library, or >>> at least that this solves the issue. >>> >>> Since I do not necessarily expect overlaps in our data but still want to >>> correct them with Illumina short reads for further analysis, I would >>> like to know, if this is even possible with your pipeline. And also I do >>> not really understand, why there is an overlap computation step before >>> the error correction step? >>> >>> Regards >>> Thomas >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> wgs-assembler-users mailing list >>> wgs...@li... >>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users -- Thomas Hackl Julius-Maximilians-Universität Department of Bioinformatics 97074 Würzburg, Germany Fon: +49 931 - 31 86883 Mail: tho...@un... |
From: Sergey K. <se...@um...> - 2012-07-23 15:31:26
|
Hi Thomas, As Brian mentioned, the pipeline will only overlap PacBio to Illumina (or short read data). So the no overlaps error means that no short-reads could be mapped to the PacBio reads and thus no correction can be done. This error is normally caused by either the fastq format output by the 1.3.0 SMRTportal software (which is since fixed in 1.3.1) or by short Illumina data (<64bp). The pacBioToCA wiki page lists suggested solutions to both issues under the Known Issues section: https://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=PacBioToCA#Known_Issues Please let me know if neither one of those suggestions fixes your issue. Thanks, Sergey Bioinformatics Scientist NBACC On Jul 23, 2012, at 11:23 AM, Walenz, Brian wrote: > Hi, Thomas- > > The pipeline is only looking for overlaps between Illumina and PacBio, not > Illumina-to-Illumina or Illumina-to-PacBio. From the overlaps it builds a > consensus sequence representing the PacBio read, and that is the corrected > read. > > Can you describe your data a bit? How short are the Illumina? Any changes > to parameters for the pipeline? > > bri > -- > Brian Walenz > Senior Software Engineer > J. Craig Venter Institute > > On 7/23/12 10:13 AM, "Thomas Hackl" <tho...@un...> wrote: > >> Hello, >> >> I am currently testing the pacBioToCA pipeline on some sample data in >> preparation for some upcoming experiments. The pipeline always stopped >> with the error: "No Overlaps found". I already figured out that >> obviously these overlaps are required within the pacBio read library, or >> at least that this solves the issue. >> >> Since I do not necessarily expect overlaps in our data but still want to >> correct them with Illumina short reads for further analysis, I would >> like to know, if this is even possible with your pipeline. And also I do >> not really understand, why there is an overlap computation step before >> the error correction step? >> >> Regards >> Thomas >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Walenz, B. <bw...@jc...> - 2012-07-23 15:23:51
|
Hi, Thomas- The pipeline is only looking for overlaps between Illumina and PacBio, not Illumina-to-Illumina or Illumina-to-PacBio. From the overlaps it builds a consensus sequence representing the PacBio read, and that is the corrected read. Can you describe your data a bit? How short are the Illumina? Any changes to parameters for the pipeline? bri -- Brian Walenz Senior Software Engineer J. Craig Venter Institute On 7/23/12 10:13 AM, "Thomas Hackl" <tho...@un...> wrote: > Hello, > > I am currently testing the pacBioToCA pipeline on some sample data in > preparation for some upcoming experiments. The pipeline always stopped > with the error: "No Overlaps found". I already figured out that > obviously these overlaps are required within the pacBio read library, or > at least that this solves the issue. > > Since I do not necessarily expect overlaps in our data but still want to > correct them with Illumina short reads for further analysis, I would > like to know, if this is even possible with your pipeline. And also I do > not really understand, why there is an overlap computation step before > the error correction step? > > Regards > Thomas > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Thomas H. <tho...@un...> - 2012-07-23 14:43:39
|
Hello, I am currently testing the pacBioToCA pipeline on some sample data in preparation for some upcoming experiments. The pipeline always stopped with the error: "No Overlaps found". I already figured out that obviously these overlaps are required within the pacBio read library, or at least that this solves the issue. Since I do not necessarily expect overlaps in our data but still want to correct them with Illumina short reads for further analysis, I would like to know, if this is even possible with your pipeline. And also I do not really understand, why there is an overlap computation step before the error correction step? Regards Thomas |
From: Ole K. T. <o.k...@bi...> - 2012-07-19 12:29:12
|
Hi Heiner, I have another 454 8 kb library and an Illumina 5 kb library (plus a 3 kb 454 library). I'm really interested in hearing your workaround. Ole On 19 July 2012 13:31, kuhl <ku...@mo...> wrote: > Hi Ole, > > what other kind of large insert libraries do you have? I might have a > suitable workaround for such problems. > > Best wishes, > > Heiner > > > > On Wed, 18 Jul 2012 11:02:56 +0200, Ole Kristian Tørresen > <o.k...@bi...> wrote: >> Hi, >> I have a bimodal mate pair library with one peak around 3kbp and the >> largest and real peak at 19kbp. When I include this in the assembly, >> the scaffolding take ages and the result is not that good (larger than >> biological probable scaffolds). If I don't include it, scaffold is >> quite quick, but the assembly is a bit more fragmented that with it >> included. Without I get around 20000 scaffolds, and I get 7000 >> scaffolds with it included (around 7000 with Newbler too, which >> handles this kinds of library better than CA I think). So I would >> rather like to include the library. >> >> classifyMates does not seem to be able to find innie oriented mates, >> only outtie PE mates as far as I can see. Could I get around this in a >> way? Pretend that the library is outtie, and search for outtie mates >> with seperation up to 5kbp or something? >> >> Thank you. >> >> Ole >> >> > ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. > Discussions >> will include endpoint security, mobile security and the latest in > malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: kuhl <ku...@mo...> - 2012-07-19 11:51:19
|
Hi Ole, what other kind of large insert libraries do you have? I might have a suitable workaround for such problems. Best wishes, Heiner On Wed, 18 Jul 2012 11:02:56 +0200, Ole Kristian Tørresen <o.k...@bi...> wrote: > Hi, > I have a bimodal mate pair library with one peak around 3kbp and the > largest and real peak at 19kbp. When I include this in the assembly, > the scaffolding take ages and the result is not that good (larger than > biological probable scaffolds). If I don't include it, scaffold is > quite quick, but the assembly is a bit more fragmented that with it > included. Without I get around 20000 scaffolds, and I get 7000 > scaffolds with it included (around 7000 with Newbler too, which > handles this kinds of library better than CA I think). So I would > rather like to include the library. > > classifyMates does not seem to be able to find innie oriented mates, > only outtie PE mates as far as I can see. Could I get around this in a > way? Pretend that the library is outtie, and search for outtie mates > with seperation up to 5kbp or something? > > Thank you. > > Ole > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Walenz, B. <bw...@jc...> - 2012-07-19 11:08:05
|
Hi, Ole- We tried this on a plant genome for the exact same issue, with little success. The 3kb mates spanned repeats that classifyMates was unable to search through, resulting in either enormous compute times. If you want to try it, the latest in CVS, I think, allows innie again. Be sure to use -nosuspicious. If not, flipping the reads works too. Use the 'random path' search -rfs. Instead of exhaustively searching, this will follow N random paths through the overlap graph. b On 7/18/12 5:02 AM, "Ole Kristian Tørresen" <o.k...@bi...> wrote: > Hi, > I have a bimodal mate pair library with one peak around 3kbp and the > largest and real peak at 19kbp. When I include this in the assembly, > the scaffolding take ages and the result is not that good (larger than > biological probable scaffolds). If I don't include it, scaffold is > quite quick, but the assembly is a bit more fragmented that with it > included. Without I get around 20000 scaffolds, and I get 7000 > scaffolds with it included (around 7000 with Newbler too, which > handles this kinds of library better than CA I think). So I would > rather like to include the library. > > classifyMates does not seem to be able to find innie oriented mates, > only outtie PE mates as far as I can see. Could I get around this in a > way? Pretend that the library is outtie, and search for outtie mates > with seperation up to 5kbp or something? > > Thank you. > > Ole > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |
From: Ole K. T. <o.k...@bi...> - 2012-07-18 09:03:06
|
Hi, I have a bimodal mate pair library with one peak around 3kbp and the largest and real peak at 19kbp. When I include this in the assembly, the scaffolding take ages and the result is not that good (larger than biological probable scaffolds). If I don't include it, scaffold is quite quick, but the assembly is a bit more fragmented that with it included. Without I get around 20000 scaffolds, and I get 7000 scaffolds with it included (around 7000 with Newbler too, which handles this kinds of library better than CA I think). So I would rather like to include the library. classifyMates does not seem to be able to find innie oriented mates, only outtie PE mates as far as I can see. Could I get around this in a way? Pretend that the library is outtie, and search for outtie mates with seperation up to 5kbp or something? Thank you. Ole |
From: Walenz, B. <bw...@jc...> - 2012-07-17 17:53:45
|
Hi, Jason- Can you send the spec and at least one of the error files from overlapper? Is this CA7 (the release) or the CVS version? I suspect overlapper might be exhausting memory. The error file reports how much memory it is allocating for the large data structures. In CA7, Illumina reads must be loaded before long reads, otherwise memory usage is higher than it should be. In the CVS version, the assembler will check for this problem. bri On 7/17/12 12:37 PM, "Powers, Jason" <jp...@ex...> wrote: Hi all, Trying to do a hybrid assembly with PacBio and Illum short reads. Essentially this is what I am trying to do: Correct PacBio with Illumina. Assembly Corrected PacBio reads with about 5X of Illumina reads. The reason I want to add in the paired end Illumina reads is that at the end of the assembly, I would like to use amosvalidate to evaluate assembly-correctness. While you can do amosvalidate without paired end reads, it is more powerful if you can incorporate that data, so I thought by sprinkling in a low amount of Illumina reads into the assembly, I could take advantage of this. Unfortunately assembly using both consistently fails during the overlap phase. Sometimes it gets to 1-overlapper, sometimes it fails on 0-overlaptrim-overlap. But it just doesn’t want to finish. I’ve encountered overlap failures before, and found that some nodes in the cluster I am using that seem to have problems with the installation/dependencies. However tracking the nodes here, it seems pretty scattershot. Any thoughts on what might be happening, or alternatively, how I can assemble the pacbio reads and add in Illumina reads post-assembly for use with amosvalidate? Thanks, Jason |
From: Powers, J. <jp...@ex...> - 2012-07-17 16:49:34
|
Hi all, Trying to do a hybrid assembly with PacBio and Illum short reads. Essentially this is what I am trying to do: Correct PacBio with Illumina. Assembly Corrected PacBio reads with about 5X of Illumina reads. The reason I want to add in the paired end Illumina reads is that at the end of the assembly, I would like to use amosvalidate to evaluate assembly-correctness. While you can do amosvalidate without paired end reads, it is more powerful if you can incorporate that data, so I thought by sprinkling in a low amount of Illumina reads into the assembly, I could take advantage of this. Unfortunately assembly using both consistently fails during the overlap phase. Sometimes it gets to 1-overlapper, sometimes it fails on 0-overlaptrim-overlap. But it just doesn't want to finish. I've encountered overlap failures before, and found that some nodes in the cluster I am using that seem to have problems with the installation/dependencies. However tracking the nodes here, it seems pretty scattershot. Any thoughts on what might be happening, or alternatively, how I can assemble the pacbio reads and add in Illumina reads post-assembly for use with amosvalidate? Thanks, Jason |
From: Christoph H. <chr...@gm...> - 2012-07-13 20:54:20
|
Thanks! Had not seen this one.. Have resumed it now on a 30g node. Do you think that will be sufficient? Sorry to always ask this questions about memory. I understand that it depends on the data/assembly, but I do not have any idea, so maybe you can give me a hint from your experience. Along the same lines: Is there any educated guess of how long the buildUnitigs may run? Sorry again for this kind of questions.. These things are always an issue for me because I have to specify job runtime and memory usage in advance. And according to memory usage I am charged with CPUhours, which I only have a limited amount of.. very annoying. The unitigger.err file says the following now: Bubble popping = on Intersection breaking = on Bad mate threshold = -7 Error threshold = 0.030 (3.000%) Error limit = 2.500 errors sizeof(ufPath) = 24 FragmentInfo()-- Loading fragment information for 224632508 fragments and 3 libraries from cache '/projects/nn9201k/Celera/work2/salaris1/4-unitigger/salaris.fragmentInfo' setLogFile()-- Now logging to '/projects/nn9201k/Celera/work2/salaris1/4-unitigger/salaris.001.bestoverlapgraph-containments.log' Once again, I understand that it is impossible to estimate these parameters accurately. Just a rough guess would help already. Thanks!! cheers, Christoph On 07/13/2012 09:55 PM, Walenz, Brian wrote: > If the third step finished with no errors, then you can delete all the > buckets, and all the ####.idx and ####.ovs files. Those would have been > automagically removed had you enabled either of the delete options. If you > have the 'ovs' file, it also likely finished successfully. > > Build unitigs seems to have made it up to loading overlaps. It still has to > allocate space for unitigs. It will grow, but I don't know by how much. > Safe guess is by more than 1gb -- so yes, move it somewhere with more > memory. I'm not sure how hard it will be to get a good guess on total > memory usage so we could abort the run early. It's on the todo list now. > > b > > > > On 7/13/12 2:35 PM, "Christoph Hahn" <chr...@gm...> wrote: > >> Hi Brian, >> >> No, I did not enable any of these delete functions, so I will delete the >> bucket directories manually now. I do have ####.idx and ####.ovs files >> (for the first 100 of 418 - 2-sort.sh ran 100 jobs. Is that a problem? >> Yes, I think the bucket???? directories make up most of the difference >> in disk space. >> >> Concerning the buildUnitigs, I was just wondering because it is now >> running with constantly 15g on a 16g machine. Its running for almost 2 >> hours now and has just created the following files at the beginning. >> They are unchanged so far. >> >> -rw-r--r-- 1 chrishah users 2.6G Jul 13 18:48 salaris.fragmentInfo >> -rw-r--r-- 1 chrishah users 0 Jul 13 18:48 >> salaris.001.bestoverlapgraph-containments.log >> -rw-r--r-- 1 chrishah users 2.4K Jul 13 18:48 unitigger.err >> >> Is there any increase of memory usage to be expected? If yes, I would be >> inclined to stop it now and start it over again on a bigger machine >> right away. >> >> Thanks for your help! I appreciate it! >> >> cheers, >> Christoph >> >> On 13.07.2012 20:20, Walenz, Brian wrote: >>> Hi, Christoph- >>> >>> Good to hear! You're the third person (I know of) to run the parallel >>> version. Instead of fixing the older store build, I'd rather spend time to >>> integrate the new one with runCA, either as a set of jobs for SGE, or a >>> series of sequential jobs. It's just scripting, but there might be some >>> performance issues to optimize. >>> >>> If the store is complete, the bucket directories can be deleted. The third >>> step should have done this for you. Maybe not if you didn't enable >>> deleteearly or deletelate. The store is complete if you have just the #### >>> files, an 'idx' and an 'ovs' file. You should not have any ####.idx or >>> ####.ovs files. Is the extra space in the bucket??? directories? The >>> difference (546 - 320 = 226) seems to be a reasonable size for the buckets. >>> >>> Memory for buildUnitgs (aka bog) cannot be specified. There isn't any data >>> we can keep on disk, or not load, or compute differently in a smaller memory >>> size. Memory is used to store fragment meta data (clear lengths, mate >>> pairs) and best overlaps, and constructed unitigs. The first two are of a >>> known size. The number of unitigs depends on the assembly. We've seen an >>> assembly that exhausted memory in bog, caused by junk fragments creating an >>> enormous number of single-fragment unitigs. >>> >>> b >>> >>> >>> >>> On 7/13/12 1:53 PM, "Christoph Hahn"<chr...@gm...> wrote: >>> >>>> Hi Brian, >>>> >>>> It s done! I have by now also updated the overlapStore with the frg- and >>>> ovlcorr and I am in the process of building unitigs now. >>>> >>>> I like this parallel version for building the ovlStore. You were right >>>> the last jobs needed double the memory. When distributing the jobs to >>>> several CPUs it is very time efficient and also used fewer overall >>>> CPUhours in comparison to the regular overlapStore command. One thing >>>> though is that I think it needs substantially more disk space. I am not >>>> 100% sure (because its gone now..), but I believe the *.ovlStore build >>>> by the regular command used some 320G of disk space, while the one I >>>> have now is using 546G. Are all the bucket???? directories in *.ovlStore >>>> still needed? >>>> >>>> Overall I think I learned a lot about CA by running the latest steps >>>> again with the parallel version of ovlStore build and your help. Are >>>> there plans to include a failsafe for the overlapStore update function, >>>> until the process is finished? So that it can be resumed in case it >>>> stops for whatever reason. >>>> >>>> One more thing: Is there a way to specify the memory buildUnitigs is >>>> using? >>>> >>>> Thanks again for your help!! >>>> >>>> cheers, >>>> Christoph >>>> >>>> >>>> On 12.07.2012 18:52, Walenz, Brian wrote: >>>>> You've captured the process nicely. >>>>> >>>>> After #1 finishes, check that you have one 'sliceSizes' file per bucket >>>>> directory. If any are missing, run that bucket again. I think (hope) that >>>>> #2 will complain if any are missing, but this has been a problem in the >>>>> past. >>>>> >>>>> Hopefully memory won't be an issue during sorting. I estimate memory size >>>>> as >>>>> 3 * (sizeof gz files) / #jobs. But, if you have Illumina + long reads >>>>> (454+, >>>>> Sanger), the balancing is screwed up and the early jobs (overlaps of >>>>> Illumina >>>>> to Illumina) have fewer overlaps than the later jobs (Illumina to long >>>>> reads). Every time I've run this, I could do 90-95% of the sort jobs on >>>>> our >>>>> grid, but had to use large memory machines for the rest. >>>>> >>>>> More jobs creates more files, but I don't think it is necessarily slower. >>>>> I >>>>> haven't benchmarked it though. >>>>> >>>>> No jobID for #3, it is tiny, does little compute, and not too much I/O. I >>>>> usually run this interactively off grid. >>>>> >>>>> b >>>>> >>>>> ________________________________________ >>>>> From: Christoph Hahn [chr...@gm...] >>>>> Sent: Thursday, July 12, 2012 9:31 AM >>>>> To: Walenz, Brian >>>>> Cc: wgs...@li... >>>>> Subject: Re: [wgs-assembler-users] runCA stopped while updating >>>>> overlapStore >>>>> - how to resume??? >>>>> >>>>> Hi Brian, >>>>> >>>>> I ran the runCA-overlapStoreBuild.pl script now. It created the three >>>>> scripts: >>>>> 1-bucketize.sh >>>>> 2-sort.sh >>>>> 3-index.sh >>>>> >>>>> right now I am running 1-bucketize.sh for every job index from 1 to >>>>> 2135. I have distributed the jobs on several CPUs and that works nicely. >>>>> >>>>> when this is finished I need to run 2-sort.sh. I specified -jobs 100 in >>>>> the runCA-overlapStoreBuild.pl, so as far as I understand it should have >>>>> created 100 jobs, right? So, I run 2-sort.sh for jobIDs 1 to 100, then? >>>>> the jobID in this case is actually the slicenumber, right? so, for e.g. >>>>> 2-sort.sh 2 it will look through all bucket directories and pull out >>>>> slice002.gz, read them into memory and write the overlaps into the store. >>>>> >>>>> When this is done I just need to run 3-index.sh once. No jobIDs >>>>> required, right? >>>>> >>>>> Am I missing anything? >>>>> >>>>> cheers, >>>>> Christoph >>>>> >>>>> >>>>> On 07/11/2012 05:54 AM, Walenz, Brian wrote: >>>>>> The first step will create 1 job for each overlapper job. These should be >>>>>> small memory, but there is some internal buffering done and I usually >>>>>> request 2gb for them anyway. >>>>>> >>>>>> The second step will create '-jobs j' jobs. Memory size here is a giant >>>>>> unknown. The '-memory m' option will cause the job to not run if it needs >>>>>> more than that much memory. Currently, you'll have to increase -memory >>>>>> for >>>>>> these jobs and find a bigger machine. >>>>>> >>>>>> All jobs in both steps are single-threaded and run independently of each >>>>>> other. >>>>>> >>>>>> b >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 7/10/12 6:46 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>> >>>>>>> Hi Brian, >>>>>>> >>>>>>> Thanks! overlaps are being computed now and CVS version of CA has been >>>>>>> successfully compiled. Will try the runCA-overlapStoreBuild.pl once the >>>>>>> overlapper is finished. One question there: I understand that the memory >>>>>>> usage is regulated by the -jobs j parameter. higher value for j means >>>>>>> less memory for every job. How can I specify the number of CPUs to be >>>>>>> used in the parallel steps? >>>>>>> >>>>>>> Thanks for your help! I appreciate it! >>>>>>> >>>>>>> cheers, >>>>>>> Christoph >>>>>>> >>>>>>> On 07/10/2012 10:18 PM, Walenz, Brian wrote: >>>>>>>> Quick guess is that runCA is finding the old ovlStore and assuming it is >>>>>>>> complete, then continuing on to frgcorr. runCA tests for the existence >>>>>>>> of >>>>>>>> name.ovlStore to determine if overlaps are finished; it doesn't check >>>>>>>> that >>>>>>>> the store is valid. So, delete *ovlStore* too. >>>>>>>> >>>>>>>> Your latest build (from scratch) is suffering from a long standing >>>>>>>> dependency issue. It needs kmer checked out and 'make install'ed. >>>>>>>> >>>>>>>> make[1]: *** No rule to make target `sweatShop.H', needed by >>>>>>>> `classifyMates.o'. Stop. >>>>>>>> make[1]: *** Waiting for unfinished jobs.... >>>>>>>> make: *** [objs] Error 1 >>>>>>>> >>>>>>>> Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. >>>>>>>> >>>>>>>> The kmer included in CA7 is too old for the CVS version of CA, so you'll >>>>>>>> need to grab it from subversion. >>>>>>>> >>>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Chec >>>>>>>> k_ >>>>>>>> ou >>>>>>>> t_and_Compile >>>>>>>> >>>>>>>> b >>>>>>>> >>>>>>>> >>>>>>>> On 7/10/12 4:00 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I actually tried to just rerun the overlapper. I moved the 1-overlapper >>>>>>>>> and the 3-overlapcorrection directories and just ran runCA and it >>>>>>>>> immediately starts with doing frgcorr. Do you mean recompute from the >>>>>>>>> very start? Is there a way to avoid recomputing the initial overlaps at >>>>>>>>> least(it took some 10000 CPUhours)?? >>>>>>>>> >>>>>>>>> Tried to compile it again - not successful. Ran make in the src >>>>>>>>> directory (output in makelog) and also in the AS_RUN directory (output >>>>>>>>> AS_RUN-makelog). >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Christoph >>>>>>>>> >>>>>>>>> >>>>>>>>> On 07/10/2012 09:04 PM, Walenz, Brian wrote: >>>>>>>>>> Odd, the *gz should only be deleted after the store is successfully >>>>>>>>>> built. >>>>>>>>>> runCA might have been confused by the attempt to rerun. The easiest >>>>>>>>>> will >>>>>>>>>> be >>>>>>>>>> to recompute. :-( >>>>>>>>>> >>>>>>>>>> I've never seen the 'libCA.a' error before. That particular program >>>>>>>>>> is >>>>>>>>>> the >>>>>>>>>> first to get built. Looks like libCA.a wasn't created. My fix for >>>>>>>>>> most >>>>>>>>>> strange compile errors is to remove the entire Linux-amd64 directory >>>>>>>>>> and >>>>>>>>>> recompile. If that fails, send along the complete output of make and >>>>>>>>>> I'll >>>>>>>>>> take a look. >>>>>>>>>> >>>>>>>>>> b >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 7/10/12 2:15 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Brian, >>>>>>>>>>> >>>>>>>>>>> Thanks for your reply! >>>>>>>>>>> >>>>>>>>>>> I would be happy to try the new parallel overlap store build, but I >>>>>>>>>>> think I need the *.ovb.gz outputs for that and unfortunately I dont >>>>>>>>>>> have >>>>>>>>>>> them any more. Looks like they were deleted after the ovlStore was >>>>>>>>>>> build. So I guess I ll need to run the overlapper again, first. Am I >>>>>>>>>>> understanding that correctly? >>>>>>>>>>> >>>>>>>>>>> I have downloaded the cvs and tried to make, but I get: >>>>>>>>>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. >>>>>>>>>>> Stop. >>>>>>>>>>> >>>>>>>>>>> I really appreciate your help! >>>>>>>>>>> >>>>>>>>>>> cheers, >>>>>>>>>>> Christoph >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>>>>>>>>>> Hi, Christoph- >>>>>>>>>>>> >>>>>>>>>>>> The original overlap store build is difficult to resume. I think it >>>>>>>>>>>> can >>>>>>>>>>>> be >>>>>>>>>>>> done, but it will take code changes that are probably specific to >>>>>>>>>>>> the >>>>>>>>>>>> case >>>>>>>>>>>> you have. Only if you do not have the *ovb.gz outputs from >>>>>>>>>>>> overlapper >>>>>>>>>>>> will >>>>>>>>>>>> I suggest this. >>>>>>>>>>>> >>>>>>>>>>>> Option 1 is then to restart. >>>>>>>>>>>> >>>>>>>>>>>> Option 2 is to use a new 'data-parallel' overlap store build >>>>>>>>>>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three >>>>>>>>>>>> grid >>>>>>>>>>>> jobs. The first job is parallel, and transfers the overlapper >>>>>>>>>>>> output >>>>>>>>>>>> into >>>>>>>>>>>> buckets for sorting. The second job, also parallel, sorts each >>>>>>>>>>>> bucket. >>>>>>>>>>>> The >>>>>>>>>>>> final job, sequential, builds an index for the store. Since this >>>>>>>>>>>> compute >>>>>>>>>>>> is >>>>>>>>>>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>>>>>>>>>> >>>>>>>>>>>> Its performance can be great -- at JCVI we've seen builds that we >>>>>>>>>>>> estimated >>>>>>>>>>>> would take 2 days using the original sequential build, finish in a >>>>>>>>>>>> few >>>>>>>>>>>> (4?) >>>>>>>>>>>> hours with the data parallel version. But on our development >>>>>>>>>>>> cluster, >>>>>>>>>>>> it >>>>>>>>>>>> is >>>>>>>>>>>> slower than the sequential version. It depends on the disk >>>>>>>>>>>> throughput. >>>>>>>>>>>> Our >>>>>>>>>>>> dev cluster is powered off of a 6-disk ZFS, while the production >>>>>>>>>>>> side >>>>>>>>>>>> has >>>>>>>>>>>> a >>>>>>>>>>>> big Isilon. >>>>>>>>>>>> >>>>>>>>>>>> It is only in CVS. I just added command line help and a bit of >>>>>>>>>>>> documentation, so do an update first. >>>>>>>>>>>> >>>>>>>>>>>> Happy to provide help if you want to try it out. More than happy to >>>>>>>>>>>> accept >>>>>>>>>>>> better documentation. >>>>>>>>>>>> >>>>>>>>>>>> b >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 7/10/12 6:47 AM, "Christoph Hahn"<chr...@gm...> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hei Ole, >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for your reply. I had looked on the preprocessing page you >>>>>>>>>>>>> are >>>>>>>>>>>>> referring to just recently. Sounds like a good approach you are >>>>>>>>>>>>> using! >>>>>>>>>>>>> Will definitely consider that to make the assembly more effective >>>>>>>>>>>>> in >>>>>>>>>>>>> a >>>>>>>>>>>>> next try. Thanks for that! >>>>>>>>>>>>> For now, I think I am pretty much over all the trimming and >>>>>>>>>>>>> correction >>>>>>>>>>>>> steps (once I get this last thing sorted out..). As far as I can >>>>>>>>>>>>> see >>>>>>>>>>>>> the >>>>>>>>>>>>> next step is already building the unitigs, so I ll try to finish >>>>>>>>>>>>> this >>>>>>>>>>>>> assembly as it is now. Will try to improve it afterwards. I am >>>>>>>>>>>>> really >>>>>>>>>>>>> curious how a first attempt of a hybrid approach (454+illumina) >>>>>>>>>>>>> will >>>>>>>>>>>>> perform in comparison to the pure illumina assemblies which I have >>>>>>>>>>>>> pretty much optimized now (and with which I am pretty happy, btw), >>>>>>>>>>>>> I >>>>>>>>>>>>> think. >>>>>>>>>>>>> >>>>>>>>>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly >>>>>>>>>>>>> now >>>>>>>>>>>>> will not work. For the next step (the unitigger) I ll need an >>>>>>>>>>>>> intact >>>>>>>>>>>>> overlap store. As it is now, I think it is useless, being only >>>>>>>>>>>>> half-updated.. I also discovered that just rerunning the previous >>>>>>>>>>>>> overlapStore command (the one before the frg- and ovlcorrection) is >>>>>>>>>>>>> not >>>>>>>>>>>>> working as I thought it would. >>>>>>>>>>>>> Seems to be a very unfortunate situation - really dont know how to >>>>>>>>>>>>> proceed.. It would be fantastic if anyone could give me a tip what >>>>>>>>>>>>> to >>>>>>>>>>>>> do!! >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for your help! >>>>>>>>>>>>> >>>>>>>>>>>>> much obliged, >>>>>>>>>>>>> Christoph >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>>>>>>>>> Hi Christoph. >>>>>>>>>>>>> >>>>>>>>>>>>> This is not an answer to your question, but a suggestion for a >>>>>>>>>>>>> work-around. If I remember correctly, you have both Illumina and >>>>>>>>>>>>> 454 >>>>>>>>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap >>>>>>>>>>>>> based >>>>>>>>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina >>>>>>>>>>>>> reads >>>>>>>>>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>>>>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, >>>>>>>>>>>>> I >>>>>>>>>>>>> ran it on some overlapping reads which I had merged with FLASH. >>>>>>>>>>>>> Then >>>>>>>>>>>>> you can use the set of trusted k-mers to correct different >>>>>>>>>>>>> datasets. >>>>>>>>>>>>> For example, I first ran CA to the end of OBT (overlap based >>>>>>>>>>>>> trimming) >>>>>>>>>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>>>>>>>>> the trusted k-mer set to correct these 454 reads too. If you do >>>>>>>>>>>>> this >>>>>>>>>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>>>>>>>>> deduplication on all the datasets too, then you'll end up with >>>>>>>>>>>>> reads >>>>>>>>>>>>> that you can use in assemblies where you skip relatively expensive >>>>>>>>>>>>> steps as frgcorrection. >>>>>>>>>>>>> >>>>>>>>>>>>> I don't think frgcorrection is that useful for the type of data >>>>>>>>>>>>> you're >>>>>>>>>>>>> using anyway. >>>>>>>>>>>>> >>>>>>>>>>>>> If you have a set of corrected reads, you can use these settings >>>>>>>>>>>>> for >>>>>>>>>>>>> CA: >>>>>>>>>>>>> doOBT=0 >>>>>>>>>>>>> doFragmentCorrection=0 >>>>>>>>>>>>> >>>>>>>>>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>>>>>>>>> assembly now. You might have to clean up your directory tree, like >>>>>>>>>>>>> removing the 3-overlapcorrection directory and maybe some other >>>>>>>>>>>>> steps >>>>>>>>>>>>> too. Apply with caution. >>>>>>>>>>>>> >>>>>>>>>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>>>>>>>>> > http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title>>>>>>>>>>>> > = >>>>>>>>>>>>> Pre >>>>>>>>>>>>> pr >>>>>>>>>>>>> oc >>>>>>>>>>>>> es >>>>>>>>>>>>> sing >>>>>>>>>>>>> and discussion with Brian. >>>>>>>>>>>>> >>>>>>>>>>>>> Ole >>>>>>>>>>>>> >>>>>>>>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> Dear users and developers, >>>>>>>>>>>>> >>>>>>>>>>>>> I have the following problem: In my assembly process I have just >>>>>>>>>>>>> completed >>>>>>>>>>>>> the fragment- and overlap error correction. Unfortunately runCA >>>>>>>>>>>>> stopped >>>>>>>>>>>>> in >>>>>>>>>>>>> the subsequent updating of the overlapStore, because of an >>>>>>>>>>>>> incorrectly >>>>>>>>>>>>> set >>>>>>>>>>>>> time limit.. >>>>>>>>>>>>> If I am trying to resume the assembly now, I get the following >>>>>>>>>>>>> error: >>>>>>>>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 >>>>>>>>>>>>> 2012 >>>>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapSto >>>>>>>>>>>>> re >>>>>>>>>>>>> -u >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>>>>>>>> rrection/salaris.erates> >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlap >>>>>>>>>>>>> Sto >>>>>>>>>>>>> re >>>>>>>>>>>>> -u >>>>>>>>>>>>> pd >>>>>>>>>>>>> ate-erates.err >>>>>>>>>>>>> 2>&1 >>>>>>>>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 >>>>>>>>>>>>> 2012 >>>>>>>>>>>>> (1 >>>>>>>>>>>>> seconds) >>>>>>>>>>>>> ERROR: Failed with signal HUP (1) >>>>>>>>>>>>> =================================================================== >>>>>>>>>>>>> === >>>>>>>>>>>>> == >>>>>>>>>>>>> == >>>>>>>>>>>>> == >>>>>>>>>>>>> ==== >>>>>>>>>>>>> >>>>>>>>>>>>> runCA failed. >>>>>>>>>>>>> >>>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>>> Stack trace: >>>>>>>>>>>>> >>>>>>>>>>>>> at >>>>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>>>>> line >>>>>>>>>>>>> 1237 >>>>>>>>>>>>> main::caFailure('failed to apply the overlap >>>>>>>>>>>>> corrections', >>>>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>>>>>>>> called >>>>>>>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>>>>>>>> main::overlapCorrection() called at >>>>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>>>>> line >>>>>>>>>>>>> 5880 >>>>>>>>>>>>> >>>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>>> Last few lines of the relevant log file >>>>>>>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overla >>>>>>>>>>>>> pSt >>>>>>>>>>>>> or >>>>>>>>>>>>> e- >>>>>>>>>>>>> up >>>>>>>>>>>>> date-erates.err): >>>>>>>>>>>>> >>>>>>>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' >>>>>>>>>>>>> for >>>>>>>>>>>>> reading: No such file or directory >>>>>>>>>>>>> >>>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>>> Failure message: >>>>>>>>>>>>> >>>>>>>>>>>>> failed to apply the overlap corrections >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>>>>>>>> reason >>>>>>>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>>>>>>>> already >>>>>>>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact >>>>>>>>>>>>> it >>>>>>>>>>>>> seems >>>>>>>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is >>>>>>>>>>>>> there >>>>>>>>>>>>> a >>>>>>>>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, >>>>>>>>>>>>> instead >>>>>>>>>>>>> of >>>>>>>>>>>>> from >>>>>>>>>>>>> 0001~, which is obviously not there any more?? >>>>>>>>>>>>> Another solution I was thinking of is to run the previous >>>>>>>>>>>>> overlapStore >>>>>>>>>>>>> command again manually (the one that was done before starting the >>>>>>>>>>>>> frgcorr >>>>>>>>>>>>> and ovlcorr: >>>>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapSto >>>>>>>>>>>>> re >>>>>>>>>>>>> -c >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING >>>>>>>>>>>>> -g >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M >>>>>>>>>>>>> 14000 >>>>>>>>>>>>> -L >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) >>>>>>>>>>>>> to >>>>>>>>>>>>> restore the status from before the frgcorr and ovlcorr steps, >>>>>>>>>>>>> before >>>>>>>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>>>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>>>>>>>> ovlcorr >>>>>>>>>>>>> steps, because these steps were really resource intensive. >>>>>>>>>>>>> >>>>>>>>>>>>> I would really appreciate any comments or suggestions to my >>>>>>>>>>>>> problem! >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> in advance for your help! >>>>>>>>>>>>> >>>>>>>>>>>>> much obliged, >>>>>>>>>>>>> Christoph >>>>>>>>>>>>> >>>>>>>>>>>>> University of Oslo >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ------------------------------------------------------------------- >>>>>>>>>>>>> --- >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> Live Security Virtual Conference >>>>>>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>>>>>> Discussions >>>>>>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>>>>>> malware >>>>>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> wgs-assembler-users mailing list >>>>>>>>>>>>> wgs...@li... >>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>>>>>>>> >>>>>>>>>>>>> ------------------------------------------------------------------- >>>>>>>>>>>>> -- >>>>>>>>>>>>> --- >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> Live Security Virtual Conference >>>>>>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>>>>>> Discussions >>>>>>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>>>>>> malware >>>>>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> wgs-assembler-users mailing list >>>>>>>>>>>>> wgs...@li... >>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>> |
From: Walenz, B. <bw...@jc...> - 2012-07-13 19:56:33
|
If the third step finished with no errors, then you can delete all the buckets, and all the ####.idx and ####.ovs files. Those would have been automagically removed had you enabled either of the delete options. If you have the 'ovs' file, it also likely finished successfully. Build unitigs seems to have made it up to loading overlaps. It still has to allocate space for unitigs. It will grow, but I don't know by how much. Safe guess is by more than 1gb -- so yes, move it somewhere with more memory. I'm not sure how hard it will be to get a good guess on total memory usage so we could abort the run early. It's on the todo list now. b On 7/13/12 2:35 PM, "Christoph Hahn" <chr...@gm...> wrote: > Hi Brian, > > No, I did not enable any of these delete functions, so I will delete the > bucket directories manually now. I do have ####.idx and ####.ovs files > (for the first 100 of 418 - 2-sort.sh ran 100 jobs. Is that a problem? > Yes, I think the bucket???? directories make up most of the difference > in disk space. > > Concerning the buildUnitigs, I was just wondering because it is now > running with constantly 15g on a 16g machine. Its running for almost 2 > hours now and has just created the following files at the beginning. > They are unchanged so far. > > -rw-r--r-- 1 chrishah users 2.6G Jul 13 18:48 salaris.fragmentInfo > -rw-r--r-- 1 chrishah users 0 Jul 13 18:48 > salaris.001.bestoverlapgraph-containments.log > -rw-r--r-- 1 chrishah users 2.4K Jul 13 18:48 unitigger.err > > Is there any increase of memory usage to be expected? If yes, I would be > inclined to stop it now and start it over again on a bigger machine > right away. > > Thanks for your help! I appreciate it! > > cheers, > Christoph > > On 13.07.2012 20:20, Walenz, Brian wrote: >> Hi, Christoph- >> >> Good to hear! You're the third person (I know of) to run the parallel >> version. Instead of fixing the older store build, I'd rather spend time to >> integrate the new one with runCA, either as a set of jobs for SGE, or a >> series of sequential jobs. It's just scripting, but there might be some >> performance issues to optimize. >> >> If the store is complete, the bucket directories can be deleted. The third >> step should have done this for you. Maybe not if you didn't enable >> deleteearly or deletelate. The store is complete if you have just the #### >> files, an 'idx' and an 'ovs' file. You should not have any ####.idx or >> ####.ovs files. Is the extra space in the bucket??? directories? The >> difference (546 - 320 = 226) seems to be a reasonable size for the buckets. >> >> Memory for buildUnitgs (aka bog) cannot be specified. There isn't any data >> we can keep on disk, or not load, or compute differently in a smaller memory >> size. Memory is used to store fragment meta data (clear lengths, mate >> pairs) and best overlaps, and constructed unitigs. The first two are of a >> known size. The number of unitigs depends on the assembly. We've seen an >> assembly that exhausted memory in bog, caused by junk fragments creating an >> enormous number of single-fragment unitigs. >> >> b >> >> >> >> On 7/13/12 1:53 PM, "Christoph Hahn"<chr...@gm...> wrote: >> >>> Hi Brian, >>> >>> It s done! I have by now also updated the overlapStore with the frg- and >>> ovlcorr and I am in the process of building unitigs now. >>> >>> I like this parallel version for building the ovlStore. You were right >>> the last jobs needed double the memory. When distributing the jobs to >>> several CPUs it is very time efficient and also used fewer overall >>> CPUhours in comparison to the regular overlapStore command. One thing >>> though is that I think it needs substantially more disk space. I am not >>> 100% sure (because its gone now..), but I believe the *.ovlStore build >>> by the regular command used some 320G of disk space, while the one I >>> have now is using 546G. Are all the bucket???? directories in *.ovlStore >>> still needed? >>> >>> Overall I think I learned a lot about CA by running the latest steps >>> again with the parallel version of ovlStore build and your help. Are >>> there plans to include a failsafe for the overlapStore update function, >>> until the process is finished? So that it can be resumed in case it >>> stops for whatever reason. >>> >>> One more thing: Is there a way to specify the memory buildUnitigs is >>> using? >>> >>> Thanks again for your help!! >>> >>> cheers, >>> Christoph >>> >>> >>> On 12.07.2012 18:52, Walenz, Brian wrote: >>>> You've captured the process nicely. >>>> >>>> After #1 finishes, check that you have one 'sliceSizes' file per bucket >>>> directory. If any are missing, run that bucket again. I think (hope) that >>>> #2 will complain if any are missing, but this has been a problem in the >>>> past. >>>> >>>> Hopefully memory won't be an issue during sorting. I estimate memory size >>>> as >>>> 3 * (sizeof gz files) / #jobs. But, if you have Illumina + long reads >>>> (454+, >>>> Sanger), the balancing is screwed up and the early jobs (overlaps of >>>> Illumina >>>> to Illumina) have fewer overlaps than the later jobs (Illumina to long >>>> reads). Every time I've run this, I could do 90-95% of the sort jobs on >>>> our >>>> grid, but had to use large memory machines for the rest. >>>> >>>> More jobs creates more files, but I don't think it is necessarily slower. >>>> I >>>> haven't benchmarked it though. >>>> >>>> No jobID for #3, it is tiny, does little compute, and not too much I/O. I >>>> usually run this interactively off grid. >>>> >>>> b >>>> >>>> ________________________________________ >>>> From: Christoph Hahn [chr...@gm...] >>>> Sent: Thursday, July 12, 2012 9:31 AM >>>> To: Walenz, Brian >>>> Cc: wgs...@li... >>>> Subject: Re: [wgs-assembler-users] runCA stopped while updating >>>> overlapStore >>>> - how to resume??? >>>> >>>> Hi Brian, >>>> >>>> I ran the runCA-overlapStoreBuild.pl script now. It created the three >>>> scripts: >>>> 1-bucketize.sh >>>> 2-sort.sh >>>> 3-index.sh >>>> >>>> right now I am running 1-bucketize.sh for every job index from 1 to >>>> 2135. I have distributed the jobs on several CPUs and that works nicely. >>>> >>>> when this is finished I need to run 2-sort.sh. I specified -jobs 100 in >>>> the runCA-overlapStoreBuild.pl, so as far as I understand it should have >>>> created 100 jobs, right? So, I run 2-sort.sh for jobIDs 1 to 100, then? >>>> the jobID in this case is actually the slicenumber, right? so, for e.g. >>>> 2-sort.sh 2 it will look through all bucket directories and pull out >>>> slice002.gz, read them into memory and write the overlaps into the store. >>>> >>>> When this is done I just need to run 3-index.sh once. No jobIDs >>>> required, right? >>>> >>>> Am I missing anything? >>>> >>>> cheers, >>>> Christoph >>>> >>>> >>>> On 07/11/2012 05:54 AM, Walenz, Brian wrote: >>>>> The first step will create 1 job for each overlapper job. These should be >>>>> small memory, but there is some internal buffering done and I usually >>>>> request 2gb for them anyway. >>>>> >>>>> The second step will create '-jobs j' jobs. Memory size here is a giant >>>>> unknown. The '-memory m' option will cause the job to not run if it needs >>>>> more than that much memory. Currently, you'll have to increase -memory >>>>> for >>>>> these jobs and find a bigger machine. >>>>> >>>>> All jobs in both steps are single-threaded and run independently of each >>>>> other. >>>>> >>>>> b >>>>> >>>>> >>>>> >>>>> >>>>> On 7/10/12 6:46 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>>> >>>>>> Hi Brian, >>>>>> >>>>>> Thanks! overlaps are being computed now and CVS version of CA has been >>>>>> successfully compiled. Will try the runCA-overlapStoreBuild.pl once the >>>>>> overlapper is finished. One question there: I understand that the memory >>>>>> usage is regulated by the -jobs j parameter. higher value for j means >>>>>> less memory for every job. How can I specify the number of CPUs to be >>>>>> used in the parallel steps? >>>>>> >>>>>> Thanks for your help! I appreciate it! >>>>>> >>>>>> cheers, >>>>>> Christoph >>>>>> >>>>>> On 07/10/2012 10:18 PM, Walenz, Brian wrote: >>>>>>> Quick guess is that runCA is finding the old ovlStore and assuming it is >>>>>>> complete, then continuing on to frgcorr. runCA tests for the existence >>>>>>> of >>>>>>> name.ovlStore to determine if overlaps are finished; it doesn't check >>>>>>> that >>>>>>> the store is valid. So, delete *ovlStore* too. >>>>>>> >>>>>>> Your latest build (from scratch) is suffering from a long standing >>>>>>> dependency issue. It needs kmer checked out and 'make install'ed. >>>>>>> >>>>>>> make[1]: *** No rule to make target `sweatShop.H', needed by >>>>>>> `classifyMates.o'. Stop. >>>>>>> make[1]: *** Waiting for unfinished jobs.... >>>>>>> make: *** [objs] Error 1 >>>>>>> >>>>>>> Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. >>>>>>> >>>>>>> The kmer included in CA7 is too old for the CVS version of CA, so you'll >>>>>>> need to grab it from subversion. >>>>>>> >>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Chec >>>>>>> k_ >>>>>>> ou >>>>>>> t_and_Compile >>>>>>> >>>>>>> b >>>>>>> >>>>>>> >>>>>>> On 7/10/12 4:00 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I actually tried to just rerun the overlapper. I moved the 1-overlapper >>>>>>>> and the 3-overlapcorrection directories and just ran runCA and it >>>>>>>> immediately starts with doing frgcorr. Do you mean recompute from the >>>>>>>> very start? Is there a way to avoid recomputing the initial overlaps at >>>>>>>> least(it took some 10000 CPUhours)?? >>>>>>>> >>>>>>>> Tried to compile it again - not successful. Ran make in the src >>>>>>>> directory (output in makelog) and also in the AS_RUN directory (output >>>>>>>> AS_RUN-makelog). >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Christoph >>>>>>>> >>>>>>>> >>>>>>>> On 07/10/2012 09:04 PM, Walenz, Brian wrote: >>>>>>>>> Odd, the *gz should only be deleted after the store is successfully >>>>>>>>> built. >>>>>>>>> runCA might have been confused by the attempt to rerun. The easiest >>>>>>>>> will >>>>>>>>> be >>>>>>>>> to recompute. :-( >>>>>>>>> >>>>>>>>> I've never seen the 'libCA.a' error before. That particular program >>>>>>>>> is >>>>>>>>> the >>>>>>>>> first to get built. Looks like libCA.a wasn't created. My fix for >>>>>>>>> most >>>>>>>>> strange compile errors is to remove the entire Linux-amd64 directory >>>>>>>>> and >>>>>>>>> recompile. If that fails, send along the complete output of make and >>>>>>>>> I'll >>>>>>>>> take a look. >>>>>>>>> >>>>>>>>> b >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 7/10/12 2:15 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>>>>> >>>>>>>>>> Hi Brian, >>>>>>>>>> >>>>>>>>>> Thanks for your reply! >>>>>>>>>> >>>>>>>>>> I would be happy to try the new parallel overlap store build, but I >>>>>>>>>> think I need the *.ovb.gz outputs for that and unfortunately I dont >>>>>>>>>> have >>>>>>>>>> them any more. Looks like they were deleted after the ovlStore was >>>>>>>>>> build. So I guess I ll need to run the overlapper again, first. Am I >>>>>>>>>> understanding that correctly? >>>>>>>>>> >>>>>>>>>> I have downloaded the cvs and tried to make, but I get: >>>>>>>>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. >>>>>>>>>> Stop. >>>>>>>>>> >>>>>>>>>> I really appreciate your help! >>>>>>>>>> >>>>>>>>>> cheers, >>>>>>>>>> Christoph >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>>>>>>>>> Hi, Christoph- >>>>>>>>>>> >>>>>>>>>>> The original overlap store build is difficult to resume. I think it >>>>>>>>>>> can >>>>>>>>>>> be >>>>>>>>>>> done, but it will take code changes that are probably specific to >>>>>>>>>>> the >>>>>>>>>>> case >>>>>>>>>>> you have. Only if you do not have the *ovb.gz outputs from >>>>>>>>>>> overlapper >>>>>>>>>>> will >>>>>>>>>>> I suggest this. >>>>>>>>>>> >>>>>>>>>>> Option 1 is then to restart. >>>>>>>>>>> >>>>>>>>>>> Option 2 is to use a new 'data-parallel' overlap store build >>>>>>>>>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three >>>>>>>>>>> grid >>>>>>>>>>> jobs. The first job is parallel, and transfers the overlapper >>>>>>>>>>> output >>>>>>>>>>> into >>>>>>>>>>> buckets for sorting. The second job, also parallel, sorts each >>>>>>>>>>> bucket. >>>>>>>>>>> The >>>>>>>>>>> final job, sequential, builds an index for the store. Since this >>>>>>>>>>> compute >>>>>>>>>>> is >>>>>>>>>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>>>>>>>>> >>>>>>>>>>> Its performance can be great -- at JCVI we've seen builds that we >>>>>>>>>>> estimated >>>>>>>>>>> would take 2 days using the original sequential build, finish in a >>>>>>>>>>> few >>>>>>>>>>> (4?) >>>>>>>>>>> hours with the data parallel version. But on our development >>>>>>>>>>> cluster, >>>>>>>>>>> it >>>>>>>>>>> is >>>>>>>>>>> slower than the sequential version. It depends on the disk >>>>>>>>>>> throughput. >>>>>>>>>>> Our >>>>>>>>>>> dev cluster is powered off of a 6-disk ZFS, while the production >>>>>>>>>>> side >>>>>>>>>>> has >>>>>>>>>>> a >>>>>>>>>>> big Isilon. >>>>>>>>>>> >>>>>>>>>>> It is only in CVS. I just added command line help and a bit of >>>>>>>>>>> documentation, so do an update first. >>>>>>>>>>> >>>>>>>>>>> Happy to provide help if you want to try it out. More than happy to >>>>>>>>>>> accept >>>>>>>>>>> better documentation. >>>>>>>>>>> >>>>>>>>>>> b >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 7/10/12 6:47 AM, "Christoph Hahn"<chr...@gm...> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hei Ole, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for your reply. I had looked on the preprocessing page you >>>>>>>>>>>> are >>>>>>>>>>>> referring to just recently. Sounds like a good approach you are >>>>>>>>>>>> using! >>>>>>>>>>>> Will definitely consider that to make the assembly more effective >>>>>>>>>>>> in >>>>>>>>>>>> a >>>>>>>>>>>> next try. Thanks for that! >>>>>>>>>>>> For now, I think I am pretty much over all the trimming and >>>>>>>>>>>> correction >>>>>>>>>>>> steps (once I get this last thing sorted out..). As far as I can >>>>>>>>>>>> see >>>>>>>>>>>> the >>>>>>>>>>>> next step is already building the unitigs, so I ll try to finish >>>>>>>>>>>> this >>>>>>>>>>>> assembly as it is now. Will try to improve it afterwards. I am >>>>>>>>>>>> really >>>>>>>>>>>> curious how a first attempt of a hybrid approach (454+illumina) >>>>>>>>>>>> will >>>>>>>>>>>> perform in comparison to the pure illumina assemblies which I have >>>>>>>>>>>> pretty much optimized now (and with which I am pretty happy, btw), >>>>>>>>>>>> I >>>>>>>>>>>> think. >>>>>>>>>>>> >>>>>>>>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly >>>>>>>>>>>> now >>>>>>>>>>>> will not work. For the next step (the unitigger) I ll need an >>>>>>>>>>>> intact >>>>>>>>>>>> overlap store. As it is now, I think it is useless, being only >>>>>>>>>>>> half-updated.. I also discovered that just rerunning the previous >>>>>>>>>>>> overlapStore command (the one before the frg- and ovlcorrection) is >>>>>>>>>>>> not >>>>>>>>>>>> working as I thought it would. >>>>>>>>>>>> Seems to be a very unfortunate situation - really dont know how to >>>>>>>>>>>> proceed.. It would be fantastic if anyone could give me a tip what >>>>>>>>>>>> to >>>>>>>>>>>> do!! >>>>>>>>>>>> >>>>>>>>>>>> Thanks for your help! >>>>>>>>>>>> >>>>>>>>>>>> much obliged, >>>>>>>>>>>> Christoph >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>>>>>>>> Hi Christoph. >>>>>>>>>>>> >>>>>>>>>>>> This is not an answer to your question, but a suggestion for a >>>>>>>>>>>> work-around. If I remember correctly, you have both Illumina and >>>>>>>>>>>> 454 >>>>>>>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap >>>>>>>>>>>> based >>>>>>>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina >>>>>>>>>>>> reads >>>>>>>>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>>>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, >>>>>>>>>>>> I >>>>>>>>>>>> ran it on some overlapping reads which I had merged with FLASH. >>>>>>>>>>>> Then >>>>>>>>>>>> you can use the set of trusted k-mers to correct different >>>>>>>>>>>> datasets. >>>>>>>>>>>> For example, I first ran CA to the end of OBT (overlap based >>>>>>>>>>>> trimming) >>>>>>>>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>>>>>>>> the trusted k-mer set to correct these 454 reads too. If you do >>>>>>>>>>>> this >>>>>>>>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>>>>>>>> deduplication on all the datasets too, then you'll end up with >>>>>>>>>>>> reads >>>>>>>>>>>> that you can use in assemblies where you skip relatively expensive >>>>>>>>>>>> steps as frgcorrection. >>>>>>>>>>>> >>>>>>>>>>>> I don't think frgcorrection is that useful for the type of data >>>>>>>>>>>> you're >>>>>>>>>>>> using anyway. >>>>>>>>>>>> >>>>>>>>>>>> If you have a set of corrected reads, you can use these settings >>>>>>>>>>>> for >>>>>>>>>>>> CA: >>>>>>>>>>>> doOBT=0 >>>>>>>>>>>> doFragmentCorrection=0 >>>>>>>>>>>> >>>>>>>>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>>>>>>>> assembly now. You might have to clean up your directory tree, like >>>>>>>>>>>> removing the 3-overlapcorrection directory and maybe some other >>>>>>>>>>>> steps >>>>>>>>>>>> too. Apply with caution. >>>>>>>>>>>> >>>>>>>>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>>>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title>>>>>>>>>>>> = >>>>>>>>>>>> Pre >>>>>>>>>>>> pr >>>>>>>>>>>> oc >>>>>>>>>>>> es >>>>>>>>>>>> sing >>>>>>>>>>>> and discussion with Brian. >>>>>>>>>>>> >>>>>>>>>>>> Ole >>>>>>>>>>>> >>>>>>>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> >>>>>>>>>>>> wrote: >>>>>>>>>>>> Dear users and developers, >>>>>>>>>>>> >>>>>>>>>>>> I have the following problem: In my assembly process I have just >>>>>>>>>>>> completed >>>>>>>>>>>> the fragment- and overlap error correction. Unfortunately runCA >>>>>>>>>>>> stopped >>>>>>>>>>>> in >>>>>>>>>>>> the subsequent updating of the overlapStore, because of an >>>>>>>>>>>> incorrectly >>>>>>>>>>>> set >>>>>>>>>>>> time limit.. >>>>>>>>>>>> If I am trying to resume the assembly now, I get the following >>>>>>>>>>>> error: >>>>>>>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 >>>>>>>>>>>> 2012 >>>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapSto >>>>>>>>>>>> re >>>>>>>>>>>> -u >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>>>>>>> rrection/salaris.erates> >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlap >>>>>>>>>>>> Sto >>>>>>>>>>>> re >>>>>>>>>>>> -u >>>>>>>>>>>> pd >>>>>>>>>>>> ate-erates.err >>>>>>>>>>>> 2>&1 >>>>>>>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 >>>>>>>>>>>> 2012 >>>>>>>>>>>> (1 >>>>>>>>>>>> seconds) >>>>>>>>>>>> ERROR: Failed with signal HUP (1) >>>>>>>>>>>> =================================================================== >>>>>>>>>>>> === >>>>>>>>>>>> == >>>>>>>>>>>> == >>>>>>>>>>>> == >>>>>>>>>>>> ==== >>>>>>>>>>>> >>>>>>>>>>>> runCA failed. >>>>>>>>>>>> >>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>> Stack trace: >>>>>>>>>>>> >>>>>>>>>>>> at >>>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>>>> line >>>>>>>>>>>> 1237 >>>>>>>>>>>> main::caFailure('failed to apply the overlap >>>>>>>>>>>> corrections', >>>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>>>>>>> called >>>>>>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>>>>>>> main::overlapCorrection() called at >>>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>>>> line >>>>>>>>>>>> 5880 >>>>>>>>>>>> >>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>> Last few lines of the relevant log file >>>>>>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overla >>>>>>>>>>>> pSt >>>>>>>>>>>> or >>>>>>>>>>>> e- >>>>>>>>>>>> up >>>>>>>>>>>> date-erates.err): >>>>>>>>>>>> >>>>>>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' >>>>>>>>>>>> for >>>>>>>>>>>> reading: No such file or directory >>>>>>>>>>>> >>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>> Failure message: >>>>>>>>>>>> >>>>>>>>>>>> failed to apply the overlap corrections >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>>>>>>> reason >>>>>>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>>>>>>> already >>>>>>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact >>>>>>>>>>>> it >>>>>>>>>>>> seems >>>>>>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is >>>>>>>>>>>> there >>>>>>>>>>>> a >>>>>>>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, >>>>>>>>>>>> instead >>>>>>>>>>>> of >>>>>>>>>>>> from >>>>>>>>>>>> 0001~, which is obviously not there any more?? >>>>>>>>>>>> Another solution I was thinking of is to run the previous >>>>>>>>>>>> overlapStore >>>>>>>>>>>> command again manually (the one that was done before starting the >>>>>>>>>>>> frgcorr >>>>>>>>>>>> and ovlcorr: >>>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapSto >>>>>>>>>>>> re >>>>>>>>>>>> -c >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING >>>>>>>>>>>> -g >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M >>>>>>>>>>>> 14000 >>>>>>>>>>>> -L >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) >>>>>>>>>>>> to >>>>>>>>>>>> restore the status from before the frgcorr and ovlcorr steps, >>>>>>>>>>>> before >>>>>>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>>>>>>> ovlcorr >>>>>>>>>>>> steps, because these steps were really resource intensive. >>>>>>>>>>>> >>>>>>>>>>>> I would really appreciate any comments or suggestions to my >>>>>>>>>>>> problem! >>>>>>>>>>>> Thanks >>>>>>>>>>>> in advance for your help! >>>>>>>>>>>> >>>>>>>>>>>> much obliged, >>>>>>>>>>>> Christoph >>>>>>>>>>>> >>>>>>>>>>>> University of Oslo >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------------------------- >>>>>>>>>>>> --- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> Live Security Virtual Conference >>>>>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>>>>> Discussions >>>>>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>>>>> malware >>>>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> wgs-assembler-users mailing list >>>>>>>>>>>> wgs...@li... >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------------------------- >>>>>>>>>>>> -- >>>>>>>>>>>> --- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> Live Security Virtual Conference >>>>>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>>>>> Discussions >>>>>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>>>>> malware >>>>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> wgs-assembler-users mailing list >>>>>>>>>>>> wgs...@li... >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>> >>>> >>>> >>> >> > |
From: Christoph H. <chr...@gm...> - 2012-07-13 18:35:43
|
Hi Brian, No, I did not enable any of these delete functions, so I will delete the bucket directories manually now. I do have ####.idx and ####.ovs files (for the first 100 of 418 - 2-sort.sh ran 100 jobs. Is that a problem? Yes, I think the bucket???? directories make up most of the difference in disk space. Concerning the buildUnitigs, I was just wondering because it is now running with constantly 15g on a 16g machine. Its running for almost 2 hours now and has just created the following files at the beginning. They are unchanged so far. -rw-r--r-- 1 chrishah users 2.6G Jul 13 18:48 salaris.fragmentInfo -rw-r--r-- 1 chrishah users 0 Jul 13 18:48 salaris.001.bestoverlapgraph-containments.log -rw-r--r-- 1 chrishah users 2.4K Jul 13 18:48 unitigger.err Is there any increase of memory usage to be expected? If yes, I would be inclined to stop it now and start it over again on a bigger machine right away. Thanks for your help! I appreciate it! cheers, Christoph On 13.07.2012 20:20, Walenz, Brian wrote: > Hi, Christoph- > > Good to hear! You're the third person (I know of) to run the parallel > version. Instead of fixing the older store build, I'd rather spend time to > integrate the new one with runCA, either as a set of jobs for SGE, or a > series of sequential jobs. It's just scripting, but there might be some > performance issues to optimize. > > If the store is complete, the bucket directories can be deleted. The third > step should have done this for you. Maybe not if you didn't enable > deleteearly or deletelate. The store is complete if you have just the #### > files, an 'idx' and an 'ovs' file. You should not have any ####.idx or > ####.ovs files. Is the extra space in the bucket??? directories? The > difference (546 - 320 = 226) seems to be a reasonable size for the buckets. > > Memory for buildUnitgs (aka bog) cannot be specified. There isn't any data > we can keep on disk, or not load, or compute differently in a smaller memory > size. Memory is used to store fragment meta data (clear lengths, mate > pairs) and best overlaps, and constructed unitigs. The first two are of a > known size. The number of unitigs depends on the assembly. We've seen an > assembly that exhausted memory in bog, caused by junk fragments creating an > enormous number of single-fragment unitigs. > > b > > > > On 7/13/12 1:53 PM, "Christoph Hahn"<chr...@gm...> wrote: > >> Hi Brian, >> >> It s done! I have by now also updated the overlapStore with the frg- and >> ovlcorr and I am in the process of building unitigs now. >> >> I like this parallel version for building the ovlStore. You were right >> the last jobs needed double the memory. When distributing the jobs to >> several CPUs it is very time efficient and also used fewer overall >> CPUhours in comparison to the regular overlapStore command. One thing >> though is that I think it needs substantially more disk space. I am not >> 100% sure (because its gone now..), but I believe the *.ovlStore build >> by the regular command used some 320G of disk space, while the one I >> have now is using 546G. Are all the bucket???? directories in *.ovlStore >> still needed? >> >> Overall I think I learned a lot about CA by running the latest steps >> again with the parallel version of ovlStore build and your help. Are >> there plans to include a failsafe for the overlapStore update function, >> until the process is finished? So that it can be resumed in case it >> stops for whatever reason. >> >> One more thing: Is there a way to specify the memory buildUnitigs is >> using? >> >> Thanks again for your help!! >> >> cheers, >> Christoph >> >> >> On 12.07.2012 18:52, Walenz, Brian wrote: >>> You've captured the process nicely. >>> >>> After #1 finishes, check that you have one 'sliceSizes' file per bucket >>> directory. If any are missing, run that bucket again. I think (hope) that >>> #2 will complain if any are missing, but this has been a problem in the past. >>> >>> Hopefully memory won't be an issue during sorting. I estimate memory size as >>> 3 * (sizeof gz files) / #jobs. But, if you have Illumina + long reads (454+, >>> Sanger), the balancing is screwed up and the early jobs (overlaps of Illumina >>> to Illumina) have fewer overlaps than the later jobs (Illumina to long >>> reads). Every time I've run this, I could do 90-95% of the sort jobs on our >>> grid, but had to use large memory machines for the rest. >>> >>> More jobs creates more files, but I don't think it is necessarily slower. I >>> haven't benchmarked it though. >>> >>> No jobID for #3, it is tiny, does little compute, and not too much I/O. I >>> usually run this interactively off grid. >>> >>> b >>> >>> ________________________________________ >>> From: Christoph Hahn [chr...@gm...] >>> Sent: Thursday, July 12, 2012 9:31 AM >>> To: Walenz, Brian >>> Cc: wgs...@li... >>> Subject: Re: [wgs-assembler-users] runCA stopped while updating overlapStore >>> - how to resume??? >>> >>> Hi Brian, >>> >>> I ran the runCA-overlapStoreBuild.pl script now. It created the three >>> scripts: >>> 1-bucketize.sh >>> 2-sort.sh >>> 3-index.sh >>> >>> right now I am running 1-bucketize.sh for every job index from 1 to >>> 2135. I have distributed the jobs on several CPUs and that works nicely. >>> >>> when this is finished I need to run 2-sort.sh. I specified -jobs 100 in >>> the runCA-overlapStoreBuild.pl, so as far as I understand it should have >>> created 100 jobs, right? So, I run 2-sort.sh for jobIDs 1 to 100, then? >>> the jobID in this case is actually the slicenumber, right? so, for e.g. >>> 2-sort.sh 2 it will look through all bucket directories and pull out >>> slice002.gz, read them into memory and write the overlaps into the store. >>> >>> When this is done I just need to run 3-index.sh once. No jobIDs >>> required, right? >>> >>> Am I missing anything? >>> >>> cheers, >>> Christoph >>> >>> >>> On 07/11/2012 05:54 AM, Walenz, Brian wrote: >>>> The first step will create 1 job for each overlapper job. These should be >>>> small memory, but there is some internal buffering done and I usually >>>> request 2gb for them anyway. >>>> >>>> The second step will create '-jobs j' jobs. Memory size here is a giant >>>> unknown. The '-memory m' option will cause the job to not run if it needs >>>> more than that much memory. Currently, you'll have to increase -memory for >>>> these jobs and find a bigger machine. >>>> >>>> All jobs in both steps are single-threaded and run independently of each >>>> other. >>>> >>>> b >>>> >>>> >>>> >>>> >>>> On 7/10/12 6:46 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>> >>>>> Hi Brian, >>>>> >>>>> Thanks! overlaps are being computed now and CVS version of CA has been >>>>> successfully compiled. Will try the runCA-overlapStoreBuild.pl once the >>>>> overlapper is finished. One question there: I understand that the memory >>>>> usage is regulated by the -jobs j parameter. higher value for j means >>>>> less memory for every job. How can I specify the number of CPUs to be >>>>> used in the parallel steps? >>>>> >>>>> Thanks for your help! I appreciate it! >>>>> >>>>> cheers, >>>>> Christoph >>>>> >>>>> On 07/10/2012 10:18 PM, Walenz, Brian wrote: >>>>>> Quick guess is that runCA is finding the old ovlStore and assuming it is >>>>>> complete, then continuing on to frgcorr. runCA tests for the existence of >>>>>> name.ovlStore to determine if overlaps are finished; it doesn't check that >>>>>> the store is valid. So, delete *ovlStore* too. >>>>>> >>>>>> Your latest build (from scratch) is suffering from a long standing >>>>>> dependency issue. It needs kmer checked out and 'make install'ed. >>>>>> >>>>>> make[1]: *** No rule to make target `sweatShop.H', needed by >>>>>> `classifyMates.o'. Stop. >>>>>> make[1]: *** Waiting for unfinished jobs.... >>>>>> make: *** [objs] Error 1 >>>>>> >>>>>> Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. >>>>>> >>>>>> The kmer included in CA7 is too old for the CVS version of CA, so you'll >>>>>> need to grab it from subversion. >>>>>> >>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_ >>>>>> ou >>>>>> t_and_Compile >>>>>> >>>>>> b >>>>>> >>>>>> >>>>>> On 7/10/12 4:00 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I actually tried to just rerun the overlapper. I moved the 1-overlapper >>>>>>> and the 3-overlapcorrection directories and just ran runCA and it >>>>>>> immediately starts with doing frgcorr. Do you mean recompute from the >>>>>>> very start? Is there a way to avoid recomputing the initial overlaps at >>>>>>> least(it took some 10000 CPUhours)?? >>>>>>> >>>>>>> Tried to compile it again - not successful. Ran make in the src >>>>>>> directory (output in makelog) and also in the AS_RUN directory (output >>>>>>> AS_RUN-makelog). >>>>>>> >>>>>>> Thanks, >>>>>>> Christoph >>>>>>> >>>>>>> >>>>>>> On 07/10/2012 09:04 PM, Walenz, Brian wrote: >>>>>>>> Odd, the *gz should only be deleted after the store is successfully >>>>>>>> built. >>>>>>>> runCA might have been confused by the attempt to rerun. The easiest >>>>>>>> will >>>>>>>> be >>>>>>>> to recompute. :-( >>>>>>>> >>>>>>>> I've never seen the 'libCA.a' error before. That particular program is >>>>>>>> the >>>>>>>> first to get built. Looks like libCA.a wasn't created. My fix for most >>>>>>>> strange compile errors is to remove the entire Linux-amd64 directory and >>>>>>>> recompile. If that fails, send along the complete output of make and >>>>>>>> I'll >>>>>>>> take a look. >>>>>>>> >>>>>>>> b >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 7/10/12 2:15 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>>>> >>>>>>>>> Hi Brian, >>>>>>>>> >>>>>>>>> Thanks for your reply! >>>>>>>>> >>>>>>>>> I would be happy to try the new parallel overlap store build, but I >>>>>>>>> think I need the *.ovb.gz outputs for that and unfortunately I dont >>>>>>>>> have >>>>>>>>> them any more. Looks like they were deleted after the ovlStore was >>>>>>>>> build. So I guess I ll need to run the overlapper again, first. Am I >>>>>>>>> understanding that correctly? >>>>>>>>> >>>>>>>>> I have downloaded the cvs and tried to make, but I get: >>>>>>>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. >>>>>>>>> >>>>>>>>> I really appreciate your help! >>>>>>>>> >>>>>>>>> cheers, >>>>>>>>> Christoph >>>>>>>>> >>>>>>>>> >>>>>>>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>>>>>>>> Hi, Christoph- >>>>>>>>>> >>>>>>>>>> The original overlap store build is difficult to resume. I think it >>>>>>>>>> can >>>>>>>>>> be >>>>>>>>>> done, but it will take code changes that are probably specific to the >>>>>>>>>> case >>>>>>>>>> you have. Only if you do not have the *ovb.gz outputs from overlapper >>>>>>>>>> will >>>>>>>>>> I suggest this. >>>>>>>>>> >>>>>>>>>> Option 1 is then to restart. >>>>>>>>>> >>>>>>>>>> Option 2 is to use a new 'data-parallel' overlap store build >>>>>>>>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three >>>>>>>>>> grid >>>>>>>>>> jobs. The first job is parallel, and transfers the overlapper output >>>>>>>>>> into >>>>>>>>>> buckets for sorting. The second job, also parallel, sorts each >>>>>>>>>> bucket. >>>>>>>>>> The >>>>>>>>>> final job, sequential, builds an index for the store. Since this >>>>>>>>>> compute >>>>>>>>>> is >>>>>>>>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>>>>>>>> >>>>>>>>>> Its performance can be great -- at JCVI we've seen builds that we >>>>>>>>>> estimated >>>>>>>>>> would take 2 days using the original sequential build, finish in a few >>>>>>>>>> (4?) >>>>>>>>>> hours with the data parallel version. But on our development cluster, >>>>>>>>>> it >>>>>>>>>> is >>>>>>>>>> slower than the sequential version. It depends on the disk >>>>>>>>>> throughput. >>>>>>>>>> Our >>>>>>>>>> dev cluster is powered off of a 6-disk ZFS, while the production side >>>>>>>>>> has >>>>>>>>>> a >>>>>>>>>> big Isilon. >>>>>>>>>> >>>>>>>>>> It is only in CVS. I just added command line help and a bit of >>>>>>>>>> documentation, so do an update first. >>>>>>>>>> >>>>>>>>>> Happy to provide help if you want to try it out. More than happy to >>>>>>>>>> accept >>>>>>>>>> better documentation. >>>>>>>>>> >>>>>>>>>> b >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 7/10/12 6:47 AM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>>>>>> >>>>>>>>>>> Hei Ole, >>>>>>>>>>> >>>>>>>>>>> Thanks for your reply. I had looked on the preprocessing page you are >>>>>>>>>>> referring to just recently. Sounds like a good approach you are >>>>>>>>>>> using! >>>>>>>>>>> Will definitely consider that to make the assembly more effective in >>>>>>>>>>> a >>>>>>>>>>> next try. Thanks for that! >>>>>>>>>>> For now, I think I am pretty much over all the trimming and >>>>>>>>>>> correction >>>>>>>>>>> steps (once I get this last thing sorted out..). As far as I can see >>>>>>>>>>> the >>>>>>>>>>> next step is already building the unitigs, so I ll try to finish this >>>>>>>>>>> assembly as it is now. Will try to improve it afterwards. I am really >>>>>>>>>>> curious how a first attempt of a hybrid approach (454+illumina) will >>>>>>>>>>> perform in comparison to the pure illumina assemblies which I have >>>>>>>>>>> pretty much optimized now (and with which I am pretty happy, btw), I >>>>>>>>>>> think. >>>>>>>>>>> >>>>>>>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly >>>>>>>>>>> now >>>>>>>>>>> will not work. For the next step (the unitigger) I ll need an intact >>>>>>>>>>> overlap store. As it is now, I think it is useless, being only >>>>>>>>>>> half-updated.. I also discovered that just rerunning the previous >>>>>>>>>>> overlapStore command (the one before the frg- and ovlcorrection) is >>>>>>>>>>> not >>>>>>>>>>> working as I thought it would. >>>>>>>>>>> Seems to be a very unfortunate situation - really dont know how to >>>>>>>>>>> proceed.. It would be fantastic if anyone could give me a tip what to >>>>>>>>>>> do!! >>>>>>>>>>> >>>>>>>>>>> Thanks for your help! >>>>>>>>>>> >>>>>>>>>>> much obliged, >>>>>>>>>>> Christoph >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>>>>>>>> Hi Christoph. >>>>>>>>>>>> >>>>>>>>>>>> This is not an answer to your question, but a suggestion for a >>>>>>>>>>>> work-around. If I remember correctly, you have both Illumina and 454 >>>>>>>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap >>>>>>>>>>>> based >>>>>>>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>>>>>>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>>>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, >>>>>>>>>>>> I >>>>>>>>>>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>>>>>>>>>> you can use the set of trusted k-mers to correct different datasets. >>>>>>>>>>>> For example, I first ran CA to the end of OBT (overlap based >>>>>>>>>>>> trimming) >>>>>>>>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>>>>>>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>>>>>>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>>>>>>>> deduplication on all the datasets too, then you'll end up with reads >>>>>>>>>>>> that you can use in assemblies where you skip relatively expensive >>>>>>>>>>>> steps as frgcorrection. >>>>>>>>>>>> >>>>>>>>>>>> I don't think frgcorrection is that useful for the type of data >>>>>>>>>>>> you're >>>>>>>>>>>> using anyway. >>>>>>>>>>>> >>>>>>>>>>>> If you have a set of corrected reads, you can use these settings for >>>>>>>>>>>> CA: >>>>>>>>>>>> doOBT=0 >>>>>>>>>>>> doFragmentCorrection=0 >>>>>>>>>>>> >>>>>>>>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>>>>>>>> assembly now. You might have to clean up your directory tree, like >>>>>>>>>>>> removing the 3-overlapcorrection directory and maybe some other >>>>>>>>>>>> steps >>>>>>>>>>>> too. Apply with caution. >>>>>>>>>>>> >>>>>>>>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>>>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title= >>>>>>>>>>>> Pre >>>>>>>>>>>> pr >>>>>>>>>>>> oc >>>>>>>>>>>> es >>>>>>>>>>>> sing >>>>>>>>>>>> and discussion with Brian. >>>>>>>>>>>> >>>>>>>>>>>> Ole >>>>>>>>>>>> >>>>>>>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> Dear users and developers, >>>>>>>>>>>>> >>>>>>>>>>>>> I have the following problem: In my assembly process I have just >>>>>>>>>>>>> completed >>>>>>>>>>>>> the fragment- and overlap error correction. Unfortunately runCA >>>>>>>>>>>>> stopped >>>>>>>>>>>>> in >>>>>>>>>>>>> the subsequent updating of the overlapStore, because of an >>>>>>>>>>>>> incorrectly >>>>>>>>>>>>> set >>>>>>>>>>>>> time limit.. >>>>>>>>>>>>> If I am trying to resume the assembly now, I get the following >>>>>>>>>>>>> error: >>>>>>>>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 >>>>>>>>>>>>> 2012 >>>>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapSto >>>>>>>>>>>>> re >>>>>>>>>>>>> -u >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>>>>>>>> rrection/salaris.erates> >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlap >>>>>>>>>>>>> Sto >>>>>>>>>>>>> re >>>>>>>>>>>>> -u >>>>>>>>>>>>> pd >>>>>>>>>>>>> ate-erates.err >>>>>>>>>>>>> 2>&1 >>>>>>>>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 >>>>>>>>>>>>> 2012 >>>>>>>>>>>>> (1 >>>>>>>>>>>>> seconds) >>>>>>>>>>>>> ERROR: Failed with signal HUP (1) >>>>>>>>>>>>> =================================================================== >>>>>>>>>>>>> === >>>>>>>>>>>>> == >>>>>>>>>>>>> == >>>>>>>>>>>>> == >>>>>>>>>>>>> ==== >>>>>>>>>>>>> >>>>>>>>>>>>> runCA failed. >>>>>>>>>>>>> >>>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>>> Stack trace: >>>>>>>>>>>>> >>>>>>>>>>>>> at >>>>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>>>>> line >>>>>>>>>>>>> 1237 >>>>>>>>>>>>> main::caFailure('failed to apply the overlap >>>>>>>>>>>>> corrections', >>>>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>>>>>>>> called >>>>>>>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>>>>>>>> main::overlapCorrection() called at >>>>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>>>>> line >>>>>>>>>>>>> 5880 >>>>>>>>>>>>> >>>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>>> Last few lines of the relevant log file >>>>>>>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overla >>>>>>>>>>>>> pSt >>>>>>>>>>>>> or >>>>>>>>>>>>> e- >>>>>>>>>>>>> up >>>>>>>>>>>>> date-erates.err): >>>>>>>>>>>>> >>>>>>>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' >>>>>>>>>>>>> for >>>>>>>>>>>>> reading: No such file or directory >>>>>>>>>>>>> >>>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>>> Failure message: >>>>>>>>>>>>> >>>>>>>>>>>>> failed to apply the overlap corrections >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>>>>>>>> reason >>>>>>>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>>>>>>>> already >>>>>>>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact >>>>>>>>>>>>> it >>>>>>>>>>>>> seems >>>>>>>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is >>>>>>>>>>>>> there >>>>>>>>>>>>> a >>>>>>>>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, >>>>>>>>>>>>> instead >>>>>>>>>>>>> of >>>>>>>>>>>>> from >>>>>>>>>>>>> 0001~, which is obviously not there any more?? >>>>>>>>>>>>> Another solution I was thinking of is to run the previous >>>>>>>>>>>>> overlapStore >>>>>>>>>>>>> command again manually (the one that was done before starting the >>>>>>>>>>>>> frgcorr >>>>>>>>>>>>> and ovlcorr: >>>>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapSto >>>>>>>>>>>>> re >>>>>>>>>>>>> -c >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING >>>>>>>>>>>>> -g >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M >>>>>>>>>>>>> 14000 >>>>>>>>>>>>> -L >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) >>>>>>>>>>>>> to >>>>>>>>>>>>> restore the status from before the frgcorr and ovlcorr steps, >>>>>>>>>>>>> before >>>>>>>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>>>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>>>>>>>> ovlcorr >>>>>>>>>>>>> steps, because these steps were really resource intensive. >>>>>>>>>>>>> >>>>>>>>>>>>> I would really appreciate any comments or suggestions to my >>>>>>>>>>>>> problem! >>>>>>>>>>>>> Thanks >>>>>>>>>>>>> in advance for your help! >>>>>>>>>>>>> >>>>>>>>>>>>> much obliged, >>>>>>>>>>>>> Christoph >>>>>>>>>>>>> >>>>>>>>>>>>> University of Oslo >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ------------------------------------------------------------------- >>>>>>>>>>>>> --- >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- >>>>>>>>>>>>> Live Security Virtual Conference >>>>>>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>>>>>> Discussions >>>>>>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>>>>>> malware >>>>>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> wgs-assembler-users mailing list >>>>>>>>>>>>> wgs...@li... >>>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> --- >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> Live Security Virtual Conference >>>>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>>>> Discussions >>>>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>>>> malware >>>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> wgs-assembler-users mailing list >>>>>>>>>>> wgs...@li... >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>> >>> >>> >> > |
From: Walenz, B. <bw...@jc...> - 2012-07-13 18:20:55
|
Hi, Christoph- Good to hear! You're the third person (I know of) to run the parallel version. Instead of fixing the older store build, I'd rather spend time to integrate the new one with runCA, either as a set of jobs for SGE, or a series of sequential jobs. It's just scripting, but there might be some performance issues to optimize. If the store is complete, the bucket directories can be deleted. The third step should have done this for you. Maybe not if you didn't enable deleteearly or deletelate. The store is complete if you have just the #### files, an 'idx' and an 'ovs' file. You should not have any ####.idx or ####.ovs files. Is the extra space in the bucket??? directories? The difference (546 - 320 = 226) seems to be a reasonable size for the buckets. Memory for buildUnitgs (aka bog) cannot be specified. There isn't any data we can keep on disk, or not load, or compute differently in a smaller memory size. Memory is used to store fragment meta data (clear lengths, mate pairs) and best overlaps, and constructed unitigs. The first two are of a known size. The number of unitigs depends on the assembly. We've seen an assembly that exhausted memory in bog, caused by junk fragments creating an enormous number of single-fragment unitigs. b On 7/13/12 1:53 PM, "Christoph Hahn" <chr...@gm...> wrote: > Hi Brian, > > It s done! I have by now also updated the overlapStore with the frg- and > ovlcorr and I am in the process of building unitigs now. > > I like this parallel version for building the ovlStore. You were right > the last jobs needed double the memory. When distributing the jobs to > several CPUs it is very time efficient and also used fewer overall > CPUhours in comparison to the regular overlapStore command. One thing > though is that I think it needs substantially more disk space. I am not > 100% sure (because its gone now..), but I believe the *.ovlStore build > by the regular command used some 320G of disk space, while the one I > have now is using 546G. Are all the bucket???? directories in *.ovlStore > still needed? > > Overall I think I learned a lot about CA by running the latest steps > again with the parallel version of ovlStore build and your help. Are > there plans to include a failsafe for the overlapStore update function, > until the process is finished? So that it can be resumed in case it > stops for whatever reason. > > One more thing: Is there a way to specify the memory buildUnitigs is > using? > > Thanks again for your help!! > > cheers, > Christoph > > > On 12.07.2012 18:52, Walenz, Brian wrote: >> You've captured the process nicely. >> >> After #1 finishes, check that you have one 'sliceSizes' file per bucket >> directory. If any are missing, run that bucket again. I think (hope) that >> #2 will complain if any are missing, but this has been a problem in the past. >> >> Hopefully memory won't be an issue during sorting. I estimate memory size as >> 3 * (sizeof gz files) / #jobs. But, if you have Illumina + long reads (454+, >> Sanger), the balancing is screwed up and the early jobs (overlaps of Illumina >> to Illumina) have fewer overlaps than the later jobs (Illumina to long >> reads). Every time I've run this, I could do 90-95% of the sort jobs on our >> grid, but had to use large memory machines for the rest. >> >> More jobs creates more files, but I don't think it is necessarily slower. I >> haven't benchmarked it though. >> >> No jobID for #3, it is tiny, does little compute, and not too much I/O. I >> usually run this interactively off grid. >> >> b >> >> ________________________________________ >> From: Christoph Hahn [chr...@gm...] >> Sent: Thursday, July 12, 2012 9:31 AM >> To: Walenz, Brian >> Cc: wgs...@li... >> Subject: Re: [wgs-assembler-users] runCA stopped while updating overlapStore >> - how to resume??? >> >> Hi Brian, >> >> I ran the runCA-overlapStoreBuild.pl script now. It created the three >> scripts: >> 1-bucketize.sh >> 2-sort.sh >> 3-index.sh >> >> right now I am running 1-bucketize.sh for every job index from 1 to >> 2135. I have distributed the jobs on several CPUs and that works nicely. >> >> when this is finished I need to run 2-sort.sh. I specified -jobs 100 in >> the runCA-overlapStoreBuild.pl, so as far as I understand it should have >> created 100 jobs, right? So, I run 2-sort.sh for jobIDs 1 to 100, then? >> the jobID in this case is actually the slicenumber, right? so, for e.g. >> 2-sort.sh 2 it will look through all bucket directories and pull out >> slice002.gz, read them into memory and write the overlaps into the store. >> >> When this is done I just need to run 3-index.sh once. No jobIDs >> required, right? >> >> Am I missing anything? >> >> cheers, >> Christoph >> >> >> On 07/11/2012 05:54 AM, Walenz, Brian wrote: >>> The first step will create 1 job for each overlapper job. These should be >>> small memory, but there is some internal buffering done and I usually >>> request 2gb for them anyway. >>> >>> The second step will create '-jobs j' jobs. Memory size here is a giant >>> unknown. The '-memory m' option will cause the job to not run if it needs >>> more than that much memory. Currently, you'll have to increase -memory for >>> these jobs and find a bigger machine. >>> >>> All jobs in both steps are single-threaded and run independently of each >>> other. >>> >>> b >>> >>> >>> >>> >>> On 7/10/12 6:46 PM, "Christoph Hahn"<chr...@gm...> wrote: >>> >>>> Hi Brian, >>>> >>>> Thanks! overlaps are being computed now and CVS version of CA has been >>>> successfully compiled. Will try the runCA-overlapStoreBuild.pl once the >>>> overlapper is finished. One question there: I understand that the memory >>>> usage is regulated by the -jobs j parameter. higher value for j means >>>> less memory for every job. How can I specify the number of CPUs to be >>>> used in the parallel steps? >>>> >>>> Thanks for your help! I appreciate it! >>>> >>>> cheers, >>>> Christoph >>>> >>>> On 07/10/2012 10:18 PM, Walenz, Brian wrote: >>>>> Quick guess is that runCA is finding the old ovlStore and assuming it is >>>>> complete, then continuing on to frgcorr. runCA tests for the existence of >>>>> name.ovlStore to determine if overlaps are finished; it doesn't check that >>>>> the store is valid. So, delete *ovlStore* too. >>>>> >>>>> Your latest build (from scratch) is suffering from a long standing >>>>> dependency issue. It needs kmer checked out and 'make install'ed. >>>>> >>>>> make[1]: *** No rule to make target `sweatShop.H', needed by >>>>> `classifyMates.o'. Stop. >>>>> make[1]: *** Waiting for unfinished jobs.... >>>>> make: *** [objs] Error 1 >>>>> >>>>> Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. >>>>> >>>>> The kmer included in CA7 is too old for the CVS version of CA, so you'll >>>>> need to grab it from subversion. >>>>> >>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_ >>>>> ou >>>>> t_and_Compile >>>>> >>>>> b >>>>> >>>>> >>>>> On 7/10/12 4:00 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I actually tried to just rerun the overlapper. I moved the 1-overlapper >>>>>> and the 3-overlapcorrection directories and just ran runCA and it >>>>>> immediately starts with doing frgcorr. Do you mean recompute from the >>>>>> very start? Is there a way to avoid recomputing the initial overlaps at >>>>>> least(it took some 10000 CPUhours)?? >>>>>> >>>>>> Tried to compile it again - not successful. Ran make in the src >>>>>> directory (output in makelog) and also in the AS_RUN directory (output >>>>>> AS_RUN-makelog). >>>>>> >>>>>> Thanks, >>>>>> Christoph >>>>>> >>>>>> >>>>>> On 07/10/2012 09:04 PM, Walenz, Brian wrote: >>>>>>> Odd, the *gz should only be deleted after the store is successfully >>>>>>> built. >>>>>>> runCA might have been confused by the attempt to rerun. The easiest >>>>>>> will >>>>>>> be >>>>>>> to recompute. :-( >>>>>>> >>>>>>> I've never seen the 'libCA.a' error before. That particular program is >>>>>>> the >>>>>>> first to get built. Looks like libCA.a wasn't created. My fix for most >>>>>>> strange compile errors is to remove the entire Linux-amd64 directory and >>>>>>> recompile. If that fails, send along the complete output of make and >>>>>>> I'll >>>>>>> take a look. >>>>>>> >>>>>>> b >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 7/10/12 2:15 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>>> >>>>>>>> Hi Brian, >>>>>>>> >>>>>>>> Thanks for your reply! >>>>>>>> >>>>>>>> I would be happy to try the new parallel overlap store build, but I >>>>>>>> think I need the *.ovb.gz outputs for that and unfortunately I dont >>>>>>>> have >>>>>>>> them any more. Looks like they were deleted after the ovlStore was >>>>>>>> build. So I guess I ll need to run the overlapper again, first. Am I >>>>>>>> understanding that correctly? >>>>>>>> >>>>>>>> I have downloaded the cvs and tried to make, but I get: >>>>>>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. >>>>>>>> >>>>>>>> I really appreciate your help! >>>>>>>> >>>>>>>> cheers, >>>>>>>> Christoph >>>>>>>> >>>>>>>> >>>>>>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>>>>>>> Hi, Christoph- >>>>>>>>> >>>>>>>>> The original overlap store build is difficult to resume. I think it >>>>>>>>> can >>>>>>>>> be >>>>>>>>> done, but it will take code changes that are probably specific to the >>>>>>>>> case >>>>>>>>> you have. Only if you do not have the *ovb.gz outputs from overlapper >>>>>>>>> will >>>>>>>>> I suggest this. >>>>>>>>> >>>>>>>>> Option 1 is then to restart. >>>>>>>>> >>>>>>>>> Option 2 is to use a new 'data-parallel' overlap store build >>>>>>>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three >>>>>>>>> grid >>>>>>>>> jobs. The first job is parallel, and transfers the overlapper output >>>>>>>>> into >>>>>>>>> buckets for sorting. The second job, also parallel, sorts each >>>>>>>>> bucket. >>>>>>>>> The >>>>>>>>> final job, sequential, builds an index for the store. Since this >>>>>>>>> compute >>>>>>>>> is >>>>>>>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>>>>>>> >>>>>>>>> Its performance can be great -- at JCVI we've seen builds that we >>>>>>>>> estimated >>>>>>>>> would take 2 days using the original sequential build, finish in a few >>>>>>>>> (4?) >>>>>>>>> hours with the data parallel version. But on our development cluster, >>>>>>>>> it >>>>>>>>> is >>>>>>>>> slower than the sequential version. It depends on the disk >>>>>>>>> throughput. >>>>>>>>> Our >>>>>>>>> dev cluster is powered off of a 6-disk ZFS, while the production side >>>>>>>>> has >>>>>>>>> a >>>>>>>>> big Isilon. >>>>>>>>> >>>>>>>>> It is only in CVS. I just added command line help and a bit of >>>>>>>>> documentation, so do an update first. >>>>>>>>> >>>>>>>>> Happy to provide help if you want to try it out. More than happy to >>>>>>>>> accept >>>>>>>>> better documentation. >>>>>>>>> >>>>>>>>> b >>>>>>>>> >>>>>>>>> >>>>>>>>> On 7/10/12 6:47 AM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>>>>> >>>>>>>>>> Hei Ole, >>>>>>>>>> >>>>>>>>>> Thanks for your reply. I had looked on the preprocessing page you are >>>>>>>>>> referring to just recently. Sounds like a good approach you are >>>>>>>>>> using! >>>>>>>>>> Will definitely consider that to make the assembly more effective in >>>>>>>>>> a >>>>>>>>>> next try. Thanks for that! >>>>>>>>>> For now, I think I am pretty much over all the trimming and >>>>>>>>>> correction >>>>>>>>>> steps (once I get this last thing sorted out..). As far as I can see >>>>>>>>>> the >>>>>>>>>> next step is already building the unitigs, so I ll try to finish this >>>>>>>>>> assembly as it is now. Will try to improve it afterwards. I am really >>>>>>>>>> curious how a first attempt of a hybrid approach (454+illumina) will >>>>>>>>>> perform in comparison to the pure illumina assemblies which I have >>>>>>>>>> pretty much optimized now (and with which I am pretty happy, btw), I >>>>>>>>>> think. >>>>>>>>>> >>>>>>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly >>>>>>>>>> now >>>>>>>>>> will not work. For the next step (the unitigger) I ll need an intact >>>>>>>>>> overlap store. As it is now, I think it is useless, being only >>>>>>>>>> half-updated.. I also discovered that just rerunning the previous >>>>>>>>>> overlapStore command (the one before the frg- and ovlcorrection) is >>>>>>>>>> not >>>>>>>>>> working as I thought it would. >>>>>>>>>> Seems to be a very unfortunate situation - really dont know how to >>>>>>>>>> proceed.. It would be fantastic if anyone could give me a tip what to >>>>>>>>>> do!! >>>>>>>>>> >>>>>>>>>> Thanks for your help! >>>>>>>>>> >>>>>>>>>> much obliged, >>>>>>>>>> Christoph >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>>>>>>> Hi Christoph. >>>>>>>>>>> >>>>>>>>>>> This is not an answer to your question, but a suggestion for a >>>>>>>>>>> work-around. If I remember correctly, you have both Illumina and 454 >>>>>>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap >>>>>>>>>>> based >>>>>>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>>>>>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, >>>>>>>>>>> I >>>>>>>>>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>>>>>>>>> you can use the set of trusted k-mers to correct different datasets. >>>>>>>>>>> For example, I first ran CA to the end of OBT (overlap based >>>>>>>>>>> trimming) >>>>>>>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>>>>>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>>>>>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>>>>>>> deduplication on all the datasets too, then you'll end up with reads >>>>>>>>>>> that you can use in assemblies where you skip relatively expensive >>>>>>>>>>> steps as frgcorrection. >>>>>>>>>>> >>>>>>>>>>> I don't think frgcorrection is that useful for the type of data >>>>>>>>>>> you're >>>>>>>>>>> using anyway. >>>>>>>>>>> >>>>>>>>>>> If you have a set of corrected reads, you can use these settings for >>>>>>>>>>> CA: >>>>>>>>>>> doOBT=0 >>>>>>>>>>> doFragmentCorrection=0 >>>>>>>>>>> >>>>>>>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>>>>>>> assembly now. You might have to clean up your directory tree, like >>>>>>>>>>> removing the 3-overlapcorrection directory and maybe some other >>>>>>>>>>> steps >>>>>>>>>>> too. Apply with caution. >>>>>>>>>>> >>>>>>>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title= >>>>>>>>>>> Pre >>>>>>>>>>> pr >>>>>>>>>>> oc >>>>>>>>>>> es >>>>>>>>>>> sing >>>>>>>>>>> and discussion with Brian. >>>>>>>>>>> >>>>>>>>>>> Ole >>>>>>>>>>> >>>>>>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> >>>>>>>>>>> wrote: >>>>>>>>>>>> Dear users and developers, >>>>>>>>>>>> >>>>>>>>>>>> I have the following problem: In my assembly process I have just >>>>>>>>>>>> completed >>>>>>>>>>>> the fragment- and overlap error correction. Unfortunately runCA >>>>>>>>>>>> stopped >>>>>>>>>>>> in >>>>>>>>>>>> the subsequent updating of the overlapStore, because of an >>>>>>>>>>>> incorrectly >>>>>>>>>>>> set >>>>>>>>>>>> time limit.. >>>>>>>>>>>> If I am trying to resume the assembly now, I get the following >>>>>>>>>>>> error: >>>>>>>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 >>>>>>>>>>>> 2012 >>>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapSto >>>>>>>>>>>> re >>>>>>>>>>>> -u >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>>>>>>> rrection/salaris.erates> >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlap >>>>>>>>>>>> Sto >>>>>>>>>>>> re >>>>>>>>>>>> -u >>>>>>>>>>>> pd >>>>>>>>>>>> ate-erates.err >>>>>>>>>>>> 2>&1 >>>>>>>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 >>>>>>>>>>>> 2012 >>>>>>>>>>>> (1 >>>>>>>>>>>> seconds) >>>>>>>>>>>> ERROR: Failed with signal HUP (1) >>>>>>>>>>>> =================================================================== >>>>>>>>>>>> === >>>>>>>>>>>> == >>>>>>>>>>>> == >>>>>>>>>>>> == >>>>>>>>>>>> ==== >>>>>>>>>>>> >>>>>>>>>>>> runCA failed. >>>>>>>>>>>> >>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>> Stack trace: >>>>>>>>>>>> >>>>>>>>>>>> at >>>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>>>> line >>>>>>>>>>>> 1237 >>>>>>>>>>>> main::caFailure('failed to apply the overlap >>>>>>>>>>>> corrections', >>>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>>>>>>> called >>>>>>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>>>>>>> main::overlapCorrection() called at >>>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>>>> line >>>>>>>>>>>> 5880 >>>>>>>>>>>> >>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>> Last few lines of the relevant log file >>>>>>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overla >>>>>>>>>>>> pSt >>>>>>>>>>>> or >>>>>>>>>>>> e- >>>>>>>>>>>> up >>>>>>>>>>>> date-erates.err): >>>>>>>>>>>> >>>>>>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' >>>>>>>>>>>> for >>>>>>>>>>>> reading: No such file or directory >>>>>>>>>>>> >>>>>>>>>>>> ---------------------------------------- >>>>>>>>>>>> Failure message: >>>>>>>>>>>> >>>>>>>>>>>> failed to apply the overlap corrections >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>>>>>>> reason >>>>>>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>>>>>>> already >>>>>>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact >>>>>>>>>>>> it >>>>>>>>>>>> seems >>>>>>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is >>>>>>>>>>>> there >>>>>>>>>>>> a >>>>>>>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, >>>>>>>>>>>> instead >>>>>>>>>>>> of >>>>>>>>>>>> from >>>>>>>>>>>> 0001~, which is obviously not there any more?? >>>>>>>>>>>> Another solution I was thinking of is to run the previous >>>>>>>>>>>> overlapStore >>>>>>>>>>>> command again manually (the one that was done before starting the >>>>>>>>>>>> frgcorr >>>>>>>>>>>> and ovlcorr: >>>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapSto >>>>>>>>>>>> re >>>>>>>>>>>> -c >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING >>>>>>>>>>>> -g >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M >>>>>>>>>>>> 14000 >>>>>>>>>>>> -L >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) >>>>>>>>>>>> to >>>>>>>>>>>> restore the status from before the frgcorr and ovlcorr steps, >>>>>>>>>>>> before >>>>>>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>>>>>>> ovlcorr >>>>>>>>>>>> steps, because these steps were really resource intensive. >>>>>>>>>>>> >>>>>>>>>>>> I would really appreciate any comments or suggestions to my >>>>>>>>>>>> problem! >>>>>>>>>>>> Thanks >>>>>>>>>>>> in advance for your help! >>>>>>>>>>>> >>>>>>>>>>>> much obliged, >>>>>>>>>>>> Christoph >>>>>>>>>>>> >>>>>>>>>>>> University of Oslo >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------------------------- >>>>>>>>>>>> --- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> -- >>>>>>>>>>>> Live Security Virtual Conference >>>>>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>>>>> Discussions >>>>>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>>>>> malware >>>>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> wgs-assembler-users mailing list >>>>>>>>>>>> wgs...@li... >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>>>>>>> >>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>> --- >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> Live Security Virtual Conference >>>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>>> Discussions >>>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>>> malware >>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>>> _______________________________________________ >>>>>>>>>> wgs-assembler-users mailing list >>>>>>>>>> wgs...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>> >> >> > |
From: Christoph H. <chr...@gm...> - 2012-07-13 17:53:30
|
Hi Brian, It s done! I have by now also updated the overlapStore with the frg- and ovlcorr and I am in the process of building unitigs now. I like this parallel version for building the ovlStore. You were right the last jobs needed double the memory. When distributing the jobs to several CPUs it is very time efficient and also used fewer overall CPUhours in comparison to the regular overlapStore command. One thing though is that I think it needs substantially more disk space. I am not 100% sure (because its gone now..), but I believe the *.ovlStore build by the regular command used some 320G of disk space, while the one I have now is using 546G. Are all the bucket???? directories in *.ovlStore still needed? Overall I think I learned a lot about CA by running the latest steps again with the parallel version of ovlStore build and your help. Are there plans to include a failsafe for the overlapStore update function, until the process is finished? So that it can be resumed in case it stops for whatever reason. One more thing: Is there a way to specify the memory buildUnitigs is using? Thanks again for your help!! cheers, Christoph On 12.07.2012 18:52, Walenz, Brian wrote: > You've captured the process nicely. > > After #1 finishes, check that you have one 'sliceSizes' file per bucket directory. If any are missing, run that bucket again. I think (hope) that #2 will complain if any are missing, but this has been a problem in the past. > > Hopefully memory won't be an issue during sorting. I estimate memory size as 3 * (sizeof gz files) / #jobs. But, if you have Illumina + long reads (454+, Sanger), the balancing is screwed up and the early jobs (overlaps of Illumina to Illumina) have fewer overlaps than the later jobs (Illumina to long reads). Every time I've run this, I could do 90-95% of the sort jobs on our grid, but had to use large memory machines for the rest. > > More jobs creates more files, but I don't think it is necessarily slower. I haven't benchmarked it though. > > No jobID for #3, it is tiny, does little compute, and not too much I/O. I usually run this interactively off grid. > > b > > ________________________________________ > From: Christoph Hahn [chr...@gm...] > Sent: Thursday, July 12, 2012 9:31 AM > To: Walenz, Brian > Cc: wgs...@li... > Subject: Re: [wgs-assembler-users] runCA stopped while updating overlapStore - how to resume??? > > Hi Brian, > > I ran the runCA-overlapStoreBuild.pl script now. It created the three > scripts: > 1-bucketize.sh > 2-sort.sh > 3-index.sh > > right now I am running 1-bucketize.sh for every job index from 1 to > 2135. I have distributed the jobs on several CPUs and that works nicely. > > when this is finished I need to run 2-sort.sh. I specified -jobs 100 in > the runCA-overlapStoreBuild.pl, so as far as I understand it should have > created 100 jobs, right? So, I run 2-sort.sh for jobIDs 1 to 100, then? > the jobID in this case is actually the slicenumber, right? so, for e.g. > 2-sort.sh 2 it will look through all bucket directories and pull out > slice002.gz, read them into memory and write the overlaps into the store. > > When this is done I just need to run 3-index.sh once. No jobIDs > required, right? > > Am I missing anything? > > cheers, > Christoph > > > On 07/11/2012 05:54 AM, Walenz, Brian wrote: >> The first step will create 1 job for each overlapper job. These should be >> small memory, but there is some internal buffering done and I usually >> request 2gb for them anyway. >> >> The second step will create '-jobs j' jobs. Memory size here is a giant >> unknown. The '-memory m' option will cause the job to not run if it needs >> more than that much memory. Currently, you'll have to increase -memory for >> these jobs and find a bigger machine. >> >> All jobs in both steps are single-threaded and run independently of each >> other. >> >> b >> >> >> >> >> On 7/10/12 6:46 PM, "Christoph Hahn"<chr...@gm...> wrote: >> >>> Hi Brian, >>> >>> Thanks! overlaps are being computed now and CVS version of CA has been >>> successfully compiled. Will try the runCA-overlapStoreBuild.pl once the >>> overlapper is finished. One question there: I understand that the memory >>> usage is regulated by the -jobs j parameter. higher value for j means >>> less memory for every job. How can I specify the number of CPUs to be >>> used in the parallel steps? >>> >>> Thanks for your help! I appreciate it! >>> >>> cheers, >>> Christoph >>> >>> On 07/10/2012 10:18 PM, Walenz, Brian wrote: >>>> Quick guess is that runCA is finding the old ovlStore and assuming it is >>>> complete, then continuing on to frgcorr. runCA tests for the existence of >>>> name.ovlStore to determine if overlaps are finished; it doesn't check that >>>> the store is valid. So, delete *ovlStore* too. >>>> >>>> Your latest build (from scratch) is suffering from a long standing >>>> dependency issue. It needs kmer checked out and 'make install'ed. >>>> >>>> make[1]: *** No rule to make target `sweatShop.H', needed by >>>> `classifyMates.o'. Stop. >>>> make[1]: *** Waiting for unfinished jobs.... >>>> make: *** [objs] Error 1 >>>> >>>> Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. >>>> >>>> The kmer included in CA7 is too old for the CVS version of CA, so you'll >>>> need to grab it from subversion. >>>> >>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_ou >>>> t_and_Compile >>>> >>>> b >>>> >>>> >>>> On 7/10/12 4:00 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>> >>>>> Hi, >>>>> >>>>> I actually tried to just rerun the overlapper. I moved the 1-overlapper >>>>> and the 3-overlapcorrection directories and just ran runCA and it >>>>> immediately starts with doing frgcorr. Do you mean recompute from the >>>>> very start? Is there a way to avoid recomputing the initial overlaps at >>>>> least(it took some 10000 CPUhours)?? >>>>> >>>>> Tried to compile it again - not successful. Ran make in the src >>>>> directory (output in makelog) and also in the AS_RUN directory (output >>>>> AS_RUN-makelog). >>>>> >>>>> Thanks, >>>>> Christoph >>>>> >>>>> >>>>> On 07/10/2012 09:04 PM, Walenz, Brian wrote: >>>>>> Odd, the *gz should only be deleted after the store is successfully built. >>>>>> runCA might have been confused by the attempt to rerun. The easiest will >>>>>> be >>>>>> to recompute. :-( >>>>>> >>>>>> I've never seen the 'libCA.a' error before. That particular program is the >>>>>> first to get built. Looks like libCA.a wasn't created. My fix for most >>>>>> strange compile errors is to remove the entire Linux-amd64 directory and >>>>>> recompile. If that fails, send along the complete output of make and I'll >>>>>> take a look. >>>>>> >>>>>> b >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 7/10/12 2:15 PM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>> >>>>>>> Hi Brian, >>>>>>> >>>>>>> Thanks for your reply! >>>>>>> >>>>>>> I would be happy to try the new parallel overlap store build, but I >>>>>>> think I need the *.ovb.gz outputs for that and unfortunately I dont have >>>>>>> them any more. Looks like they were deleted after the ovlStore was >>>>>>> build. So I guess I ll need to run the overlapper again, first. Am I >>>>>>> understanding that correctly? >>>>>>> >>>>>>> I have downloaded the cvs and tried to make, but I get: >>>>>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. >>>>>>> >>>>>>> I really appreciate your help! >>>>>>> >>>>>>> cheers, >>>>>>> Christoph >>>>>>> >>>>>>> >>>>>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>>>>>> Hi, Christoph- >>>>>>>> >>>>>>>> The original overlap store build is difficult to resume. I think it can >>>>>>>> be >>>>>>>> done, but it will take code changes that are probably specific to the >>>>>>>> case >>>>>>>> you have. Only if you do not have the *ovb.gz outputs from overlapper >>>>>>>> will >>>>>>>> I suggest this. >>>>>>>> >>>>>>>> Option 1 is then to restart. >>>>>>>> >>>>>>>> Option 2 is to use a new 'data-parallel' overlap store build >>>>>>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid >>>>>>>> jobs. The first job is parallel, and transfers the overlapper output >>>>>>>> into >>>>>>>> buckets for sorting. The second job, also parallel, sorts each bucket. >>>>>>>> The >>>>>>>> final job, sequential, builds an index for the store. Since this compute >>>>>>>> is >>>>>>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>>>>>> >>>>>>>> Its performance can be great -- at JCVI we've seen builds that we >>>>>>>> estimated >>>>>>>> would take 2 days using the original sequential build, finish in a few >>>>>>>> (4?) >>>>>>>> hours with the data parallel version. But on our development cluster, it >>>>>>>> is >>>>>>>> slower than the sequential version. It depends on the disk throughput. >>>>>>>> Our >>>>>>>> dev cluster is powered off of a 6-disk ZFS, while the production side has >>>>>>>> a >>>>>>>> big Isilon. >>>>>>>> >>>>>>>> It is only in CVS. I just added command line help and a bit of >>>>>>>> documentation, so do an update first. >>>>>>>> >>>>>>>> Happy to provide help if you want to try it out. More than happy to >>>>>>>> accept >>>>>>>> better documentation. >>>>>>>> >>>>>>>> b >>>>>>>> >>>>>>>> >>>>>>>> On 7/10/12 6:47 AM, "Christoph Hahn"<chr...@gm...> wrote: >>>>>>>> >>>>>>>>> Hei Ole, >>>>>>>>> >>>>>>>>> Thanks for your reply. I had looked on the preprocessing page you are >>>>>>>>> referring to just recently. Sounds like a good approach you are using! >>>>>>>>> Will definitely consider that to make the assembly more effective in a >>>>>>>>> next try. Thanks for that! >>>>>>>>> For now, I think I am pretty much over all the trimming and correction >>>>>>>>> steps (once I get this last thing sorted out..). As far as I can see the >>>>>>>>> next step is already building the unitigs, so I ll try to finish this >>>>>>>>> assembly as it is now. Will try to improve it afterwards. I am really >>>>>>>>> curious how a first attempt of a hybrid approach (454+illumina) will >>>>>>>>> perform in comparison to the pure illumina assemblies which I have >>>>>>>>> pretty much optimized now (and with which I am pretty happy, btw), I >>>>>>>>> think. >>>>>>>>> >>>>>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now >>>>>>>>> will not work. For the next step (the unitigger) I ll need an intact >>>>>>>>> overlap store. As it is now, I think it is useless, being only >>>>>>>>> half-updated.. I also discovered that just rerunning the previous >>>>>>>>> overlapStore command (the one before the frg- and ovlcorrection) is not >>>>>>>>> working as I thought it would. >>>>>>>>> Seems to be a very unfortunate situation - really dont know how to >>>>>>>>> proceed.. It would be fantastic if anyone could give me a tip what to >>>>>>>>> do!! >>>>>>>>> >>>>>>>>> Thanks for your help! >>>>>>>>> >>>>>>>>> much obliged, >>>>>>>>> Christoph >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>>>>>> Hi Christoph. >>>>>>>>>> >>>>>>>>>> This is not an answer to your question, but a suggestion for a >>>>>>>>>> work-around. If I remember correctly, you have both Illumina and 454 >>>>>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap based >>>>>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>>>>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, I >>>>>>>>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>>>>>>>> you can use the set of trusted k-mers to correct different datasets. >>>>>>>>>> For example, I first ran CA to the end of OBT (overlap based trimming) >>>>>>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>>>>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>>>>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>>>>>> deduplication on all the datasets too, then you'll end up with reads >>>>>>>>>> that you can use in assemblies where you skip relatively expensive >>>>>>>>>> steps as frgcorrection. >>>>>>>>>> >>>>>>>>>> I don't think frgcorrection is that useful for the type of data you're >>>>>>>>>> using anyway. >>>>>>>>>> >>>>>>>>>> If you have a set of corrected reads, you can use these settings for >>>>>>>>>> CA: >>>>>>>>>> doOBT=0 >>>>>>>>>> doFragmentCorrection=0 >>>>>>>>>> >>>>>>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>>>>>> assembly now. You might have to clean up your directory tree, like >>>>>>>>>> removing the 3-overlapcorrection directory and maybe some other steps >>>>>>>>>> too. Apply with caution. >>>>>>>>>> >>>>>>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Pre >>>>>>>>>> pr >>>>>>>>>> oc >>>>>>>>>> es >>>>>>>>>> sing >>>>>>>>>> and discussion with Brian. >>>>>>>>>> >>>>>>>>>> Ole >>>>>>>>>> >>>>>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>>>>>>>>>> Dear users and developers, >>>>>>>>>>> >>>>>>>>>>> I have the following problem: In my assembly process I have just >>>>>>>>>>> completed >>>>>>>>>>> the fragment- and overlap error correction. Unfortunately runCA >>>>>>>>>>> stopped >>>>>>>>>>> in >>>>>>>>>>> the subsequent updating of the overlapStore, because of an incorrectly >>>>>>>>>>> set >>>>>>>>>>> time limit.. >>>>>>>>>>> If I am trying to resume the assembly now, I get the following error: >>>>>>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>>>>> -u >>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>>>>>> rrection/salaris.erates> >>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSto >>>>>>>>>>> re >>>>>>>>>>> -u >>>>>>>>>>> pd >>>>>>>>>>> ate-erates.err >>>>>>>>>>> 2>&1 >>>>>>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 2012 >>>>>>>>>>> (1 >>>>>>>>>>> seconds) >>>>>>>>>>> ERROR: Failed with signal HUP (1) >>>>>>>>>>> ====================================================================== >>>>>>>>>>> == >>>>>>>>>>> == >>>>>>>>>>> == >>>>>>>>>>> ==== >>>>>>>>>>> >>>>>>>>>>> runCA failed. >>>>>>>>>>> >>>>>>>>>>> ---------------------------------------- >>>>>>>>>>> Stack trace: >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>>> line >>>>>>>>>>> 1237 >>>>>>>>>>> main::caFailure('failed to apply the overlap corrections', >>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>>>>>> called >>>>>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>>>>>> main::overlapCorrection() called at >>>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>>> line >>>>>>>>>>> 5880 >>>>>>>>>>> >>>>>>>>>>> ---------------------------------------- >>>>>>>>>>> Last few lines of the relevant log file >>>>>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSt >>>>>>>>>>> or >>>>>>>>>>> e- >>>>>>>>>>> up >>>>>>>>>>> date-erates.err): >>>>>>>>>>> >>>>>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>>>>>>>>>> reading: No such file or directory >>>>>>>>>>> >>>>>>>>>>> ---------------------------------------- >>>>>>>>>>> Failure message: >>>>>>>>>>> >>>>>>>>>>> failed to apply the overlap corrections >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>>>>>> reason >>>>>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>>>>>> already >>>>>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it >>>>>>>>>>> seems >>>>>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is >>>>>>>>>>> there >>>>>>>>>>> a >>>>>>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead >>>>>>>>>>> of >>>>>>>>>>> from >>>>>>>>>>> 0001~, which is obviously not there any more?? >>>>>>>>>>> Another solution I was thinking of is to run the previous overlapStore >>>>>>>>>>> command again manually (the one that was done before starting the >>>>>>>>>>> frgcorr >>>>>>>>>>> and ovlcorr: >>>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>>>>> -c >>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M >>>>>>>>>>> 14000 >>>>>>>>>>> -L >>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>>>>>>>>>> restore the status from before the frgcorr and ovlcorr steps, before >>>>>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>>>>>> ovlcorr >>>>>>>>>>> steps, because these steps were really resource intensive. >>>>>>>>>>> >>>>>>>>>>> I would really appreciate any comments or suggestions to my problem! >>>>>>>>>>> Thanks >>>>>>>>>>> in advance for your help! >>>>>>>>>>> >>>>>>>>>>> much obliged, >>>>>>>>>>> Christoph >>>>>>>>>>> >>>>>>>>>>> University of Oslo >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ---------------------------------------------------------------------- >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> -- >>>>>>>>>>> Live Security Virtual Conference >>>>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>>>> Discussions >>>>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>>>> malware >>>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> wgs-assembler-users mailing list >>>>>>>>>>> wgs...@li... >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> Live Security Virtual Conference >>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>> Discussions >>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>> malware >>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>> _______________________________________________ >>>>>>>>> wgs-assembler-users mailing list >>>>>>>>> wgs...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>> > > |
From: Ole K. T. <o.k...@bi...> - 2012-07-12 18:07:50
|
Hi, sorry if this e-mail might be a bit long, but it's a strange and annoying problem. I have both 454 and Illumina dataset for my species, and I've finished two pure 454 assemblies with CA and one with mixed data. The genome should be around 830 Mb, and I think that if computeCoverageStat is very wrong its estimate, my assembly get screwed up. On the first 454 assembly I ran (there was a some Sanger reads in it too, but not much), it estimated the genome size to 816,110,291.72 bp, quite close what we think it is. I ran it with mostly default settings and with bog. Then I ran a mixed assembly (about 24x 454 reads and 20x Illumina reads), with bogart, and the estimated genome size was 1,213,867,868.06 bp and I got a lot degenerates and a messed up assembly. Mostly the same settings as the first assembly, with regards to error rates at the different stages at least. The big difference in the 454 reads in this and the first one was that I removed all the 454 shotgun reads that were shorter than 300 bp, that might have done some harm too. We have speculated a lot what might have been the cause of the misestimate, Jason suggested it might be pile up of Illumina reads at the end of 454 reads. The genome is quite plagued with serial tandem repeats (ACACACACACA) and the 454 platform can't sequence through this so a lot of the reads end with the STR. Illumina can sequence through it, and the guess was that a lot of Illumina reads were just STRs and they piled up on the end of 454 reads/unitigs, thereby causing the misestimate. I've tried to look into it, but I can't find that this holds true. The highest coverage as I can see is in the middle of the unitigs/degenerates. Then I created some 454 reads that I had run Overlap Based Trimming on and saved them for use in later assemblies (as suggested by Brian and the preprosessing site on the wiki). I ran this assembly with bogart, because I wanted to have a baseline against later (mixed) assemblies. All 454 reads that survived OBT was included here. computeCoverageStat estimated the genome size to 1,118,909,921.55 bp, quite close the mixed assembly. I then copied the assembly, removed the tigStore, 4-unitigger and later folders, and reran with bog. Then the estimated genome size was 954,398,150.47 bp. This assembly is scaffolding as I write this, so I'm not yet sure how it will be. Hopefully it will be pretty good. I don't remember the differences between bog and bogart right now, but can it be understandable that bogart does a bad job on a (mostly) 454 assembly? I've just started an assembly with about 26x 454 reads and 52x Illumina reads, where all reads have been merTrimmed with k-mers from about 20x coverage in combined overlapping (with FLASH) Illumina reads as evidence. If there's something with the 454 reads that confused bogart, then the merTrimmed 454 reads and the predominance of Illumina reads will hopefully overcome it. I have most data and logs available, so any hints to what I could do to fix it or where I should look to figure it out is welcome. Thank you. Ole |
From: Walenz, B. <bw...@jc...> - 2012-07-12 16:53:39
|
You've captured the process nicely. After #1 finishes, check that you have one 'sliceSizes' file per bucket directory. If any are missing, run that bucket again. I think (hope) that #2 will complain if any are missing, but this has been a problem in the past. Hopefully memory won't be an issue during sorting. I estimate memory size as 3 * (sizeof gz files) / #jobs. But, if you have Illumina + long reads (454+, Sanger), the balancing is screwed up and the early jobs (overlaps of Illumina to Illumina) have fewer overlaps than the later jobs (Illumina to long reads). Every time I've run this, I could do 90-95% of the sort jobs on our grid, but had to use large memory machines for the rest. More jobs creates more files, but I don't think it is necessarily slower. I haven't benchmarked it though. No jobID for #3, it is tiny, does little compute, and not too much I/O. I usually run this interactively off grid. b ________________________________________ From: Christoph Hahn [chr...@gm...] Sent: Thursday, July 12, 2012 9:31 AM To: Walenz, Brian Cc: wgs...@li... Subject: Re: [wgs-assembler-users] runCA stopped while updating overlapStore - how to resume??? Hi Brian, I ran the runCA-overlapStoreBuild.pl script now. It created the three scripts: 1-bucketize.sh 2-sort.sh 3-index.sh right now I am running 1-bucketize.sh for every job index from 1 to 2135. I have distributed the jobs on several CPUs and that works nicely. when this is finished I need to run 2-sort.sh. I specified -jobs 100 in the runCA-overlapStoreBuild.pl, so as far as I understand it should have created 100 jobs, right? So, I run 2-sort.sh for jobIDs 1 to 100, then? the jobID in this case is actually the slicenumber, right? so, for e.g. 2-sort.sh 2 it will look through all bucket directories and pull out slice002.gz, read them into memory and write the overlaps into the store. When this is done I just need to run 3-index.sh once. No jobIDs required, right? Am I missing anything? cheers, Christoph On 07/11/2012 05:54 AM, Walenz, Brian wrote: > The first step will create 1 job for each overlapper job. These should be > small memory, but there is some internal buffering done and I usually > request 2gb for them anyway. > > The second step will create '-jobs j' jobs. Memory size here is a giant > unknown. The '-memory m' option will cause the job to not run if it needs > more than that much memory. Currently, you'll have to increase -memory for > these jobs and find a bigger machine. > > All jobs in both steps are single-threaded and run independently of each > other. > > b > > > > > On 7/10/12 6:46 PM, "Christoph Hahn" <chr...@gm...> wrote: > >> Hi Brian, >> >> Thanks! overlaps are being computed now and CVS version of CA has been >> successfully compiled. Will try the runCA-overlapStoreBuild.pl once the >> overlapper is finished. One question there: I understand that the memory >> usage is regulated by the -jobs j parameter. higher value for j means >> less memory for every job. How can I specify the number of CPUs to be >> used in the parallel steps? >> >> Thanks for your help! I appreciate it! >> >> cheers, >> Christoph >> >> On 07/10/2012 10:18 PM, Walenz, Brian wrote: >>> Quick guess is that runCA is finding the old ovlStore and assuming it is >>> complete, then continuing on to frgcorr. runCA tests for the existence of >>> name.ovlStore to determine if overlaps are finished; it doesn't check that >>> the store is valid. So, delete *ovlStore* too. >>> >>> Your latest build (from scratch) is suffering from a long standing >>> dependency issue. It needs kmer checked out and 'make install'ed. >>> >>> make[1]: *** No rule to make target `sweatShop.H', needed by >>> `classifyMates.o'. Stop. >>> make[1]: *** Waiting for unfinished jobs.... >>> make: *** [objs] Error 1 >>> >>> Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. >>> >>> The kmer included in CA7 is too old for the CVS version of CA, so you'll >>> need to grab it from subversion. >>> >>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_ou >>> t_and_Compile >>> >>> b >>> >>> >>> On 7/10/12 4:00 PM, "Christoph Hahn" <chr...@gm...> wrote: >>> >>>> Hi, >>>> >>>> I actually tried to just rerun the overlapper. I moved the 1-overlapper >>>> and the 3-overlapcorrection directories and just ran runCA and it >>>> immediately starts with doing frgcorr. Do you mean recompute from the >>>> very start? Is there a way to avoid recomputing the initial overlaps at >>>> least(it took some 10000 CPUhours)?? >>>> >>>> Tried to compile it again - not successful. Ran make in the src >>>> directory (output in makelog) and also in the AS_RUN directory (output >>>> AS_RUN-makelog). >>>> >>>> Thanks, >>>> Christoph >>>> >>>> >>>> On 07/10/2012 09:04 PM, Walenz, Brian wrote: >>>>> Odd, the *gz should only be deleted after the store is successfully built. >>>>> runCA might have been confused by the attempt to rerun. The easiest will >>>>> be >>>>> to recompute. :-( >>>>> >>>>> I've never seen the 'libCA.a' error before. That particular program is the >>>>> first to get built. Looks like libCA.a wasn't created. My fix for most >>>>> strange compile errors is to remove the entire Linux-amd64 directory and >>>>> recompile. If that fails, send along the complete output of make and I'll >>>>> take a look. >>>>> >>>>> b >>>>> >>>>> >>>>> >>>>> >>>>> On 7/10/12 2:15 PM, "Christoph Hahn" <chr...@gm...> wrote: >>>>> >>>>>> Hi Brian, >>>>>> >>>>>> Thanks for your reply! >>>>>> >>>>>> I would be happy to try the new parallel overlap store build, but I >>>>>> think I need the *.ovb.gz outputs for that and unfortunately I dont have >>>>>> them any more. Looks like they were deleted after the ovlStore was >>>>>> build. So I guess I ll need to run the overlapper again, first. Am I >>>>>> understanding that correctly? >>>>>> >>>>>> I have downloaded the cvs and tried to make, but I get: >>>>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. >>>>>> >>>>>> I really appreciate your help! >>>>>> >>>>>> cheers, >>>>>> Christoph >>>>>> >>>>>> >>>>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>>>>> Hi, Christoph- >>>>>>> >>>>>>> The original overlap store build is difficult to resume. I think it can >>>>>>> be >>>>>>> done, but it will take code changes that are probably specific to the >>>>>>> case >>>>>>> you have. Only if you do not have the *ovb.gz outputs from overlapper >>>>>>> will >>>>>>> I suggest this. >>>>>>> >>>>>>> Option 1 is then to restart. >>>>>>> >>>>>>> Option 2 is to use a new 'data-parallel' overlap store build >>>>>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid >>>>>>> jobs. The first job is parallel, and transfers the overlapper output >>>>>>> into >>>>>>> buckets for sorting. The second job, also parallel, sorts each bucket. >>>>>>> The >>>>>>> final job, sequential, builds an index for the store. Since this compute >>>>>>> is >>>>>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>>>>> >>>>>>> Its performance can be great -- at JCVI we've seen builds that we >>>>>>> estimated >>>>>>> would take 2 days using the original sequential build, finish in a few >>>>>>> (4?) >>>>>>> hours with the data parallel version. But on our development cluster, it >>>>>>> is >>>>>>> slower than the sequential version. It depends on the disk throughput. >>>>>>> Our >>>>>>> dev cluster is powered off of a 6-disk ZFS, while the production side has >>>>>>> a >>>>>>> big Isilon. >>>>>>> >>>>>>> It is only in CVS. I just added command line help and a bit of >>>>>>> documentation, so do an update first. >>>>>>> >>>>>>> Happy to provide help if you want to try it out. More than happy to >>>>>>> accept >>>>>>> better documentation. >>>>>>> >>>>>>> b >>>>>>> >>>>>>> >>>>>>> On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: >>>>>>> >>>>>>>> Hei Ole, >>>>>>>> >>>>>>>> Thanks for your reply. I had looked on the preprocessing page you are >>>>>>>> referring to just recently. Sounds like a good approach you are using! >>>>>>>> Will definitely consider that to make the assembly more effective in a >>>>>>>> next try. Thanks for that! >>>>>>>> For now, I think I am pretty much over all the trimming and correction >>>>>>>> steps (once I get this last thing sorted out..). As far as I can see the >>>>>>>> next step is already building the unitigs, so I ll try to finish this >>>>>>>> assembly as it is now. Will try to improve it afterwards. I am really >>>>>>>> curious how a first attempt of a hybrid approach (454+illumina) will >>>>>>>> perform in comparison to the pure illumina assemblies which I have >>>>>>>> pretty much optimized now (and with which I am pretty happy, btw), I >>>>>>>> think. >>>>>>>> >>>>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now >>>>>>>> will not work. For the next step (the unitigger) I ll need an intact >>>>>>>> overlap store. As it is now, I think it is useless, being only >>>>>>>> half-updated.. I also discovered that just rerunning the previous >>>>>>>> overlapStore command (the one before the frg- and ovlcorrection) is not >>>>>>>> working as I thought it would. >>>>>>>> Seems to be a very unfortunate situation - really dont know how to >>>>>>>> proceed.. It would be fantastic if anyone could give me a tip what to >>>>>>>> do!! >>>>>>>> >>>>>>>> Thanks for your help! >>>>>>>> >>>>>>>> much obliged, >>>>>>>> Christoph >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>>>>> Hi Christoph. >>>>>>>>> >>>>>>>>> This is not an answer to your question, but a suggestion for a >>>>>>>>> work-around. If I remember correctly, you have both Illumina and 454 >>>>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap based >>>>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>>>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, I >>>>>>>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>>>>>>> you can use the set of trusted k-mers to correct different datasets. >>>>>>>>> For example, I first ran CA to the end of OBT (overlap based trimming) >>>>>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>>>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>>>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>>>>> deduplication on all the datasets too, then you'll end up with reads >>>>>>>>> that you can use in assemblies where you skip relatively expensive >>>>>>>>> steps as frgcorrection. >>>>>>>>> >>>>>>>>> I don't think frgcorrection is that useful for the type of data you're >>>>>>>>> using anyway. >>>>>>>>> >>>>>>>>> If you have a set of corrected reads, you can use these settings for >>>>>>>>> CA: >>>>>>>>> doOBT=0 >>>>>>>>> doFragmentCorrection=0 >>>>>>>>> >>>>>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>>>>> assembly now. You might have to clean up your directory tree, like >>>>>>>>> removing the 3-overlapcorrection directory and maybe some other steps >>>>>>>>> too. Apply with caution. >>>>>>>>> >>>>>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Pre >>>>>>>>> pr >>>>>>>>> oc >>>>>>>>> es >>>>>>>>> sing >>>>>>>>> and discussion with Brian. >>>>>>>>> >>>>>>>>> Ole >>>>>>>>> >>>>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>>>>>>>>> Dear users and developers, >>>>>>>>>> >>>>>>>>>> I have the following problem: In my assembly process I have just >>>>>>>>>> completed >>>>>>>>>> the fragment- and overlap error correction. Unfortunately runCA >>>>>>>>>> stopped >>>>>>>>>> in >>>>>>>>>> the subsequent updating of the overlapStore, because of an incorrectly >>>>>>>>>> set >>>>>>>>>> time limit.. >>>>>>>>>> If I am trying to resume the assembly now, I get the following error: >>>>>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>>>> -u >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>>>>> rrection/salaris.erates> >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSto >>>>>>>>>> re >>>>>>>>>> -u >>>>>>>>>> pd >>>>>>>>>> ate-erates.err >>>>>>>>>> 2>&1 >>>>>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 2012 >>>>>>>>>> (1 >>>>>>>>>> seconds) >>>>>>>>>> ERROR: Failed with signal HUP (1) >>>>>>>>>> ====================================================================== >>>>>>>>>> == >>>>>>>>>> == >>>>>>>>>> == >>>>>>>>>> ==== >>>>>>>>>> >>>>>>>>>> runCA failed. >>>>>>>>>> >>>>>>>>>> ---------------------------------------- >>>>>>>>>> Stack trace: >>>>>>>>>> >>>>>>>>>> at >>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>> line >>>>>>>>>> 1237 >>>>>>>>>> main::caFailure('failed to apply the overlap corrections', >>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>>>>> called >>>>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>>>>> main::overlapCorrection() called at >>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>> line >>>>>>>>>> 5880 >>>>>>>>>> >>>>>>>>>> ---------------------------------------- >>>>>>>>>> Last few lines of the relevant log file >>>>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSt >>>>>>>>>> or >>>>>>>>>> e- >>>>>>>>>> up >>>>>>>>>> date-erates.err): >>>>>>>>>> >>>>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>>>>>>>>> reading: No such file or directory >>>>>>>>>> >>>>>>>>>> ---------------------------------------- >>>>>>>>>> Failure message: >>>>>>>>>> >>>>>>>>>> failed to apply the overlap corrections >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>>>>> reason >>>>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>>>>> already >>>>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it >>>>>>>>>> seems >>>>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is >>>>>>>>>> there >>>>>>>>>> a >>>>>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead >>>>>>>>>> of >>>>>>>>>> from >>>>>>>>>> 0001~, which is obviously not there any more?? >>>>>>>>>> Another solution I was thinking of is to run the previous overlapStore >>>>>>>>>> command again manually (the one that was done before starting the >>>>>>>>>> frgcorr >>>>>>>>>> and ovlcorr: >>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>>>> -c >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M >>>>>>>>>> 14000 >>>>>>>>>> -L >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>>>>>>>>> restore the status from before the frgcorr and ovlcorr steps, before >>>>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>>>>> ovlcorr >>>>>>>>>> steps, because these steps were really resource intensive. >>>>>>>>>> >>>>>>>>>> I would really appreciate any comments or suggestions to my problem! >>>>>>>>>> Thanks >>>>>>>>>> in advance for your help! >>>>>>>>>> >>>>>>>>>> much obliged, >>>>>>>>>> Christoph >>>>>>>>>> >>>>>>>>>> University of Oslo >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ---------------------------------------------------------------------- >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> Live Security Virtual Conference >>>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>>> Discussions >>>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>>> malware >>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>>> _______________________________________________ >>>>>>>>>> wgs-assembler-users mailing list >>>>>>>>>> wgs...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>>>>> >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> -- >>>>>>>> -- >>>>>>>> -- >>>>>>>> Live Security Virtual Conference >>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>> Discussions >>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>> malware >>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>> _______________________________________________ >>>>>>>> wgs-assembler-users mailing list >>>>>>>> wgs...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> |
From: Christoph H. <chr...@gm...> - 2012-07-12 13:32:08
|
Hi Brian, I ran the runCA-overlapStoreBuild.pl script now. It created the three scripts: 1-bucketize.sh 2-sort.sh 3-index.sh right now I am running 1-bucketize.sh for every job index from 1 to 2135. I have distributed the jobs on several CPUs and that works nicely. when this is finished I need to run 2-sort.sh. I specified -jobs 100 in the runCA-overlapStoreBuild.pl, so as far as I understand it should have created 100 jobs, right? So, I run 2-sort.sh for jobIDs 1 to 100, then? the jobID in this case is actually the slicenumber, right? so, for e.g. 2-sort.sh 2 it will look through all bucket directories and pull out slice002.gz, read them into memory and write the overlaps into the store. When this is done I just need to run 3-index.sh once. No jobIDs required, right? Am I missing anything? cheers, Christoph On 07/11/2012 05:54 AM, Walenz, Brian wrote: > The first step will create 1 job for each overlapper job. These should be > small memory, but there is some internal buffering done and I usually > request 2gb for them anyway. > > The second step will create '-jobs j' jobs. Memory size here is a giant > unknown. The '-memory m' option will cause the job to not run if it needs > more than that much memory. Currently, you'll have to increase -memory for > these jobs and find a bigger machine. > > All jobs in both steps are single-threaded and run independently of each > other. > > b > > > > > On 7/10/12 6:46 PM, "Christoph Hahn" <chr...@gm...> wrote: > >> Hi Brian, >> >> Thanks! overlaps are being computed now and CVS version of CA has been >> successfully compiled. Will try the runCA-overlapStoreBuild.pl once the >> overlapper is finished. One question there: I understand that the memory >> usage is regulated by the -jobs j parameter. higher value for j means >> less memory for every job. How can I specify the number of CPUs to be >> used in the parallel steps? >> >> Thanks for your help! I appreciate it! >> >> cheers, >> Christoph >> >> On 07/10/2012 10:18 PM, Walenz, Brian wrote: >>> Quick guess is that runCA is finding the old ovlStore and assuming it is >>> complete, then continuing on to frgcorr. runCA tests for the existence of >>> name.ovlStore to determine if overlaps are finished; it doesn't check that >>> the store is valid. So, delete *ovlStore* too. >>> >>> Your latest build (from scratch) is suffering from a long standing >>> dependency issue. It needs kmer checked out and 'make install'ed. >>> >>> make[1]: *** No rule to make target `sweatShop.H', needed by >>> `classifyMates.o'. Stop. >>> make[1]: *** Waiting for unfinished jobs.... >>> make: *** [objs] Error 1 >>> >>> Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. >>> >>> The kmer included in CA7 is too old for the CVS version of CA, so you'll >>> need to grab it from subversion. >>> >>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_ou >>> t_and_Compile >>> >>> b >>> >>> >>> On 7/10/12 4:00 PM, "Christoph Hahn" <chr...@gm...> wrote: >>> >>>> Hi, >>>> >>>> I actually tried to just rerun the overlapper. I moved the 1-overlapper >>>> and the 3-overlapcorrection directories and just ran runCA and it >>>> immediately starts with doing frgcorr. Do you mean recompute from the >>>> very start? Is there a way to avoid recomputing the initial overlaps at >>>> least(it took some 10000 CPUhours)?? >>>> >>>> Tried to compile it again - not successful. Ran make in the src >>>> directory (output in makelog) and also in the AS_RUN directory (output >>>> AS_RUN-makelog). >>>> >>>> Thanks, >>>> Christoph >>>> >>>> >>>> On 07/10/2012 09:04 PM, Walenz, Brian wrote: >>>>> Odd, the *gz should only be deleted after the store is successfully built. >>>>> runCA might have been confused by the attempt to rerun. The easiest will >>>>> be >>>>> to recompute. :-( >>>>> >>>>> I've never seen the 'libCA.a' error before. That particular program is the >>>>> first to get built. Looks like libCA.a wasn't created. My fix for most >>>>> strange compile errors is to remove the entire Linux-amd64 directory and >>>>> recompile. If that fails, send along the complete output of make and I'll >>>>> take a look. >>>>> >>>>> b >>>>> >>>>> >>>>> >>>>> >>>>> On 7/10/12 2:15 PM, "Christoph Hahn" <chr...@gm...> wrote: >>>>> >>>>>> Hi Brian, >>>>>> >>>>>> Thanks for your reply! >>>>>> >>>>>> I would be happy to try the new parallel overlap store build, but I >>>>>> think I need the *.ovb.gz outputs for that and unfortunately I dont have >>>>>> them any more. Looks like they were deleted after the ovlStore was >>>>>> build. So I guess I ll need to run the overlapper again, first. Am I >>>>>> understanding that correctly? >>>>>> >>>>>> I have downloaded the cvs and tried to make, but I get: >>>>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. >>>>>> >>>>>> I really appreciate your help! >>>>>> >>>>>> cheers, >>>>>> Christoph >>>>>> >>>>>> >>>>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>>>>> Hi, Christoph- >>>>>>> >>>>>>> The original overlap store build is difficult to resume. I think it can >>>>>>> be >>>>>>> done, but it will take code changes that are probably specific to the >>>>>>> case >>>>>>> you have. Only if you do not have the *ovb.gz outputs from overlapper >>>>>>> will >>>>>>> I suggest this. >>>>>>> >>>>>>> Option 1 is then to restart. >>>>>>> >>>>>>> Option 2 is to use a new 'data-parallel' overlap store build >>>>>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid >>>>>>> jobs. The first job is parallel, and transfers the overlapper output >>>>>>> into >>>>>>> buckets for sorting. The second job, also parallel, sorts each bucket. >>>>>>> The >>>>>>> final job, sequential, builds an index for the store. Since this compute >>>>>>> is >>>>>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>>>>> >>>>>>> Its performance can be great -- at JCVI we've seen builds that we >>>>>>> estimated >>>>>>> would take 2 days using the original sequential build, finish in a few >>>>>>> (4?) >>>>>>> hours with the data parallel version. But on our development cluster, it >>>>>>> is >>>>>>> slower than the sequential version. It depends on the disk throughput. >>>>>>> Our >>>>>>> dev cluster is powered off of a 6-disk ZFS, while the production side has >>>>>>> a >>>>>>> big Isilon. >>>>>>> >>>>>>> It is only in CVS. I just added command line help and a bit of >>>>>>> documentation, so do an update first. >>>>>>> >>>>>>> Happy to provide help if you want to try it out. More than happy to >>>>>>> accept >>>>>>> better documentation. >>>>>>> >>>>>>> b >>>>>>> >>>>>>> >>>>>>> On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: >>>>>>> >>>>>>>> Hei Ole, >>>>>>>> >>>>>>>> Thanks for your reply. I had looked on the preprocessing page you are >>>>>>>> referring to just recently. Sounds like a good approach you are using! >>>>>>>> Will definitely consider that to make the assembly more effective in a >>>>>>>> next try. Thanks for that! >>>>>>>> For now, I think I am pretty much over all the trimming and correction >>>>>>>> steps (once I get this last thing sorted out..). As far as I can see the >>>>>>>> next step is already building the unitigs, so I ll try to finish this >>>>>>>> assembly as it is now. Will try to improve it afterwards. I am really >>>>>>>> curious how a first attempt of a hybrid approach (454+illumina) will >>>>>>>> perform in comparison to the pure illumina assemblies which I have >>>>>>>> pretty much optimized now (and with which I am pretty happy, btw), I >>>>>>>> think. >>>>>>>> >>>>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now >>>>>>>> will not work. For the next step (the unitigger) I ll need an intact >>>>>>>> overlap store. As it is now, I think it is useless, being only >>>>>>>> half-updated.. I also discovered that just rerunning the previous >>>>>>>> overlapStore command (the one before the frg- and ovlcorrection) is not >>>>>>>> working as I thought it would. >>>>>>>> Seems to be a very unfortunate situation - really dont know how to >>>>>>>> proceed.. It would be fantastic if anyone could give me a tip what to >>>>>>>> do!! >>>>>>>> >>>>>>>> Thanks for your help! >>>>>>>> >>>>>>>> much obliged, >>>>>>>> Christoph >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>>>>> Hi Christoph. >>>>>>>>> >>>>>>>>> This is not an answer to your question, but a suggestion for a >>>>>>>>> work-around. If I remember correctly, you have both Illumina and 454 >>>>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap based >>>>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>>>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, I >>>>>>>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>>>>>>> you can use the set of trusted k-mers to correct different datasets. >>>>>>>>> For example, I first ran CA to the end of OBT (overlap based trimming) >>>>>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>>>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>>>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>>>>> deduplication on all the datasets too, then you'll end up with reads >>>>>>>>> that you can use in assemblies where you skip relatively expensive >>>>>>>>> steps as frgcorrection. >>>>>>>>> >>>>>>>>> I don't think frgcorrection is that useful for the type of data you're >>>>>>>>> using anyway. >>>>>>>>> >>>>>>>>> If you have a set of corrected reads, you can use these settings for >>>>>>>>> CA: >>>>>>>>> doOBT=0 >>>>>>>>> doFragmentCorrection=0 >>>>>>>>> >>>>>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>>>>> assembly now. You might have to clean up your directory tree, like >>>>>>>>> removing the 3-overlapcorrection directory and maybe some other steps >>>>>>>>> too. Apply with caution. >>>>>>>>> >>>>>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Pre >>>>>>>>> pr >>>>>>>>> oc >>>>>>>>> es >>>>>>>>> sing >>>>>>>>> and discussion with Brian. >>>>>>>>> >>>>>>>>> Ole >>>>>>>>> >>>>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>>>>>>>>> Dear users and developers, >>>>>>>>>> >>>>>>>>>> I have the following problem: In my assembly process I have just >>>>>>>>>> completed >>>>>>>>>> the fragment- and overlap error correction. Unfortunately runCA >>>>>>>>>> stopped >>>>>>>>>> in >>>>>>>>>> the subsequent updating of the overlapStore, because of an incorrectly >>>>>>>>>> set >>>>>>>>>> time limit.. >>>>>>>>>> If I am trying to resume the assembly now, I get the following error: >>>>>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>>>> -u >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>>>>> rrection/salaris.erates> >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSto >>>>>>>>>> re >>>>>>>>>> -u >>>>>>>>>> pd >>>>>>>>>> ate-erates.err >>>>>>>>>> 2>&1 >>>>>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 2012 >>>>>>>>>> (1 >>>>>>>>>> seconds) >>>>>>>>>> ERROR: Failed with signal HUP (1) >>>>>>>>>> ====================================================================== >>>>>>>>>> == >>>>>>>>>> == >>>>>>>>>> == >>>>>>>>>> ==== >>>>>>>>>> >>>>>>>>>> runCA failed. >>>>>>>>>> >>>>>>>>>> ---------------------------------------- >>>>>>>>>> Stack trace: >>>>>>>>>> >>>>>>>>>> at >>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>> line >>>>>>>>>> 1237 >>>>>>>>>> main::caFailure('failed to apply the overlap corrections', >>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>>>>> called >>>>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>>>>> main::overlapCorrection() called at >>>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>>> line >>>>>>>>>> 5880 >>>>>>>>>> >>>>>>>>>> ---------------------------------------- >>>>>>>>>> Last few lines of the relevant log file >>>>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSt >>>>>>>>>> or >>>>>>>>>> e- >>>>>>>>>> up >>>>>>>>>> date-erates.err): >>>>>>>>>> >>>>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>>>>>>>>> reading: No such file or directory >>>>>>>>>> >>>>>>>>>> ---------------------------------------- >>>>>>>>>> Failure message: >>>>>>>>>> >>>>>>>>>> failed to apply the overlap corrections >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>>>>> reason >>>>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>>>>> already >>>>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it >>>>>>>>>> seems >>>>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is >>>>>>>>>> there >>>>>>>>>> a >>>>>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead >>>>>>>>>> of >>>>>>>>>> from >>>>>>>>>> 0001~, which is obviously not there any more?? >>>>>>>>>> Another solution I was thinking of is to run the previous overlapStore >>>>>>>>>> command again manually (the one that was done before starting the >>>>>>>>>> frgcorr >>>>>>>>>> and ovlcorr: >>>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>>>> -c >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M >>>>>>>>>> 14000 >>>>>>>>>> -L >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>>>>>>>>> restore the status from before the frgcorr and ovlcorr steps, before >>>>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>>>>> ovlcorr >>>>>>>>>> steps, because these steps were really resource intensive. >>>>>>>>>> >>>>>>>>>> I would really appreciate any comments or suggestions to my problem! >>>>>>>>>> Thanks >>>>>>>>>> in advance for your help! >>>>>>>>>> >>>>>>>>>> much obliged, >>>>>>>>>> Christoph >>>>>>>>>> >>>>>>>>>> University of Oslo >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ---------------------------------------------------------------------- >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> -- >>>>>>>>>> Live Security Virtual Conference >>>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>>> Discussions >>>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>>> malware >>>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>>> _______________________________________________ >>>>>>>>>> wgs-assembler-users mailing list >>>>>>>>>> wgs...@li... >>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>>>>> >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> -- >>>>>>>> -- >>>>>>>> -- >>>>>>>> Live Security Virtual Conference >>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>> Discussions >>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>> malware >>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>> _______________________________________________ >>>>>>>> wgs-assembler-users mailing list >>>>>>>> wgs...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> |
From: Walenz, B. <bw...@jc...> - 2012-07-11 03:54:51
|
The first step will create 1 job for each overlapper job. These should be small memory, but there is some internal buffering done and I usually request 2gb for them anyway. The second step will create '-jobs j' jobs. Memory size here is a giant unknown. The '-memory m' option will cause the job to not run if it needs more than that much memory. Currently, you'll have to increase -memory for these jobs and find a bigger machine. All jobs in both steps are single-threaded and run independently of each other. b On 7/10/12 6:46 PM, "Christoph Hahn" <chr...@gm...> wrote: > Hi Brian, > > Thanks! overlaps are being computed now and CVS version of CA has been > successfully compiled. Will try the runCA-overlapStoreBuild.pl once the > overlapper is finished. One question there: I understand that the memory > usage is regulated by the -jobs j parameter. higher value for j means > less memory for every job. How can I specify the number of CPUs to be > used in the parallel steps? > > Thanks for your help! I appreciate it! > > cheers, > Christoph > > On 07/10/2012 10:18 PM, Walenz, Brian wrote: >> Quick guess is that runCA is finding the old ovlStore and assuming it is >> complete, then continuing on to frgcorr. runCA tests for the existence of >> name.ovlStore to determine if overlaps are finished; it doesn't check that >> the store is valid. So, delete *ovlStore* too. >> >> Your latest build (from scratch) is suffering from a long standing >> dependency issue. It needs kmer checked out and 'make install'ed. >> >> make[1]: *** No rule to make target `sweatShop.H', needed by >> `classifyMates.o'. Stop. >> make[1]: *** Waiting for unfinished jobs.... >> make: *** [objs] Error 1 >> >> Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. >> >> The kmer included in CA7 is too old for the CVS version of CA, so you'll >> need to grab it from subversion. >> >> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_ou >> t_and_Compile >> >> b >> >> >> On 7/10/12 4:00 PM, "Christoph Hahn" <chr...@gm...> wrote: >> >>> Hi, >>> >>> I actually tried to just rerun the overlapper. I moved the 1-overlapper >>> and the 3-overlapcorrection directories and just ran runCA and it >>> immediately starts with doing frgcorr. Do you mean recompute from the >>> very start? Is there a way to avoid recomputing the initial overlaps at >>> least(it took some 10000 CPUhours)?? >>> >>> Tried to compile it again - not successful. Ran make in the src >>> directory (output in makelog) and also in the AS_RUN directory (output >>> AS_RUN-makelog). >>> >>> Thanks, >>> Christoph >>> >>> >>> On 07/10/2012 09:04 PM, Walenz, Brian wrote: >>>> Odd, the *gz should only be deleted after the store is successfully built. >>>> runCA might have been confused by the attempt to rerun. The easiest will >>>> be >>>> to recompute. :-( >>>> >>>> I've never seen the 'libCA.a' error before. That particular program is the >>>> first to get built. Looks like libCA.a wasn't created. My fix for most >>>> strange compile errors is to remove the entire Linux-amd64 directory and >>>> recompile. If that fails, send along the complete output of make and I'll >>>> take a look. >>>> >>>> b >>>> >>>> >>>> >>>> >>>> On 7/10/12 2:15 PM, "Christoph Hahn" <chr...@gm...> wrote: >>>> >>>>> Hi Brian, >>>>> >>>>> Thanks for your reply! >>>>> >>>>> I would be happy to try the new parallel overlap store build, but I >>>>> think I need the *.ovb.gz outputs for that and unfortunately I dont have >>>>> them any more. Looks like they were deleted after the ovlStore was >>>>> build. So I guess I ll need to run the overlapper again, first. Am I >>>>> understanding that correctly? >>>>> >>>>> I have downloaded the cvs and tried to make, but I get: >>>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. >>>>> >>>>> I really appreciate your help! >>>>> >>>>> cheers, >>>>> Christoph >>>>> >>>>> >>>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>>>> Hi, Christoph- >>>>>> >>>>>> The original overlap store build is difficult to resume. I think it can >>>>>> be >>>>>> done, but it will take code changes that are probably specific to the >>>>>> case >>>>>> you have. Only if you do not have the *ovb.gz outputs from overlapper >>>>>> will >>>>>> I suggest this. >>>>>> >>>>>> Option 1 is then to restart. >>>>>> >>>>>> Option 2 is to use a new 'data-parallel' overlap store build >>>>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid >>>>>> jobs. The first job is parallel, and transfers the overlapper output >>>>>> into >>>>>> buckets for sorting. The second job, also parallel, sorts each bucket. >>>>>> The >>>>>> final job, sequential, builds an index for the store. Since this compute >>>>>> is >>>>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>>>> >>>>>> Its performance can be great -- at JCVI we've seen builds that we >>>>>> estimated >>>>>> would take 2 days using the original sequential build, finish in a few >>>>>> (4?) >>>>>> hours with the data parallel version. But on our development cluster, it >>>>>> is >>>>>> slower than the sequential version. It depends on the disk throughput. >>>>>> Our >>>>>> dev cluster is powered off of a 6-disk ZFS, while the production side has >>>>>> a >>>>>> big Isilon. >>>>>> >>>>>> It is only in CVS. I just added command line help and a bit of >>>>>> documentation, so do an update first. >>>>>> >>>>>> Happy to provide help if you want to try it out. More than happy to >>>>>> accept >>>>>> better documentation. >>>>>> >>>>>> b >>>>>> >>>>>> >>>>>> On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: >>>>>> >>>>>>> Hei Ole, >>>>>>> >>>>>>> Thanks for your reply. I had looked on the preprocessing page you are >>>>>>> referring to just recently. Sounds like a good approach you are using! >>>>>>> Will definitely consider that to make the assembly more effective in a >>>>>>> next try. Thanks for that! >>>>>>> For now, I think I am pretty much over all the trimming and correction >>>>>>> steps (once I get this last thing sorted out..). As far as I can see the >>>>>>> next step is already building the unitigs, so I ll try to finish this >>>>>>> assembly as it is now. Will try to improve it afterwards. I am really >>>>>>> curious how a first attempt of a hybrid approach (454+illumina) will >>>>>>> perform in comparison to the pure illumina assemblies which I have >>>>>>> pretty much optimized now (and with which I am pretty happy, btw), I >>>>>>> think. >>>>>>> >>>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now >>>>>>> will not work. For the next step (the unitigger) I ll need an intact >>>>>>> overlap store. As it is now, I think it is useless, being only >>>>>>> half-updated.. I also discovered that just rerunning the previous >>>>>>> overlapStore command (the one before the frg- and ovlcorrection) is not >>>>>>> working as I thought it would. >>>>>>> Seems to be a very unfortunate situation - really dont know how to >>>>>>> proceed.. It would be fantastic if anyone could give me a tip what to >>>>>>> do!! >>>>>>> >>>>>>> Thanks for your help! >>>>>>> >>>>>>> much obliged, >>>>>>> Christoph >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>>>> Hi Christoph. >>>>>>>> >>>>>>>> This is not an answer to your question, but a suggestion for a >>>>>>>> work-around. If I remember correctly, you have both Illumina and 454 >>>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap based >>>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, I >>>>>>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>>>>>> you can use the set of trusted k-mers to correct different datasets. >>>>>>>> For example, I first ran CA to the end of OBT (overlap based trimming) >>>>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>>>> deduplication on all the datasets too, then you'll end up with reads >>>>>>>> that you can use in assemblies where you skip relatively expensive >>>>>>>> steps as frgcorrection. >>>>>>>> >>>>>>>> I don't think frgcorrection is that useful for the type of data you're >>>>>>>> using anyway. >>>>>>>> >>>>>>>> If you have a set of corrected reads, you can use these settings for >>>>>>>> CA: >>>>>>>> doOBT=0 >>>>>>>> doFragmentCorrection=0 >>>>>>>> >>>>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>>>> assembly now. You might have to clean up your directory tree, like >>>>>>>> removing the 3-overlapcorrection directory and maybe some other steps >>>>>>>> too. Apply with caution. >>>>>>>> >>>>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Pre >>>>>>>> pr >>>>>>>> oc >>>>>>>> es >>>>>>>> sing >>>>>>>> and discussion with Brian. >>>>>>>> >>>>>>>> Ole >>>>>>>> >>>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>>>>>>>> Dear users and developers, >>>>>>>>> >>>>>>>>> I have the following problem: In my assembly process I have just >>>>>>>>> completed >>>>>>>>> the fragment- and overlap error correction. Unfortunately runCA >>>>>>>>> stopped >>>>>>>>> in >>>>>>>>> the subsequent updating of the overlapStore, because of an incorrectly >>>>>>>>> set >>>>>>>>> time limit.. >>>>>>>>> If I am trying to resume the assembly now, I get the following error: >>>>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>>> -u >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>>>> rrection/salaris.erates> >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSto >>>>>>>>> re >>>>>>>>> -u >>>>>>>>> pd >>>>>>>>> ate-erates.err >>>>>>>>> 2>&1 >>>>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 2012 >>>>>>>>> (1 >>>>>>>>> seconds) >>>>>>>>> ERROR: Failed with signal HUP (1) >>>>>>>>> ====================================================================== >>>>>>>>> == >>>>>>>>> == >>>>>>>>> == >>>>>>>>> ==== >>>>>>>>> >>>>>>>>> runCA failed. >>>>>>>>> >>>>>>>>> ---------------------------------------- >>>>>>>>> Stack trace: >>>>>>>>> >>>>>>>>> at >>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>> line >>>>>>>>> 1237 >>>>>>>>> main::caFailure('failed to apply the overlap corrections', >>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>>>> called >>>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>>>> main::overlapCorrection() called at >>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>> line >>>>>>>>> 5880 >>>>>>>>> >>>>>>>>> ---------------------------------------- >>>>>>>>> Last few lines of the relevant log file >>>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSt >>>>>>>>> or >>>>>>>>> e- >>>>>>>>> up >>>>>>>>> date-erates.err): >>>>>>>>> >>>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>>>>>>>> reading: No such file or directory >>>>>>>>> >>>>>>>>> ---------------------------------------- >>>>>>>>> Failure message: >>>>>>>>> >>>>>>>>> failed to apply the overlap corrections >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>>>> reason >>>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>>>> already >>>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it >>>>>>>>> seems >>>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is >>>>>>>>> there >>>>>>>>> a >>>>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead >>>>>>>>> of >>>>>>>>> from >>>>>>>>> 0001~, which is obviously not there any more?? >>>>>>>>> Another solution I was thinking of is to run the previous overlapStore >>>>>>>>> command again manually (the one that was done before starting the >>>>>>>>> frgcorr >>>>>>>>> and ovlcorr: >>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>>> -c >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M >>>>>>>>> 14000 >>>>>>>>> -L >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>>>>>>>> restore the status from before the frgcorr and ovlcorr steps, before >>>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>>>> ovlcorr >>>>>>>>> steps, because these steps were really resource intensive. >>>>>>>>> >>>>>>>>> I would really appreciate any comments or suggestions to my problem! >>>>>>>>> Thanks >>>>>>>>> in advance for your help! >>>>>>>>> >>>>>>>>> much obliged, >>>>>>>>> Christoph >>>>>>>>> >>>>>>>>> University of Oslo >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ---------------------------------------------------------------------- >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> Live Security Virtual Conference >>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>> Discussions >>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>> malware >>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>> _______________________________________________ >>>>>>>>> wgs-assembler-users mailing list >>>>>>>>> wgs...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>>>> >>>>>>> ------------------------------------------------------------------------ >>>>>>> -- >>>>>>> -- >>>>>>> -- >>>>>>> Live Security Virtual Conference >>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>> Discussions >>>>>>> will include endpoint security, mobile security and the latest in >>>>>>> malware >>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>> _______________________________________________ >>>>>>> wgs-assembler-users mailing list >>>>>>> wgs...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>> > > |