From: Walenz, B. <bw...@jc...> - 2012-07-10 15:09:21
|
Hi, Christoph- The original overlap store build is difficult to resume. I think it can be done, but it will take code changes that are probably specific to the case you have. Only if you do not have the *ovb.gz outputs from overlapper will I suggest this. Option 1 is then to restart. Option 2 is to use a new 'data-parallel' overlap store build (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid jobs. The first job is parallel, and transfers the overlapper output into buckets for sorting. The second job, also parallel, sorts each bucket. The final job, sequential, builds an index for the store. Since this compute is just a collection of jobs, it can be restarted/resumed/fixed easily. Its performance can be great -- at JCVI we've seen builds that we estimated would take 2 days using the original sequential build, finish in a few (4?) hours with the data parallel version. But on our development cluster, it is slower than the sequential version. It depends on the disk throughput. Our dev cluster is powered off of a 6-disk ZFS, while the production side has a big Isilon. It is only in CVS. I just added command line help and a bit of documentation, so do an update first. Happy to provide help if you want to try it out. More than happy to accept better documentation. b On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: > Hei Ole, > > Thanks for your reply. I had looked on the preprocessing page you are > referring to just recently. Sounds like a good approach you are using! > Will definitely consider that to make the assembly more effective in a > next try. Thanks for that! > For now, I think I am pretty much over all the trimming and correction > steps (once I get this last thing sorted out..). As far as I can see the > next step is already building the unitigs, so I ll try to finish this > assembly as it is now. Will try to improve it afterwards. I am really > curious how a first attempt of a hybrid approach (454+illumina) will > perform in comparison to the pure illumina assemblies which I have > pretty much optimized now (and with which I am pretty happy, btw), I think. > > I am afraid, your suggestion to do doFragmentCorrection=0 directly now > will not work. For the next step (the unitigger) I ll need an intact > overlap store. As it is now, I think it is useless, being only > half-updated.. I also discovered that just rerunning the previous > overlapStore command (the one before the frg- and ovlcorrection) is not > working as I thought it would. > Seems to be a very unfortunate situation - really dont know how to > proceed.. It would be fantastic if anyone could give me a tip what to do!! > > Thanks for your help! > > much obliged, > Christoph > > > > > On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >> Hi Christoph. >> >> This is not an answer to your question, but a suggestion for a >> work-around. If I remember correctly, you have both Illumina and 454 >> reads. Celera runs, as you see below, frgcorrection and overlap based >> trimming to correct 454 reads, and merTrim to correct Illumina reads >> (can also be used on 454 reads). What I've been doing lately, is to >> run meryl on a trusted set of Illumina reads, pair end for example, I >> ran it on some overlapping reads which I had merged with FLASH. Then >> you can use the set of trusted k-mers to correct different datasets. >> For example, I first ran CA to the end of OBT (overlap based trimming) >> for my 454 reads, and then output the result as fastq-files. I used >> the trusted k-mer set to correct these 454 reads too. If you do this >> for all your reads, used either merTim or merTrim/OBT, and do >> deduplication on all the datasets too, then you'll end up with reads >> that you can use in assemblies where you skip relatively expensive >> steps as frgcorrection. >> >> I don't think frgcorrection is that useful for the type of data you're >> using anyway. >> >> If you have a set of corrected reads, you can use these settings for CA: >> doOBT=0 >> doFragmentCorrection=0 >> >> When I think of it, you might use doFragmentCorrection=0 on this >> assembly now. You might have to clean up your directory tree, like >> removing the 3-overlapcorrection directory and maybe some other steps >> too. Apply with caution. >> >> Most of the stuff I've mentioned I've taken from here: >> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Preproces >> sing >> and discussion with Brian. >> >> Ole >> >> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>> Dear users and developers, >>> >>> I have the following problem: In my assembly process I have just completed >>> the fragment- and overlap error correction. Unfortunately runCA stopped in >>> the subsequent updating of the overlapStore, because of an incorrectly set >>> time limit.. >>> If I am trying to resume the assembly now, I get the following error: >>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -u >>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>> rrection/salaris.erates> >>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-upd >>> ate-erates.err >>> 2>&1 >>> ----------------------------------------END Mon Jul 9 11:05:54 2012 (1 >>> seconds) >>> ERROR: Failed with signal HUP (1) >>> ============================================================================ >>> ==== >>> >>> runCA failed. >>> >>> ---------------------------------------- >>> Stack trace: >>> >>> at /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line >>> 1237 >>> main::caFailure('failed to apply the overlap corrections', >>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') called >>> at /usit/titan/u1/chrishah/programmes/wgs >>> -7.0/Linux-amd64/bin/./runCA line 4077 >>> main::overlapCorrection() called at >>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line 5880 >>> >>> ---------------------------------------- >>> Last few lines of the relevant log file >>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore-up >>> date-erates.err): >>> >>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>> reading: No such file or directory >>> >>> ---------------------------------------- >>> Failure message: >>> >>> failed to apply the overlap corrections >>> >>> >>> >>> So it can obviously not find the file /salaris.ovlStore/0001~. The reason >>> is, from what I can see, that the /salaris.ovlStore/0001~ file has already >>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it seems >>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there a >>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead of from >>> 0001~, which is obviously not there any more?? >>> Another solution I was thinking of is to run the previous overlapStore >>> command again manually (the one that was done before starting the frgcorr >>> and ovlcorr: >>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore -c >>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M 14000 -L >>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>> restore the status from before the frgcorr and ovlcorr steps, before >>> resuming runCA. This should restore the 0001~ file, right? The most >>> important thing is that I want to avoid rerunning the frgcorr and ovlcorr >>> steps, because these steps were really resource intensive. >>> >>> I would really appreciate any comments or suggestions to my problem! Thanks >>> in advance for your help! >>> >>> much obliged, >>> Christoph >>> >>> University of Oslo >>> >>> >>> >>> >>> ---------------------------------------------------------------------------- >>> -- >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> wgs-assembler-users mailing list >>> wgs...@li... >>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |