From: Christoph H. <chr...@gm...> - 2012-07-10 22:46:20
|
Hi Brian, Thanks! overlaps are being computed now and CVS version of CA has been successfully compiled. Will try the runCA-overlapStoreBuild.pl once the overlapper is finished. One question there: I understand that the memory usage is regulated by the -jobs j parameter. higher value for j means less memory for every job. How can I specify the number of CPUs to be used in the parallel steps? Thanks for your help! I appreciate it! cheers, Christoph On 07/10/2012 10:18 PM, Walenz, Brian wrote: > Quick guess is that runCA is finding the old ovlStore and assuming it is > complete, then continuing on to frgcorr. runCA tests for the existence of > name.ovlStore to determine if overlaps are finished; it doesn't check that > the store is valid. So, delete *ovlStore* too. > > Your latest build (from scratch) is suffering from a long standing > dependency issue. It needs kmer checked out and 'make install'ed. > > make[1]: *** No rule to make target `sweatShop.H', needed by > `classifyMates.o'. Stop. > make[1]: *** Waiting for unfinished jobs.... > make: *** [objs] Error 1 > > Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. > > The kmer included in CA7 is too old for the CVS version of CA, so you'll > need to grab it from subversion. > > http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_ou > t_and_Compile > > b > > > On 7/10/12 4:00 PM, "Christoph Hahn" <chr...@gm...> wrote: > >> Hi, >> >> I actually tried to just rerun the overlapper. I moved the 1-overlapper >> and the 3-overlapcorrection directories and just ran runCA and it >> immediately starts with doing frgcorr. Do you mean recompute from the >> very start? Is there a way to avoid recomputing the initial overlaps at >> least(it took some 10000 CPUhours)?? >> >> Tried to compile it again - not successful. Ran make in the src >> directory (output in makelog) and also in the AS_RUN directory (output >> AS_RUN-makelog). >> >> Thanks, >> Christoph >> >> >> On 07/10/2012 09:04 PM, Walenz, Brian wrote: >>> Odd, the *gz should only be deleted after the store is successfully built. >>> runCA might have been confused by the attempt to rerun. The easiest will be >>> to recompute. :-( >>> >>> I've never seen the 'libCA.a' error before. That particular program is the >>> first to get built. Looks like libCA.a wasn't created. My fix for most >>> strange compile errors is to remove the entire Linux-amd64 directory and >>> recompile. If that fails, send along the complete output of make and I'll >>> take a look. >>> >>> b >>> >>> >>> >>> >>> On 7/10/12 2:15 PM, "Christoph Hahn" <chr...@gm...> wrote: >>> >>>> Hi Brian, >>>> >>>> Thanks for your reply! >>>> >>>> I would be happy to try the new parallel overlap store build, but I >>>> think I need the *.ovb.gz outputs for that and unfortunately I dont have >>>> them any more. Looks like they were deleted after the ovlStore was >>>> build. So I guess I ll need to run the overlapper again, first. Am I >>>> understanding that correctly? >>>> >>>> I have downloaded the cvs and tried to make, but I get: >>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. >>>> >>>> I really appreciate your help! >>>> >>>> cheers, >>>> Christoph >>>> >>>> >>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>>> Hi, Christoph- >>>>> >>>>> The original overlap store build is difficult to resume. I think it can be >>>>> done, but it will take code changes that are probably specific to the case >>>>> you have. Only if you do not have the *ovb.gz outputs from overlapper will >>>>> I suggest this. >>>>> >>>>> Option 1 is then to restart. >>>>> >>>>> Option 2 is to use a new 'data-parallel' overlap store build >>>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid >>>>> jobs. The first job is parallel, and transfers the overlapper output into >>>>> buckets for sorting. The second job, also parallel, sorts each bucket. >>>>> The >>>>> final job, sequential, builds an index for the store. Since this compute >>>>> is >>>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>>> >>>>> Its performance can be great -- at JCVI we've seen builds that we estimated >>>>> would take 2 days using the original sequential build, finish in a few (4?) >>>>> hours with the data parallel version. But on our development cluster, it >>>>> is >>>>> slower than the sequential version. It depends on the disk throughput. >>>>> Our >>>>> dev cluster is powered off of a 6-disk ZFS, while the production side has a >>>>> big Isilon. >>>>> >>>>> It is only in CVS. I just added command line help and a bit of >>>>> documentation, so do an update first. >>>>> >>>>> Happy to provide help if you want to try it out. More than happy to accept >>>>> better documentation. >>>>> >>>>> b >>>>> >>>>> >>>>> On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: >>>>> >>>>>> Hei Ole, >>>>>> >>>>>> Thanks for your reply. I had looked on the preprocessing page you are >>>>>> referring to just recently. Sounds like a good approach you are using! >>>>>> Will definitely consider that to make the assembly more effective in a >>>>>> next try. Thanks for that! >>>>>> For now, I think I am pretty much over all the trimming and correction >>>>>> steps (once I get this last thing sorted out..). As far as I can see the >>>>>> next step is already building the unitigs, so I ll try to finish this >>>>>> assembly as it is now. Will try to improve it afterwards. I am really >>>>>> curious how a first attempt of a hybrid approach (454+illumina) will >>>>>> perform in comparison to the pure illumina assemblies which I have >>>>>> pretty much optimized now (and with which I am pretty happy, btw), I >>>>>> think. >>>>>> >>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now >>>>>> will not work. For the next step (the unitigger) I ll need an intact >>>>>> overlap store. As it is now, I think it is useless, being only >>>>>> half-updated.. I also discovered that just rerunning the previous >>>>>> overlapStore command (the one before the frg- and ovlcorrection) is not >>>>>> working as I thought it would. >>>>>> Seems to be a very unfortunate situation - really dont know how to >>>>>> proceed.. It would be fantastic if anyone could give me a tip what to do!! >>>>>> >>>>>> Thanks for your help! >>>>>> >>>>>> much obliged, >>>>>> Christoph >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>>> Hi Christoph. >>>>>>> >>>>>>> This is not an answer to your question, but a suggestion for a >>>>>>> work-around. If I remember correctly, you have both Illumina and 454 >>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap based >>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, I >>>>>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>>>>> you can use the set of trusted k-mers to correct different datasets. >>>>>>> For example, I first ran CA to the end of OBT (overlap based trimming) >>>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>>> deduplication on all the datasets too, then you'll end up with reads >>>>>>> that you can use in assemblies where you skip relatively expensive >>>>>>> steps as frgcorrection. >>>>>>> >>>>>>> I don't think frgcorrection is that useful for the type of data you're >>>>>>> using anyway. >>>>>>> >>>>>>> If you have a set of corrected reads, you can use these settings for CA: >>>>>>> doOBT=0 >>>>>>> doFragmentCorrection=0 >>>>>>> >>>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>>> assembly now. You might have to clean up your directory tree, like >>>>>>> removing the 3-overlapcorrection directory and maybe some other steps >>>>>>> too. Apply with caution. >>>>>>> >>>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Prepr >>>>>>> oc >>>>>>> es >>>>>>> sing >>>>>>> and discussion with Brian. >>>>>>> >>>>>>> Ole >>>>>>> >>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>>>>>>> Dear users and developers, >>>>>>>> >>>>>>>> I have the following problem: In my assembly process I have just >>>>>>>> completed >>>>>>>> the fragment- and overlap error correction. Unfortunately runCA stopped >>>>>>>> in >>>>>>>> the subsequent updating of the overlapStore, because of an incorrectly >>>>>>>> set >>>>>>>> time limit.. >>>>>>>> If I am trying to resume the assembly now, I get the following error: >>>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>> -u >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>>> rrection/salaris.erates> >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStore >>>>>>>> -u >>>>>>>> pd >>>>>>>> ate-erates.err >>>>>>>> 2>&1 >>>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 2012 (1 >>>>>>>> seconds) >>>>>>>> ERROR: Failed with signal HUP (1) >>>>>>>> ======================================================================== >>>>>>>> == >>>>>>>> == >>>>>>>> ==== >>>>>>>> >>>>>>>> runCA failed. >>>>>>>> >>>>>>>> ---------------------------------------- >>>>>>>> Stack trace: >>>>>>>> >>>>>>>> at >>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>> line >>>>>>>> 1237 >>>>>>>> main::caFailure('failed to apply the overlap corrections', >>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>>> called >>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>>> main::overlapCorrection() called at >>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA line >>>>>>>> 5880 >>>>>>>> >>>>>>>> ---------------------------------------- >>>>>>>> Last few lines of the relevant log file >>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapStor >>>>>>>> e- >>>>>>>> up >>>>>>>> date-erates.err): >>>>>>>> >>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>>>>>>> reading: No such file or directory >>>>>>>> >>>>>>>> ---------------------------------------- >>>>>>>> Failure message: >>>>>>>> >>>>>>>> failed to apply the overlap corrections >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>>> reason >>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>>> already >>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it >>>>>>>> seems >>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is there >>>>>>>> a >>>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead of >>>>>>>> from >>>>>>>> 0001~, which is obviously not there any more?? >>>>>>>> Another solution I was thinking of is to run the previous overlapStore >>>>>>>> command again manually (the one that was done before starting the >>>>>>>> frgcorr >>>>>>>> and ovlcorr: >>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>> -c >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M 14000 >>>>>>>> -L >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>>>>>>> restore the status from before the frgcorr and ovlcorr steps, before >>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>>> ovlcorr >>>>>>>> steps, because these steps were really resource intensive. >>>>>>>> >>>>>>>> I would really appreciate any comments or suggestions to my problem! >>>>>>>> Thanks >>>>>>>> in advance for your help! >>>>>>>> >>>>>>>> much obliged, >>>>>>>> Christoph >>>>>>>> >>>>>>>> University of Oslo >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> -- >>>>>>>> -- >>>>>>>> -- >>>>>>>> Live Security Virtual Conference >>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>> Discussions >>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>> malware >>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>> _______________________________________________ >>>>>>>> wgs-assembler-users mailing list >>>>>>>> wgs...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>>> >>>>>> -------------------------------------------------------------------------- >>>>>> -- >>>>>> -- >>>>>> Live Security Virtual Conference >>>>>> Exclusive live event will cover all the ways today's security and >>>>>> threat landscape has changed and how IT managers can respond. Discussions >>>>>> will include endpoint security, mobile security and the latest in malware >>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>> _______________________________________________ >>>>>> wgs-assembler-users mailing list >>>>>> wgs...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >> |