From: Walenz, B. <bw...@jc...> - 2012-07-11 03:54:51
|
The first step will create 1 job for each overlapper job. These should be small memory, but there is some internal buffering done and I usually request 2gb for them anyway. The second step will create '-jobs j' jobs. Memory size here is a giant unknown. The '-memory m' option will cause the job to not run if it needs more than that much memory. Currently, you'll have to increase -memory for these jobs and find a bigger machine. All jobs in both steps are single-threaded and run independently of each other. b On 7/10/12 6:46 PM, "Christoph Hahn" <chr...@gm...> wrote: > Hi Brian, > > Thanks! overlaps are being computed now and CVS version of CA has been > successfully compiled. Will try the runCA-overlapStoreBuild.pl once the > overlapper is finished. One question there: I understand that the memory > usage is regulated by the -jobs j parameter. higher value for j means > less memory for every job. How can I specify the number of CPUs to be > used in the parallel steps? > > Thanks for your help! I appreciate it! > > cheers, > Christoph > > On 07/10/2012 10:18 PM, Walenz, Brian wrote: >> Quick guess is that runCA is finding the old ovlStore and assuming it is >> complete, then continuing on to frgcorr. runCA tests for the existence of >> name.ovlStore to determine if overlaps are finished; it doesn't check that >> the store is valid. So, delete *ovlStore* too. >> >> Your latest build (from scratch) is suffering from a long standing >> dependency issue. It needs kmer checked out and 'make install'ed. >> >> make[1]: *** No rule to make target `sweatShop.H', needed by >> `classifyMates.o'. Stop. >> make[1]: *** Waiting for unfinished jobs.... >> make: *** [objs] Error 1 >> >> Once kmer is installed, wipe (again) the Linux-amd64 and rebuild. >> >> The kmer included in CA7 is too old for the CVS version of CA, so you'll >> need to grab it from subversion. >> >> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Check_ou >> t_and_Compile >> >> b >> >> >> On 7/10/12 4:00 PM, "Christoph Hahn" <chr...@gm...> wrote: >> >>> Hi, >>> >>> I actually tried to just rerun the overlapper. I moved the 1-overlapper >>> and the 3-overlapcorrection directories and just ran runCA and it >>> immediately starts with doing frgcorr. Do you mean recompute from the >>> very start? Is there a way to avoid recomputing the initial overlaps at >>> least(it took some 10000 CPUhours)?? >>> >>> Tried to compile it again - not successful. Ran make in the src >>> directory (output in makelog) and also in the AS_RUN directory (output >>> AS_RUN-makelog). >>> >>> Thanks, >>> Christoph >>> >>> >>> On 07/10/2012 09:04 PM, Walenz, Brian wrote: >>>> Odd, the *gz should only be deleted after the store is successfully built. >>>> runCA might have been confused by the attempt to rerun. The easiest will >>>> be >>>> to recompute. :-( >>>> >>>> I've never seen the 'libCA.a' error before. That particular program is the >>>> first to get built. Looks like libCA.a wasn't created. My fix for most >>>> strange compile errors is to remove the entire Linux-amd64 directory and >>>> recompile. If that fails, send along the complete output of make and I'll >>>> take a look. >>>> >>>> b >>>> >>>> >>>> >>>> >>>> On 7/10/12 2:15 PM, "Christoph Hahn" <chr...@gm...> wrote: >>>> >>>>> Hi Brian, >>>>> >>>>> Thanks for your reply! >>>>> >>>>> I would be happy to try the new parallel overlap store build, but I >>>>> think I need the *.ovb.gz outputs for that and unfortunately I dont have >>>>> them any more. Looks like they were deleted after the ovlStore was >>>>> build. So I guess I ll need to run the overlapper again, first. Am I >>>>> understanding that correctly? >>>>> >>>>> I have downloaded the cvs and tried to make, but I get: >>>>> *** No rule to make target `libCA.a', needed by `fragmentDepth'. Stop. >>>>> >>>>> I really appreciate your help! >>>>> >>>>> cheers, >>>>> Christoph >>>>> >>>>> >>>>> On 07/10/2012 05:09 PM, Walenz, Brian wrote: >>>>>> Hi, Christoph- >>>>>> >>>>>> The original overlap store build is difficult to resume. I think it can >>>>>> be >>>>>> done, but it will take code changes that are probably specific to the >>>>>> case >>>>>> you have. Only if you do not have the *ovb.gz outputs from overlapper >>>>>> will >>>>>> I suggest this. >>>>>> >>>>>> Option 1 is then to restart. >>>>>> >>>>>> Option 2 is to use a new 'data-parallel' overlap store build >>>>>> (AS_RUN/runCA-overlapStoreBuild.pl). It runs as a series of three grid >>>>>> jobs. The first job is parallel, and transfers the overlapper output >>>>>> into >>>>>> buckets for sorting. The second job, also parallel, sorts each bucket. >>>>>> The >>>>>> final job, sequential, builds an index for the store. Since this compute >>>>>> is >>>>>> just a collection of jobs, it can be restarted/resumed/fixed easily. >>>>>> >>>>>> Its performance can be great -- at JCVI we've seen builds that we >>>>>> estimated >>>>>> would take 2 days using the original sequential build, finish in a few >>>>>> (4?) >>>>>> hours with the data parallel version. But on our development cluster, it >>>>>> is >>>>>> slower than the sequential version. It depends on the disk throughput. >>>>>> Our >>>>>> dev cluster is powered off of a 6-disk ZFS, while the production side has >>>>>> a >>>>>> big Isilon. >>>>>> >>>>>> It is only in CVS. I just added command line help and a bit of >>>>>> documentation, so do an update first. >>>>>> >>>>>> Happy to provide help if you want to try it out. More than happy to >>>>>> accept >>>>>> better documentation. >>>>>> >>>>>> b >>>>>> >>>>>> >>>>>> On 7/10/12 6:47 AM, "Christoph Hahn" <chr...@gm...> wrote: >>>>>> >>>>>>> Hei Ole, >>>>>>> >>>>>>> Thanks for your reply. I had looked on the preprocessing page you are >>>>>>> referring to just recently. Sounds like a good approach you are using! >>>>>>> Will definitely consider that to make the assembly more effective in a >>>>>>> next try. Thanks for that! >>>>>>> For now, I think I am pretty much over all the trimming and correction >>>>>>> steps (once I get this last thing sorted out..). As far as I can see the >>>>>>> next step is already building the unitigs, so I ll try to finish this >>>>>>> assembly as it is now. Will try to improve it afterwards. I am really >>>>>>> curious how a first attempt of a hybrid approach (454+illumina) will >>>>>>> perform in comparison to the pure illumina assemblies which I have >>>>>>> pretty much optimized now (and with which I am pretty happy, btw), I >>>>>>> think. >>>>>>> >>>>>>> I am afraid, your suggestion to do doFragmentCorrection=0 directly now >>>>>>> will not work. For the next step (the unitigger) I ll need an intact >>>>>>> overlap store. As it is now, I think it is useless, being only >>>>>>> half-updated.. I also discovered that just rerunning the previous >>>>>>> overlapStore command (the one before the frg- and ovlcorrection) is not >>>>>>> working as I thought it would. >>>>>>> Seems to be a very unfortunate situation - really dont know how to >>>>>>> proceed.. It would be fantastic if anyone could give me a tip what to >>>>>>> do!! >>>>>>> >>>>>>> Thanks for your help! >>>>>>> >>>>>>> much obliged, >>>>>>> Christoph >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 09.07.2012 13:20, Ole Kristian Tørresen wrote: >>>>>>>> Hi Christoph. >>>>>>>> >>>>>>>> This is not an answer to your question, but a suggestion for a >>>>>>>> work-around. If I remember correctly, you have both Illumina and 454 >>>>>>>> reads. Celera runs, as you see below, frgcorrection and overlap based >>>>>>>> trimming to correct 454 reads, and merTrim to correct Illumina reads >>>>>>>> (can also be used on 454 reads). What I've been doing lately, is to >>>>>>>> run meryl on a trusted set of Illumina reads, pair end for example, I >>>>>>>> ran it on some overlapping reads which I had merged with FLASH. Then >>>>>>>> you can use the set of trusted k-mers to correct different datasets. >>>>>>>> For example, I first ran CA to the end of OBT (overlap based trimming) >>>>>>>> for my 454 reads, and then output the result as fastq-files. I used >>>>>>>> the trusted k-mer set to correct these 454 reads too. If you do this >>>>>>>> for all your reads, used either merTim or merTrim/OBT, and do >>>>>>>> deduplication on all the datasets too, then you'll end up with reads >>>>>>>> that you can use in assemblies where you skip relatively expensive >>>>>>>> steps as frgcorrection. >>>>>>>> >>>>>>>> I don't think frgcorrection is that useful for the type of data you're >>>>>>>> using anyway. >>>>>>>> >>>>>>>> If you have a set of corrected reads, you can use these settings for >>>>>>>> CA: >>>>>>>> doOBT=0 >>>>>>>> doFragmentCorrection=0 >>>>>>>> >>>>>>>> When I think of it, you might use doFragmentCorrection=0 on this >>>>>>>> assembly now. You might have to clean up your directory tree, like >>>>>>>> removing the 3-overlapcorrection directory and maybe some other steps >>>>>>>> too. Apply with caution. >>>>>>>> >>>>>>>> Most of the stuff I've mentioned I've taken from here: >>>>>>>> http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Pre >>>>>>>> pr >>>>>>>> oc >>>>>>>> es >>>>>>>> sing >>>>>>>> and discussion with Brian. >>>>>>>> >>>>>>>> Ole >>>>>>>> >>>>>>>> On 9 July 2012 12:47, Christoph Hahn<chr...@gm...> wrote: >>>>>>>>> Dear users and developers, >>>>>>>>> >>>>>>>>> I have the following problem: In my assembly process I have just >>>>>>>>> completed >>>>>>>>> the fragment- and overlap error correction. Unfortunately runCA >>>>>>>>> stopped >>>>>>>>> in >>>>>>>>> the subsequent updating of the overlapStore, because of an incorrectly >>>>>>>>> set >>>>>>>>> time limit.. >>>>>>>>> If I am trying to resume the assembly now, I get the following error: >>>>>>>>> ----------------------------------------START Mon Jul 9 11:05:53 2012 >>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>>> -u >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapco >>>>>>>>> rrection/salaris.erates> >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSto >>>>>>>>> re >>>>>>>>> -u >>>>>>>>> pd >>>>>>>>> ate-erates.err >>>>>>>>> 2>&1 >>>>>>>>> ----------------------------------------END Mon Jul 9 11:05:54 2012 >>>>>>>>> (1 >>>>>>>>> seconds) >>>>>>>>> ERROR: Failed with signal HUP (1) >>>>>>>>> ====================================================================== >>>>>>>>> == >>>>>>>>> == >>>>>>>>> == >>>>>>>>> ==== >>>>>>>>> >>>>>>>>> runCA failed. >>>>>>>>> >>>>>>>>> ---------------------------------------- >>>>>>>>> Stack trace: >>>>>>>>> >>>>>>>>> at >>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>> line >>>>>>>>> 1237 >>>>>>>>> main::caFailure('failed to apply the overlap corrections', >>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/o...') >>>>>>>>> called >>>>>>>>> at /usit/titan/u1/chrishah/programmes/wgs >>>>>>>>> -7.0/Linux-amd64/bin/./runCA line 4077 >>>>>>>>> main::overlapCorrection() called at >>>>>>>>> /usit/titan/u1/chrishah/programmes/wgs-7.0/Linux-amd64/bin/./runCA >>>>>>>>> line >>>>>>>>> 5880 >>>>>>>>> >>>>>>>>> ---------------------------------------- >>>>>>>>> Last few lines of the relevant log file >>>>>>>>> (/projects/nn9201k/Celera/work2/salaris1/3-overlapcorrection/overlapSt >>>>>>>>> or >>>>>>>>> e- >>>>>>>>> up >>>>>>>>> date-erates.err): >>>>>>>>> >>>>>>>>> AS_OVS_openBinaryOverlapFile()-- Failed to open >>>>>>>>> '/projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore/0001~' for >>>>>>>>> reading: No such file or directory >>>>>>>>> >>>>>>>>> ---------------------------------------- >>>>>>>>> Failure message: >>>>>>>>> >>>>>>>>> failed to apply the overlap corrections >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> So it can obviously not find the file /salaris.ovlStore/0001~. The >>>>>>>>> reason >>>>>>>>> is, from what I can see, that the /salaris.ovlStore/0001~ file has >>>>>>>>> already >>>>>>>>> been updated to /salaris.ovlStore/0001 before it stopped. In fact it >>>>>>>>> seems >>>>>>>>> to have stopped after updating /salaris.ovlStore/0249 (of 430). Is >>>>>>>>> there >>>>>>>>> a >>>>>>>>> way to tell runCA to continue from /salaris.ovlStore/0250~, instead >>>>>>>>> of >>>>>>>>> from >>>>>>>>> 0001~, which is obviously not there any more?? >>>>>>>>> Another solution I was thinking of is to run the previous overlapStore >>>>>>>>> command again manually (the one that was done before starting the >>>>>>>>> frgcorr >>>>>>>>> and ovlcorr: >>>>>>>>> /xanadu/home/chrishah/programmes/wgs-7.0/Linux-amd64/bin/overlapStore >>>>>>>>> -c >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.BUILDING -g >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.gkpStore -i 0 -M >>>>>>>>> 14000 >>>>>>>>> -L >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.list> >>>>>>>>> /projects/nn9201k/Celera/work2/salaris1/salaris.ovlStore.err 2>&1) to >>>>>>>>> restore the status from before the frgcorr and ovlcorr steps, before >>>>>>>>> resuming runCA. This should restore the 0001~ file, right? The most >>>>>>>>> important thing is that I want to avoid rerunning the frgcorr and >>>>>>>>> ovlcorr >>>>>>>>> steps, because these steps were really resource intensive. >>>>>>>>> >>>>>>>>> I would really appreciate any comments or suggestions to my problem! >>>>>>>>> Thanks >>>>>>>>> in advance for your help! >>>>>>>>> >>>>>>>>> much obliged, >>>>>>>>> Christoph >>>>>>>>> >>>>>>>>> University of Oslo >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ---------------------------------------------------------------------- >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> -- >>>>>>>>> Live Security Virtual Conference >>>>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>>>> Discussions >>>>>>>>> will include endpoint security, mobile security and the latest in >>>>>>>>> malware >>>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>>>> _______________________________________________ >>>>>>>>> wgs-assembler-users mailing list >>>>>>>>> wgs...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>>>>>>>> >>>>>>> ------------------------------------------------------------------------ >>>>>>> -- >>>>>>> -- >>>>>>> -- >>>>>>> Live Security Virtual Conference >>>>>>> Exclusive live event will cover all the ways today's security and >>>>>>> threat landscape has changed and how IT managers can respond. >>>>>>> Discussions >>>>>>> will include endpoint security, mobile security and the latest in >>>>>>> malware >>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>>>> _______________________________________________ >>>>>>> wgs-assembler-users mailing list >>>>>>> wgs...@li... >>>>>>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users >>> > > |