From: Brian W. <th...@gm...> - 2015-05-11 11:12:03
|
60 tb? Repeats ho! For future use, the output of overlapper are gzip compressed, and will be uncompressed, and then doubled to go into the ovlStore. What CA version are you using? Later versions were optimized more for PacBio than for Illumina. One change made data sizes larger than they really need to be. In file AS_global.H, AS_READ_MAX_NORMAL_LEN_BITS should be set to 11 (2047 bases). Smaller won't help. Definitely drop the shorter reads. The historical minimum is 64 bases, but I'd just drop anything that didn't stitch together. Losing the mates here won't degrade the assembly, and will make it easier to run. I don't have a good feel for how much coverage is too much. One way to look at coverage is to compute the expected overlap size. For 60x coverage of 280bp reads, you'd sample a read every 5bp, so the expected overlap is 275bp. 30x coverage would drop this to 270bp. Do you have a (cumulative) histogram of read lengths? In addition to throwing out short reads, definitely increase the minimum overlap size (ovlMinLen) to whatever the length of the shortest read is, -1 (or 2 or ...). What kmer threshold did it pick (0-mercounts, one of the *err files)? Can you send the histogram file? Plotting the first two columns should show a definite hump at the expected coverage, with a large tail. Any humps after that are repeats that probably should be excluded from seeding overlaps. Be sure to check way out on the X axis, with Y zoomed in, for any very common repeats. Increasing the kmer size probably won't reduce disk usage, but might reduce run time, at the expense of sensitivity. Illumina reads are usually pretty clean, and you can probably get away with merSize=28. Any chance there is adapter present? I did that once. Had it completed, I would have ended up with a near complete graph - every read overlapping every other read. If you have pre-trimmed your reads, you can skip OBT, saving a round of overlapper and ovlStore. Finally, nope, mer will use more. Lots and lots more. It (was) appropriate for 454 microbes. b On Fri, May 8, 2015 at 9:01 PM, Langhorst, Brad <Lan...@ne...> wrote: > Hi: > > I’m trying to build unitigs for a ~ 3Gbase organism. > > I’m about 1/6 of the way through the 0-overlaptrim-overlap step and I’m > already using about 10T of disk and I had to pause the job since that’s > all I have...I can’t get 60T of disk. > > I have ~ 1.5B stitched illumina reads = ~ 50 - 280bp (used PEAR to > stitch since wgs could not handle R1+R2 frags due to integer overflow) > i figure about 120X or 60X per allele. > too much? > should I toss half of the shorter reads? > how much will that help with disk space? > > > I’m using the ovl overlapper… > will the mer overlapper use less disk? > > Is there another setting I can change to use less disk? > > > Brad > > spec file: > > merylMemory = 128000 > merylThreads = 16 > > mbtThreads = 8 > > ovlStoreMemory=8192 > > #ovlStoreMemory=10000 > > useGrid = 1 > scriptOnGrid = 0 > > ovlHashBits=25 > > # with this setting I observe about 2 runs at full capacity and one at > 20, - can't use a smaller number because the job count is too large for sge > #ovlHashBlockLength=1800000000 > > ovlHashBlockLength=2880000000 # should be about 400% > ovlThreads=8 > > #ovlOverlapper=mer > #merOverlapperThreads=8 > #merOverlapperSeedBatchSize=1000000 > > > frgCorrBatchSize = 10000000 > frgCorrThreads = 8 > > unitigger=bogart > batThreads=8 > stopAfter=unitigger > sge=-p -100 -pe smp 8 > > frgCorrOnGrid=1 > ovlCorrOnGrid=1 > > > -- > Brad Langhorst, Ph.D. > Applications and Product Development Scientist > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > > |