From: Walenz, B. <bw...@jc...> - 2012-11-20 12:41:02
|
Hi, Xueping- That’s frustrating! Can you send along the qc report? We’re just finishing up a repetitive fish. We had some success changing the ‘astat’ cutoffs for labeling unitigs unique/not-unique. We used astatHighBound=0 and astatLowBound=-20 based on a plot of unitig length vs astat (numbers came from 5-consensus-coverage-stat, but I didn’t do the analysis and would have to pester someone to get any scripts to pass along). If there are large degenerate contigs, this will help by labeling them as unique and letting them be used for scaffolds. Or, it’s possible that unitig construction was poor. I’ll have to think about how to measure this — are they small because of bad trimming, low coverage, biased coverage or repeat boundaries? The signal for all of these looks basically the same, but the resolution is quite different. Sorry I’m not much help yet. b On 11/20/12 6:05 AM, "Quan, Xueping" <x....@im...> wrote: Dear All I have a large plant genome (3.5Gb in size) with high repeat content (more than 60%). The sequencing data I got are about 45x Illumina paired-end and mate pair data (after data cleaning), and 0.5x 454 mate pair data. I have finished the assembly using celera. However, the coverage of contig (600mb) and scaffold sequences (660mb) for the genome is very low. Most of the unitigs (about 5Gb) sequences are failed to be combined into any scaffold). Below is my spec file, could anyone help to give suggestion about how to improve the assembly: " utgGraphErrorRate=0.03 # bogart use utgGraphErrorRate, utgGraphErrorLimit, utgMergeErrorRate, utgMergeErrorLimit utgGraphErrorLimit=3.25 # utgMergeErrorRate=0.045 utgMergeErrorLimit=5.25 ovlErrorRate=0.04 # Larger than utg to allow for correction. cnsErrorRate=0.08 # Larger than utg to avoid occasional consensus failures cgwErrorRate=0.10 # Larger than utg to allow contig merges across high-error ends gkpAllowInefficientStorage=1 # frgMinLen=64 # fragment shorter than this length are not loaded into the assembler ovlMinLen=40 # overlaps shorter than this length are not computed # merSize =22 # default=22; use lower to combine across heterozygosity, higher to separate near-identical repeat copies overlapper=ovl # the mer overlapper for 454-like data is insensitive to homopolymer problems but requires more RAM and disk #UNITIGGER configuration unitigger = bogart batMemory=650 utgBubblePopping = 1 batThreads=64 # utgGenomeSize = 3.5gb # # MERYL calculates K-mer seeds merylMemory = 512000 merylThreads = 32 # # OVERLAPPER calculates overlaps ovlHashBits=24 ovlHashBlockLength=700000000 ovlThreads = 2 ovlConcurrency = 32 ovlRefBlockSize = 320000000 # # OVERLAP STORE build the database ovlStoreMemory = 109210 # Mbp # ERROR CORRECTION not applied to overlaps doFragmentCorrection=0 # Scafolder # CONSENSUS configuration cnsConcurrency = 64 L1_GAIIx.frg L2_GAIIx.frg L3_GAIIx.frg L4_GAIIx.frg L5_GAIIx.frg L6_GAIIx.frg L7_GAIIx.frg L8_GAIIx.frg L3_HiSeq.frg L4_HiSeq.frg L5_HiSeq.frg L6_HiSeq.frg L1_454.frg L2_454.frg L3_454.frg L4_454.frg L5_454.frg L6_454.frg L7_454.frg L8_454.frg L9_454.frg L10_454.frg " Thanks very much! Xueping Quan Imperial College London Tel: +44(0)207 594 17 80 |