From: Walenz, B. <bw...@jc...> - 2012-05-10 17:50:41
|
Hi, Ole- ovlHashLibrary=2 does mean to load only reads from the second library into the hash table. In this case, it's the pac bio reads. The 'ref' library is what fragments we search against the hash table. ovlRefLibrary=1-1 translates to 'starting at library 1 and ending at library 1'. Overlaps well be computed between library 1 and 2, but not in the same library. I should point out that this isn't implemented perfectly. The overlap jobs for computing overlaps within library 1 are still launched, and the hash tables are still built, but no overlaps are output. The 'overlap_partition' command is responsible for setting up the hash and reference ranges for each overlap job, and this isn't aware of the ovlHashLibrary/ovlRefLibrary options. We've been recently disabling OBT (and fragmentCorrection) in runCA, and doing all trimming/correction outside the assembler. In your case, you can run the assembler up through OBT on all your 454 reads, then dump gatekeeper to build a trimmed fragment set. If you're using CVS tip, dumping as fastq will work too. With the pacbio reads, this is mandatory, since the pipeline will split some of the pacbio reads into multiple pieces. The obt overlaps and ovl overlaps used for assembly aren't compatible. The obt overlaps are more like blast matches (align a-b in read 1 to c-d in read 2) while the ovl overlaps are ... overlaps; see http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Overlaps . Since trimming will change the length of the read, it's impossible to translate the overlaps on untrimmed reads to overlaps on trimmed reads. b On 5/10/12 4:53 AM, "Ole Kristian Tørresen" <o.k...@bi...> wrote: > Hi, > we have started doing some sequencing on PacBio, and correcting the > reads with the PacBioToCA pipeline. The genome about 800 Mb, and we're > trying to correct the PacBio reads from two SMRTcells with about 20x > in 454 reads. This translates to 130,389 PacBio reads with 126 Mb > sequence, and 47M 454 reads and 17.6 Gb sequence. > > We see that 0-overlaptrim-overlap uses quite a bit of time, and I fear > that 1-overlapper will use a long time too. Is it possible to compute > the overlaps between the 454 reads ahead of time, and use the overlaps > from that store to only compute the overlaps between 454 reads and > PacBio reads? Since I guess most to time is spent computing the > overlaps between 454 reads. This could be useful for assembly in > general too, sometimes we only input some data to have a faster > assembly, while later on we input more. > > When I look at the command that's used to run CA in the error > correction step: runCA -s pacbio.spec -p asm -d temppacbio > ovlHashLibrary=2 ovlRefLibrary=1-1 obtHashLibrary=1-1 > obtRefLibrary=1-1 sge=" -sync y" sgePropagateHold=corAsm > stopAfter=overlapper, does it actually do something what I ask for? It > only loads hash fragments from library 2, but it loads all libraries > in the other *Library options (1-1 = 0)? Could anyone explain to me > what that really means? > > Sincerely, > Ole > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |