From: Ole K. T. <o.k...@bi...> - 2012-05-10 18:55:20
|
Hi Brian. Thank you for this, good to know. Our PacBio fastq files were over multiple lines (SMRT-Portal 1.3... Thank you a lot PacBio!), and the correction pipeline ran for 17 days taking up 48 CPUs and I guess we can just kill it now. On 10 May 2012 19:50, Walenz, Brian <bw...@jc...> wrote: > Hi, Ole- > > ovlHashLibrary=2 does mean to load only reads from the second library into > the hash table. In this case, it's the pac bio reads. The 'ref' library is > what fragments we search against the hash table. ovlRefLibrary=1-1 > translates to 'starting at library 1 and ending at library 1'. Overlaps > well be computed between library 1 and 2, but not in the same library. > > I should point out that this isn't implemented perfectly. The overlap jobs > for computing overlaps within library 1 are still launched, and the hash > tables are still built, but no overlaps are output. The 'overlap_partition' > command is responsible for setting up the hash and reference ranges for each > overlap job, and this isn't aware of the ovlHashLibrary/ovlRefLibrary > options. > > We've been recently disabling OBT (and fragmentCorrection) in runCA, and > doing all trimming/correction outside the assembler. In your case, you can > run the assembler up through OBT on all your 454 reads, then dump gatekeeper > to build a trimmed fragment set. If you're using CVS tip, dumping as fastq > will work too. With the pacbio reads, this is mandatory, since the pipeline > will split some of the pacbio reads into multiple pieces. I saw some submissions to the CVS about this, but couldn't figure out exactly what it meant. This clears up that. I recently started an assembly with 454 and Illumina reads (Illumina corrected Quake), and correct-frags have run for several days now. Should I run OBT on all my 454 reads, dump the trimmed reads, and use them in a new assembly with the error corrected Illumina reads? The default with the CVS tip will then be to not run correct-frags etc on those reads? What will be the effect of using these trimmed 454 reads for PacBio error correction? > > The obt overlaps and ovl overlaps used for assembly aren't compatible. The > obt overlaps are more like blast matches (align a-b in read 1 to c-d in read > 2) while the ovl overlaps are ... overlaps; see > http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Overlaps > . Since trimming will change the length of the read, it's impossible to > translate the overlaps on untrimmed reads to overlaps on trimmed reads. I hadn't seen that page. It's a useful reference (as are other "hidden" pages at that wiki.) Ole > > b > > > > On 5/10/12 4:53 AM, "Ole Kristian Tørresen" <o.k...@bi...> wrote: > >> Hi, >> we have started doing some sequencing on PacBio, and correcting the >> reads with the PacBioToCA pipeline. The genome about 800 Mb, and we're >> trying to correct the PacBio reads from two SMRTcells with about 20x >> in 454 reads. This translates to 130,389 PacBio reads with 126 Mb >> sequence, and 47M 454 reads and 17.6 Gb sequence. >> >> We see that 0-overlaptrim-overlap uses quite a bit of time, and I fear >> that 1-overlapper will use a long time too. Is it possible to compute >> the overlaps between the 454 reads ahead of time, and use the overlaps >> from that store to only compute the overlaps between 454 reads and >> PacBio reads? Since I guess most to time is spent computing the >> overlaps between 454 reads. This could be useful for assembly in >> general too, sometimes we only input some data to have a faster >> assembly, while later on we input more. >> >> When I look at the command that's used to run CA in the error >> correction step: runCA -s pacbio.spec -p asm -d temppacbio >> ovlHashLibrary=2 ovlRefLibrary=1-1 obtHashLibrary=1-1 >> obtRefLibrary=1-1 sge=" -sync y" sgePropagateHold=corAsm >> stopAfter=overlapper, does it actually do something what I ask for? It >> only loads hash fragments from library 2, but it loads all libraries >> in the other *Library options (1-1 = 0)? Could anyone explain to me >> what that really means? >> >> Sincerely, >> Ole >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> wgs-assembler-users mailing list >> wgs...@li... >> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users > |