From: Ole K. T. <o.k...@bi...> - 2012-05-10 08:53:47
|
Hi, we have started doing some sequencing on PacBio, and correcting the reads with the PacBioToCA pipeline. The genome about 800 Mb, and we're trying to correct the PacBio reads from two SMRTcells with about 20x in 454 reads. This translates to 130,389 PacBio reads with 126 Mb sequence, and 47M 454 reads and 17.6 Gb sequence. We see that 0-overlaptrim-overlap uses quite a bit of time, and I fear that 1-overlapper will use a long time too. Is it possible to compute the overlaps between the 454 reads ahead of time, and use the overlaps from that store to only compute the overlaps between 454 reads and PacBio reads? Since I guess most to time is spent computing the overlaps between 454 reads. This could be useful for assembly in general too, sometimes we only input some data to have a faster assembly, while later on we input more. When I look at the command that's used to run CA in the error correction step: runCA -s pacbio.spec -p asm -d temppacbio ovlHashLibrary=2 ovlRefLibrary=1-1 obtHashLibrary=1-1 obtRefLibrary=1-1 sge=" -sync y" sgePropagateHold=corAsm stopAfter=overlapper, does it actually do something what I ask for? It only loads hash fragments from library 2, but it loads all libraries in the other *Library options (1-1 = 0)? Could anyone explain to me what that really means? Sincerely, Ole |