[wgs-assembler-users] Saving and combining overlap stores (error correction PacBio reads for ex.)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,
we have started doing some sequencing on PacBio, and correcting the
reads with the PacBioToCA pipeline. The genome about 800 Mb, and we're
trying to correct the PacBio reads from two SMRTcells with about 20x
in 454 reads. This translates to 130,389 PacBio reads with 126 Mb
sequence, and 47M 454 reads and 17.6 Gb sequence.

We see that 0-overlaptrim-overlap uses quite a bit of time, and I fear
that 1-overlapper will use a long time too. Is it possible to compute
the overlaps between the 454 reads ahead of time, and use the overlaps
from that store to only compute the overlaps between 454 reads and
PacBio reads? Since I guess most to time is spent computing the
overlaps between 454 reads. This could be useful for assembly in
general too, sometimes we only input some data to have a faster
assembly, while later on we input more.

When I look at the command that's used to run CA in the error
correction step: runCA -s pacbio.spec -p asm -d temppacbio
ovlHashLibrary=2 ovlRefLibrary=1-1 obtHashLibrary=1-1
obtRefLibrary=1-1 sge=" -sync y" sgePropagateHold=corAsm
stopAfter=overlapper, does it actually do something what I ask for? It
only loads hash fragments from library 2, but it loads all libraries
in the other *Library options (1-1 = 0)? Could anyone explain to me
what that really means?

Sincerely,
Ole