Re: [wgs-assembler-users] Saving and combining overlap stores (error correction PacBio reads for ex

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi, Ole-

ovlHashLibrary=2 does mean to load only reads from the second library into
the hash table.  In this case, it's the pac bio reads.  The 'ref' library is
what fragments we search against the hash table.  ovlRefLibrary=1-1
translates to 'starting at library 1 and ending at library 1'.  Overlaps
well be computed between library 1 and 2, but not in the same library.

I should point out that this isn't implemented perfectly.  The overlap jobs
for computing overlaps within library 1 are still launched, and the hash
tables are still built, but no overlaps are output.  The 'overlap_partition'
command is responsible for setting up the hash and reference ranges for each
overlap job, and this isn't aware of the ovlHashLibrary/ovlRefLibrary
options.

We've been recently disabling OBT (and fragmentCorrection) in runCA, and
doing all trimming/correction outside the assembler.  In your case, you can
run the assembler up through OBT on all your 454 reads, then dump gatekeeper
to build a trimmed fragment set.  If you're using CVS tip, dumping as fastq
will work too.  With the pacbio reads, this is mandatory, since the pipeline
will split some of the pacbio reads into multiple pieces.

The obt overlaps and ovl overlaps used for assembly aren't compatible.  The
obt overlaps are more like blast matches (align a-b in read 1 to c-d in read
2) while the ovl overlaps are ... overlaps; see
http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Overlaps
.  Since trimming will change the length of the read, it's impossible to
translate the overlaps on untrimmed reads to overlaps on trimmed reads.

b

On 5/10/12 4:53 AM, "Ole Kristian Tørresen" <o.k...@bi...> wrote:

> Hi,
> we have started doing some sequencing on PacBio, and correcting the
> reads with the PacBioToCA pipeline. The genome about 800 Mb, and we're
> trying to correct the PacBio reads from two SMRTcells with about 20x
> in 454 reads. This translates to 130,389 PacBio reads with 126 Mb
> sequence, and 47M 454 reads and 17.6 Gb sequence.
> 
> We see that 0-overlaptrim-overlap uses quite a bit of time, and I fear
> that 1-overlapper will use a long time too. Is it possible to compute
> the overlaps between the 454 reads ahead of time, and use the overlaps
> from that store to only compute the overlaps between 454 reads and
> PacBio reads? Since I guess most to time is spent computing the
> overlaps between 454 reads. This could be useful for assembly in
> general too, sometimes we only input some data to have a faster
> assembly, while later on we input more.
> 
> When I look at the command that's used to run CA in the error
> correction step: runCA -s pacbio.spec -p asm -d temppacbio
> ovlHashLibrary=2 ovlRefLibrary=1-1 obtHashLibrary=1-1
> obtRefLibrary=1-1 sge=" -sync y" sgePropagateHold=corAsm
> stopAfter=overlapper, does it actually do something what I ask for? It
> only loads hash fragments from library 2, but it loads all libraries
> in the other *Library options (1-1 = 0)? Could anyone explain to me
> what that really means?
> 
> Sincerely,
> Ole
> 
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users

Re: [wgs-assembler-users] Saving and combining overlap stores (error correction PacBio reads for ex

Re: [wgs-assembler-users] Saving and combining overlap stores (error correction PacBio reads for ex.)