Re: [wgs-assembler-users] Spec file setup and long run time issues

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I’d suggest starting with the PBcR wiki page which has examples and spec files:
http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR>
You can test those datasets to make sure the installation is working on your system.

From your command it looks like you are correcting PacBio reads with Illumina reads. As documented on the PBcR wiki page this mode is no longer being updated and is significantly slower than using only PacBio data (which requires at least 30X coverage but 50X+ is best). If you have enough coverage, I’d recommend that approach instead. You could also try alternate tools to correct the PacBio data with Illumina (like ECTools, provread, LorDEC, etc).

If you’d still like to use PBcR for Illumina-based correction, the wiki page documents setting parameters when you have SMRTportal installed/in your path:
http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Correcting_Large_.28.3E_100Mbp.29_Genomes_.28Using_high-identity_data_or_CA_8.1.29 <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Correcting_Large_.28.3E_100Mbp.29_Genomes_.28Using_high-identity_data_or_CA_8.1.29>
or not:
http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Correcting_Large_.28.3E_100Mbp.29_Genomes_With_CA_7.0_or_older_.28not_recommended.29 <http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Correcting_Large_.28.3E_100Mbp.29_Genomes_With_CA_7.0_or_older_.28not_recommended.29>

That should help you configure your run. However, as I said, the correction with Illumina data will be significantly slower than self-correction of PacBio data and you should expect it to take a few hours on bacterial genomes. I’d also recommend not using more than 50X Illumina data as well.

Sergey

> On Mar 20, 2015, at 10:57 AM, Seth Munholland <mu...@uw...> wrote:
> 
> Hello Everyone,
> 
> I'm new to CA and I'm trying to use 8.3 to correct some PacBio reads.  Installation went smoothly, but when it comes time to run I first hit the problem of a missing spec file.  After some googling I found an example posted on the seqanswers forums (http://seqanswers.com/forums/archive/index.php/t-18478.html <http://seqanswers.com/forums/archive/index.php/t-18478.html>), however it's for Celera 7.  I went through and compared the spec options to the options parameters that print at the start of the PBcR run and removed everything except the memory related entries.  I wanted to see what the default values gave me before I tried tweaking things, but I have more memory available to me and I presumed the command line -threads option did the same thing as altering the spec values concerning threads.
> 
> My spec file consisted of the following:
> assemble = 0
> ovlMemory = 250
> merylMemory = 256000
> ovlStoreMemory = 256000
> 
> The command I ran it with was:
> PBcR -threads 30 -libraryname PI440795_A08 -s PI440795.spec -fastq PI440795_A08.fastq Pacu1.frg Pacu2.frg
> 
> Once I try to run it, however, I realize I've done something wrong.  The PBcR run has been on OverlapInCore for hours at this point and is using ~5GB of RAM.
> 
> The second problem I faced came when I tried using a smaller dataset to see if it was a size based issue and it moved through that stage within a day, and moved beyond the correction, but then it stalled on runPartition.sh, using ~10GB of RAM and taking ~1.5 hours per partition, while showing essentially no CPU usage.
> 
> I've since come across the RunCA wiki page which outlines many of the spec options, and found that many the options I started with from the example spec file don't even exist anymore.  Would anyone be able and willing to lend me a hand so I can properly configure my Celera pipeline to correct my PacBio reads please?
> 
> Seth Munholland, B.Sc.
> Department of Biological Sciences
> Rm. 304 Biology Building
> University of Windsor
> 401 Sunset Ave. N9B 3P4
> T: (519) 253-3000 Ext: 4755 <>------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the 
> conversation now. http://goparallel.sourceforge.net/_______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users