Re: [wgs-assembler-users] S. purpuratus parameters

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

In reply to your earlier error with bogart (bogart -G copygkpStore -O ..ovlStore -T e10.tigStore -o test.bogart  \
 -eg 0.10 -Eg $VAL -em 0.10 -Em $VAL 2>&1 | tee bogart_25.log). When writing output, bogart will split the resulting contigs into multiple partitions for consensus to be computed in parallel. Since you’re not setting this option, it will put one contig per partition and thus you are running out of file pointers. If you add -B 75000 that should fix your issue (where -B specifies # sequences per partition). If you have 10M sequences this will mean about 130 partitions. You can adjust the 75000 up to ensure you end up with less than 1024 partitions and fit into your open file limit.

As far as specifying innie/outtie, I don’t think the classification does not run automatically. It has to be enabled with the runCA parameters dncMPlibraries and dncBBlibraries, where dncMPlibraries  is the list of your mate-pair libraries while dncBBlibraries are the paired-end libraries. I think it will corrected the innie/outtie designation for any libraries listed in the dncMPlibraries list. Brian would know better so he can correct me.

I’d also second Brian’s suggestion to run MaSuRCA or another assembler like ALLPATHS-LG (if you have the required libraries) for an Illumina dataset. Celera Assembler is not well-designed to handle Illumina datasets and there are other/faster options available for assembly.

Sergey

> On Jan 21, 2015, at 3:45 PM, mathog <ma...@ca...> wrote:
> 
> Three of the four Illumina data sets used are "Nextera Mate Pair Reads", 
> but
> the frg files for those differ from the other one only in library and 
> file names, and this:
> 
> < mea:1000.000
> < std:100.000
> ---
>> mea:3000.000
>> std:300.000
> 
> These mate pair reads were described as "innie" in the frg files, but 
> with respect to the source DNA, I'm thinking now they probably should 
> have been "outie".  Or maybe not, since there is supposed to be code to 
> detect and handle these:
> 
> http://wgs-assembler.sourceforge.net/wiki/index.php/Pair_classification_within_Illumina_mate_pair_data
> 
> Does the analysis described in the preceding link occur automatically 
> for Illumina data, or is something special needed to turn it on?
> 
> Thanks,
> 
> David Mathog
> ma...@ca...
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> 
> ------------------------------------------------------------------------------
> New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
> GigeNET is offering a free month of service with a new server in Ashburn.
> Choose from 2 high performing configs, both with 100TB of bandwidth.
> Higher redundancy.Lower latency.Increased capacity.Completely compliant.
> http://p.sf.net/sfu/gigenet
> _______________________________________________
> wgs-assembler-users mailing list
> wgs...@li...
> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users