From: Serge K. <ser...@gm...> - 2015-09-15 21:07:56
|
Hi, 1. Most likely this indicates PBDAGCON or BLASR might not be found which is causing the issue with the E. coli dataset. Can you send the full output from that run along with the contents of the tempK12/runPartition.sh script? 2. As for number of contigs in yeast, the numbers in the paper are after filtering out contigs with less than 50 reads. Without filtering you should have about 36 contigs. You can see the unfiltered CA 8.3 results here: http://wgs-assembler.sourceforge.net/wiki/index.php/Version_8.3_Release_Notes <http://wgs-assembler.sourceforge.net/wiki/index.php/Version_8.3_Release_Notes> As long as your max and N50 sizes are similar to those reported your assembly is running correctly. Serge > On Sep 14, 2015, at 10:37 PM, A. Bernardo Carvalho <ber...@gm...> wrote: > > Dear all, > I installed the CA 8.3rc2 in my server (Dell PE2900 running CentOS6.6, > with 8 cores and 64 Gb RAM) . As I have not used CA before, I run two > sets of test data from the PBcR web pages (or from the MHAP paper). > In both cases my assembly was quite more fragmented then the reported > ones. The datasets are: > > > 1) 30x coverage E. coli ( > http://wgs-assembler.sourceforge.net/wiki/index.php/PBcR#Assembling_an_E._coli > ). It should have assembled as a single contig, and I always got 11 > contigs. My command line and spec file follows: > > PBcR -pbCNS -length 500 -l K12_attempt11 -s > /home/tools/CA_tests/yeast/yeast2.spec -fastq > /home/tools/CA_tests/ecoli_83/selfSampleData/pacbio_filtered.fastq > genomeSize=4650000 > run11.out 2>&1 & > > spec file: > useGrid = 0 > scriptOnGrid = 0 > assemble = 1 > javaPath=/usr/bin/ > ovlThreads = 12 > threads = 12 > ovlConcurrency = 1 > cnsConcurrency = 12 > merylThreads = 12 > > I also removed the last 5 lines of the spec file (allowing PBcR to > choose nearly everything), but againg got 11 contigs. However, if I > run the program with the full Ecoli data set (downloaded from the AWS > snapshot), I got a single contig. > > > > 2) The yeast data set reported in the MHAP paper (Berlin et al 2015) , > downloaded from > http://gembox.cbcb.umd.edu/mhap/raw/yeast_filtered.fastq.gz > The MHAP paper describes that the assembly resulted in 21 contigs, > whereas I am always getting around 30. The command line follows (the > spec file is the same used for Ecoli ): > > PBcR -length 500 -l yeast2 -s yeast2.spec -fastq > /home/tools/CA_tests/yeast/yeast_filtered.fastq genomeSize=12100000 > > yeast2.out 2>&1 & > > I also tried to force the use of PBDAGCON, instead of falcon_sense, by > adding the line " pbcns=1" to the spec file, , and removing the > -pbCNS from the command line. The assembly was much slower, and, to my > surprise, more fragmented: 39 contigs. > > Are these results normal, or do they indicate some problem in my > installation of the Celera Assembler? > > Yours, > Bernardo > > > A. Bernardo Carvalho > > Departamento de Genética > Universidade Federal do Rio de Janeiro > > ------------------------------------------------------------------------------ > _______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |