From: Serge K. <ser...@gm...> - 2015-12-04 22:30:17
|
Hi, The issue is that PBDAGCON relies on BLASR libraries to do alignments in our implementation. For whatever reason, BLASR performance on D. melanogaster is extremely poor. Thus, PBDAGCON is very slow and I wouldn’t recommend running PBDAGCON on this genome unless you can run all the partitions in parallel on a grid environment. Also, we have a new version of the assembler, canu, which has an updated falcon_sense version which may work better for your assembly. You get the falcon_sense Linux binary here: http://github.com/marbl/canu/blob/master/src/falcon_sense/falcon_sense.Linux-amd64.bin?raw=true <https://github.com/marbl/canu> and just try replacing the version in CA 8.3 to see if it improves the Y assembly. Sergey > On Dec 1, 2015, at 8:31 AM, A. Bernardo Carvalho <ber...@gm...> wrote: > > Hi, > I noticed that while the Drosophila melanogaster MHAP assembly is very good in general, it has many gaps in single-copy Y-linked genes. I guess that this is caused by low coverage: the DNA came from males, and was assembled at 25x, which leaves the Y genes at 12.5x (theoretically). Furthermore, it seems that Y-linked reads are being lost during the first correction step (done by falcon-sense; I checked the uncorrected and the corrected reads). > > I am trying to fix these problems by increasing the coverage of the corrected reads used in the "post-correction" steps (by adding assembleCoverage=40 in the spec file ; instead of the default 25x) , and by forcing the use of pbdagcon instead of falcon-sense (by adding pbcns=1 in the spec file). The assembly with 40x and falcon-sense worked fine , but when I tried 40x with pbdagcon , the run seems to be abnormally slow. Specifically, the machine I used is a Dell with 24 processors / 144 Gb RAM, and after 9 days running it was still processing the first two partitions of runPartition.sh > > # /home3/users/bernardo/drosophila//tempdros10/runPartition.sh 1 > # /home3/users/bernardo/drosophila//tempdros10/runPartition.sh 2 > > I checked the runPartition.sh script, and it seems to use only 8 threads (instead of 24): > > cat /home3/users/bernardo/drosophila//tempdros10/runPartition.sh > > $bin/outputLayout \ > -L \ > -e 0.35 -M 1500 \ > -i /home3/users/bernardo/drosophila//tempdros10/asm \ > -o /home3/users/bernardo/drosophila//tempdros10/asm \ > -p $jobid \ > -l 500 \ > \ > -P \ > -G /home3/users/bernardo/drosophila//tempdros10/asm.gkpStore \ > 2> /home3/users/bernardo/drosophila//tempdros10/$jobid.lay.err | $bin/convertToPBCNS -consensus pbdagcon -path /home3/users/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/ -output /home3/users/bernardo/drosophila//tempdros10/$jobid.fasta -prefix /home3/users/bernardo/drosophila//tempdros10/$jobid.tmp -length 500 -coverage 4 -threads 8 > /home3/users/bernardo/drosophila//tempdros10/$jobid.err 2>&1 && touch > > In this particular run I have not specified cnsConcurrency or consensusConcurrency in the spec file (so the PBcR choose the values; I only set threads=20 ), but in another run I added cnsConcurrency=20 > consensusConcurrency=20 > to the spec file, and again in 10 days it processed only 3 of the 200 partitions. > > I tried before the ecoli 30x and the yeast data, and both worked fine with pbdagcon (although slower than falcon-sense). Are there some limitation to use pbdagcon with higher coverage data? Is the -threads 8 option of the convertToPBCNS program correct? > > Thanks, > Bernardo > > > > > > > > > > A. Bernardo Carvalho > > Departamento de Genética > Universidade Federal do Rio de Janeiro > ------------------------------------------------------------------------------ > Go from Idea to Many App Stores Faster with Intel(R) XDK > Give your users amazing mobile app experiences with Intel(R) XDK. > Use one codebase in this all-in-one HTML5 development environment. > Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. > http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140_______________________________________________ > wgs-assembler-users mailing list > wgs...@li... > https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users |