From: Serge K. <ser...@gm...> - 2015-12-22 20:19:46
|
Hmm, I haven’t seen that error before. It sounds like one of the previous steps didn’t advance the unitig version (the versions advance as the unitgs are built, the first version is layouts, version 2 is consensus, etc). I think you should be able to get your unitigs without trying to re-run the rest of the pipeline. If you run: tigStore -g dros1nf/asm.gkpStore -t dros1nf/asm.tigStore 2 -U -d consensus -nreads 2 1000000 > asm.fasta it should dump all the unitigs which have consensus called. Sergey > On Dec 14, 2015, at 3:24 PM, A. Bernardo Carvalho <ber...@gm...> wrote: > > Hi Serge, > Thank you for your suggestion. I followed it, but got stopped by another error (below; probably at the unitigger) . Please let me know if you have any other suggestion. > best, > Bernardo > > I issued the following commands: > > cd /draft1/bernardo1/drosophila > rm dros1nf.fastq > rm dros1nf.frg > rm -fr dros1nf > java -jar /home/bernardo/programs/convertFastaAndQualToFastq.jar dros1nf.fasta > dros1nf.fastq > fastqToCA -libraryname dros1nf -technology pacbio-corrected -type sanger -reads dros1nf.fastq > dros1nf.frg > runCA -s /draft1/bernardo1/drosophila//tempdros1nf/dros1nf.spec -p asm -d dros1nf ovlRefBlockLength=100000000000 ovlRefBlockSize=0 useGrid=0 scriptOnGrid=0 unitigger=bogart ovlErrorRate=0.03 utgErrorRate=0.025 cgwErrorRate=0.1 cnsErrorRate=0.1 utgGraphErrorLimit=0 utgGraphErrorRate=0.025 utgMergeErrorLimit=0 utgMergeErrorRate=0.025 frgCorrBatchSize=100000 doOverlapBasedTrimming=1 obtErrorRate=0.03 obtErrorLimit=4.5 frgMinLen=26 ovlMinLen=40 "batOptions=-RS -NS -CS" consensus=pbutgcns merSize=22 cnsMaxCoverage=1 cnsReuseUnitigs=1 gridEnginePropagateHold="pBcR_asm" dros1nf.frg > dros1nf.out 2>&1 > > > > OUTPUT: > ... > > ----------------------------------------START Mon Dec 14 09:51:20 2015 > /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/bogart -O /draft1/bernardo1/drosophila/dros1nf/asm.ovlStore -G /draft1/bernardo1/drosophila/dros1nf/asm.gkpStore -T /draft1/bernardo1/drosophila/dros1nf/asm.tigStore -B 4189 -eg 0.025 -Eg 0 -em 0.025 -Em 0 -RS -NS -CS -o /draft1/bernardo1/drosophila/dros1nf/4-unitigger/asm > /draft1/bernardo1/drosophila/dros1nf/4-unitigger/unitigger.err 2>&1 > ----------------------------------------END Mon Dec 14 09:52:38 2015 (78 seconds) > ----------------------------------------START Mon Dec 14 09:52:38 2015 > /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/gatekeeper -P /draft1/bernardo1/drosophila/dros1nf/4-unitigger/asm.partitioning /draft1/bernardo1/drosophila/dros1nf/asm.gkpStore > /draft1/bernardo1/drosophila/dros1nf/5-consensus/asm.partitioned.err 2>&1 > ----------------------------------------END Mon Dec 14 09:53:02 2015 (24 seconds) > ----------------------------------------START CONCURRENT Mon Dec 14 09:53:02 2015 > /draft1/bernardo1/drosophila/dros1nf/5-consensus/consensus.sh 1 > /dev/null 2>&1 > /draft1/bernardo1/drosophila/dros1nf/5-consensus/consensus.sh 2 > /dev/null 2>&1 > ... > /draft1/bernardo1/drosophila/dros1nf/5-consensus/consensus.sh 67 > /dev/null 2>&1 > /draft1/bernardo1/drosophila/dros1nf/5-consensus/consensus.sh 68 > /dev/null 2>&1 > /draft1/bernardo1/drosophila/dros1nf/5-consensus/consensus.sh 69 > /dev/null 2>&1 > ----------------------------------------END CONCURRENT Mon Dec 14 17:52:36 2015 (28774 seconds) > ----------------------------------------START Mon Dec 14 17:52:36 2015 > /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/tigStore -g /draft1/bernardo1/drosophila/dros1nf/asm.gkpStore -t /draft1/bernardo1/drosophila/dros1nf/asm.tigStore 2 -N -R /draft1/bernardo1/drosophila/dros1nf/5-consensus/asm.fixes > asm.fixes.err 2>&1 > ----------------------------------------END Mon Dec 14 17:52:36 2015 (0 seconds) > ----------------------------------------START Mon Dec 14 17:52:36 2015 > /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/tigStore \ > -g /draft1/bernardo1/drosophila/dros1nf/asm.gkpStore \ > -t /draft1/bernardo1/drosophila/dros1nf/5-consensus-insert-sizes/asm.tigStore 3 \ > -d matepair -U \ > > /draft1/bernardo1/drosophila/dros1nf/5-consensus-insert-sizes/estimates.out 2>&1 > ----------------------------------------END Mon Dec 14 17:52:36 2015 (0 seconds) > ERROR: Failed with signal HUP (1) > ================================================================================ > > runCA failed. > > ---------------------------------------- > Stack trace: > > at /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin//runCA line 1628. > main::caFailure("Insert size estimation failed", "/draft1/bernardo1/drosophila/dros1nf/5-consensus-insert-sizes"...) called at /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin//runCA line 4814 > main::postUnitiggerConsensus() called at /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin//runCA line 6259 > > ---------------------------------------- > Last few lines of the relevant log file (/draft1/bernardo1/drosophila/dros1nf/5-consensus-insert-sizes/estimates.out): > > MultiAlignStore::MultiAlignStore()-- ERROR, didn't find any unitigs or contigs in the store. > MultiAlignStore::MultiAlignStore()-- asked for store '/draft1/bernardo1/drosophila/dros1nf/5-consensus-insert-sizes/asm.tigStore', correct? > MultiAlignStore::MultiAlignStore()-- asked for version '3', correct? > MultiAlignStore::MultiAlignStore()-- asked for partition unitig=0 contig=0, correct? > MultiAlignStore::MultiAlignStore()-- asked for writable=0 inplace=0 append=0, correct? > > ---------------------------------------- > Failure message: > > Insert size estimation failed > > > > A. Bernardo Carvalho > > Departamento de Genética > Universidade Federal do Rio de Janeiro > > On 12 December 2015 at 17:46, Serge Koren <ser...@gm... <mailto:ser...@gm...>> wrote: > Ah yes, it outputs multi-line fasta which the previous version did not and the code is assuming it would output one line for each so it’s generating an invalid fastq file. If you take the dros1nf.fasta file, it should be valid. Convert it to a fastq with a fixed QV value, make a frg file, and re-run the last failed command. > > /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/fastqToCA -libraryname dros1nf -technology pacbio-corrected -type sanger -reads dros1nf.fastq > dros1nf.frg > /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/runCA -s /draft1/bernardo1/drosophila//tempdros1nf/dros1nf.spec -p asm -d dros1nf ovlRefBlockLength=100000000000 ovlRefBlockSize=0 useGrid=0 scriptOnGrid=0 unitigger=bogart ovlErrorRate=0.03 utgErrorRate=0.025 cgwErrorRate=0.1 cnsErrorRate=0.1 utgGraphErrorLimit=0 utgGraphErrorRate=0.025 utgMergeErrorLimit=0 utgMergeErrorRate=0.025 frgCorrBatchSize=100000 doOverlapBasedTrimming=1 obtErrorRate=0.03 obtErrorLimit=4.5 frgMinLen=26 ovlMinLen=40 "batOptions=-RS -NS -CS" consensus=pbutgcns merSize=22 cnsMaxCoverage=1 cnsReuseUnitigs=1 gridEnginePropagateHold="pBcR_asm" dros1nf.frg > > Sergey > >> On Dec 12, 2015, at 1:43 PM, A. Bernardo Carvalho <ber...@gm... <mailto:ber...@gm...>> wrote: >> >> Dear Sergey, >> Thank you for your suggestion. I tried two times to use the falcon_sense program from canu inside the PBcR script , and got the same errror in both attempts (error message copied below). It seems that the output of the new falcon_sense (from canu) is somehow incompatible with the PBcR script. Please let me know if you have any suggestion on how to proceed ; if none, I will wait for the canu release. >> >> Yours, >> Bernardo >> >> >> >> >> >> ********* Finished correcting 7200013631 bp (using 15743312583 <tel:15743312583> bp). >> ********* Assembling corrected sequences. >> Assembling with average 52 (min frag 26) and using ovl is 40 >> ----------------------------------------START Fri Dec 11 19:16:41 2015 >> ln -sf dros1nf.frg dros1nf.longest25.frg >> ----------------------------------------END Fri Dec 11 19:16:41 2015 (0 seconds) >> ----------------------------------------START Fri Dec 11 19:16:41 2015 >> ln -sf dros1nf.fastq dros1nf.longest25.fastq >> ----------------------------------------END Fri Dec 11 19:16:42 2015 (1 seconds) >> ----------------------------------------START Fri Dec 11 19:16:42 2015 >> /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/runCA -s /draft1/bernardo1/drosophila//tempdros1nf/dros1nf.spec -p asm -d dros1nf ovlRefBlockLength=100000000000 ovlRefBlockSize=0 useGrid=0 scriptOnGrid=0 unitigger=bogart ovlErrorRate=0.03 utgErrorRate=0.025 cgwErrorRate=0.1 cnsErrorRate=0.1 utgGraphErrorLimit=0 utgGraphErrorRate=0.025 utgMergeErrorLimit=0 utgMergeErrorRate=0.025 frgCorrBatchSize=100000 doOverlapBasedTrimming=1 obtErrorRate=0.03 obtErrorLimit=4.5 frgMinLen=26 ovlMinLen=40 "batOptions=-RS -NS -CS" consensus=pbutgcns merSize=22 cnsMaxCoverage=1 cnsReuseUnitigs=1 gridEnginePropagateHold="pBcR_asm" dros1nf.longest25.frg >> ----------------------------------------START Fri Dec 11 19:16:42 2015 >> /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/gatekeeper -o /draft1/bernardo1/drosophila/dros1nf/asm.gkpStore.BUILDING -T -F /draft1/bernardo1/drosophila/dros1nf.longest25.frg > /draft1/bernardo1/drosophila/dros1nf/asm.gkpStore.err 2>&1 >> ----------------------------------------END Fri Dec 11 19:18:32 2015 (110 seconds) >> ERROR: Failed with signal HUP (1) >> ================================================================================ >> >> runCA failed. >> >> ---------------------------------------- >> Stack trace: >> >> at /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/runCA line 1628. >> main::caFailure("gatekeeper failed", "/draft1/bernardo1/drosophila/dros1nf/asm.gkpStore.err") called at /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/runCA line 1957 >> main::preoverlap("/draft1/bernardo1/drosophila/dros1nf.longest25.frg") called at /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/runCA line 6250 >> >> ---------------------------------------- >> Last few lines of the relevant log file (/draft1/bernardo1/drosophila/dros1nf/asm.gkpStore.err): >> >> >> Starting file '/draft1/bernardo1/drosophila/dros1nf.longest25.frg'. >> >> Processing SINGLE-ENDED SANGER QV encoding reads from: >> '/draft1/bernardo1/drosophila//dros1nf.fastq' >> >> >> GKP finished with 68766632 alerts or errors: >> 68766632 # ILL Error: not a sequence start line. >> >> ERROR: library IID 1 'dros1nf' has 51263.29% errors or warnings. >> >> ---------------------------------------- >> Failure message: >> >> gatekeeper failed >> >> >> A. Bernardo Carvalho >> >> Departamento de Genética >> Universidade Federal do Rio de Janeiro >> >> On 4 December 2015 at 20:32, Serge Koren <ser...@gm... <mailto:ser...@gm...>> wrote: >> Hi, >> >> The issue is that PBDAGCON relies on BLASR libraries to do alignments in our implementation. For whatever reason, BLASR performance on D. melanogaster is extremely poor. Thus, PBDAGCON is very slow and I wouldn’t recommend running PBDAGCON on this genome unless you can run all the partitions in parallel on a grid environment. >> >> Also, we have a new version of the assembler, canu, which has an updated falcon_sense version which may work better for your assembly. You get the falcon_sense Linux binary here: >> http://github.com/marbl/canu/blob/master/src/falcon_sense/falcon_sense.Linux-amd64.bin?raw=true <https://github.com/marbl/canu> >> and just try replacing the version in CA 8.3 to see if it improves the Y assembly. >> >> Sergey >> >>> On Dec 1, 2015, at 8:31 AM, A. Bernardo Carvalho <ber...@gm... <mailto:ber...@gm...>> wrote: >>> >>> Hi, >>> I noticed that while the Drosophila melanogaster MHAP assembly is very good in general, it has many gaps in single-copy Y-linked genes. I guess that this is caused by low coverage: the DNA came from males, and was assembled at 25x, which leaves the Y genes at 12.5x (theoretically). Furthermore, it seems that Y-linked reads are being lost during the first correction step (done by falcon-sense; I checked the uncorrected and the corrected reads). >>> >>> I am trying to fix these problems by increasing the coverage of the corrected reads used in the "post-correction" steps (by adding assembleCoverage=40 in the spec file ; instead of the default 25x) , and by forcing the use of pbdagcon instead of falcon-sense (by adding pbcns=1 in the spec file). The assembly with 40x and falcon-sense worked fine , but when I tried 40x with pbdagcon , the run seems to be abnormally slow. Specifically, the machine I used is a Dell with 24 processors / 144 Gb RAM, and after 9 days running it was still processing the first two partitions of runPartition.sh >>> >>> # /home3/users/bernardo/drosophila//tempdros10/runPartition.sh 1 >>> # /home3/users/bernardo/drosophila//tempdros10/runPartition.sh 2 >>> >>> I checked the runPartition.sh script, and it seems to use only 8 threads (instead of 24): >>> >>> cat /home3/users/bernardo/drosophila//tempdros10/runPartition.sh >>> >>> $bin/outputLayout \ >>> -L \ >>> -e 0.35 -M 1500 \ >>> -i /home3/users/bernardo/drosophila//tempdros10/asm \ >>> -o /home3/users/bernardo/drosophila//tempdros10/asm \ >>> -p $jobid \ >>> -l 500 \ >>> \ >>> -P \ >>> -G /home3/users/bernardo/drosophila//tempdros10/asm.gkpStore \ >>> 2> /home3/users/bernardo/drosophila//tempdros10/$jobid.lay.err | $bin/convertToPBCNS -consensus pbdagcon -path /home3/users/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/ -output /home3/users/bernardo/drosophila//tempdros10/$jobid.fasta -prefix /home3/users/bernardo/drosophila//tempdros10/$jobid.tmp -length 500 -coverage 4 -threads 8 > /home3/users/bernardo/drosophila//tempdros10/$jobid.err 2>&1 && touch >>> >>> In this particular run I have not specified cnsConcurrency or consensusConcurrency in the spec file (so the PBcR choose the values; I only set threads=20 ), but in another run I added cnsConcurrency=20 >>> consensusConcurrency=20 >>> to the spec file, and again in 10 days it processed only 3 of the 200 partitions. >>> >>> I tried before the ecoli 30x and the yeast data, and both worked fine with pbdagcon (although slower than falcon-sense). Are there some limitation to use pbdagcon with higher coverage data? Is the -threads 8 option of the convertToPBCNS program correct? >>> >>> Thanks, >>> Bernardo >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> A. Bernardo Carvalho >>> >>> Departamento de Genética >>> Universidade Federal do Rio de Janeiro >>> ------------------------------------------------------------------------------ >>> Go from Idea to Many App Stores Faster with Intel(R) XDK >>> Give your users amazing mobile app experiences with Intel(R) XDK. >>> Use one codebase in this all-in-one HTML5 development environment. >>> Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. >>> http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140_______________________________________________ <http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140_______________________________________________> >>> wgs-assembler-users mailing list >>> wgs...@li... <mailto:wgs...@li...> >>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users <https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users> >> >> > > |