|
From: A. B. C. <ber...@gm...> - 2015-12-14 20:25:32
|
Hi Serge,
Thank you for your suggestion. I followed it, but got stopped by another
error (below; probably at the unitigger) . Please let me know if you have
any other suggestion.
best,
Bernardo
I issued the following commands:
cd /draft1/bernardo1/drosophila
rm dros1nf.fastq
rm dros1nf.frg
rm -fr dros1nf
java -jar /home/bernardo/programs/convertFastaAndQualToFastq.jar
dros1nf.fasta > dros1nf.fastq
fastqToCA -libraryname dros1nf -technology pacbio-corrected -type sanger
-reads dros1nf.fastq > dros1nf.frg
runCA -s /draft1/bernardo1/drosophila//tempdros1nf/dros1nf.spec -p asm -d
dros1nf ovlRefBlockLength=100000000000 ovlRefBlockSize=0 useGrid=0
scriptOnGrid=0 unitigger=bogart ovlErrorRate=0.03 utgErrorRate=0.025
cgwErrorRate=0.1 cnsErrorRate=0.1 utgGraphErrorLimit=0
utgGraphErrorRate=0.025 utgMergeErrorLimit=0 utgMergeErrorRate=0.025
frgCorrBatchSize=100000 doOverlapBasedTrimming=1 obtErrorRate=0.03
obtErrorLimit=4.5 frgMinLen=26 ovlMinLen=40 "batOptions=-RS -NS -CS"
consensus=pbutgcns merSize=22 cnsMaxCoverage=1 cnsReuseUnitigs=1
gridEnginePropagateHold="pBcR_asm" dros1nf.frg > dros1nf.out 2>&1
OUTPUT:
...
----------------------------------------START Mon Dec 14 09:51:20 2015
/home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/bogart -O
/draft1/bernardo1/drosophila/dros1nf/asm.ovlStore -G
/draft1/bernardo1/drosophila/dros1nf/asm.gkpStore -T
/draft1/bernardo1/drosophila/dros1nf/asm.tigStore -B 4189 -eg 0.025 -Eg
0 -em 0.025 -Em 0 -RS -NS -CS -o
/draft1/bernardo1/drosophila/dros1nf/4-unitigger/asm >
/draft1/bernardo1/drosophila/dros1nf/4-unitigger/unitigger.err 2>&1
----------------------------------------END Mon Dec 14 09:52:38 2015 (78
seconds)
----------------------------------------START Mon Dec 14 09:52:38 2015
/home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/gatekeeper -P
/draft1/bernardo1/drosophila/dros1nf/4-unitigger/asm.partitioning
/draft1/bernardo1/drosophila/dros1nf/asm.gkpStore >
/draft1/bernardo1/drosophila/dros1nf/5-consensus/asm.partitioned.err 2>&1
----------------------------------------END Mon Dec 14 09:53:02 2015 (24
seconds)
----------------------------------------START CONCURRENT Mon Dec 14
09:53:02 2015
/draft1/bernardo1/drosophila/dros1nf/5-consensus/consensus.sh 1 > /dev/null
2>&1
/draft1/bernardo1/drosophila/dros1nf/5-consensus/consensus.sh 2 > /dev/null
2>&1
...
/draft1/bernardo1/drosophila/dros1nf/5-consensus/consensus.sh 67 >
/dev/null 2>&1
/draft1/bernardo1/drosophila/dros1nf/5-consensus/consensus.sh 68 >
/dev/null 2>&1
/draft1/bernardo1/drosophila/dros1nf/5-consensus/consensus.sh 69 >
/dev/null 2>&1
----------------------------------------END CONCURRENT Mon Dec 14 17:52:36
2015 (28774 seconds)
----------------------------------------START Mon Dec 14 17:52:36 2015
/home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/tigStore -g
/draft1/bernardo1/drosophila/dros1nf/asm.gkpStore -t
/draft1/bernardo1/drosophila/dros1nf/asm.tigStore 2 -N -R
/draft1/bernardo1/drosophila/dros1nf/5-consensus/asm.fixes > asm.fixes.err
2>&1
----------------------------------------END Mon Dec 14 17:52:36 2015 (0
seconds)
----------------------------------------START Mon Dec 14 17:52:36 2015
/home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/tigStore \
-g /draft1/bernardo1/drosophila/dros1nf/asm.gkpStore \
-t
/draft1/bernardo1/drosophila/dros1nf/5-consensus-insert-sizes/asm.tigStore
3 \
-d matepair -U \
>
/draft1/bernardo1/drosophila/dros1nf/5-consensus-insert-sizes/estimates.out
2>&1
----------------------------------------END Mon Dec 14 17:52:36 2015 (0
seconds)
ERROR: Failed with signal HUP (1)
================================================================================
runCA failed.
----------------------------------------
Stack trace:
at /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin//runCA line 1628.
main::caFailure("Insert size estimation failed",
"/draft1/bernardo1/drosophila/dros1nf/5-consensus-insert-sizes"...) called
at /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin//runCA line 4814
main::postUnitiggerConsensus() called at
/home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin//runCA line 6259
----------------------------------------
Last few lines of the relevant log file
(/draft1/bernardo1/drosophila/dros1nf/5-consensus-insert-sizes/estimates.out):
MultiAlignStore::MultiAlignStore()-- ERROR, didn't find any unitigs or
contigs in the store.
MultiAlignStore::MultiAlignStore()-- asked for store
'/draft1/bernardo1/drosophila/dros1nf/5-consensus-insert-sizes/asm.tigStore',
correct?
MultiAlignStore::MultiAlignStore()-- asked for version '3', correct?
MultiAlignStore::MultiAlignStore()-- asked for partition unitig=0
contig=0, correct?
MultiAlignStore::MultiAlignStore()-- asked for writable=0 inplace=0
append=0, correct?
----------------------------------------
Failure message:
Insert size estimation failed
A. Bernardo Carvalho
Departamento de Genética
Universidade Federal do Rio de Janeiro
On 12 December 2015 at 17:46, Serge Koren <ser...@gm...> wrote:
> Ah yes, it outputs multi-line fasta which the previous version did not and
> the code is assuming it would output one line for each so it’s generating
> an invalid fastq file. If you take the dros1nf.fasta file, it should be
> valid. Convert it to a fastq with a fixed QV value, make a frg file, and
> re-run the last failed command.
>
> /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/fastqToCA -libraryname
> dros1nf -technology pacbio-corrected -type sanger -reads dros1nf.fastq >
> dros1nf.frg
> /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/runCA -s
> /draft1/bernardo1/drosophila//tempdros1nf/dros1nf.spec -p asm -d dros1nf
> ovlRefBlockLength=100000000000 ovlRefBlockSize=0 useGrid=0 scriptOnGrid=0
> unitigger=bogart ovlErrorRate=0.03 utgErrorRate=0.025 cgwErrorRate=0.1
> cnsErrorRate=0.1 utgGraphErrorLimit=0 utgGraphErrorRate=0.025
> utgMergeErrorLimit=0 utgMergeErrorRate=0.025 frgCorrBatchSize=100000
> doOverlapBasedTrimming=1 obtErrorRate=0.03 obtErrorLimit=4.5 frgMinLen=26
> ovlMinLen=40 "batOptions=-RS -NS -CS" consensus=pbutgcns merSize=22
> cnsMaxCoverage=1 cnsReuseUnitigs=1 gridEnginePropagateHold="pBcR_asm"
> dros1nf.frg
>
> Sergey
>
> On Dec 12, 2015, at 1:43 PM, A. Bernardo Carvalho <ber...@gm...>
> wrote:
>
> Dear Sergey,
> Thank you for your suggestion. I tried two times to use the falcon_sense
> program from canu inside the PBcR script , and got the same errror in both
> attempts (error message copied below). It seems that the output of the new
> falcon_sense (from canu) is somehow incompatible with the PBcR script.
> Please let me know if you have any suggestion on how to proceed ; if none,
> I will wait for the canu release.
>
> Yours,
> Bernardo
>
>
>
>
>
> ********* Finished correcting 7200013631 bp (using 15743312583 bp).
> ********* Assembling corrected sequences.
> Assembling with average 52 (min frag 26) and using ovl is 40
> ----------------------------------------START Fri Dec 11 19:16:41 2015
> ln -sf dros1nf.frg dros1nf.longest25.frg
> ----------------------------------------END Fri Dec 11 19:16:41 2015 (0
> seconds)
> ----------------------------------------START Fri Dec 11 19:16:41 2015
> ln -sf dros1nf.fastq dros1nf.longest25.fastq
> ----------------------------------------END Fri Dec 11 19:16:42 2015 (1
> seconds)
> ----------------------------------------START Fri Dec 11 19:16:42 2015
> /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/runCA -s
> /draft1/bernardo1/drosophila//tempdros1nf/dros1nf.spec -p asm -d dros1nf
> ovlRefBlockLength=100000000000 ovlRefBlockSize=0 useGrid=0 scriptOnGrid=0
> unitigger=bogart ovlErrorRate=0.03 utgErrorRate=0.025 cgwErrorRate=0.1
> cnsErrorRate=0.1 utgGraphErrorLimit=0 utgGraphErrorRate=0.025
> utgMergeErrorLimit=0 utgMergeErrorRate=0.025 frgCorrBatchSize=100000
> doOverlapBasedTrimming=1 obtErrorRate=0.03 obtErrorLimit=4.5 frgMinLen=26
> ovlMinLen=40 "batOptions=-RS -NS -CS" consensus=pbutgcns merSize=22
> cnsMaxCoverage=1 cnsReuseUnitigs=1 gridEnginePropagateHold="pBcR_asm"
> dros1nf.longest25.frg
> ----------------------------------------START Fri Dec 11 19:16:42 2015
> /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/gatekeeper -o
> /draft1/bernardo1/drosophila/dros1nf/asm.gkpStore.BUILDING -T -F
> /draft1/bernardo1/drosophila/dros1nf.longest25.frg >
> /draft1/bernardo1/drosophila/dros1nf/asm.gkpStore.err 2>&1
> ----------------------------------------END Fri Dec 11 19:18:32 2015 (110
> seconds)
> ERROR: Failed with signal HUP (1)
>
> ================================================================================
>
> runCA failed.
>
> ----------------------------------------
> Stack trace:
>
> at /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/runCA line 1628.
> main::caFailure("gatekeeper failed",
> "/draft1/bernardo1/drosophila/dros1nf/asm.gkpStore.err") called at
> /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/runCA line 1957
>
> main::preoverlap("/draft1/bernardo1/drosophila/dros1nf.longest25.frg")
> called at /home/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/runCA line 6250
>
> ----------------------------------------
> Last few lines of the relevant log file
> (/draft1/bernardo1/drosophila/dros1nf/asm.gkpStore.err):
>
>
> Starting file '/draft1/bernardo1/drosophila/dros1nf.longest25.frg'.
>
> Processing SINGLE-ENDED SANGER QV encoding reads from:
> '/draft1/bernardo1/drosophila//dros1nf.fastq'
>
>
> GKP finished with 68766632 alerts or errors:
> 68766632 # ILL Error: not a sequence start line.
>
> ERROR: library IID 1 'dros1nf' has 51263.29% errors or warnings.
>
> ----------------------------------------
> Failure message:
>
> gatekeeper failed
>
>
> A. Bernardo Carvalho
>
> Departamento de Genética
> Universidade Federal do Rio de Janeiro
>
> On 4 December 2015 at 20:32, Serge Koren <ser...@gm...> wrote:
>
>> Hi,
>>
>> The issue is that PBDAGCON relies on BLASR libraries to do alignments in
>> our implementation. For whatever reason, BLASR performance on D.
>> melanogaster is extremely poor. Thus, PBDAGCON is very slow and I wouldn’t
>> recommend running PBDAGCON on this genome unless you can run all the
>> partitions in parallel on a grid environment.
>>
>> Also, we have a new version of the assembler, canu, which has an updated
>> falcon_sense version which may work better for your assembly. You get the
>> falcon_sense Linux binary here:
>>
>> http://github.com/marbl/canu/blob/master/src/falcon_sense/falcon_sense.Linux-amd64.bin?raw=true
>> <https://github.com/marbl/canu>
>> and just try replacing the version in CA 8.3 to see if it improves the Y
>> assembly.
>>
>> Sergey
>>
>> On Dec 1, 2015, at 8:31 AM, A. Bernardo Carvalho <ber...@gm...>
>> wrote:
>>
>> Hi,
>> I noticed that while the Drosophila melanogaster MHAP assembly is very
>> good in general, it has many gaps in single-copy Y-linked genes. I guess
>> that this is caused by low coverage: the DNA came from males, and was
>> assembled at 25x, which leaves the Y genes at 12.5x (theoretically).
>> Furthermore, it seems that Y-linked reads are being lost during the first
>> correction step (done by falcon-sense; I checked the uncorrected and the
>> corrected reads).
>>
>> I am trying to fix these problems by increasing the coverage of the
>> corrected reads used in the "post-correction" steps (by adding
>> assembleCoverage=40 in the spec file ; instead of the default 25x) , and
>> by forcing the use of pbdagcon instead of falcon-sense (by adding pbcns=1
>> in the spec file). The assembly with 40x and falcon-sense worked fine ,
>> but when I tried 40x with pbdagcon , the run seems to be abnormally slow.
>> Specifically, the machine I used is a Dell with 24 processors / 144 Gb RAM,
>> and after 9 days running it was still processing the first two partitions
>> of runPartition.sh
>>
>> # /home3/users/bernardo/drosophila//tempdros10/runPartition.sh 1
>> # /home3/users/bernardo/drosophila//tempdros10/runPartition.sh 2
>>
>> I checked the runPartition.sh script, and it seems to use only 8 threads
>> (instead of 24):
>>
>> cat /home3/users/bernardo/drosophila//tempdros10/runPartition.sh
>>
>> $bin/outputLayout \
>> -L \
>> -e 0.35 -M 1500 \
>> -i /home3/users/bernardo/drosophila//tempdros10/asm \
>> -o /home3/users/bernardo/drosophila//tempdros10/asm \
>> -p $jobid \
>> -l 500 \
>> \
>> -P \
>> -G /home3/users/bernardo/drosophila//tempdros10/asm.gkpStore \
>> 2> /home3/users/bernardo/drosophila//tempdros10/$jobid.lay.err |
>> $bin/convertToPBCNS -consensus pbdagcon -path
>> /home3/users/bernardo/programs/wgs-8.3rc2/Linux-amd64/bin/ -output
>> /home3/users/bernardo/drosophila//tempdros10/$jobid.fasta -prefix
>> /home3/users/bernardo/drosophila//tempdros10/$jobid.tmp -length 500
>> -coverage 4 -threads 8 >
>> /home3/users/bernardo/drosophila//tempdros10/$jobid.err 2>&1 && touch
>>
>> In this particular run I have not specified cnsConcurrency or
>> consensusConcurrency in the spec file (so the PBcR choose the values; I
>> only set threads=20 ), but in another run I added cnsConcurrency=20
>> consensusConcurrency=20
>> to the spec file, and again in 10 days it processed only 3 of the 200
>> partitions.
>>
>> I tried before the ecoli 30x and the yeast data, and both worked fine
>> with pbdagcon (although slower than falcon-sense). Are there some
>> limitation to use pbdagcon with higher coverage data? Is the -threads 8
>> option of the convertToPBCNS program correct?
>>
>> Thanks,
>> Bernardo
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> A. Bernardo Carvalho
>>
>> Departamento de Genética
>> Universidade Federal do Rio de Janeiro
>>
>> ------------------------------------------------------------------------------
>> Go from Idea to Many App Stores Faster with Intel(R) XDK
>> Give your users amazing mobile app experiences with Intel(R) XDK.
>> Use one codebase in this all-in-one HTML5 development environment.
>> Design, debug & build mobile apps & 2D/3D high-impact games for multiple
>> OSs.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140_______________________________________________
>> wgs-assembler-users mailing list
>> wgs...@li...
>> https://lists.sourceforge.net/lists/listinfo/wgs-assembler-users
>>
>>
>>
>
>
|