Hello
I have been trying to run PBCR on my PacBio data. The pipeline was running smoothly and made 4 sam files as per my specifications in pacbio.spec
1.sam 2.sam 3.sam 4.sam got converted into 1.ovls 2.ovls 3.ovls 4.ovls
Currently my PBCR pipeline is running and forming 5.sam 6.sam 7.sam and 8.sam
However I see that blasr for 6.sam has failed
Running partition 000002 with options -h 659897171-660029876 -r 425206338-659897170 --hashstrings 132706 --hashdatalen 1000002482 start 425206337 end 659897170 total 234690833 zero job 1 and stride 2
Running partition 000001 with options -h 659897171-660029876 -r 1-425206337 --hashstrings 132706 --hashdatalen 1000002482 start 0 end 425206337 total 425206337 zero job 0 and stride 2
Running partition 000003 with options -h 660029877-660162937 -r 1-425206337 --hashstrings 133061 --hashdatalen 1000008291 start 0 end 425206337 total 425206337 zero job 0 and stride 2
Running partition 000004 with options -h 660029877-660162937 -r 425206338-659897170 --hashstrings 133061 --hashdatalen 1000008291 start 425206337 end 659897170 total 234690833 zero job 1 and stride 2
Blasr completed.
Blasr completed.
SamToCA conversion completed.
/home/iivr/Desktop/PBCR//tempec_pacbio/1-overlapper/overlap.sh 5
Running partition 000005 with options -h 660162938-660298518 -r 1-425206337 --hashstrings 135581 --hashdatalen 1000000244 start 0 end 425206337 total 425206337 zero job 0 and stride 2
SamToCA conversion completed.
/home/iivr/Desktop/PBCR//tempec_pacbio/1-overlapper/overlap.sh 6
Running partition 000006 with options -h 660162938-660298518 -r 425206338-659897170 --hashstrings 135581 --hashdatalen 1000000244 start 425206337 end 659897170 total 234690833 zero job 1 and stride 2
Blasr completed.
Blasr failed.
[INFO] 2016-10-30T02:36:04 [blasr] started.
blasr: ../../lib/cpp/alignment/algorithms/anchoring/MapBySuffixArrayImpl.hpp:303: int MapReadToGenome(T_RefSequence&, T_SuffixArray&, T_Sequence&, unsigned int, std::vector<T_MatchPos, std::allocator<_Tp2=""> >&, AnchorParameters&) [with T_SuffixArray = SuffixArray<unsigned char,="" std::vector<int,="" std::allocator<int=""> >, DefaultCompareStrings<unsigned char="">, DNATuple>, T_RefSequence = FASTASequence, T_Sequence = SMRTSequence, T_MatchPos = ChainedMatchPos]: Assertion `sa.index[mp] + matchLength[matchIndex] <= reference.length' failed.
Command terminated by signal 6
564420.20user 36035.95system 9:15:09elapsed 1802%CPU (0avgtext+0avgdata 10329280maxresident)k
9933936inputs+158959392outputs (3637major+4045822067minor)pagefaults 0swaps
/home/iivr/Desktop/PBCR//tempec_pacbio/1-overlapper/overlap.sh 7
Running partition 000007 with options -h 660298519-660438109 -r 1-425206337 --hashstrings 139591 --hashdatalen 1000002830 start 0 end 425206337 total 425206337 zero job 0 and stride 2
Blasr completed.
Right now PBCR is still running and I don't want to interrupt it. Is there any way that I can restart my failed blasr?
This looks like an internal blasr error so restarting won’t likely fix it (the match went out of the index bounds). The pipeline will stop when it finds a missing job from the run at which point you could try re-running blasr or trying an updated version of blasr. However, I would recommend using Canu instead as PBcR is no longer being supported/maintained.
Related
Support Requests: #32
Hello Sergey
'This looks like an internal blasr error so restarting won’t likely fix it (the match went out of the index bounds).'
Does this mean that this particular step 1-overlapper is unable to be completed if I restart it again after it fails?
I cannot use Canu since I am using a Hybrid assembly with only around 20x pacbio reads and 150x Illumina reads.
Also I cannot use the latest version of blasr because of the recent changes and I had to downgrade to version that actually works with pbcr
./blasr --version
Note : prior to Blasr version 5.1 , use ./blasr -version (single dash) incompatible with pbcr 8.3
I understand that pbcr is not being maintained. Can you possibly advise me on how to best complete a hybrid assembly OR how I can use the latest blasr with this pbcr because that would probably solve all my problems
Thanks!
20X coverage is the minimum recommended for Canu so you can give it a shot with the low-coverage parameters suggested here:
http://canu.readthedocs.io/en/latest/quick-start.html#assembling-low-coverage-datasets http://canu.readthedocs.io/en/latest/quick-start.html#assembling-low-coverage-datasets
If you want to get PBcR to work with the latest blasr, you would probably have to update the code. Change any reference to -version to be —version. It’s just a perl script so if you change it you don’t need to recompile. You can pass arbitrary parameters for blasr with the blasr option to PBcR, that should let you customize the parameters to match what the latest blasr expects.
Related
Support Requests: #32