[Bio-bwa-help] Fwd: using bwasw for aligning pacbio reads to much chimera?
Status: Beta
Brought to you by:
lh3lh3
From: Heng Li <lh...@sa...> - 2012-06-21 14:14:53
|
CC'ing bio-bwa-help. Bandwidth largely puts a limitation on the maximum number of "abs(#ins-#del)" in the alignment. Smaller bandwidth improves speed. You may google "banded dynamical programing" for more information. What are the two alignments in the SAM format for the read? Heng Begin forwarded message: > From: "Geest, Henri van de" <hen...@wu...> > Subject: RE: using bwasw for aligning pacbio reads to much chimera? > Date: June 21, 2012 8:59:52 AM EDT > To: 'Heng Li' <he...@br...> > > Hi Heng, > > Thanks for your answer, increasing the bandwidth leads me to the following alignment: > > > (almost there) further increase of bandwidth does not help. (I tried upto 5000) . What does it actually do? > > I use Ecoli gi|238899406|ref|NC_012759.1|. This behaviour is seen by a lot of my reads, almost all show up with multiple alignments. > > > > > From: Heng Li [mailto:he...@br...] > Sent: woensdag 20 juni 2012 15:54 > To: Geest, Henri van de > Subject: Re: using bwasw for aligning pacbio reads to much chimera? > > You may consider to increase "-w" to 500. If it is still not working, please give me the accession number of your e. coli strain. > > Heng > > On Jun 20, 2012, at 9:08 AM, Geest, Henri van de wrote: > > > Hello Heng Li, > > > I have some problems with aligning PacBio reads to my reference and I think BWA-SW is best suited for aligning those long reads to a reference sequence. > > With the advised parameters for PacBio reads (“For the current PacBio reads (end of 2010), '-b5 -q2 -r1 -z10' is recommended”), bwasw reports multiple alignments at the same spot (chimera?). I think however that this is not chimera, but just a parameter issue. By eye you can continue some alignments at least longer than reported. Can you help me out on this problem? > I also blasted this sequence at ncbi: (alignment over 87% of the read, 2 HSPs) > > > <image004.jpg> > > > > <image006.jpg> > > Below you can see 11 alignments for the same pacbio read. Dark green is aligned, light green is unaligned. > > <image013.jpg> > > Settings: > Version: 0.6.2-r126 > ~/bin/bwa/bwa bwasw -t 10 -a1 -b5 -q2 -r1 -z10 ./single_ecoli.fa ../pacbio_small.fasta > The input read is 7700bp long. Reference is Ecoli of ~5MB. > > Attached you can find the used pacbio read. The reference is Ecoli. > > > If I increase the match reward to 3, I get single alignments, but than most of the alignments of my bigger set make no sense anymore. Also decreasing the z value to 1, removes some chemira, but not all. > > > Pacbio read: > > >m120510_130729_42179_c100335562550000001523020009201201_s1_p0/937/160_7855 > TAGGATAACACGGCGAGACCACAATCTCGAGGCGCACACTCTCTATCCCGCAATCGGAATGGCAGCTGACGTCCTCGGCGGCACTCCGACAGGCGAATTACACCGCCGACCTCATTACCGTTGCATCTCAACCATGGAGACGCCGCCGTCGCGGCGCAGCGCACGCCCACGTCGCATTGATAATCTCGGAGAGTGCTCGGAACCAGTCTCAGAGCTGCCGCTGCAGCCCGGCCCCTCCTCCAGAATCCCTGACTTGAGGAGCCACCCCCGCGAGTGGTTTTGGGGTGTAGCCGAGGGTAGAAGAGCACCACGGGGCGATATCAGCATGTGTGTTAGGACCAAAACGCAGCTGCGCAATACACGAAGGCCACACCCGAGGACTATACCCTCCATACCGAAGTGAACACCACCGCTCACCTGTCGCTTCGATAGCATTGCTTACCTACGCGAGGCATCATCTCTCCCCACGGCCCCACAGCCCGAATATAGCAGAAGAAACTCCCGGCGACGCAGTACCGACACGCGCACCCTCCTGCATTATTCTCAGACACCACGTAGATTACCTCTCTAACACGGTGAAGATAACAAAGCAAAAGCTCCAGCATTCATGAAAACAGACGAGATTGAATGCCACCCCTGTCGCGTGGAGTGACCAGCGGACACAACAGATGCGTAACTGGCAAGATGGCCCGAACAGTCGCCACCGTGGCGAAGAACTGCTATCTCTACATCGGTACACACAGCGCAGCGGCGCCCCGGAACTCGCACGCGGACAATCGCAGACCAGTAACGCAGTGAAGAGGGCCGGAAGGGGATGTAATCTGGCCCAGAATCGCCGGCGATCGTGTGAACGCTGAAAGGTCAAAACCGAAGAGACAGCTCCTCTTCATGACCCGAGACCGCACCGCTGGTCGGCTGATTGACCAACGTCTGTATGTACCATCCGGACCCGCAGCTGCGAAACGCTTCCCGAAATTGTCCCGCATCCCGCCGCAAGATGACAGTTCTGCGTGCACCGCCCAGCGACTGCACCGTCACACACGCGGAAGCGGCACGCGCCCTCTGGCAGAACAGCGCGCAGGTCGTCGCGTTGGTGCGTCCGGGTGTTCAAAACTCCGTCCAATCCAACCGTGCTGGCGAGCTGGCACAGCTATGAGGCAAGACGCGCGTTTTTTGTGAGTGACCGATGACGAAAGAACATCCAGCCGAAGAGTGGGTGAACAGGAGTTAATGCGGCGTTCGGCGCGTGCTGCGGCGCATCGCTCCGGAGTATCTGGTGCAAATGGGTGGCGACGTTTGCAGCAGCTGGGCGGTGGTGAAGCCATTCCGCGCTGGAGATGCCGTGAAGGAAACATTGTTTGCGAAGCTGCCAGGAAATGGACAGGAGCGCGCCGTCGATAGCTTGCGCTGCCCGAAGCCCGATTTAAGCTCATTACGGCAGCCCTAAGCTTTAAGTGCAAAATGGCGGTCTGTGTTTAACCGCATTTTTTACTGGAGAAAACATGCGTTACCTAATCTTGCTCCTCGAGTACTGACCCCGGCCATCTGACGATCGCCGTCGCATTCGCCGCCGCGATTGGTTTGCAGCGCCCGAACATCGGACCGTCGCACTGATGACCACCTCCGTCGCGGGTAATTCTCGGGTGAAAAACCTGACCCTCAATGCCTGGCACCACGCTGCCGCATTTCTGAATGCGGAGATTCCGCTCGCCCAGGGGGCCGCTGTGCCACTGGTAACGCGCCCGCGCGTATGCGGCATCTGTCGCACGGGCCGCGGAATCGCGCTAATGGGCTCTAGCGACTTTGGTGAGCCACCCGAAGCCGCTCGGATACCCGGCGTTTTCTGGCGCATCGGCGATGCCCTGATGCGCACCAGAGCGTGCTTACCGGCCTGGTGGCCATGCGCCCGTTAACCAATACACACTGCTTCAAATGCGCCTCTCGGATGCAAGCCGTTATCAGTCGCCGTCTGGTGGATCATGGGTGGTTCTGCCGGACGCGGCAACTGTACGCAACGCGCGAGTTTGAATCTTGGCTGGCCGAATCCAGAGCTGGCTGCCCTGTGCTCTTCCCGCAGTGCGTTTGAAATCGTCATGGGTCGGTTTGATGTCCACCAATCAGCCGCAATATTAGAATCCTGACATCTCTCTAGCACTGCGCGCAGTTAAACCGTACCGGAGAAATGCTTCCACCGCCTGTTTGAGCCACTACCGTGAGCGGCAGTATGCAAAGCGGCCTTGCGACAGCCGACTCTCTCGCGCCATCGCCCCTGGCCTGGTGCGCCCGGACCTCGTTTCACGTCTCAACCACTGCTTTGCCGGCAGTGCGACACTCAGGGCGGAGCTACCCTCAGGCCGACCGGTGGTTGCATATCGACCGGTGCCCCTGCAAGCCAGTCCCAATGTACAGGCTGCATTGATCTGGACTGTGAAAGCGCTTCCAGCATGGTGGCTGAGGTGCGGCTCTGGCGTCGTAAAGCCTGTCAAATGTTATCTGGCCATGAGTCATTCATGCCGAACTCATGCCTTTCACTGATGATCACCGGCCCTGTTTATCCGATTAATTTCTAACTATCAGCGGTTTTTTGGCTGGCGCGGTAGCGATGCGCTGTTACTGCCTGAAACGGTCTATCGCACAATAACAAAGAGGAATAGCTATCCGATGATGACAACACTCGCTTGCCCAGTCGCGGGACGCGGGGGCTATGGCGGCCGCCAGTTGACTTCAGGCCCGACGCGGCTGAGGCATTCAGAGGCCGTGGCAGTTGGCGCCTGCGCTGCGAGGTGTGAACGATCTGCTTACTCGGCAGCGACGCGTGAACTGGCCGGAGCGCCGGGAACAGGCTTACCCGGTCAACAGCCTCGATGCGGTAAAAGATGATTTTGATGTCTGTTTACTCGAGAGATTTACCGCCGCGGGCCCGTCAGAAGGCGTACGTCGTGGAACCTCTCGCTTTTTGCTCGCCAGCATGGCAAGGGAGGTGATCGGCAACTACGGGGGTTTGACGAAACCGTAACAAGCATTCGTGACGCAGCCGCTGCCGATATTGCATGTGACTTGCGCTGCCATTGTTAAGAGCGTGTTGGCGTAACGTCATGCCTAGCTGCTGGAGAAGCGCCAGCCAAAGGATGGGTCGGAATAACGAATAGTCGAAATTATTGAAGCACATCATAGACATAAACGTTGATCCCCGTCGGCACGCACTGGCAAGTGGGAAGAGCGATCGCCCACGCCCCTTTGATAAAGATCGGTCTGAAAGATGCGCGGTCCTACGTCGGCGAAGGCCACCGCACGGTGAACGGTGCCTGGCAGCAATGTTTCTGCCCCGCGGTGCGTGGCAGGTGACATCGTTGGTGAACATACCGCGATGTTTGCGGGATATGTGGCGAGCGTCGGAGCAGCTCACCCATAAGGCGGTCACAGCCGTATGACATTTGCTACCGGACGCGCTAATCACCACACGAAGTCGGCTTTTGCTGGTTGAAGTGGTAAGGAAAGCGTCTTTTTGATAGCGAATGTACTGCTGATTCAATAATTAGTAAGCCACAAATATTTTGTTAGTGTGCAAAGAATAACACATTTAATTTATTGATTATCAAGGGGCTTTAATTTCTTGGCCCTTTTATTTTTCGGTGTATGGTGCTTTTGAATTGTCTAAAGTGCAAAGATTACATGTTTTGTGCTTCTGTTTTCTGTTCTTTTAAATGTAAATTTGGACATGGGTTTGTCCACTTTTTTCTGCTCCCGTCTGTTTATTTCCATGGCAATCTCTTGCTGGCGCAGGCCGTTTTCCAGAACAGGTTACGGAGATCCTTTTTGTCGCTATGCCTGTAAATGACAATGCATGACCACAAATACATAACCAACACATCCCCGCCATTACGCCCGTGCACTTTTAGCGCCCATCCACGCCCAGAATTGCCGCCTGTGGGCCAGAATTCCGTGCGTAGGGCCCAGATTCTGCACTCTGATTACGTCCATCACTGTGTGAATGAATATCGGCACACTAAAGTGAGTGAACTCGATTCTCCTGGAGGGTGTTAATGCATCTTAAGTCAGCGCTATCTGGTTCCTGGACGACGGAACCCCCAGTTTCCGGCGGTCTGGCATAGGACCCACAGGTTTCGGCATCGCGCCACCGGCCTTGCGGGGAAGCGTTCAATACGCGTCTGCGCCGCGCGACCCCGCGTTAATCAAGAAAGTCCTCACCTCGATCCTTCCTCATGGCCTGTCAAATCGTTACTGACTCGCGCCCTTACTTTCCGCAATAGGTTCGGGCAATCCGGTCGGGCACCACAAACAAATGAAAGAAATAAAATCAGCATAAAAAAAAACAAAGTGCCGCTAAGAAAATGCCAGCAACTTCCATAAACTAACAGAAGACCTACTATCATTACACAACTGAAACACAATAACATCGAGAGTGGCGATTTAGACAAAAATAATCGATAACACACAAAACGACAATATTAACTAGGACAGAAGAAAAAAGAGACAGATGGCTGCATTAATCGCGGCGATAACCCAGATGCCGGCGCTGGTTCAGAAAGAGCCCGGCGCTTCAAACAGCGGTTGAAGAGGATCTAGCAAGAAGAAGTAGCCACCACGAGCAGAAGAGACTAATAGCCATAGACACCAAAGAAACAAGAACACAATAAAACCGAGATAGGACATGCCAGACACGAGAAAAAGCAAGCGAGCTGACGTTCATACACAGATCTATAACAATCAAAATATTATTCGAAAAATAGACAAGAAACAAAACAAAACAACATACTAAGAGACGGAAAGGATAAATATAGAAAGAAAGATACACCTAACCATCGTTCGGCGCCAAACTTCTGCGGAGATGTTACAACAAACCTCCAAAAAACAGAACCGGAGAACAACGGCCGCAAAACAGAGGACAACAAAAAGAAAACGATTAAGAAAAAGAAAACACAGTACCAAAAAAGAAATCACACACGAATTACTCGCAGACAAAACACAGAGAGACAATAAAAAAAAAAGATACTATAAAACAAAAAAGACACTGTCCTCGGTCATCGCAGGACAGCAGGAAAGCGCTCGGCCGGAGCGTGGCGAGACAGTCAAAACATGAAGATTTGTCGAAAACCCCGACCACGCAGCAGCAACATGACAACGCGATACACAAAGACACAATAAACAAACAAAAAAAAAAAAAAAAAAAAAAAAGAAAAGAACAAAGACAACAATTATAATAGATGGAAATAACATAAACATAAAGAGCAAAAAAAACAAATCCTGTTCCGTCGGTTAGCTTACAGGCAGTTCAAATCAAGCCGCACGGATACCCCGCAATAAGAGACAACCCTCGAGAAAAGACCAGAGTGGACCAAACGAACAAACAGACAACAAGAAAAATCGCAAAATATACAAGTTAGAAAATATAAGAAAGAGGAAAAAAATACCGTAAAACACGACTAAAGTAAAGATAAAGAATAAAAGAAAAATACACAAAAGTAGATACAACAAAACAAAAAATAACCCTGCCATCTCTGGGCGCGCCGATTGTTATCAGTCAGCAAATGAAAGAACAGAACACGCGCGTAAGGGATTGACTAAATCAAACAAAAATGACCAAAACACAGACGAAGAGAGATGGAAAAGCCGCGTGAGCGTAACGAGATAGACCCGCCAAAAAGTCAGAGAGAGGAGAAAAAAAACGAGCGAGTGGGGCGCGCGCGCGCGATTCTGGTGAGACTCCAACCCTGCGACCATCCATGACCGACCGCACCGGAAATGGCTGATGCAAACCTACATCGGAGCCGATTCACTGGAACGAAAATATACAAAAAGAAAAAGCGCCCGAACAAGATGGCTGCAGACCGATGGGCGGCGACAAAAAACAAGAGCGCTGCGAACTGCGCGCTGGAGCTGGAACATCAGCCGGCTGTCGATGGGAAGCAGTTAGGTGTCACCACTGATCTAAGGACAACTGCAAAACAAATATAAGAGAGACAAAAACGCAAAAGCCCGCCCGTCGTTCGACGTAGCGATGAGAAAAGTTGGTCTGGAACCGCCAGCGTCGCGGGTGATCCACACACGGATGGAACGAACCGCCTGGCGTCTTCGCCCGCCTGACGTGGGCGTTTCCCGTGCATTTATCCGCCATCCCGCGCCTCTTATACCGCCCACTGGCGGGTAGCGGCGCCGTACCGCTTAAACCGTGACACGATTGAAAGAAATTGGGCGGCCCCGCCGCTCTGGATCTCTCATCCCGACCCGACCGAGAGTTCCGCGATTGATCCAGACGTCGCTGAGGCGCCTGGCCCTGCAAGCGCTACGAGGATGGCAAGTGCGGGCGTGAGCAAACGAAACTCGCATCATCGTGCTGCTCTATCGGAAACAGCTGTCGATGCCCGATGGCATCCACACGCGCGGTGACGTCACATCACTGTCCGCGCCAGCCCAAACGATGGACCGACCAGAGAAATGATCAAGGTCATCGCGTAACGCCTCGGATGGCGGCTGCTGCGTGGAATCGGCGGAAACCGTGTTCCCAACGTTCAATTTGCGGCGGAACCAGGCGAAACGGTCGGTCTGATTGTTATCGAAATAACCCGACCAGGGCGTGTCCCGTTCCTTCGTACGCTGGCGCAGAACAGCGACCCGGTTTCCCGGCTGCTAAACGGTGGGCGAAACTGCGGTGGGTTACACCCTCCGACGAACTGATGAACCGACATCAGCGTCCCCGGGACCGGACGTCACTCCGCCTCCTTCGAGGCGTTCGCCATCACTACATGCGCTCGGTGACGTGACACAATCATCTCGCTTGAACTTCGAGAAAATCCGCCGTGCCCTAACGAGCGGTCTCACCACTCAGATGAAACTCGGTCTTGGCGAAGTGAGGCGACATTTGGTCGCGACCAGACAGCAGATCCTGGAAAAGCGCGTGCGCGCGGCCGTGCACTCGGTGCCGCTGGGATTCGACCCGAAGAGTGGCCTGGAGTTGACCGCGAAGCGCGCTTACGCCAAAACCGTCGCGAACTGAAGACGCAGGCGCGATCGTCGCTCGTGGTATACTGGTACACGTCGCACGATGCGGTTCCGTGCGGCGCCTCGTCCTGTGGACGAGCGTCTTCCAACCTCGACCCACCGACCGCTGGTTCCGTGGGTACACGATCTGAAGCAGCTGCGTGCGTCCTGGAAGCAGAAAGTGCGGAAGTGGGGCATCACTGGCCTTAACGCCTGACTTCCTGCCGGCCAGCTGAAACGCAAAGCGCCTGTGCGGAATGCGCGCTTTTGGCAAACACTGGCGGGCGTACGAGAAGCGCGAATCCGTACCGCGTGAGCCAGTGATGCCGAAACTGCACCCGGTTTAGCTAAAGCGCGGTGGATAGGCTGGTTGCGGGCAGAGCTTGCGCCACCGACACCAGGCGGTGGATCCGCTTACATGTACTGCCCCCCACTGTATGAACAAAGTGCGAAGGGCCGAATCGCGTCAACCGCGCGTGAAGAAAATCACTGTGCCTCTGGCGCGGCGGCCCCGACCGTAATCGGGGTCAGGGCTTGCAATCGCGTCGAACTACTGTGTCGAGTACACGCCGGGCTCCGCTGGCGCTCGCACGCGACAGAACGTTACGCAACATTATGGGTTAACTGCTGAAGCCCGGACGCGACGTCCACCCGAGCTCACGCGACACTCTCCGACCGCCCCTCTCTGTCAGAGGC > > In another example(2 alignments, 1 input read) you can clearly see that the alignment is not bad at all, altough bwa-sw stopped aligning (light green) > (settings: /bwa bwasw -t 10 -a2 -b3 -q2 -r2 -z1) (the score/penalties as Blast) > > > > <image014.jpg> > > Zoom out of above screen: > <image015.jpg> > > > I hope you find this interesting to read, and this might help you or me tuning the parameters for pacbio read data. If you need more information, please ask me for it. > > Looking forward to your answer, > > Kind regards, > > Henri van de Geest > Assistant researcher Bioinformatics > Wageningen UR, Plant Research International, Applied Bioinformatics > PO Box 167, 6700 AD, Wageningen, the Netherlands Wageningen Campus > Building 107, Droevendaalsesteeg 1, 6708 PB, Wageningen, The Netherlands > +31-317-480756 > Hen...@wu... > Skype: henri-van-de-geest > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a compa ny registered in England with number 2742969, whose registered office is 2 15 Euston Road, London, NW1 2BE. |