David,

So I re-ran everything, it seems like a lot of the reads that bioscope maps are done in the filtering step. We have a lot of tags mapping here because it is transcriptome data, however this should be present in the genome (rRNA and tRNA).
n=50588166
unaligned 31%
aligned only by bioscope=33%
aligend by both=35%
aligned by bfast=1%

A lot of this is probably do to the filtering stage. Bioscope maps 28% of the total reads to the filter
reference, and of those 92% are mapped only by bioscope.

Looking only at reads that are mapped but not to the filter:
n=21554029
Both= 77%
Bioscope only= 16%
Bfast only = 8%

Let's ignore the filter step for right now, as these are uninteresting reads anyway. Bioscope is mapping 7% of reads that are not mapped in bfast, and bfast is mapping 3% of reads not mapped by bioscope.

I pulled out a handful of SAM records for these reads to get a sense of what sort of problems bfast might be having. A lot of them are truncated, but as you pointed out, this shouldn't affect alignment, just quality.

I've included them below. Let me know if you have any ideas.

Regards,
Aaron

1177_441_1937   0       chrXI   105119  22      26M24H  *       0       0      TCAGTGAAATGTTTTGGCCCAATTAA       IIIIIIIIIIIIIIIIIIII$$@IA)      RG:Z:2010080419375805   CS:Z:T02121120031100010300113030120100001210230110102021        AS:i:22CQ:Z:99>48;6<>5;<<?<><1<?:$)84.,)*;1&%(>)%&$37%.7<%(&+;  NH:i:1  IH:i:1  HI:i:1 MD:Z:26


1185_1352_1910  16      chrV    194556  11      24H26M  *       0       0      TCGATTATACCACCTGTTTTATGCTT       "IIIIIIIIIIIIIIIIIIIIIIIII      RG:Z:2010080419375805   CS:Z:T30231330001120110133303232133020103031311231230222        AS:i:19CQ:Z:;<><A<6;?==><=@==;88;882<:><9;=>7,9;9659><,/;:?;8<  NH:i:1  IH:i:1  HI:i:1 MD:Z:5A20


1238_1039_1037  256     chrII   478347  3       50M     *       0       0      AAGCCATTGACGCCATTGAACAACCATCTAGACCAACTGACAAGCCATTG       IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHICIFF!      RG:Z:2010080419375805   CS:Z:T30230130121330130120110101322322101012121102301301        AS:i:49 CQ:Z:767<=9>:9><><6=>:?5;<8;8;>><78<564:93859@7909139.9 XN:i:49 NH:i:2  IH:i:2  HI:i:1  CC:Z:chrXVI     CP:i:701274     MD:Z:50
1238_1039_1037  0       chrXVI  701274  3       50M     *       0       0      AAGCCATTGACGCCATTGAACAACCATCTAGACCAACTGACAAGCCATTG       IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHICIFF!      RG:Z:2010080419375805   CS:Z:T30230130121330130120110101322322101012121102301301        AS:i:49 CQ:Z:767<=9>:9><><6=>:?5;<8;8;>><78<564:93859@7909139.9 XN:i:49 NH:i:2  IH:i:2  HI:i:2  MD:Z:50

866_547_1295    272     chrXII  1018430 3       50M     *       0       0      GAAGATTGAGCGTAAGAGTGTCTCTTAACTTGGTTAGCTTGCTTTCTGGT       !IB3GIID''II,,III,,IIIIBIII"""IIIIIIIIIIIIIIIIIIII      RG:Z:2010080419375805   CS:Z:T31012200231023230101000030222211222201133021032202        AS:i:34 CQ:Z:AA@=A@><A=@??>@><>;@1*&>:?,7?<<7,A>=5,>5;')<3=+):< XN:i:34 NH:i:2  IH:i:2  HI:i:1  CC:Z:chrXIII    CP:i:146908     MD:Z:50
866_547_1295    0       chrXIII 146908  3       50M     *       0       0      ACCAGAAAGCAAGCTAACCAAGTTAAGAGACACTCTTACGCTCAATCTTC       IIIIIIIIIIIIIIIIIIII"""IIIBIIII,,III,,II''DIIG3BI!      RG:Z:2010080419375805   CS:Z:T31012200231023230101000030222211222201133021032202        AS:i:34 CQ:Z:AA@=A@><A=@??>@><>;@1*&>:?,7?<<7,A>=5,>5;')<3=+):< XN:i:34 NH:i:2  IH:i:2  HI:i:2  MD:Z:50


On Tue, Jan 18, 2011 at 9:20 AM, Aaron Goodman <aaronjg@genomics.upenn.edu> wrote:
Yes. I'm re-running the analysis. I think I picked some minimum cut-off to eliminate obvious mis-alignments. I suppose that I could do the trimming on the SAM file and recalculate the alignment scores based on the trim reads.

I'll report back when it is finished.

Cheers,
Aaron

On Mon, Jan 17, 2011 at 8:00 PM, David Rio <driodeiros@gmail.com> wrote:
I understand we may pick the sub-optimal hit but at least we report an
alignment. The substring strategy
will improve the local alignment for those cases but it will not
recover that 2% Aaron was reporting.

Aaron, could you please post the 2% you are referring to?

-drd

On Mon, Jan 17, 2011 at 11:01 AM, Nils Homer <nilshomer@gmail.com> wrote:
> What happens if there are two hits, one theoretically better than the other,
> but the sub-optimal one is affected less by the 3' end (alignment score).
> Then the mapping quality could be reduced, or the read will not be mapped.
> I would have to think it through more,
>
> Nils
>
> On Mon, Jan 17, 2011 at 8:08 AM, David Rio <driodeiros@gmail.com> wrote:
>>
>> At what step bfast fails to align those reads (Aaron reports 2% more
>> reads aligned)?
>> II do not see how changing the dynamic programming to allow partial
>> local alignment  will help.
>> Despite having a poor quality at the 3' end, if a read generates CALs
>> (under a certain amount) bfast will perform local alignment and report
>> it.
>>
>> -drd
>>
>> On Wed, Nov 10, 2010 at 7:43 PM, Nils Homer <nilshomer@gmail.com> wrote:
>> > This is a great idea, and would entail simply changing the local aligner
>> > to
>> > not have to align the full read as it does now.  This would be a
>> > beginning
>> > project for someone in bioinformatics.  Any takers?
>> >
>> > Nils
>> >
>> >
>> > On 11/10/10 12:08 PM, "Aaron Goodman" <aaronjg@genomics.upenn.edu>
>> > wrote:
>> >
>> > I am a bioinformatician at the Penn Genomes Frontiers Institute. We have
>> > recently started a High Throughput Sequencing initiative and I am
>> > working on
>> > developing an informatics pipeline for our data. I have been looking at
>> > various aligners, and have been impressed with BFAST's performance and
>> > accuracy. However in our analysis it is not as thorough as ABI's
>> > Bioscope
>> > pipeline (though it is much faster).
>> >
>> > It seems like the advantage of Bioscope is that it allows for the
>> > mapping of
>> > partial reads. Thus, I would like to suggest a feature to be added to
>> > BFAST.
>> > It would be very useful to align partial reads. This would be
>> > particularly
>> > benificial in the case of SOLiD data where the accuracy of the colorcall
>> > towards the ends of reads is rather low. In the current implementation,
>> > these reads are not aligned.
>> >
>> > In order to counter this some groups have used a recursive mapping
>> > strategy
>> > where bases are trimmed from the ends of reads until a match is found.
>> > (Cloonen et. al's RNA-MATE, as well as a recent post to the bfast help
>> > group).
>> >
>> > ABI's mapper bioscope handles this situation by doing a seed and extend
>> > alignment, but allowing the extension step to terminate before the ends
>> > of
>> > the reads. By doing this they are able to allow a higher proportion of
>> > the
>> > reads mapping.
>> >
>> > It would be very helpful to my group if BFAST could have this
>> > functionality.
>> > I think it would also make it able to map more reads that ABI's
>> > Bioscope.
>> >
>> > Regards,
>> > Aaron Goodman
>> >
>> >
>> > ________________________________
>> >
>> > ------------------------------------------------------------------------------
>> > The Next 800 Companies to Lead America's Growth: New Video Whitepaper
>> > David G. Thomson, author of the best-selling book "Blueprint to a
>> > Billion" shares his insights and actions to help propel your
>> > business during the next growth cycle. Listen Now!
>> > http://p.sf.net/sfu/SAP-dev2dev
>> > ________________________________
>> > _______________________________________________
>> > Bfast-devel mailing list
>> > Bfast-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/bfast-devel
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Centralized Desktop Delivery: Dell and VMware Reference Architecture
>> > Simplifying enterprise desktop deployment and management using
>> > Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
>> > client virtualization framework. Read more!
>> > http://p.sf.net/sfu/dell-eql-dev2dev
>> > _______________________________________________
>> > Bfast-devel mailing list
>> > Bfast-devel@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/bfast-devel
>> >
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Protect Your Site and Customers from Malware Attacks
>> Learn about various malware tactics and how to avoid them. Understand
>> malware threats, the impact they can have on your business, and how you
>> can protect your company and customers by using code signing.
>> http://p.sf.net/sfu/oracle-sfdevnl
>> _______________________________________________
>> Bfast-devel mailing list
>> Bfast-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/bfast-devel
>
>

------------------------------------------------------------------------------
Protect Your Site and Customers from Malware Attacks
Learn about various malware tactics and how to avoid them. Understand
malware threats, the impact they can have on your business, and how you
can protect your company and customers by using code signing.
http://p.sf.net/sfu/oracle-sfdevnl
_______________________________________________
Bfast-devel mailing list
Bfast-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bfast-devel