From: Björn N. <bjo...@sc...> - 2010-10-27 08:59:31
|
Thanks to everyone for valuable input :-) If the mapping output is influenced by the order of reference seqs in the reference file I think it is a clear bug. (If reads that map with an overhang were kept in a consistent way that would be a nice feature, but that's not the case it seems). In my opinion it would be nice to have this fixed. This is not a major issue as long as you are aware of it; however, the problem is the number of small things you need to be aware of to make sound inferences. At least it is important enough to be clearly documented on the homepage. Best Björn On 27 okt 2010, at 03.26, Joseph Fass wrote: > I seem to recall Heng Li mentioning that reference sequences were simply concatenated prior to indexing ... in which case, there wouldn't be biological significance in most cases. > > I guess I find John's points compelling ... so hopefully a consensus will arise! > > ~Joe > > > > > > > On Tue, Oct 26, 2010 at 4:18 PM, Bob Harris <rsh...@bx...> wrote: > Joseph Fass wrote: > [Re unmapped reads with positions off the ends of reference sequences] > > This has been discussed on the BWA list before, and I wouldn't consider it > a bug. > > and John Marshall replied: > > > (I think you will find that just about every other user on the list > considers it a bug!) > > I think it's useful to consider two aspects of this BWA behaviour separately: > > 1. BWA is mapping these reads in a biologically meaningless way, bridging > two separate chromosomes that are only coincidentally presented as > adjacent. > > 2. BWA is outputting these mappings as reads with the UNMAPPED flag set > and with RNAME/POS at the end of a chromosome, with the read running past > the end. > > The relevant questions are whether each of these aspects is a bug, and if > so whether the bug is worth fixing. > > The first aspect is clearly a bug. Without having studied the code :-), > it seems to me that this could be fixed by inserting a sentinel letter > between reference sequences, and assigning it a horrifying mismatch > penalty. However the implementation would be at best nontrivial and if it > pushed the alphabet from 4 letters to 5, the memory implications could > also be nontrivial. So I would grudgingly admit that it might well not be > worth fixing. > > Well, that isn't the only way to fix it. A simple post-processing step (inside BWA, just prior to emitting the read) could compare the mapped position and read length to the length of the scaffold/chromosome, and discard such (allegedly) erroneous mappings. > > But as often as the over-the-end-mapped-read has been defended on this list, I presume there's some other reason people think they should be kept. Do they really result from straddling the end of one scaffold and the start of the next? Or do they simply extend off the end of a scaffold? If it is the latter case, then they may have biological significance in that the end of the scaffold doesn't necessarily represent the end of any biological entity. > > Bob H (no, not THAT Bob H) > > > > > -- > Joseph Fass > Bioinformatics Programmer > UC Davis Bioinformatics Core > joseph.fass -at- gmail.com (professional) > 970.227.5928 (c) || 530.752.2698 (w) > ------------------------------------------------------------------------------ > Nokia and AT&T present the 2010 Calling All Innovators-North America contest > Create new apps & games for the Nokia N8 for consumers in U.S. and Canada > $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing > Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store > http://p.sf.net/sfu/nokia-dev2dev_______________________________________________ > Bio-bwa-help mailing list > Bio...@li... > https://lists.sourceforge.net/lists/listinfo/bio-bwa-help ------------------------------------------------ Bjorn Nystedt, PhD Bioinformatics scientist SciLifeLab, Stockholm www.scilifelab.se Visting address: Karolinska Institutet Science Park Tomtebodavägen 23 A 171 65 Solna Postal address: Box 1031 171 21 Solna E-mail: bjo...@sc... Phone: +46 (0)8 5248 1477 Mobile phone: +46 (0)73 625 1477 ------------------------------------------------ |