|
From: Afif E. <ael...@sd...> - 2015-11-03 20:31:11
|
Hi, Adam, On 11/03/2015 12:05 PM, Adam Phillippy wrote: > Nucmer alignments will often overlap for the same reason MUMs can > overlap (repeats). In the example you give the alignments overlap on > the reference, but they do not overlap on the query. delta-filter -q > tries to find the best alignments that cover as much of the query > sequence as possible. Thus, this little 714 bp sequence appears > duplicated in your query, but not your reference. delta-filter keeps > it because it explains positions 3744105-3744818 on the query. I understood that one, but I was referring to the middle line. Here is the snippet again: 3569553 3701911 3570622 *3702980* 132359 132359 99.99 4419977 44259912.99 2.99 ... 3701734 3741905 *3702919* 3743090 40172 40172 99.98 4419977 44259910.91 0.91 ... 3741192 3741905 3744105 3744818 714 714 98.60 4419977 44259910.02 0.02 ... The actual positions 3702919-3702980 in the query overlap in the first and second lines. I understand why the alignment is overlapping, but shouldn't delta-filter -q have filtered out the alignment corresponding to the second line here? In other words, is delta-filter -q /supposed/ to throw out all alignments that result in query overlaps? > > I promise you will find many similar examples as you continue to play > with the data. These types of alignments are very common when aligning > assemblies to references (due either to mis-assembly or true > variants), and sorting them all out is a very difficult problem. > Of course. Many thanks and regards Afif |