|
From: Michael H. <hi...@mp...> - 2014-10-13 16:39:00
|
Hi,
I am using nucmer to align DNA sequences and during some playing around
with my input data, I noticed the following bug or strange behavior:
I aligned two small files (input.fa with sequence "a" and "b" against
input2.fa with sequence "c") and found a hit between "a" and "c"
cat input.fa
>a
GTGTCTTTGGTGCACTGGATGTAGATGGATTCTTGCGCCCATTCACGATTGTGGTAGAAGGGCTAGTAAAGGTGGAGTGGATAATCCAACCCGCAACTCCGATATGAATGGCCGTCTTTTGATCAGCCGCATATTAGACTTAACGTGCTCAAAAACACGGTTCATAATCCTTGATTATGCTGTTGACCAGCTTGCTGGCCAGTGTAGGATCAATTCGTTTATTATATAAGTCCTGCAGTCAAAGATCAGTAAGAAATTATTGATTGCTTTCTAACGTTTTTTCCCTTTCATGAAGTGCTTGGGGTTTTGCCAAAGAAATCCTTTAGCCTTTAGTAGTAAATAAGCAGACTAGAGTCCACATACATCGACAATATATCAGCTTCTGCTGGCGGCGAATCAGAGGGGCCGCTGTCGCCTCCACCTATGAGGCCCCTCAGGGCGAGCTGGCTCCATTGGGAGTAGATGGAACACTGGCGGCTTCAGTGGGTTAGACCTGTGTAATAGAGACACGCCAGATTTGCACCCTGCCCCTGGTGGTGCACTGTCGCCAATCGCCCAAGCGTGGCGACACCTTGATCCTGGCCTGTTCCCGATTGGGCAATCGGGCCTGGGGGGCAGTCGTTCCAGGCCCAAATTAGCTGCATTGCATCGTGTGGCATCACTGGCGCACCTGCAGTGGCGCCTCCGGCCCTCGCCTTCCTGCTCCTACTGTGTCAGCCAGTGCATTCACGCATCTATCCATAGAGATACATTTGGGTGTGCCTGGGCTTTCGCACCAGCGATTAACTTTTTGCTGCCAGTTTCCTCCTTGGCTTACTGCAGGCGGCAAGATGGAACAGCAGAATGTCAGCTGACATTCGTTCACAAAAGATCATGAAACTAACACCAGGGCCGGAATTGTAGAGATGTATCCACTTCCTTGTGTGTTTAGGAACTAACACAAGTTGGTGGATGGGTGGTAGTAACAATGACAGAAGAATTGCCACCCGG
>b
AAAAGAGAGCATTTATGCTTTTGAATTTAACTTAAGTTTGATGCACTAAAATAATAAAGGTATTCGATGTACTGTTCTATTTAAGTGCATAAAACTATTTTTTGTAAGGGTGCTATATCCTTTTTTCAGATAGATTCTTTGTTGATTCTCTTCCTTTTGATAAGAAGGAGTAGTTCCTGGTCAAGAGGCTAAAGTGTGGAAATGACCACATTTTCCACACTTTCTGCCAACCAGTGACAACATATCATAGGTGGCTAGCAGTGCAGAAAGAAGTCAAAAAGGTGTCCTGGCTTCAGGTTAGCCTCCTCCAGCCTAGCCAACAAAATGTGACTGCTTTGAATTCTCTGTAAACAAGAGAATTCATAAAAGCCATAAAACCAAAAAAAAAAACTCTGCACACTTCCAAGCAGGAGCTGTAGCTGGCAGATGTAGATCTAAGTTGGGTTTTATCCAGCTAGGGCCAATTATCCTCTCCTGTTCTGTTAGCCAAATTGTGGGATTCTCTAAAAGCCCAGAAATTACTCGCTGATAGAGTTGGAAACACCTAGACAAAGGGCTAAAAAGTTCAGGAGCATTAAAAAATGGTATTCCTGCATTGAACAGTACTCGCCCTCCAAAGAATTCATCTATTTGAAGGTTAAAAATGTGGGATCAAATATAAAAGTAGATCCTTAACCCAATACCAATGTGTGTTTACGCAACATTAAGGCTTTTATTTACAAGATTTGGGAGAAGAGAAGATGGATGAAGGCTGTTGAGTAAAATAATGAAACAGATGAATTTTTCCCCTTGAAAACAGAGACACAGGTTAAACCAGGAGGGGTTGGAGAATAGCCAATGGTTAACATGTATGCCCAACTGATGAGATGTTTTTAAAGATGATAAAGAGAAGGAGGGTTCCTTCCTGGTTGAGTAGAAACGGAGGGTGGCTTTTTTGAAGAGGAGCGAAA
cat input2.fa
>c
TATAAATAGCTAAATACATAACTAATGTTATGAATTCTTCTAGAAAATGTCAAGAAGTCATTGCAATTTATTTCTCGCTTGCAACTTTGAAACAATATTGTATGCTTTAGGTTCCACTTTCCACTTTTTTCTTGCCTCTGTCTGGTTGATTATTTTTGTGTCAGTGGTGCCACCTCGTGGAGTTTAAGTTTTACTGGCAAACTCTCCTTTTAAAATAAACACAGTGTAGTATTATTATATGGGAAAGAGAACAGGATTCTTGTGCATTCAGTAGTTATACAGAACACCTGCTTAAATTTTGTCTTTGGTACAACTATTAGAAGCACAGAAGTCCTTCTAACTGTACTAAAGTGGATCATTGCAAAGTCATTATGAAAAGAATTCTAACAAAACGCATACATTGCAACAGAATGAAAACATGGCAATTCTTGTATTTTATAACCCATCCAAAACTTGTGTTAGTTCCTAAACACAAGGAATGGAGTACATCTACAGTTCTGTTAGTTCATGATTTTTGTGAACATGTCAGTGACATTCTGCTGTTCCATCTTGCCGCCTGCAGTAGCCAAGAGAAGTGGCAGCAAAAAGTTAATCACTGGTGCAAACCCAGCAACCAAATTATCTCTATGGATAATGGTGAAGGCACTGGCTGACACAGTAGGAGATGAAGGCGAGGCGAGGCGAGGCCATGCAGTGCGCAGTGATGCACAGATGCAATCAGCTAATTGGGCCTGAACGATGCCCCAGGCCCGATTGCCCAATCGAACAGGCAGGATCAATGTGTCGCAGTGGCCGGGACA
nucmer --maxmatch -l 11 -b 1000 -g 1000 input.fa input2.fa ; show-coords
out.delta
gives:
NUCMER
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [%
IDY] | [TAGS]
=====================================================================================
565 985 | 787 420 | 421 368 | 83.89 |
a c
Now, if I remove sequence "b" from input.fa that shows no alignment, the
alignment between "a" and "c" is not found anymore
cat input.fa
>a
GTGTCTTTGGTGCACTGGATGTAGATGGATTCTTGCGCCCATTCACGATTGTGGTAGAAGGGCTAGTAAAGGTGGAGTGGATAATCCAACCCGCAACTCCGATATGAATGGCCGTCTTTTGATCAGCCGCATATTAGACTTAACGTGCTCAAAAACACGGTTCATAATCCTTGATTATGCTGTTGACCAGCTTGCTGGCCAGTGTAGGATCAATTCGTTTATTATATAAGTCCTGCAGTCAAAGATCAGTAAGAAATTATTGATTGCTTTCTAACGTTTTTTCCCTTTCATGAAGTGCTTGGGGTTTTGCCAAAGAAATCCTTTAGCCTTTAGTAGTAAATAAGCAGACTAGAGTCCACATACATCGACAATATATCAGCTTCTGCTGGCGGCGAATCAGAGGGGCCGCTGTCGCCTCCACCTATGAGGCCCCTCAGGGCGAGCTGGCTCCATTGGGAGTAGATGGAACACTGGCGGCTTCAGTGGGTTAGACCTGTGTAATAGAGACACGCCAGATTTGCACCCTGCCCCTGGTGGTGCACTGTCGCCAATCGCCCAAGCGTGGCGACACCTTGATCCTGGCCTGTTCCCGATTGGGCAATCGGGCCTGGGGGGCAGTCGTTCCAGGCCCAAATTAGCTGCATTGCATCGTGTGGCATCACTGGCGCACCTGCAGTGGCGCCTCCGGCCCTCGCCTTCCTGCTCCTACTGTGTCAGCCAGTGCATTCACGCATCTATCCATAGAGATACATTTGGGTGTGCCTGGGCTTTCGCACCAGCGATTAACTTTTTGCTGCCAGTTTCCTCCTTGGCTTACTGCAGGCGGCAAGATGGAACAGCAGAATGTCAGCTGACATTCGTTCACAAAAGATCATGAAACTAACACCAGGGCCGGAATTGTAGAGATGTATCCACTTCCTTGTGTGTTTAGGAACTAACACAAGTTGGTGGATGGGTGGTAGTAACAATGACAGAAGAATTGCCACCCGG
cat input2.fa
>c
TATAAATAGCTAAATACATAACTAATGTTATGAATTCTTCTAGAAAATGTCAAGAAGTCATTGCAATTTATTTCTCGCTTGCAACTTTGAAACAATATTGTATGCTTTAGGTTCCACTTTCCACTTTTTTCTTGCCTCTGTCTGGTTGATTATTTTTGTGTCAGTGGTGCCACCTCGTGGAGTTTAAGTTTTACTGGCAAACTCTCCTTTTAAAATAAACACAGTGTAGTATTATTATATGGGAAAGAGAACAGGATTCTTGTGCATTCAGTAGTTATACAGAACACCTGCTTAAATTTTGTCTTTGGTACAACTATTAGAAGCACAGAAGTCCTTCTAACTGTACTAAAGTGGATCATTGCAAAGTCATTATGAAAAGAATTCTAACAAAACGCATACATTGCAACAGAATGAAAACATGGCAATTCTTGTATTTTATAACCCATCCAAAACTTGTGTTAGTTCCTAAACACAAGGAATGGAGTACATCTACAGTTCTGTTAGTTCATGATTTTTGTGAACATGTCAGTGACATTCTGCTGTTCCATCTTGCCGCCTGCAGTAGCCAAGAGAAGTGGCAGCAAAAAGTTAATCACTGGTGCAAACCCAGCAACCAAATTATCTCTATGGATAATGGTGAAGGCACTGGCTGACACAGTAGGAGATGAAGGCGAGGCGAGGCGAGGCCATGCAGTGCGCAGTGATGCACAGATGCAATCAGCTAATTGGGCCTGAACGATGCCCCAGGCCCGATTGCCCAATCGAACAGGCAGGATCAATGTGTCGCAGTGGCCGGGACA
nucmer --maxmatch -l 11 -b 1000 -g 1000 input.fa input2.fa ; show-coords
out.delta
gives:
NUCMER
[S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [%
IDY] | [TAGS]
=====================================================================================
I called mummer and mgaps manually on the .ntref file and it seems that
while mummer outputs something, mgaps removes the alignment.
Could you please have a look if this is a bug, and if not tell me which
mgaps parameters would keep that alignment?
Thanks a lot
- Michael
|