Menu

#145 allversusall bug + needle scoring issue

open
nobody
2
2010-07-07
2008-07-31
Anonymous
No

Dear Laura,

Which EMBOSS program are you using? I don't find this effect with EMBOSS
needle:

$ cat seq_a.fa
>seq_a
MGQMQIV
$ cat seq_b.fa
>seq_b
IV
$ needle
Needleman-Wunsch global alignment.
Input sequence: seq_a.fa
Second sequence(s): seq_b.fa
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output alignment [seq_a.needle]:
$ cat seq_a.needle
########################################
# Program: needle
# Rundate: Wed 9 Jul 2008 12:34:46
# Commandline: needle
# -asequence seq_a.fa
# -bsequence seq_b.fa
# Align_format: srspair
# Report_file: seq_a.needle
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: seq_a
# 2: seq_b
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 7
# Identity: 2/7 (28.6%)
# Similarity: 2/7 (28.6%)
# Gaps: 5/7 (71.4%)
# Score: 8.0
#
#
#=======================================

seq_a 1 MGQMQIV 7
||
seq_b 1 -----IV 2

#---------------------------------------
#---------------------------------------

I'm not sure it's relevant to your question but note that, in EMBOSS
needle, the score is unaffected by "hanging ends". I consider this odd,
in fact not really a global alignment score. E.g. a protein with domain
architecture -a-b-c-d- would get approx. the same score if aligned
against a protein of domain architecture -c-d-, as it would when aligned
against a protein of domain architecture -c-d-e-f-g-h-i-j-k-l-m-. In my
view this goes against the spirit of global alignment - but this
approach is briefly justified in the needle documentation, and I believe
is not unusual for global"alignment programs. Here's what I mean:

$ cat seq_c.fa
>seq_c
IVPPLKP
bhmac-db60-2:~ db60$ needle
Needleman-Wunsch global alignment.
Input sequence: seq_a.fa
Second sequence(s): seq_c.fa
Gap opening penalty [10.0]:
Gap extension penalty [0.5]:
Output alignment [seq_a.needle]:
$ cat seq_a.needle
########################################
# Program: needle
# Rundate: Wed 9 Jul 2008 12:37:01
# Commandline: needle
# -asequence seq_a.fa
# -bsequence seq_c.fa
# Align_format: srspair
# Report_file: seq_a.needle
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: seq_a
# 2: seq_c
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 12
# Identity: 2/12 (16.7%)
# Similarity: 2/12 (16.7%)
# Gaps: 10/12 (83.3%)
# Score: 8.0
#
#
#=======================================

seq_a 1 MGQMQIV----- 7
||
seq_c 1 -----IVPPLKP 7

#---------------------------------------
#---------------------------------------

Note that identity, similarity and gaps have all changed but score
remains the same as when seq_a and seq_b were aligned, since the only
difference is a "hanging end".

Best regards,

Daniel

laura wrote:
> Dear emboss users,
>
> I am using allversus all tool for global sequence alignment. I am writing
> to you because I am obtaining perfect aligments between sequences that have
> a very different length.. for example if I have a 100 residues protein
> sequence and a 2 residues protein sequence I obtain a 100% identity when I
> perform the alignment, in which I would expect a very poor sequence
> identity. Is there any way to prevent it or it is a posible bug in the
> program??
>
> I would thank you to answer me as soon as possible,
>
> Regards,
>
> Laura.

Discussion

  • Mahmut Uludag

    Mahmut Uludag - 2010-07-07

    end-weight support implemented in needle addresses the needle side of the question

     
  • Mahmut Uludag

    Mahmut Uludag - 2010-07-07
    • priority: 5 --> 2