Smith-Waterman optimal local alignment
Isaac Turner
turner.isaac@gmail.com
== Build ==
$ make
or
$ make DEBUG=1
== Run ==
$ smith_waterman ACCGATA CCGT
To get a list of options/help:
$ smith_waterman
== Scoring systems ==
Proteins:
Query Length Substitution Matrix Gap Costs (gap_open,gap_extend)
<35 PAM-30 (9,1)
35-50 PAM-70 (10,1)
50-85 BLOSUM-80 (10,1)
85 BLOSUM-62 (10,1)
[table from http://www.ncbi.nlm.nih.gov/blast/html/sub_matrix.html]
gap (of length N) penalty: gap_open + N*gap_extend
NCBI BLAST Quote:
Many nucleotide searches use a simple scoring system that consists of a "reward"
for a match and a "penalty" for a mismatch. The (absolute) reward/penalty ratio
should be increased as one looks at more divergent sequences. A ratio of 0.33
(1/-3) is appropriate for sequences that are about 99% conserved; a ratio of 0.5
(1/-2) is best for sequences that are 95% conserved; a ratio of about one (1/-1)
is best for sequences that are 75% conserved [1].
[from: http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#Reward-penalty]
NCBI Gap (open, extend) values:
-5, -2
-2, -2
-1, -2
-0, -2
-3, -1
-2, -1
-1, -1
Our default (for now) are:
gap_open/gap_extend: (-1,-1)
match/mismatch: (2,-2)
== Development ==
- No current goals - please suggest some!