- labels: 611191 --> Internals
- assigned_to: nobody --> peterrice
Hello,
I was using fuzznuc to find a particular motif in a set of short sequences
and noticed some odd behaviour:
Using both EMBOSS 5.0.0 on the command line & using the Jemboss (version
1.5) interface:
>F03
TTTATTTTCTATGGGCCGGTATCTTG
Pattern TTN(0,22)TT
fuzznuc -sequence F03 -complement -rformat seqtable
#---------------------------------------
#=======================================
#
# Sequence: F03 from: 1 to: 26
# HitCount: 5
#
# Pattern_name Mismatch Pattern
# pattern1 0 TTN(0,22)TT
#
# Complement: Yes
#
#=======================================
Start End Pattern_name Mismatch Sequence
7 25 pattern1 . TTCTATGGGCCGGTATCTT
6 25 pattern1 . TTTCTATGGGCCGGTATCTT
5 25 pattern1 . TTTTCTATGGGCCGGTATCTT
2 25 pattern1 . TTATTTTCTATGGGCCGGTATCTT
1 25 pattern1 . TTTATTTTCTATGGGCCGGTATCTT
#---------------------------------------
To my mind this is missing a number of hits:
Start End Pattern_name Mismatch Sequence
5 8 pattern1 . TTTT
2 6 pattern1 . TTATT
2 7 pattern1 . TTATTT
2 8 pattern1 . TTATTTT
1 6 pattern1 . TTTATT
1 7 pattern1 . TTTATTT
1 8 pattern1 . TTTATTTT
If I change the pattern to TTN(0,12) then some of the missing hits are found
but not all - only the longest version where there is a second run of Ts:
TTATTTT but not TTATT or TTATTT
Is this a flaw in fuzznuc or a flaw in my understanding of what fuzznuc
should be doing - ie an exhaustive pattern search?
Thanks.
--
Mike Mitchell
Bioinformatics & Biostatistics Service
Cancer Research UK +44 (0) 207 269 3115