Menu

#398 SAM format extension for features ...

Next Release
open
nobody
5
2011-02-10
2011-02-10
Jon Ison
No

From the mailing list - request for features in SAM format output (output for fuzznuc)

> It would be nice to actually show the pattern matched in the note field.

or both. What it shows is the "name" of the pattern, which usually
defaults to pattern. We could show the name and then the pattern. I will
add it to the next release.

My motivation was to build BAM tracks showing matches of lots of patterns in the genome sequence.
I hadn't thought about
proteins but I guess you could so something similar.

The SAM file would show the position of each match per line and the CIGAR string containing the
matched pattern and SEQ
(col 10) containing the query pattern expanded to show the match. The original pattern could be in
the OPT field. I see
there is a tag for Mismatching positions (MD) which would work for regex style matches (so good
for 'dreg'), but I am
not sure it would be strictly legal for a PROSITE like pattern.

e.g for [CG](5)TG{A}N(1,5)C

Could you have

MD:Z:[CG](5)TG{A}N(1,5)C

?

It looks like {,} is not allowed. So perhaps you would have to translate the pattern to a regex or
generate an
alternative optional tag. I am not a SAM expert so apologies if I am proposing to violate the
format rules!

Incidentally, I would use dreg but it doesn't allow mismatches to be easily specified.

Discussion