Thread: [Mauve-users] DNA ambiguity codes
Brought to you by:
koadman
From: Christine J. <chr...@gm...> - 2016-11-04 15:04:01
|
Dear mauve-users, I am trying to align sequences with a lot of ambiguous DNA bases (M, R, W, S, Y, ...) with progressiveMauve and I was wondering how the tool is handling these. E.g. is the pair (A, M) considered a match or mismatch? How about (M, R)? (M = A or C; R = A or G). Are ambigiuous bases even considered when matching or are they replaced by "N" as with some other (alignment-) tools? I tried to find my answer through the used substitution matrix but all references to the HOXD matrix only contain A, T, C, G... Thanks in advance, Christine |
From: Aaron D. <aar...@ut...> - 2016-11-07 20:27:07
|
Hi Christine, There are two levels of the algorithm to consider when it comes to ambiguities. The first is the alignment anchoring, which is using spaced seeds to find strings of gap-free matches among the input sequences. For this, the sequences become encoded in a two-bit representation, e.g. 00 for A, 01 for C, etc. Any IUPAC ambiguity that contains an A will be collapsed to 00, any remaining ambiguity with a C will become 01, and so on for G and T. This means that for example M and S will not match in the two bit representation even though they could both encode a C. However, the anchoring tolerates mismatches in positions dictated by the seed pattern, see the Darling et al 2006 WABI publication for more details about those seed patterns. Second, once a set of anchors have been selected, progressiveMauve is using the MUSCLE algorithm to compute the gapped alignment between anchors and to subsequently refine the alignment around anchors. In this stage the sequences, with any IUPAC codes, are passed onto MUSCLE. For details about how MUSCLE handles these characters I think your best bet is to inquire with Bob Edgar, who should be able to give the authoritative answer. Best, -Aaron On Fri, 2016-11-04 at 15:03 +0000, Christine Jandrasits wrote: > Dear mauve-users, > > I am trying to align sequences with a lot of ambiguous DNA bases (M, > R, W, S, Y, ...) with progressiveMauve and I was wondering how the > tool is handling these. > > E.g. is the pair (A, M) considered a match or mismatch? How about (M, > R)? > (M = A or C; R = A or G). Are ambigiuous bases even considered when > matching or are they replaced by "N" as with some other (alignment-) > tools? > > I tried to find my answer through the used substitution matrix but > all references to the HOXD matrix only contain A, T, C, G... > > Thanks in advance, > Christine > ------------------------------------------------------------------- > ----------- > Developer Access Program for Intel Xeon Phi Processors > Access to Intel Xeon Phi processor-based developer platforms. > With one year of Intel Parallel Studio XE. > Training and support from Colfax. > Order your platform today. http://sdm.link/xeonphi > _______________________________________________ > Mauve-users mailing list > Mau...@li... > https://lists.sourceforge.net/lists/listinfo/mauve-users -- Aaron E. Darling, Ph.D. Associate Professor, ithree institute University of Technology Sydney Australia http://darlinglab.org twitter: @koadman UTS CRICOS Provider Code: 00099F DISCLAIMER: This email message and any accompanying attachments may contain confidential information. If you are not the intended recipient, do not read, use, disseminate, distribute or copy this message or attachments. If you have received this message in error, please notify the sender immediately and delete this message. Any views expressed in this message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of the University of Technology Sydney. Before opening any attachments, please check them for viruses and defects. Think. Green. Do. Please consider the environment before printing this email. |